This
three-part workshop series introduces participants to natural language
processing (NLP) with Python. It builds on our text mining series,
"Getting Started with Textual Data," by extending the scope of
data-inflected text analysis to include various methods of modeling meaning.
Sessions will cover NLP topics ranging from segmentation and dependency parsing
to sentiment analysis and context-sensitive modeling. We will also discuss how
to implement such methods for tasks like classification. Basic familiarity with
analyzing textual data in Python is required. We welcome students, postdocs,
faculty, and staff from a variety of research domains, ranging from health
informatics to the humanities.
Workshop
dates were May 23, May 25, and May 27, 2022, 10:00 AM – 12:00 PM.
By
the end of this series, you should be able to:
- Use popular
NLP frameworks in Python, including Gensim and spaCy
- Explain key
concepts and terminology in NLP, including dependency parsing, named
entity recognition, and word embedding
- Process
texts to glean information about sentiment, subject, and style
- Classify
texts on the basis of their features
- Produce
models of word meanings from a corpus
- Perform a
few core NLP tasks including keyword analysis, relation extraction,
document similarity analysis, and text summarization.
Software needed: Python;
Google Colab (instructors will provide notebooks and data).
The
copyright on this video is owned by the Regents of the University of California
and is licensed for reuse under the Creative Commons Attribution 4.0
International (CC BY 4.0) License.