Machine Learning in R: Clustering and Classification (Part 1 of 2) 2024-05-23
From DataLab June 04, 2024
Related Media
This two-part workshop series provides an introduction to using R for two popular machine learning techniques: clustering and classification. Clustering involves identifying groups of similar observations (called clusters) within data. Clustering can be an effective tool for finding patterns and an important part of exploratory data analysis. Classification refers to modeling categorical variables. Classification models can provide insight into the relationship between the predictors and response, as well as a way to make predictions about new observations.
After this workshop series, learners should be able to:
- Assess whether classification or clustering are relevant to their research problems and data sets;
- Explain the tradeoffs between popular clustering algorithms;
- Run a clustering algorithm on their data;
- Build and train a classification model on their data;
- Use cross-validation to estimate accuracy and tune hyperparameters for classification models;
- Identify strategies to improve results from classification models.
Prerequisites: This workshop is designed for researchers who have data that they are already working with in R. Participants must have taken DataLab’s “Overview of Statistical Machine Learning,” “R Basics,” and “Regression in R” workshop series, or have equivalent prior experience. Completion of DataLab’s “Intermediate R” series is recommended but not required. Participants must be comfortable with basic R syntax, and have the latest version of R pre-installed and running on their laptops. The focus of the workshop is on implementing clustering and classification in R, and not on learning the R language itself. Bring your laptop with the latest version of R and RStudio.
- Tags
- Appears In