Machine Learning in R
A hands-on introduction to machine learning in R with an emphasis on
supervised predictive modeling using the {tidymodels} ecosystem.
Instructors:
Jeffrey Girard, PhD (University of Kansas)
Shirley Wang, PhD (Yale University)
Workshop Dates and Times:
Monday, July 14, 2025 at 9:30am-4:30pm EDT
Tuesday, July 15, 2025 at 9:30am-4:30pm EDT
Wednesday, July 16, 2025 at 9:30am-4:30pm EDT
Thursday, July 17, 2025 at 9:30am-4:30pm EDT
(And 2024 Recording is Available Now)
Workshop Format:
Four-Day Online Workshop Recording
Whereas statistical methods traditionally used in the social and behavioral sciences emphasize interpretability and quantification of uncertainty, machine learning methods emphasize complexity and accuracy of predictions. Machine learning methods are thus particularly well-suited for applications where (1) there are nonlinear and complex relationships among a large number of predictor variables and (2) accurately predicting the outcome variable is more important than fully understanding the relationships between variables.
This workshop will provide a hands-on introduction to the application of machine learning techniques in R using the {tidymodels} packages. It will emphasize practical knowledge and conceptual intuitions (e.g., teaching you how to drive a car) rather than technical and theoretical mastery (e.g., teaching you how to build a car). In addition, rather than briefly surveying the full breadth of available machine learning techniques, this workshop will provide a deep dive into several supervised learning methods with broad applicability in the social and behavioral sciences: regularized regression models (GLMNET), random forest ensembles (RF), and support vector machines (SVM). Introductory theory for these methods will be included and recommendations for further readings will be provided.
This workshop’s practical focus will allow attendees to learn about: formulating a good research question that machine learning can answer, preparing data for analysis, setting up a rigorous cross-validation procedure, evaluating predictive performance, and interpreting/reporting results for a scientific audience.
What you’ll learn
What can machine learning do, and what can it not? Learn about the types of questions that supervised predictive modeling can and cannot answer. Formulate good research questions that match the pros and cons of the approach.
What are best practices for applied machine learning? Learn about the dangers of overfitting and how to set up a rigorous cross-validation procedure to prevent your analyses from producing misleading results.
Which approaches are best for smaller datasets? Learn the theoretical intuitions behind and practical application of traditional machine learning techniques that work well with smaller datasets (i.e., less than 1000 observations).
How do I write and review a machine learning paper? Learn all the steps necessary to design and run a machine learning experiment, as well as what information to include in your write-up and what to look for as a reviewer of such write-ups.
Syllabus
Day 1
Conceptual introductions
Tidyverse primer and data
Data splitting and validation
Model fitting and prediction
Day 2
Workflows and metrics
Feature engineering recipes
Resampling and cross-validation
Building a model: start to finish
Day 3
Regularization and elastic net
GLMNET example and tuning
Decision trees and random forests
RF example and reporting
Day 4
Support vector machines
Practical issues
Consulting
Registration Options
Machine Learning in R
- Professional
- $799
- Baseline Price for Faculty,
Staff, and Other Professionals - Click Register Below
- Trainee
- $799 $535
- 33% Discount for
Students and Postdocs - Use code "TRAINEE" at Checkout
- LMIC
- $799 $80
- 90% Discount for Learners in
Low and Middle Income Countries - Apply for the code
Note: All registration options for this workshop come with three things:
(1) Access to the video recording and materials of the 2024 version of the workshop until July 14, 2025
(2) The ability to attend the live recording of the 2025 version of the workshop on July 14-17, 2025
(3) Access to the video recording and materials of the 2025 version of the workshop after July 17, 2025
If this workshop is offered again in future years (e.g., 2026+), then you will have continued “evergreen” access to the new recordings and materials.
FAQs
-
Although attendees of all backgrounds are welcome and the skills taught will be broadly applicable, example datasets and advice will be tailored specifically to the social, behavioral, and medical sciences (e.g., psychology, medicine, education, and related fields).
-
Intermediate
-
Workshop attendees are not expected to have any background knowledge of machine learning, but some proficiency with R (e.g., knowledge of how to import data and manipulate data frames) will be assumed and some familiarity with statistical modeling (e.g., multiple regression and generalized linear models) will be helpful. If an attendee is new to R and/or the {tidyverse} package ecosystem, we recommend they also consider enrolling in the “R for Researchers Combo.”
-
R
-
Slides, example datasets, videos
-
This is meant to be an introductory workshop and, as such, we cannot cover all potential topics of interest. Advanced topics that will not be covered in this workshop include: neural networks and deep learning, unsupervised learning and clustering algorithms, recommender systems, generative models, domain-specific feature engineering (e.g., natural language processing, computer vision, and neuroscience), statistical comparison of model performance, model and prediction explanation, and forecasting from intensive longitudinal data. Some of these topics will be mentioned briefly but will not be covered extensively. Similarly, we are happy to discuss advanced/specialized topics during the consultation periods but there are limits on how much time we can spend and how deep we can go into them.