Machine Learning in R

A hands-on introduction to machine learning in R with an emphasis on
supervised predictive modeling using the {tidymodels} ecosystem.

Instructors:
Jeffrey Girard, PhD (University of Kansas)
Shirley Wang, PhD (Yale University)

Workshop Dates and Times:
Tuesday, May 28th, 10:00am – 3:30pm ET
Wednesday, May 29th, 10:00am – 3:30pm ET
Thursday, May 30th, 10:00am – 3:30pm ET
Friday, May 31th, 10:00am – 3:30pm ET

Workshop Format:
Four-Day Synchronous Online Workshop

Whereas statistical methods traditionally used in the social and behavioral sciences emphasize interpretability and quantification of uncertainty, machine learning methods emphasize complexity and accuracy of predictions. Machine learning methods are thus particularly well-suited for applications where (1) there are nonlinear and complex relationships among a large number of predictor variables and (2) accurately predicting the outcome variable is more important than fully understanding the relationships between variables.

This workshop will provide a hands-on introduction to the application of machine learning techniques in R using the {tidymodels} packages. It will emphasize practical knowledge and conceptual intuitions (e.g., teaching you how to drive a car) rather than technical and theoretical mastery (e.g., teaching you how to build a car). In addition, rather than briefly surveying the full breadth of available machine learning techniques, this workshop will provide a deep dive into several supervised learning methods with broad applicability in the social and behavioral sciences: regularized regression models (GLMNET), random forest ensembles (RF), and support vector machines (SVM). Introductory theory for these methods will be included and recommendations for further readings will be provided.

This workshop’s practical focus will allow attendees to learn about: formulating a good research question that machine learning can answer, preparing data for analysis, setting up a rigorous cross-validation procedure, evaluating predictive performance, and interpreting/reporting results for a scientific audience.

What you’ll learn

  • What can machine learning do, and what can it not? Learn about the types of questions that supervised predictive modeling can and cannot answer. Formulate good research questions that match the pros and cons of the approach.

  • What are best practices for applied machine learning? Learn about the dangers of overfitting and how to set up a rigorous cross-validation procedure to prevent your analyses from producing misleading results.

  • Which approaches are best for smaller datasets? Learn the theoretical intuitions behind and practical application of traditional machine learning techniques that work well with smaller datasets (i.e., less than 1000 observations).

  • How do I write and review a machine learning paper? Learn all the steps necessary to design and run a machine learning experiment, as well as what information to include in your write-up and what to look for as a reviewer of such write-ups.

Syllabus

Day 1

  1. Conceptual introductions

  2. Tidyverse primer and data

  3. Data splitting and validation

  4. Model fitting and prediction

Day 2

  1. Workflows and metrics

  2. Feature engineering recipes

  3. Resampling and cross-validation

  4. Building a model: start to finish

Day 3

  1. Regularization and elastic net

  2. GLMNET example and tuning

  3. Decision trees and random forests

  4. RF example and reporting

Day 4

  1. Support vector machines

  2. Practical issues

  3. Consulting

Registration Options

Machine Learning in R

  • Professional
  • $999
  • Baseline Price for Faculty,
    Staff, and Other Professionals
  • Click Register Below
  • Trainee
  • $999 $666
  • 33% Discount for
    Students and Postdocs
  • Use code "TRAINEE" at Checkout

 FAQs

  • Although attendees of all backgrounds are welcome and the skills taught will be broadly applicable, example datasets and advice will be tailored specifically to the social, behavioral, and medical sciences (e.g., psychology, medicine, education, and related fields).

  • Intermediate

  • Workshop attendees are not expected to have any background knowledge of machine learning, but some proficiency with R (e.g., knowledge of how to import data and manipulate data frames) will be assumed and some familiarity with statistical modeling (e.g., multiple regression and generalized linear models) will be helpful. If an attendee is new to R and/or the {tidyverse} package ecosystem, we recommend they also consider enrolling in the “R for Researchers Combo.”

  • R

  • Slides, example datasets, videos

  • This is meant to be an introductory workshop and, as such, we cannot cover all potential topics of interest. Advanced topics that will not be covered in this workshop include: neural networks and deep learning, unsupervised learning and clustering algorithms, recommender systems, generative models, domain-specific feature engineering (e.g., natural language processing, computer vision, and neuroscience), statistical comparison of model performance, model and prediction explanation, and forecasting from intensive longitudinal data. Some of these topics will be mentioned briefly but will not be covered extensively. Similarly, we are happy to discuss advanced/specialized topics during the consultation periods but there are limits on how much time we can spend and how deep we can go into them.