Analysing Human Language using R:

From Audio and Text to Interpretable Results

A hands-on introduction to analysing and visualising human language in R, designed for researchers in the social, behavioral, and medical sciences. Participants will learn to transform audio recordings into transcriptions, generate scores for psychological constructs using pretrained language models, and visualise the language patterns that drive those scores. No prior experience with natural language processing or machine learning is required.

Register

Instructor:
Oscar Kjell, PhD (Lund University; Vanderbilt University)

Workshop Format:
Three-Half-Day Live Online Workshop

Workshop Dates and Times:
August 11–13, 2026
(2:00pm to 5:00pm CEST / 8:00am to 11:00am ET)

Video Availability:
New Workshop for 2026!
New Videos Available on August 14, 2026

Researchers across the social, behavioral, and medical sciences increasingly have access to rich language data — from interviews and open-ended surveys to audio recordings — but lack accessible tools to analyse it rigorously. This workshop introduces the R Language Analysis Suite (talk, text, topics, and the L-BAM Library), an open-source toolkit that supports the full pipeline from audio to interpretable results.

Participants will work through this pipeline end-to-end — from raw audio to publication-ready results. Using talk (www.r-talk.org), text (www.r-text.org), and topics (www.r-topics.org), we will generate transcriptions, embeddings, and language-based assessment scores (e.g., depression, satisfaction with life, worry, trust, surprise, anger, and implicit motives such as power and achievement), visualise the language patterns driving these scores, and perform basic validity checks. Throughout, we will emphasise open science practices, reproducible workflows, and the ethical considerations of working with human language data.

What you’ll learn

Audio Processing with talk: Transform audio recordings into transcriptions and audio embeddings, bringing spoken language data directly into R
Language Embedding and L-BAM Scoring with text: Convert text into numeric representations using Large Language Models, apply pretrained L-BAM models to generate scores for psychological constructs (e.g., depression, well-being, worry, and implicit motives), and optionally train your own models
Visualising Language Patterns with text and topics: Identify statistically relevant language patterns and create interpretable plots that reveal which words and phrases drive model scores, supporting both scientific communication and hypothesis testing
Reporting Methods and Results: Learn how to describe language-based methods and present results clearly and completely for publication, including guidance on what reviewers expect
Reproducible and Ethical Workflows: Integrate the full pipeline into open, reproducible R code and learn best practices for transparent reporting, result interpretation, and ethical handling of human language data

Syllabus

Day 1: From Audio to Language Scores

Research background and motivation for language-based assessment
Introduction to the R Language Analysis Suite: talk, text, topics, and L-BAMs
Using talk: converting audio recordings to transcriptions and audio embeddings
Using text: embedding text with transformer models

Day 2: Visualization, Interpretation, and Reporting

Applying L-BAMs: generating scores for psychological constructs (e.g., depression, well-being, worry, implicit motives, etc.)
Training your own models with text
Open science practices, reproducible workflows, and ethical considerations
Hands-on exercise: running the full audio-to-score pipeline on example data

Day 3: Visualization, Interpretation, and Reporting

Using topics: identifying and visualising statistically relevant language patterns
Visualising results: word plots, topic plots, and composite figures
Writing up methods and results for publication
Hands-on exercise: producing interpretable visualisations and a complete reproducible pipeline

Registration Options

Introduction to Measurement Invariance and Item Response Theory

Professional
Baseline Price for Faculty,
Staff, and Other Professionals
Click Register Below

Trainee
33% Discount for
Students and Postdocs
Use code "TRAINEE" at Checkout

LMIC
90% Discount for Learners in
Low and Middle Income Countries
Apply for the code

FAQs

This workshop is designed for researchers in the social, behavioural, and medical sciences who work with — or plan to work with — text or audio data. It is particularly well-suited for psychologists, psychiatrists, clinical researchers, and health scientists interested in language-based psychological assessment, as well as quantitative researchers looking to expand their methodological toolkit. The workshop is also relevant for those who collect interview data, open-ended survey responses, or speech recordings and would like to extract psychological meaning from these in a rigorous, reproducible way. No prior experience with natural language processing or machine learning is required.
Beginner to Intermediate. No prior experience with natural language processing, machine learning, or transformer models is assumed. Basic familiarity with R is expected — participants should be comfortable installing packages, running functions, and reading output. More experienced R users will also find value in the applied, domain-specific content.
Participants should have basic familiarity with R — comfortable installing packages, running functions, and reading output — and a general understanding of quantitative research methods (e.g., at the level of an introductory statistics course). No experience with natural language processing, machine learning, or text analysis is required; all relevant concepts will be introduced from the ground up. Participants who are new to R are encouraged to first complete the SMaRT Introduction to R for Researchers workshop.
All analyses will be conducted in R (version 4.1 or later) using RStudio. The following packages will be used and should be installed in advance: talk (www.r-talk.org), text (www.r-text.org), and topics (www.r-topics.org). A setup guide with installation instructions will be shared before the workshop. No paid software licences are required. Python is used in the background by some packages but will be installed automatically via text or talk.
Registered participants will receive annotated R scripts, slide decks, and example data sets (audio files, text data, and pre-computed embeddings). Video recordings of all sessions will be available to participants following the workshop, hosted and password-protected on the SMaRT website.

Analysing Human Language using R:

From Audio and Text to Interpretable Results

What you’ll learn

Syllabus

Day 1: From Audio to Language Scores

Day 2: Visualization, Interpretation, and Reporting

Day 3: Visualization, Interpretation, and Reporting

Registration Options

FAQs

Who should enroll in this workshop?

What level is this workshop?

What should learners already know?

What software will be used?

What materials are shared with learners?