Analysing Human Language using R:

From Audio and Text to Interpretable Results

A hands-on introduction to analysing and visualising human language in R, designed for researchers in the social, behavioral, and medical sciences. Participants will learn to transform audio recordings into transcriptions, generate scores for psychological constructs using pretrained language models, and visualise the language patterns that drive those scores. No prior experience with natural language processing or machine learning is required.

Instructor:
Oscar Kjell, PhD (Lund University; Vanderbilt University)

Workshop Format:
Three-Half-Day Live Online Workshop

Workshop Dates and Times:
August 11–13, 2026
(2:00pm to 5:00pm CEST / 8:00am to 11:00am ET)

Video Availability:
New Workshop for 2026!
New Videos Available on August 14, 2026

Researchers across the social, behavioral, and medical sciences increasingly have access to rich language data — from interviews and open-ended surveys to audio recordings — but lack accessible tools to analyse it rigorously. This workshop introduces the R Language Analysis Suite (talk, text, topics, and the L-BAM Library), an open-source toolkit that supports the full pipeline from audio to interpretable results.

Participants will work through this pipeline end-to-end — from raw audio to publication-ready results. Using talk (www.r-talk.org), text (www.r-text.org), and topics (www.r-topics.org), we will generate transcriptions, embeddings, and language-based assessment scores (e.g., depression, satisfaction with life, worry, trust, surprise, anger, and implicit motives such as power and achievement), visualise the language patterns driving these scores, and perform basic validity checks. Throughout, we will emphasise open science practices, reproducible workflows, and the ethical considerations of working with human language data.

What you’ll learn

  • Audio Processing with talk: Transform audio recordings into transcriptions and audio embeddings, bringing spoken language data directly into R

  • Language Embedding and L-BAM Scoring with text: Convert text into numeric representations using Large Language Models, apply pretrained L-BAM models to generate scores for psychological constructs (e.g., depression, well-being, worry, and implicit motives), and optionally train your own models

  • Visualising Language Patterns with text and topics: Identify statistically relevant language patterns and create interpretable plots that reveal which words and phrases drive model scores, supporting both scientific communication and hypothesis testing

  • Reporting Methods and Results: Learn how to describe language-based methods and present results clearly and completely for publication, including guidance on what reviewers expect

  • Reproducible and Ethical Workflows: Integrate the full pipeline into open, reproducible R code and learn best practices for transparent reporting, result interpretation, and ethical handling of human language data

Syllabus

  • Research background and motivation for language-based assessment

  • Introduction to the R Language Analysis Suite: talk, text, topics, and L-BAMs

  • Using talk: converting audio recordings to transcriptions and audio embeddings

  • Using text: embedding text with transformer models

  • Applying L-BAMs: generating scores for psychological constructs (e.g., depression, well-being, worry, implicit motives, etc.)

  • Open science practices, reproducible workflows, and ethical considerations

  • Hands-on exercise: running the full audio-to-score pipeline on example data

  • Using topics: identifying and visualising statistically relevant language patterns

  • Visualising results: word plots, topic plots, and composite figures

  • Training your own models and producing semantic similarity scores with text

  • Basic validity checks and interpreting outputs

  • Writing up methods and results for publication

  • Hands-on exercise: producing interpretable visualisations and a complete reproducible pipeline

Registration Options

Introduction to Measurement Invariance and Item Response Theory

  • Professional
  • Baseline Price for Faculty,
    Staff, and Other Professionals
  • Click Register Below
  • Trainee
  • 33% Discount for
    Students and Postdocs
  • Use code "TRAINEE" at Checkout

 FAQs

  • This workshop is designed for researchers in the social, behavioural, and medical sciences who work with — or plan to work with — text or audio data. It is particularly well-suited for psychologists, psychiatrists, clinical researchers, and health scientists interested in language-based psychological assessment, as well as quantitative researchers looking to expand their methodological toolkit. The workshop is also relevant for those who collect interview data, open-ended survey responses, or speech recordings and would like to extract psychological meaning from these in a rigorous, reproducible way. No prior experience with natural language processing or machine learning is required.

  • Beginner to Intermediate. No prior experience with natural language processing, machine learning, or transformer models is assumed. Basic familiarity with R is expected — participants should be comfortable installing packages, running functions, and reading output. More experienced R users will also find value in the applied, domain-specific content.

  • Participants should have basic familiarity with R — comfortable installing packages, running functions, and reading output — and a general understanding of quantitative research methods (e.g., at the level of an introductory statistics course). No experience with natural language processing, machine learning, or text analysis is required; all relevant concepts will be introduced from the ground up. Participants who are new to R are encouraged to first complete the SMaRT Introduction to R for Researchers workshop.

  • All analyses will be conducted in R (version 4.1 or later) using RStudio. The following packages will be used and should be installed in advance: talk (www.r-talk.org), text (www.r-text.org), and topics (www.r-topics.org). A setup guide with installation instructions will be shared before the workshop. No paid software licences are required. Python is used in the background by some packages but will be installed automatically via text or talk.

  • Registered participants will receive annotated R scripts, slide decks, and example data sets (audio files, text data, and pre-computed embeddings). Video recordings of all sessions will be available to participants following the workshop, hosted and password-protected on the SMaRT website.