CRJ Theme Map — Analysis Process

This document describes how the lyrical theme analysis was conducted across Carly Rae Jepsen’s discography.

Overview

183 songs across 22 album releases were analyzed for lyrical themes and sentiments using Claude (claude-sonnet-4-6) running in Cowork mode. The analysis was conducted in three phases: data collection, theme extraction, and visualization.


Phase 1 — Data Collection

Songs and lyrics were scraped from the Genius API using scraper.py. The scrape produced:

  • 197 songs identified across CRJ’s catalog
  • 183 songs with lyrics successfully retrieved (93% coverage)
  • Coverage spans from Dear You EP (2004) through CRJ7* (2026)
  • Songs without lyrics (14 total) were excluded from analysis

Output: songs.json — a flat array of song objects with title, album, release year, and lyrics fields.


Phase 2 — Theme Extraction

Splitting into individual files

songs.json was split into 183 individual files under songs/, one per song, named {index:03d}_{slug}.json (e.g. 014_run_away_with_me.json). This enabled parallel processing by multiple agents.

Parallel analysis

10 Claude sub-agents were spawned simultaneously, each assigned a batch of ~18–19 songs (by numeric file prefix). Each agent:

  1. Read each song’s lyrics from its individual JSON file
  2. Identified 3–6 themes per song from a fixed controlled vocabulary of 20 terms
  3. Assigned a sentiment to each theme: positive, negative, or ambivalent
  4. Wrote a structured JSON result to analysis/, matching the source filename

Theme vocabulary

All agents used the same controlled vocabulary to ensure consistency across the dataset:

longing · desire · heartbreak · obsession · infatuation · self-worth · vulnerability · joy · euphoria · fantasy · escapism · nostalgia · resilience · empowerment · melancholy · intimacy · unrequited-love · new-love · loss · connection

Output format

Each file in analysis/ follows this structure:

{
  "title": "Run Away with Me",
  "album": "E•MO•TION (10th Anniversary Edition)",
  "filename": "014_run_away_with_me.json",
  "themes": [
    { "theme": "escapism", "sentiment": "positive" },
    { "theme": "desire", "sentiment": "positive" },
    { "theme": "euphoria", "sentiment": "positive" },
    { "theme": "fantasy", "sentiment": "positive" },
    { "theme": "connection", "sentiment": "positive" }
  ]
}

Notes on album labeling

Genius tagged some albums with edition-specific names. For analysis purposes, the following mappings were applied:

Genius label Treated as
Kiss (Tour Edition) Kiss
E•MO•TION (Japanese Edition) E·MO·TION (sparse — 1 song)
E•MO•TION: Side B+ (Japan Exclusive) E·MO·TION Side B
Dedicated (Japanese Edition) Dedicated
Dedicated Side B (Japanese Edition) Dedicated Side B
Tug of War (Japan CD) Tug of War

Remix albums, tour editions, and standalone singles were excluded from the primary visualization but remain in the analysis/ folder.


Phase 3 — Visualization

An interactive HTML heatmap was generated at theme_chart.html showing:

  • Albums as columns (ordered chronologically, 2004–2026)
  • Themes as rows
  • Cell values as the percentage of an album’s songs that carry each theme
  • Tooltips listing the specific songs for each album × theme combination on hover

A bar chart below the heatmap shows the average theme prevalence across all main studio albums.


Files

File/Folder Description
scraper.py Genius API scraper — fetches songs and lyrics
songs.json Raw scrape output — 197 songs, 183 with lyrics
songs/ Individual song JSON files (one per song with lyrics)
analysis/ Theme analysis output — one JSON file per song
theme_chart.html Interactive heatmap visualization
ANALYSIS.md This document

Key findings

  • Desire is the single most prevalent theme across CRJ’s catalog, appearing in songs on every major album
  • Vulnerability is consistently present but shifts in sentiment — predominantly negative in early albums (Tug of War, Kiss) and increasingly ambivalent in later work (Dedicated era onward)
  • E·MO·TION 10th Anniversary has the highest concentration of desire (17 songs, 71% of the album)
  • CRJ7* leads the catalog on longing (9 songs, 64% of the album)
  • Kiss has the highest vulnerability count in absolute terms (12 songs)
  • Obsession appears almost exclusively with negative sentiment; joy and euphoria are nearly always positive
  • The Dedicated era (2019–2020) shows the widest thematic range of any period, combining obsessive desire with empowerment and resilience