CRJ Theme Map — Analysis Process
This document describes how the lyrical theme analysis was conducted across Carly Rae Jepsen’s discography.
Overview
183 songs across 22 album releases were analyzed for lyrical themes and sentiments using Claude (claude-sonnet-4-6) running in Cowork mode. The analysis was conducted in three phases: data collection, theme extraction, and visualization.
Phase 1 — Data Collection
Songs and lyrics were scraped from the Genius API using scraper.py. The scrape produced:
- 197 songs identified across CRJ’s catalog
- 183 songs with lyrics successfully retrieved (93% coverage)
- Coverage spans from Dear You EP (2004) through CRJ7* (2026)
- Songs without lyrics (14 total) were excluded from analysis
Output: songs.json — a flat array of song objects with title, album, release year, and lyrics fields.
Phase 2 — Theme Extraction
Splitting into individual files
songs.json was split into 183 individual files under songs/, one per song, named {index:03d}_{slug}.json (e.g. 014_run_away_with_me.json). This enabled parallel processing by multiple agents.
Parallel analysis
10 Claude sub-agents were spawned simultaneously, each assigned a batch of ~18–19 songs (by numeric file prefix). Each agent:
- Read each song’s lyrics from its individual JSON file
- Identified 3–6 themes per song from a fixed controlled vocabulary of 20 terms
- Assigned a sentiment to each theme:
positive,negative, orambivalent - Wrote a structured JSON result to
analysis/, matching the source filename
Theme vocabulary
All agents used the same controlled vocabulary to ensure consistency across the dataset:
longing · desire · heartbreak · obsession · infatuation · self-worth · vulnerability · joy · euphoria · fantasy · escapism · nostalgia · resilience · empowerment · melancholy · intimacy · unrequited-love · new-love · loss · connection
Output format
Each file in analysis/ follows this structure:
{
"title": "Run Away with Me",
"album": "E•MO•TION (10th Anniversary Edition)",
"filename": "014_run_away_with_me.json",
"themes": [
{ "theme": "escapism", "sentiment": "positive" },
{ "theme": "desire", "sentiment": "positive" },
{ "theme": "euphoria", "sentiment": "positive" },
{ "theme": "fantasy", "sentiment": "positive" },
{ "theme": "connection", "sentiment": "positive" }
]
}Notes on album labeling
Genius tagged some albums with edition-specific names. For analysis purposes, the following mappings were applied:
| Genius label | Treated as |
|---|---|
| Kiss (Tour Edition) | Kiss |
| E•MO•TION (Japanese Edition) | E·MO·TION (sparse — 1 song) |
| E•MO•TION: Side B+ (Japan Exclusive) | E·MO·TION Side B |
| Dedicated (Japanese Edition) | Dedicated |
| Dedicated Side B (Japanese Edition) | Dedicated Side B |
| Tug of War (Japan CD) | Tug of War |
Remix albums, tour editions, and standalone singles were excluded from the primary visualization but remain in the analysis/ folder.
Phase 3 — Visualization
An interactive HTML heatmap was generated at theme_chart.html showing:
- Albums as columns (ordered chronologically, 2004–2026)
- Themes as rows
- Cell values as the percentage of an album’s songs that carry each theme
- Tooltips listing the specific songs for each album × theme combination on hover
A bar chart below the heatmap shows the average theme prevalence across all main studio albums.
Files
| File/Folder | Description |
|---|---|
scraper.py |
Genius API scraper — fetches songs and lyrics |
songs.json |
Raw scrape output — 197 songs, 183 with lyrics |
songs/ |
Individual song JSON files (one per song with lyrics) |
analysis/ |
Theme analysis output — one JSON file per song |
theme_chart.html |
Interactive heatmap visualization |
ANALYSIS.md |
This document |
Key findings
- Desire is the single most prevalent theme across CRJ’s catalog, appearing in songs on every major album
- Vulnerability is consistently present but shifts in sentiment — predominantly negative in early albums (Tug of War, Kiss) and increasingly ambivalent in later work (Dedicated era onward)
- E·MO·TION 10th Anniversary has the highest concentration of desire (17 songs, 71% of the album)
- CRJ7* leads the catalog on longing (9 songs, 64% of the album)
- Kiss has the highest vulnerability count in absolute terms (12 songs)
- Obsession appears almost exclusively with negative sentiment; joy and euphoria are nearly always positive
- The Dedicated era (2019–2020) shows the widest thematic range of any period, combining obsessive desire with empowerment and resilience