Do we need AI for that?
Recently I was listening to the latest Hardfork podcast. This is a podcast about tech and foremost about AI. They’ve talked about a new Spotify feature, that let’s you create playlists based on some prompt instructions you gave the tool. The example mentioned was “Songs that do not have the title in the lyrics”. A question in my mind was: Do we need AI for that? This should be a not too complicated query. Let’s try this out in the following.
Download the dataset from ‘kaggle’:
lyrics_csv <- function() {
base_url <- str_glue(
"https://www.kaggle.com/api/v1/datasets/download/carlosgdcj/",
"genius-song-lyrics-with-language-information")
request(base_url) |>
req_perform(path = "title_lyrics.zip")
csv_path <- unzip("title_lyrics.zip", overwrite = TRUE)
file_delete("title_lyrics.zip")
return(csv_path)
}
csv_lyrics <- lyrics_csv()
Read in the CSV file:
df_lyrics <- tarchetypes::tar_group_count_run(data = read_csv(csv_lyrics),
100L)
In total we get a data frame with 5134856 rows.
To find the songs we are looking for, do the following:
title_not_in_lyrics <- function(df_lyrics) {
df_lyrics |>
mutate(
title_clean = str_remove_all(title, "\\p{Cf}|\\?|\\\\|\\{|\\}"),
) |>
filter(
str_detect(
lyrics,
regex(title_clean, ignore_case = TRUE),
negate = TRUE
)
)
}
df_title_not_in_lyrics <- title_not_in_lyrics(df_lyrics)
In total 45.4% of all songs comply with the above mentioned criteria.
Some examples are:
Our analysis shows that approximately 45.4% of songs have titles that don’t appear in their lyrics. This answers our original question: no, we don’t necessarily need AI for this task—a well-crafted regular expression does the job.
This work was completed using R v. 4.5.2 (R Core Team 2025) and the following R packages: arrow v. 20.0.0 (Richardson et al. 2025), crew v. 1.3.0 (Landau 2025), distill v. 1.6 (Dervieux et al. 2023), fs v. 1.6.6 (Hester, Wickham, and Csárdi 2025), httr2 v. 1.2.2 (Wickham 2025), knitr v. 1.51 (Xie 2014, 2015, 2025), reactable v. 0.4.5 (Lin 2025), rmarkdown v. 2.30 (Xie, Allaire, and Grolemund 2018; Xie, Dervieux, and Riederer 2020; Allaire et al. 2025), scales v. 1.4.0 (Wickham, Pedersen, and Seidel 2025), shiny v. 1.12.1 (Chang et al. 2025), tarchetypes v. 0.14.0 (Landau 2021a), targets v. 1.12.0 (Landau 2021b), tidyverse v. 2.0.0 (Wickham et al. 2019).
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://codeberg.org/duju211/title_lyrics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
During (2026, Feb. 28). Datannery: Finding Song Titles that do not appear in the Lyrics. Retrieved from https://www.datannery.com/posts/finding-song-titles-that-do-not-appear-in-the-lyrics/
BibTeX citation
@misc{during2026finding,
author = {During, Julian},
title = {Datannery: Finding Song Titles that do not appear in the Lyrics},
url = {https://www.datannery.com/posts/finding-song-titles-that-do-not-appear-in-the-lyrics/},
year = {2026}
}