Finding Song Titles that do not appear in the Lyrics

Do we need AI for that?

Julian During https://datannery.com
2026-02-28

Idea

Recently I was listening to the latest Hardfork podcast. This is a podcast about tech and foremost about AI. They’ve talked about a new Spotify feature, that let’s you create playlists based on some prompt instructions you gave the tool. The example mentioned was “Songs that do not have the title in the lyrics”. A question in my mind was: Do we need AI for that? This should be a not too complicated query. Let’s try this out in the following.

Data

Download the dataset from ‘kaggle’:

lyrics_csv <- function() {
  base_url <- str_glue(
    "https://www.kaggle.com/api/v1/datasets/download/carlosgdcj/",
    "genius-song-lyrics-with-language-information")

  request(base_url) |>
    req_perform(path = "title_lyrics.zip")

  csv_path <- unzip("title_lyrics.zip", overwrite = TRUE)
  file_delete("title_lyrics.zip")

  return(csv_path)
}
csv_lyrics  <- lyrics_csv()

Read in the CSV file:

df_lyrics  <- tarchetypes::tar_group_count_run(data = read_csv(csv_lyrics), 
     100L)

In total we get a data frame with 5134856 rows.

To find the songs we are looking for, do the following:

title_not_in_lyrics <- function(df_lyrics) {
  df_lyrics |>
    mutate(
      title_clean = str_remove_all(title, "\\p{Cf}|\\?|\\\\|\\{|\\}"),
    ) |>
    filter(
      str_detect(
        lyrics,
        regex(title_clean, ignore_case = TRUE),
        negate = TRUE
      )
    )
}
df_title_not_in_lyrics  <- title_not_in_lyrics(df_lyrics)

In total 45.4% of all songs comply with the above mentioned criteria.

Some examples are:

Our analysis shows that approximately 45.4% of songs have titles that don’t appear in their lyrics. This answers our original question: no, we don’t necessarily need AI for this task—a well-crafted regular expression does the job.

Acknowledgments

This work was completed using R v. 4.5.2 (R Core Team 2025) and the following R packages: arrow v. 20.0.0 (Richardson et al. 2025), crew v. 1.3.0 (Landau 2025), distill v. 1.6 (Dervieux et al. 2023), fs v. 1.6.6 (Hester, Wickham, and Csárdi 2025), httr2 v. 1.2.2 (Wickham 2025), knitr v. 1.51 (Xie 2014, 2015, 2025), reactable v. 0.4.5 (Lin 2025), rmarkdown v. 2.30 (Xie, Allaire, and Grolemund 2018; Xie, Dervieux, and Riederer 2020; Allaire et al. 2025), scales v. 1.4.0 (Wickham, Pedersen, and Seidel 2025), shiny v. 1.12.1 (Chang et al. 2025), tarchetypes v. 0.14.0 (Landau 2021a), targets v. 1.12.0 (Landau 2021b), tidyverse v. 2.0.0 (Wickham et al. 2019).

Allaire, JJ, Yihui Xie, Christophe Dervieux, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, et al. 2025. rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Garrick Aden-Buie, Yihui Xie, et al. 2025. shiny: Web Application Framework for r. https://doi.org/10.32614/CRAN.package.shiny.
Dervieux, Christophe, JJ Allaire, Rich Iannone, Alison Presmanes Hill, and Yihui Xie. 2023. distill: R Markdown Format for Scientific and Technical Writing. https://doi.org/10.32614/CRAN.package.distill.
Hester, Jim, Hadley Wickham, and Gábor Csárdi. 2025. fs: Cross-Platform File System Operations Based on libuv. https://doi.org/10.32614/CRAN.package.fs.
Landau, William Michael. 2021a. tarchetypes: Archetypes for Targets.
———. 2021b. “The Targets r Package: A Dynamic Make-Like Function-Oriented Pipeline Toolkit for Reproducibility and High-Performance Computing.” Journal of Open Source Software 6 (57): 2959. https://doi.org/10.21105/joss.02959.
———. 2025. crew: A Distributed Worker Launcher Framework. https://doi.org/10.32614/CRAN.package.crew.
Lin, Greg. 2025. reactable: Interactive Data Tables for r. https://doi.org/10.32614/CRAN.package.reactable.
R Core Team. 2025. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Richardson, Neal, Ian Cook, Nic Crane, Dewey Dunnington, Romain François, Jonathan Keane, Dragoș Moldovan-Grünfeld, Jeroen Ooms, Jacob Wujciak-Jens, and Apache Arrow. 2025. arrow: Integration to Apache Arrow. https://doi.org/10.32614/CRAN.package.arrow.
Wickham, Hadley. 2025. Httr2: Perform HTTP Requests and Process the Responses. https://doi.org/10.32614/CRAN.package.httr2.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Thomas Lin Pedersen, and Dana Seidel. 2025. scales: Scale Functions for Visualization. https://doi.org/10.32614/CRAN.package.scales.
Xie, Yihui. 2014. knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2025. knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.org/knitr/.
Xie, Yihui, J. J. Allaire, and Garrett Grolemund. 2018. R Markdown: The Definitive Guide. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown.
Xie, Yihui, Christophe Dervieux, and Emily Riederer. 2020. R Markdown Cookbook. Boca Raton, Florida: Chapman; Hall/CRC. https://bookdown.org/yihui/rmarkdown-cookbook.

References

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://codeberg.org/duju211/title_lyrics, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

During (2026, Feb. 28). Datannery: Finding Song Titles that do not appear in the Lyrics. Retrieved from https://www.datannery.com/posts/finding-song-titles-that-do-not-appear-in-the-lyrics/

BibTeX citation

@misc{during2026finding,
  author = {During, Julian},
  title = {Datannery: Finding Song Titles that do not appear in the Lyrics},
  url = {https://www.datannery.com/posts/finding-song-titles-that-do-not-appear-in-the-lyrics/},
  year = {2026}
}