Sentiment analysis of Genius songs lyrics
Aug 14, 2020
Maria Dermit
5 minute read

Sentiment analysis on Katty Perry upcoming album Smile.

Smile is the upcoming sixth studio album by American singer Katy Perry. It is scheduled to be released in two weeks time, on August 28 2020. Perry describes her last album Smile as her “journey towards the light, with stories of resilience, hope, and love”. According to her, she has gone through some emotionally challenged times lately, and expecting a child could just add more into that emotional roller coaster 🎢.

library(tidyverse)
library(genius)
library(tidytext)
library(tidymodels)
library(blogdown)

I am going to obtain the lyrics of that album trough Genius (world’s biggest collection of song lyrics and musical knowledge) using the genius package.

Getting data

smile_raw<-genius_album(artist = "Katy Perry", album = "Smile")

Data preparation

tidy_smile <- smile_raw %>%
  unnest_tokens(word, lyric)

tidy_smile  %>% count(track_title)
## # A tibble: 12 x 2
##    track_title                  n
##    <chr>                    <int>
##  1 Champagne Problems          18
##  2 Cry About It Later          92
##  3 Daisies                      1
##  4 Harleys in Hawaii          339
##  5 Never Really Over            1
##  6 Not the End of the World    60
##  7 Only Love                   18
##  8 Resilient                  213
##  9 Smile                      333
## 10 Teary Eyes                  81
## 11 Tucked                       1
## 12 What Makes a Woman           1
smile_raw %>% filter(track_title=="Tucked")
## # A tibble: 1 x 4
##   track_n  line lyric track_title
##     <int> <int> <chr> <chr>      
## 1       9    NA <NA>  Tucked

We can do text analysis using the fantastic tidytext package. To use tidytext, we need to tokenise words to extract meaning and use tidytext’s sentiment analysis capabilities. We can use unnest_tokens function uses the tokenizers package to separate each lyric line into words. The default tokenizing is for words, but other options include characters, n-grams, sentences, lines, paragraphs, or separation around a regex pattern. We realised the lyrics of some songs have yet to be released.

tidy_smile <- tidy_smile %>%
  anti_join(get_stopwords())

We can remove stop words (available via the function get_stopwords()) with an anti_join(). Lets count the frequency of words.

tidy_smile %>%
  count(word, sort = TRUE) 
## # A tibble: 233 x 2
##    word         n
##    <chr>    <int>
##  1 smile       19
##  2 oh          17
##  3 baby        16
##  4 back        15
##  5 now         15
##  6 cause       13
##  7 finally     12
##  8 got         11
##  9 grateful    11
## 10 jeweler     11
## # … with 223 more rows

There is a range of methods and dictionaries for evaluating the opinion or emotion in text. The tidytext package comes with several sentiment lexicons.

get_sentiments("bing") %>% count(sentiment)
## # A tibble: 2 x 2
##   sentiment     n
##   <chr>     <int>
## 1 negative   4781
## 2 positive   2005
get_sentiments("nrc") %>% count(sentiment)
## # A tibble: 10 x 2
##    sentiment        n
##    <chr>        <int>
##  1 anger         1247
##  2 anticipation   839
##  3 disgust       1058
##  4 fear          1476
##  5 joy            689
##  6 negative      3324
##  7 positive      2312
##  8 sadness       1191
##  9 surprise       534
## 10 trust         1231
get_sentiments("loughran") %>% count(sentiment)
## # A tibble: 6 x 2
##   sentiment        n
##   <chr>        <int>
## 1 constraining   184
## 2 litigious      904
## 3 negative      2355
## 4 positive       354
## 5 superfluous     56
## 6 uncertainty    297

Sentiment analysis

In this analysis I am going to use 2 lexicons:

smile_sentiment <- tidy_smile %>%
  inner_join(get_sentiments("bing"), by = "word") %>% 
  count(track_title, sentiment) %>% 
  spread(sentiment, n, fill = 0) %>% 
  mutate(sentiment = positive - negative)
smile_sentiment
## # A tibble: 6 x 4
##   track_title              negative positive sentiment
##   <chr>                       <dbl>    <dbl>     <dbl>
## 1 Cry About It Later              5        6         1
## 2 Harleys in Hawaii               0        4         4
## 3 Not the End of the World        4        2        -2
## 4 Resilient                       6       15         9
## 5 Smile                          13       56        43
## 6 Teary Eyes                      0        4         4
smile_sentiment_nrc <- tidy_smile %>%
  inner_join(get_sentiments("nrc"), by = "word") %>% 
  count(track_title, sentiment) 
smile_sentiment_nrc
## # A tibble: 50 x 3
##    track_title        sentiment        n
##    <chr>              <chr>        <int>
##  1 Cry About It Later anticipation     5
##  2 Cry About It Later joy              4
##  3 Cry About It Later negative         8
##  4 Cry About It Later positive         5
##  5 Cry About It Later sadness          4
##  6 Cry About It Later surprise         1
##  7 Cry About It Later trust            1
##  8 Harleys in Hawaii  anger            1
##  9 Harleys in Hawaii  anticipation     4
## 10 Harleys in Hawaii  disgust          1
## # … with 40 more rows

Plotting the data

ggplot(smile_sentiment, aes(track_title, sentiment, fill = track_title)) +
  geom_bar(stat = "identity", show.legend = FALSE)+
   theme_minimal(base_family = "")+
  theme(plot.title.position = "plot")+
  labs(x = "",
         y = "Sentiment",
         title = "Sentiment analysis per words across songs",
         subtitle = "Katty Perry's last album Smile")

BJTL

Despite the song called Daisies could sound bucolic, the song titled with this name has lowest sentiment rating (a.k.a more negative-associated words than positive-associated words).

ggplot(smile_sentiment_nrc, aes( n,sentiment, fill = track_title)) +
  geom_bar(stat = "identity", show.legend = FALSE)+
  theme_minimal(base_family = "")+
  theme(plot.title.position = "plot")+
  facet_wrap(~track_title, ncol = 2, scales="free") +
  labs(x = "",
         y = "Sentiment",title = "Sentiment analysis per words across songs",
         subtitle = "Katty Perry's last album Smile")

nrc Overall, the songs have quite positive feelings/sentiments. Smile song has many positive and joyful terms and little fear-associated words. On the other hand, Daisies song, is the saddest song of the album, having as much fear as positive associated words and little joy.

Therefore, if you want to listen to the latest, more positive Katty Perry, click on the video image to watch her last song Smile 😄!

KP