Background

My friend Freddy Velazquez is in need of help figuring out what makes music more popular. Canillas (Freddy) has 26 monthly listeners, and is trying to get more listeners. This would help since his new album comes out soon. I will be using a Linear Regression model to figure out what makes music popular. The spotify data I am using is from 2015 to 2020.




Data Analytics

Basic Stats

Summary of Popularity

pandary(spotify$popularity)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 60 64.5 64.36 70 100

Fav Stats of Popularity based on Year

pander(favstats(popularity ~ year, data = spotify)[,-10])
year min Q1 median Q3 max mean sd n
2015 48 54 59 64 91 59.55 6.636 1931
2016 50 57 61 65 92 61.37 6.329 1969
2017 54 60 64 68 87 64.86 6.059 2000
2018 56 62 66 71 92 67.28 6.177 2000
2019 58 65 68 74 95 69.66 6.538 2000
2020 0 63 68 73 100 63.11 21.28 1756



Linear Regression

simple <- function(x){
  
  # Preform Regression
  
  mylm <- lm(popularity ~ x, data = spotify)
  
  # Diagnostic
  
  par(mfrow=c(1,3))
  plot(mylm, which=1:2)
  plot(mylm$residuals)
  
  # Create Plot
  
  myplot <- ggplot(spotify, aes(y = popularity, x = x)) +
    geom_point(color = "red") +
    geom_smooth(method = "lm", se=FALSE, color = "black") +
    theme_bw()
  
  # Create List and Return List
  
  mylist <- list("lm" = mylm, "plot" = myplot)
  return(mylist)
}

Danceability

lm_danceability <- simple(spotify$danceability)

pandary(lm_danceability$lm)
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.19 0.367 164 0
x 6.623 0.5609 11.81 5.413e-32
Fitting linear model: popularity ~ x
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
11656 10.63 0.01182 0.01174
lm_danceability$plot




Duration

lm_duration <- simple(spotify$duration_ms)

pandary(lm_duration$lm)
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.67 0.3878 177.1 0
x -2.066e-05 1.8e-06 -11.47 2.612e-30
Fitting linear model: popularity ~ x
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
11656 10.63 0.01117 0.01108
lm_duration$plot




Energy

lm_energy <- simple(spotify$energy)

pandary(lm_energy$lm)
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 65.31 0.2979 219.2 0
x -1.572 0.4693 -3.35 0.0008116
Fitting linear model: popularity ~ x
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
11656 10.69 0.0009619 0.0008761
lm_energy$plot




Liveness

lm_liveness <- simple(spotify$liveness)

pandary(lm_liveness$lm)
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 64.82 0.1591 407.3 0
x -2.534 0.6936 -3.653 0.0002599
Fitting linear model: popularity ~ x
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
11656 10.69 0.001144 0.001058
lm_liveness$plot




Loudness

lm_loudness <- simple(spotify$loudness)

pandary(lm_loudness$lm)
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 64.71 0.1794 360.6 0
x 0.04487 0.01949 2.302 0.02133
Fitting linear model: popularity ~ x
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
11656 10.69 0.0004547 0.0003689
lm_loudness$plot




Tempo

lm_tempo <- simple(spotify$tempo)

pandary(lm_tempo$lm)
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 64.36 0.3916 164.3 0
x -1.17e-06 0.003169 -0.0003692 0.9997
Fitting linear model: popularity ~ x
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
11656 10.69 1.17e-11 -8.581e-05
lm_tempo$plot




Year

lm_year <- simple(spotify$year)

pandary(lm_year$lm)
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -2683 116.1 -23.11 1.31e-115
x 1.362 0.05753 23.67 5.323e-121
Fitting linear model: popularity ~ x
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
11656 10.44 0.04586 0.04578
lm_year$plot




Interpretation

First I would like to mention that as years go on new songs increase in popularity, which is good for you. Next lets see what each regression tells us.

  • Danceability - Increasing popularity with increasing Danceability - Slope = 6.623
  • Duration - Decreasing popularity with increasing duration - Slope = -2.066e-05
  • Energy - Decreasing popularity with increasing energy - Slope = -1.572
  • Liveness - Decreasing popularity with increasing liveness - Slope = -2.534
  • Loudness - Increasing popularity with increasing loudness - Slope = 0.04487
  • Tempo - Decreasing popularity with increasing tempo - Slope = -1.17e-06
  • Year - Increasing popularity with increasing year - Slope = 1.362

So basically we it is danceable and loud then it would have a high popularity.

Freddy your music fits both being danceable and loud so so as lonas it’s not super long that your music would be popular.