Spotify

Background

My friend Freddy Velazquez is in need of help figuring out what makes music more popular. Canillas (Freddy) has 26 monthly listeners, and is trying to get more listeners. This would help since his new album comes out soon. I will be using a Linear Regression model to figure out what makes music popular. The spotify data I am using is from 2015 to 2020.

Data Analytics

Basic Stats

Summary of Popularity

pandary(spotify$popularity)

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.
0	60	64.5	64.36	70	100

Fav Stats of Popularity based on Year

pander(favstats(popularity ~ year, data = spotify)[,-10])

year	min	Q1	median	Q3	max	mean	sd	n
2015	48	54	59	64	91	59.55	6.636	1931
2016	50	57	61	65	92	61.37	6.329	1969
2017	54	60	64	68	87	64.86	6.059	2000
2018	56	62	66	71	92	67.28	6.177	2000
2019	58	65	68	74	95	69.66	6.538	2000
2020	0	63	68	73	100	63.11	21.28	1756

Linear Regression

simple <- function(x){
  
  # Preform Regression
  
  mylm <- lm(popularity ~ x, data = spotify)
  
  # Diagnostic
  
  par(mfrow=c(1,3))
  plot(mylm, which=1:2)
  plot(mylm$residuals)
  
  # Create Plot
  
  myplot <- ggplot(spotify, aes(y = popularity, x = x)) +
    geom_point(color = "red") +
    geom_smooth(method = "lm", se=FALSE, color = "black") +
    theme_bw()
  
  # Create List and Return List
  
  mylist <- list("lm" = mylm, "plot" = myplot)
  return(mylist)
}

Danceability

lm_danceability <- simple(spotify$danceability)

pandary(lm_danceability$lm)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	60.19	0.367	164	0
x	6.623	0.5609	11.81	5.413e-32

Fitting linear model: popularity ~ x
Observations	Residual Std. Error	\(R^2\)	Adjusted \(R^2\)
11656	10.63	0.01182	0.01174

lm_danceability$plot

Duration

lm_duration <- simple(spotify$duration_ms)

pandary(lm_duration$lm)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	68.67	0.3878	177.1	0
x	-2.066e-05	1.8e-06	-11.47	2.612e-30

Fitting linear model: popularity ~ x
Observations	Residual Std. Error	\(R^2\)	Adjusted \(R^2\)
11656	10.63	0.01117	0.01108

lm_duration$plot

Energy

lm_energy <- simple(spotify$energy)

pandary(lm_energy$lm)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	65.31	0.2979	219.2	0
x	-1.572	0.4693	-3.35	0.0008116

Fitting linear model: popularity ~ x
Observations	Residual Std. Error	\(R^2\)	Adjusted \(R^2\)
11656	10.69	0.0009619	0.0008761

lm_energy$plot

Liveness

lm_liveness <- simple(spotify$liveness)

pandary(lm_liveness$lm)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	64.82	0.1591	407.3	0
x	-2.534	0.6936	-3.653	0.0002599

Fitting linear model: popularity ~ x
Observations	Residual Std. Error	\(R^2\)	Adjusted \(R^2\)
11656	10.69	0.001144	0.001058

lm_liveness$plot

Loudness

lm_loudness <- simple(spotify$loudness)

pandary(lm_loudness$lm)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	64.71	0.1794	360.6	0
x	0.04487	0.01949	2.302	0.02133

Fitting linear model: popularity ~ x
Observations	Residual Std. Error	\(R^2\)	Adjusted \(R^2\)
11656	10.69	0.0004547	0.0003689

lm_loudness$plot

Tempo

lm_tempo <- simple(spotify$tempo)

pandary(lm_tempo$lm)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	64.36	0.3916	164.3	0
x	-1.17e-06	0.003169	-0.0003692	0.9997

Fitting linear model: popularity ~ x
Observations	Residual Std. Error	\(R^2\)	Adjusted \(R^2\)
11656	10.69	1.17e-11	-8.581e-05

lm_tempo$plot

Year

lm_year <- simple(spotify$year)

pandary(lm_year$lm)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-2683	116.1	-23.11	1.31e-115
x	1.362	0.05753	23.67	5.323e-121

Fitting linear model: popularity ~ x
Observations	Residual Std. Error	\(R^2\)	Adjusted \(R^2\)
11656	10.44	0.04586	0.04578

lm_year$plot

Interpretation

First I would like to mention that as years go on new songs increase in popularity, which is good for you. Next lets see what each regression tells us.

Danceability - Increasing popularity with increasing Danceability - Slope = 6.623
Duration - Decreasing popularity with increasing duration - Slope = -2.066e-05
Energy - Decreasing popularity with increasing energy - Slope = -1.572
Liveness - Decreasing popularity with increasing liveness - Slope = -2.534
Loudness - Increasing popularity with increasing loudness - Slope = 0.04487
Tempo - Decreasing popularity with increasing tempo - Slope = -1.17e-06
Year - Increasing popularity with increasing year - Slope = 1.362

So basically we it is danceable and loud then it would have a high popularity.

Freddy your music fits both being danceable and loud so so as lonas it’s not super long that your music would be popular.