Background
My friend Freddy Velazquez is in need of help figuring out what makes music more popular. Canillas (Freddy) has 26 monthly listeners, and is trying to get more listeners. This would help since his new album comes out soon. I will be using a Linear Regression model to figure out what makes music popular. The spotify data I am using is from 2015 to 2020.
Data Analytics
Basic Stats
Summary of Popularity
pandary(spotify$popularity)
Fav Stats of Popularity based on Year
pander(favstats(popularity ~ year, data = spotify)[,-10])
2015 |
48 |
54 |
59 |
64 |
91 |
59.55 |
6.636 |
1931 |
2016 |
50 |
57 |
61 |
65 |
92 |
61.37 |
6.329 |
1969 |
2017 |
54 |
60 |
64 |
68 |
87 |
64.86 |
6.059 |
2000 |
2018 |
56 |
62 |
66 |
71 |
92 |
67.28 |
6.177 |
2000 |
2019 |
58 |
65 |
68 |
74 |
95 |
69.66 |
6.538 |
2000 |
2020 |
0 |
63 |
68 |
73 |
100 |
63.11 |
21.28 |
1756 |
Linear Regression
simple <- function(x){
# Preform Regression
mylm <- lm(popularity ~ x, data = spotify)
# Diagnostic
par(mfrow=c(1,3))
plot(mylm, which=1:2)
plot(mylm$residuals)
# Create Plot
myplot <- ggplot(spotify, aes(y = popularity, x = x)) +
geom_point(color = "red") +
geom_smooth(method = "lm", se=FALSE, color = "black") +
theme_bw()
# Create List and Return List
mylist <- list("lm" = mylm, "plot" = myplot)
return(mylist)
}
Danceability
lm_danceability <- simple(spotify$danceability)
pandary(lm_danceability$lm)
(Intercept) |
60.19 |
0.367 |
164 |
0 |
x |
6.623 |
0.5609 |
11.81 |
5.413e-32 |
Fitting linear model: popularity ~ x
11656 |
10.63 |
0.01182 |
0.01174 |
lm_danceability$plot
Duration
lm_duration <- simple(spotify$duration_ms)
pandary(lm_duration$lm)
(Intercept) |
68.67 |
0.3878 |
177.1 |
0 |
x |
-2.066e-05 |
1.8e-06 |
-11.47 |
2.612e-30 |
Fitting linear model: popularity ~ x
11656 |
10.63 |
0.01117 |
0.01108 |
lm_duration$plot
Energy
lm_energy <- simple(spotify$energy)
pandary(lm_energy$lm)
(Intercept) |
65.31 |
0.2979 |
219.2 |
0 |
x |
-1.572 |
0.4693 |
-3.35 |
0.0008116 |
Fitting linear model: popularity ~ x
11656 |
10.69 |
0.0009619 |
0.0008761 |
lm_energy$plot
Liveness
lm_liveness <- simple(spotify$liveness)
pandary(lm_liveness$lm)
(Intercept) |
64.82 |
0.1591 |
407.3 |
0 |
x |
-2.534 |
0.6936 |
-3.653 |
0.0002599 |
Fitting linear model: popularity ~ x
11656 |
10.69 |
0.001144 |
0.001058 |
lm_liveness$plot
Loudness
lm_loudness <- simple(spotify$loudness)
pandary(lm_loudness$lm)
(Intercept) |
64.71 |
0.1794 |
360.6 |
0 |
x |
0.04487 |
0.01949 |
2.302 |
0.02133 |
Fitting linear model: popularity ~ x
11656 |
10.69 |
0.0004547 |
0.0003689 |
lm_loudness$plot
Tempo
lm_tempo <- simple(spotify$tempo)
pandary(lm_tempo$lm)
(Intercept) |
64.36 |
0.3916 |
164.3 |
0 |
x |
-1.17e-06 |
0.003169 |
-0.0003692 |
0.9997 |
Fitting linear model: popularity ~ x
11656 |
10.69 |
1.17e-11 |
-8.581e-05 |
lm_tempo$plot
Year
lm_year <- simple(spotify$year)
pandary(lm_year$lm)
(Intercept) |
-2683 |
116.1 |
-23.11 |
1.31e-115 |
x |
1.362 |
0.05753 |
23.67 |
5.323e-121 |
Fitting linear model: popularity ~ x
11656 |
10.44 |
0.04586 |
0.04578 |
lm_year$plot
Interpretation
First I would like to mention that as years go on new songs increase in popularity, which is good for you. Next lets see what each regression tells us.
- Danceability - Increasing popularity with increasing Danceability - Slope = 6.623
- Duration - Decreasing popularity with increasing duration - Slope = -2.066e-05
- Energy - Decreasing popularity with increasing energy - Slope = -1.572
- Liveness - Decreasing popularity with increasing liveness - Slope = -2.534
- Loudness - Increasing popularity with increasing loudness - Slope = 0.04487
- Tempo - Decreasing popularity with increasing tempo - Slope = -1.17e-06
- Year - Increasing popularity with increasing year - Slope = 1.362
So basically we it is danceable and loud then it would have a high popularity.
Freddy your music fits both being danceable and loud so so as lonas it’s not super long that your music would be popular.