Data Science and Books – Eugene Burmistrov

Last week I completed R basic course on EdX by Harvard University. It is not as challenging as John Hopnins’es course, but well-organized and with lots of examples. The only bad thing happened at the end of the course when they raised the certificate price from 49 to 149 dollars. At first, I was sure that it’s a joke or kind of bug, but then I was forced to apply on financial support and got 90% discount.

After the course, I decided to play around with open data. I ended up on Moscow city open date site. After a short look, I decided to see the most popular books for different years among the users of Moscow libraries. I wrote a quick script to find out the book and author with the best rating for each year.

library("readxl")
library("dplyr")
my_data <- read_excel("data-29434-2020-04-29.xlsx")
p<-my_data %>%filter(PopularityRating  == 1)%>%select(Year, BookTitle, Author, Genre)
p

And results were predictable at first, but then I saw that autor I have never heard became first twice!

Looks like I need to put aside English literature and fill the gap in Russian culture.

Unfortunately, I was not able to read data in CSV and JSON format, and data lack of quantitative data, like numbers of checkout. But anyway, it’s a good experience and a nice way to practice playing with data

Leave a CommentCancel reply