A data analysis using Goodreads

For the last year or so I’ve tracked the books I’ve read using the online semi-social media website Goodreads. It took me a while to finally check it out after first hearing about it, but it’s quickly become a favourite tool for tracking my reading and deciding what to read next. I really like that I can log in and quickly see how many books I’ve read in a month or year, and in what order, and when I read particular titles. I rarely bother with written reviews, but I do rate each book I finish with their five-star rating system, so I can recall how much I liked something or easily pull up a list of my favourite books to share with others.

I also use the ratings system for deciding whether or not to invest the time in reading a book. The site takes all users’ ratings and averages them out to provide a rating for the book. It’s a two-decimal number, with most books falling plus-or-minus half a star each side of 4.00.

(The rest of this might get boring. Feel free to skip it! :) But do go check out Goodreads yourself, if you haven’t yet. You can find me and add me as a friend there.)

Even though there doesn’t seem like a lot of difference in half a star, I’ve begun consistently noticing a pattern in what books I love versus books I’m meh about. I’m a science geek, I love charts and graphs and data-mining and interesting trends and correlations. So naturally, because I was wondering whether the pattern was actually borne out by the data, I imported all my ratings and the books’ ratings into Excel and did up a quick scatterplot.

On the vertical axis is my rating. It can only be whole numbers (3, 4, 5). On the horizontal axis is the Goodreads averaged rating, which is to two decimal places. I’ve got 150 titles in my Goodreads "read" category; most of these are YA, but there are a few adult and non-fiction titles in there as well. I removed the non-fiction but included everything else in the graph.


The thing I’d been picking up on was that if a book was rated less than 3.8 on Goodreads I was never going to love it (5-star – I keep thinking about it for a day or two after closing the cover; any book that makes me cry is an automatic 5-star), and whether or not I liked it (4-star – I really enjoyed it but forget about it quickly after I’m done) or thought it was meh (3-star – books I was interested in enough to finish but probably wouldn’t bother picking up a sequel) was hit or miss. (I rarely give books 1- or 2-star ratings – if I dislike a book that much, normally I won’t finish. There have only been a couple of exceptions to this, seen in this chart.)

And the data, it turns out, generally holds up. I have never given a 5-star rating to any book with less than a 3.75 on Goodreads – and in fact, of the 43 titles I’ve rated 5 stars, only 7 of them are less than 4.00. Of those 7, 4 of them are by two particular authors I happen to like, so only 3 are the sort I might randomly pick up. On the flip side of things, of the 33 books I’ve rated 3 stars, just 9 of them have a Goodreads rating over 4.00.

Things get a bit muddier with my 4-star ratings, but the graphs seem to show a thicker cluster of books in the middle of each line, with thinner points at each end. This does actually translate to different averages/medians:

For books I’ve given 3 stars, the average rating is 3.87 and the median is 3.82. For 4-star books the average is 4.04 and the median 4.06. And for my 5-star reads the average is 4.17 and the median 4.13.

Here’s an even more interesting graph. This is for all books/formats. The red squares represent the middle 50% of books with that star rating, with the upper and lower 25% on either side. The 4- and 5-star books overlap quite a bit, but the middle 50% of the 3-star books is distinctly lower.


And one final graph, because I love graphs. This one counts number of books I’ve given each star rating to according to Goodreads rating.


So, takeaway: a book being rated over 4.00 is not a guarantee that I’ll love it, but being rated less than 4.00 is a pretty good indication I won’t.

With so many books out there, I’ve started using this information to help me decide what to read next. If I find something that looks interesting in the library, I’ll use their computer to look up its Goodreads rating. If something’s rated less than 3.8 I return it to the shelf, even if the blurb sounds really interesting. And between 3.8 and 4.0 I might waffle over it, making my decision based on if I’ve got other stuff in the queue or have been hearing rave reviews of the book.

How about you? Noticed any patterns in your reading?


7 responses to “A data analysis using Goodreads

  1. I read through the text in my email update and then clicked on this, so I had to go hunting for axes labels & was quite confused until I found them. But once I found them, I thought it was really interesting.
    I’ve been picked up random middle grade books from the library that look interesting (or that I’ve heard a lot about) – the main trend: if I’ve heard a lot about it, it won’t be checked into the library.
    Also, I’ve been thinking of my goodreads reviews as practice for writing summaries/query letters and thinking about what I like to see in a review. I don’t think it’s the same though – I’m too vested in my own work.

    • Sorry about the axis labels, Sarah. I used to make graphs in Excel all the time, but hadn’t in the new Office 2007 version and couldn’t figure out how to insert it (or, at least, not in the amount of time I wanted to bother fiddling with it).

      That’s a funny comment re: your library. Perhaps it’s because our library/town is small, but I find the hot new YA titles do usually show up soon after release on the new releases shelves if I check each time I’m in.

      Good thought about writing reviews. I think, too, it’s a slightly different approach simply because the review should focus more on the strengths and weaknesses of the work (the blurb being already provided on the page), but that in itself is useful to practice.

  2. I use LibraryThing, not Goodreads, but I’m glad to see I’m not the only one who’s obsessed with tracking and analyzing their reading habits! Just a comment on using the Goodreads rating to decide whether or not to read a book: I would also think that the usefulness of this would depend on how many members on Goodreads have rated the book. The more ratings, the more accurate that average rating would likely be. An interesting analysis indeed… now I may have to try something similar with my own books.

    • That’s true, too, Heather. For the most part the books I read have lots of ratings (like, in the thousands or tens of thousands), so I don’t think it’s too biased. Nice to know someone else is intrigued by trends in habits. :)

      I don’t know LibraryThing at all, though I recognize the name. Does it work the same way?

      • I don’t know much about Goodreads (I was on it briefly once), but LibraryThing is fairly similar to it – basically a way to catalogue and organize your library. I think LibraryThing allows you to have more control over the way your library looks, and it has more opportunities for you to directly contribute to the site (adding information about books and authors and local stores and events, for example). Another difference with LibraryThing is that you do have to pay to use it if you have more than 200 books in your library. Most users choose for the lifetime subscription, which is totally and completely worth every cent, in my opinion. I’m sure there are advantages and disadvantages to both Goodreads and LibraryThing, but I am totally in love with LibraryThing. And if you’re interested to see what it looks out, you can check out my LT profile and library if you like: http://www.librarything.com/profile/Heather39

  3. I love Goodreads and didn’t know about these reports – thanks so much for the tip!

  4. I have created a site as my personal project which gives you detail analysis of your books including, how many books you have read from each country. Male vs Female author distribution. And also a graph of when your read books were published.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s