New York Times crossword analysis
Background
My partner and I started doing crosswords sometime in 2017 and we’ve been hooked since. This led us to getting a New York Times (NYT) crossword subscription in 2018. The NYT app shows you when you complete a crossword with a gold or blue star, where the former means you solved it without revealing any answers and before the answers come out. The app also provides some stats but I wanted to see what else is the data telling us.
The data
I found a Python package by Matt Dodge that retrieves your NYT crossword stats and used this to pull in our stats. This data spans from January 01, 2018 to January 23, 2023.
The analysis
I’m using the “Hiroshige” palette from the
MetBrewer
package.
Total solves
Let’s start off by looking at total solves. ~I’ll note that these crosswords are mostly a team effort–at least in the beginning.~ If you’re unfamiiliar with crosswords, you’ll notice most solves are on Mondays and that’s because it’s the easiest day and it progressively gets harder with the hardest being the Sunday crossword. This plot also separates the solves which I refer to as “check status” here, meaning thatif it was solved without any reveals and within the time limit it’s a legit solve, otherwise we’ll refer to it as checked–which isn’t as fun.
So this plot is telling us that Mondays are the easiest solves with the highest count (by far) but it doesn’t really give us a fulsome picture. If we were to just take this as is, we’d see that about half the Mondays solves are legit, whereas the remaining solves needed an extra little nudge.
Let’s look at our progress over the years.
We see we’re real keen early on, there’s a bit of a dip in 2020 but we’re up again in 2021 and 2022. Of course, we’re still early in 2023 so there’s always room for potential there. So it looks at a glance that the legit Monday solves are slowly increasing but we can confirm this by digging into the data a little bit more.
Solved Monday crosswords
We’re omitting 2023 from these anlayses because they’ll throw off the mean and regression models since this year in incomplete.
This line plot shows solved Mondays by their check status. The solid lines show the count of solved Mondays and the dotted lines show the average (or mean) per respective check status.
We see here we’re above the average in 2018, 2019, and 2022.
This line plot shows solved Mondays by their check status. The solid lines show the count of solved Mondays and the dotted lines show the regression line (using a linear model).
The legit line looks pretty steady here, which is at least that seeing that there is a decrease for the non-legit solves.
Breakdown by day
The blue line indicates the time in seconds to complete a crossword, the red dotted line indicates the moving average (yearly), and the orange dashed line indicates the linear model regression. Ideally, and it’s what we’re seeing here, that the time to complete Monday crosswords is decreasing over time. Looking at the time completion is another take on the data to see if there’s any improvement in our crosswording.
The data for Tuesdays isn’t as good as for the Mondays because it was only recently we graduated to being able to do Tuesdays consistently.
Backtracking
I wanted to take a peek at one more thing. So as we’ve become more comfortable with crosswords we’ve been going back into the archives–and sometimes I’ve gone into the archives on my own–to do Mondays. So for this last plot, we look at solved crosswords that weren’t checked or had the answers revealed. This plot supports and potentially explains some weird behaviours seen in the above plots. This plot also includes a dotted line to indicate the average number of solved crosswords that were not checked or revealed from 2018 to 2023.
Why this plot is important is to consider things outside of what you see in the data to explain the data. But this is only the beginning of data exploration!
I hope you’ve enjoyed this brief analysis of our NYT crossword journey :smile: