New and improved PITCHf/x dataBy
A couple of Fridays ago a bomb was dropped on the analytical baseball community. However, in this case, it was perhaps the greatest bomb ever deployed. You see my friends, Dan Brooks of the renowned Brooks Baseball announced with zero fanfare that Brooks — a terrific asset as far as individual game data goes, but lagging behind TexasLeaguers.com and JoeLefkowitz.com in multi-seasonal data — would not only now be carrying Player Cards featuring seasonal data, but that the PITCHf/x data dating back to 2007 (the first year data became available) had been manually reclassified by PITCHf/x gods Lucas Apostoleris and Harry Pavlidis.
That’s right; somehow, someway, Lucas and Harry sifted through three-and-a-half-million pitches worth of PITCHf/x data, so that amateur analysts like myself would have the most accurate data possible to play with.
Why is this important? Well, for starters, pretty much any time I’ve talked about Ivan Nova over the last six months, it came with the caveat that we knew his second-half success was due in part to increased deployment of his slider, but I didn’t have the data to back this assertion up, as the PITCHf/x system stubbornly insisted that Nova only threw a slider 3.9% of the time. Now we know the truth.
Check out the following table, showing Nova’s non-reclassified 2011 PITCHf/x data, against Lucas and Harry’s reclassified 2011 PITCHf/x data:
The four-seam classification was pretty much on the money, as was the curveball, but the rest of Nova’s repertoire was pretty butchered by PITCHf/x. As you can see, Nova actually threw his slider 13% of the time instead of 3.9%, while the reclassification also determined that Nova threw a sinker, not a two-seamer. He also threw about half as many changeups as the un-reclassified data said he did, and he doesn’t actually have a cutter at all.
However, Lucas and Harry could’ve called it a project and we would’ve been plenty happy simply having accurate PITCHf/x data. But no, they decided to go even further, providing pitch and sabermetric outcome breakdowns by pitch type, and while some of these categories have been available at T-Leaguers and Lefkowitz, never before has all of this data been available in one place. In particular, the Whiff/Swing% on an individual pitch level is simply astounding, and something that’s never been freely available. Check out the remainder of Nova’s 2011 stats:
Now, we had a pretty good idea that Nova’s new-and-improved slider was nasty, but I don’t think anyone realized it was 43.1% Whiff-per-Swings-Taken nasty! As a frame of reference, CC Sabathia, who boasts one of the top sliders in the game, recorded a Whiff/Swing of 40.9% last season (though in fairness, he also threw it 27% of the time).
In the aftermath of this insane treasure trove of new data, I couldn’t help but wonder whether they’d be adding league average data (helpful as an additional reference point), and also if we could expect to have manually reclassified data for the upcoming 2012 season, as it’d be quite helpful to have the full spectrum of accurate data when looking at a given pitcher’s offerings across multiple seasons. Incredibly, both Lucas and Harry confirmed via e-mail that they do indeed plan to reclassify pitches on an ongoing basis throughout the season.
This is probably one of the most important sabermetric projects undertaken in the last 10 years. It’s incredible that not only have they devoted their time and energy into delivering a product any of us can access free of charge, but that they’ve also committed to maintaining an accurate set of data on a go-forward basis is just mind-blowingly awesome.