New and improved PITCHf/x data


A couple of Fridays ago a bomb was dropped on the analytical baseball community. However, in this case, it was perhaps the greatest bomb ever deployed. You see my friends, Dan Brooks of the renowned Brooks Baseball announced with zero fanfare that Brooks — a terrific asset as far as individual game data goes, but lagging behind TexasLeaguers.com and JoeLefkowitz.com in multi-seasonal data — would not only now be carrying Player Cards featuring seasonal data, but that the PITCHf/x data dating back to 2007 (the first year data became available) had been manually reclassified by PITCHf/x gods Lucas Apostoleris and Harry Pavlidis.

That’s right; somehow, someway, Lucas and Harry sifted through three-and-a-half-million pitches worth of PITCHf/x data, so that amateur analysts like myself would have the most accurate data possible to play with.

Why is this important? Well, for starters, pretty much any time I’ve talked about Ivan Nova over the last six months, it came with the caveat that we knew his second-half success was due in part to increased deployment of his slider, but I didn’t have the data to back this assertion up, as the PITCHf/x system stubbornly insisted that Nova only threw a slider 3.9% of the time. Now we know the truth.

Check out the following table, showing Nova’s non-reclassified 2011 PITCHf/x data, against Lucas and Harry’s reclassified 2011 PITCHf/x data:

The four-seam classification was pretty much on the money, as was the curveball, but the rest of Nova’s repertoire was pretty butchered by PITCHf/x. As you can see, Nova actually threw his slider 13% of the time instead of 3.9%, while the reclassification also determined that Nova threw a sinker, not a two-seamer. He also threw about half as many changeups as the un-reclassified data said he did, and he doesn’t actually have a cutter at all.

However, Lucas and Harry could’ve called it a project and we would’ve been plenty happy simply having accurate PITCHf/x data. But no, they decided to go even further, providing pitch and sabermetric outcome breakdowns by pitch type, and while some of these categories have been available at T-Leaguers and Lefkowitz, never before has all of this data been available in one place. In particular, the Whiff/Swing% on an individual pitch level is simply astounding, and something that’s never been freely available. Check out the remainder of Nova’s 2011 stats:

Now, we had a pretty good idea that Nova’s new-and-improved slider was nasty, but I don’t think anyone realized it was 43.1% Whiff-per-Swings-Taken nasty! As a frame of reference, CC Sabathia, who boasts one of the top sliders in the game, recorded a Whiff/Swing of 40.9% last season (though in fairness, he also threw it 27% of the time).

In the aftermath of this insane treasure trove of new data, I couldn’t help but wonder whether they’d be adding league average data (helpful as an additional reference point), and also if we could expect to have manually reclassified data for the upcoming 2012 season, as it’d be quite helpful to have the full spectrum of accurate data when looking at a given pitcher’s offerings across multiple seasons. Incredibly, both Lucas and Harry confirmed via e-mail that they do indeed plan to reclassify pitches on an ongoing basis throughout the season.

This is probably one of the most important sabermetric projects undertaken in the last 10 years. It’s incredible that not only have they devoted their time and energy into delivering a product any of us can access free of charge, but that they’ve also committed to maintaining an accurate set of data on a go-forward basis is just mind-blowingly awesome.

Categories : PITCHf/x


  1. Gonzo says:

    Wow, this was a massive undertaking. Wow.

  2. Johnny says:

    How do they determine that he throws a sinker instead of a two-seamer? I thought everybody had pretty much agreed that it was the latter. I assume they’re looking at break, yes? In any case, the whole thing is pretty cool.

    • Harry Pavlidis says:

      That’s a good question — I don’t. By convention, two-seamers are called sinkers. Even if it’s a tailing pitch. Further confounding is Tim Lincecum, who supposedly does not employ a four seam grip but has two fastballs, one which I call a fastball the other a sinker/two-seamer … they’re both two seam fastballs, neither are sinkers. The main think is finding the clusters/groups, labeling is another thing.

  3. Michael says:

    Nerd alert! #ILoveIt

  4. Tom Zig says:

    Good god. This is tremendous. Good work.

  5. CMP says:

    Are these pitches categorized manually with someone watching ever pitch of every game or by computer?

    • Harry Pavlidis says:

      PITCHf/x systems include a couple of calibrated cameras that give us the path of the ball. From the deflection from the path of a spinless ball, we can distinguish pitch types quite well. Most of the time.

  6. Jeff says:

    Sweet statistical jesus

  7. JoeyA says:

    So we arent using wins and ERA anymore?

    In all seriousness, did anyone think to simply ask the dude what he throws? How did it take all of this analysis and work to find out Nova doesnt throw a cutter?

    Anyway, this is amazing and only furthers my need to understand all of these statistics in more detail.

  8. iftheshoe_fits says:

    Lucas and Harry are amazing…

  9. FP says:

    Or you could have just looked at his Fangraphs profile with Baseball Info Solutions data months ago and seen virtually the exact pitch percentage breakdown, minus the sinkers separated out.

    • Bo Knows says:

      I’ve never found fangraphs all that good at determining pitches, or the advanced breakdowns of said pitches

    • Larry Koestler says:

      Though BIS may have come closer to the correct SL% for Nova than the non-reclassified PITCHf/x data did, the BIS data — which, as you note, lumps all four- and two-seamers into one catchall and unhelpful “FB” category — is generally considered inadequate by most analysts.

      I’d rather have the best possible data available than flawed data, even if the latter may be slightly easier to come by.

      • Harry Pavlidis says:

        Nova, and Melancon, have split IDs that are unpublished. Actually, as I noted in a comment below, that would be in Lucas’s data (Nova, I’ve got Melancon sinkers and cutters).

  10. Jesse says:

    Holy shit, this is amazing.

  11. I am not the droids you're looking for... says:

    Hey Larry maybe you and the guys can now go back and re-do all those posts wherein you analyzed why it was possible that AJ might not completely suck this year.

  12. Harry Pavlidis says:

    Hey, thanks all. I did classify every pitch, Lucas has reviewed many and we are going to merge his data, and others, over time. No I’m not crazy (that’s a lie).

    Not everything is right, that’s for sure, especially since this project was done over a four year period. We’ve already made some improvements.

    So, please sign up for the forums and let us know when you see something that needs work. Otherwise, I just hope you all enjoy it.


  13. Dick Whitman says:

    Dear Lucas and Harry,

    Thank you.

  14. Uli440 says:

    Swing% and Swing-Miss% (whiffs/total swings) aren’t quite new. Joe Lefkowitz’s pitcher cards have them broken down by pitch type and batter handedness.

  15. BobCano says:

    This is remarkable from a data perspective, great work by all.

    Now my comment is very nit-picky:

    When you prepare these charts, please rank both against the same metric (in this case Sel/Freq). It would make it easier to see what’s thrown most -> least in one quick glance.

    Also, for the layman of the stat world (myself included within said bucket) a legend would be extremely helpful. This way one would avoid thinking that CU was referring to Cutter as opposed to Change-Up.

    Thanks in advance,

    Your brother

Leave a Reply

You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.

If this is your first time commenting on River Ave. Blues, please review the RAB Commenter Guidelines. Login for commenting features. Register for RAB.