Have you ever read an article on this site, only to encounter a strange acronym that you don’t understand? For the most part they’re either inside jokes or advanced metrics. The increasing amount of data available makes it easier for us to take raw numbers and put them into context, allowing us the ability to compare players using stats that give us not only numbers, but context. These advanced stats tell us not one thing — OBP, for instance, tells us just one thing and ignores other factors — but many things that go into a player’s value.
Over the next week or so we’ll discuss the most commonly used stats on this site. Many of these require heavy math, and we know that can turn off many people. This series of articles will attempt to explain what goes into these stats without getting into any of the heavy math. We’ll include as many resources as possible, however, in case you want to dive into the calculations yourself. By the end of the series, we’ll replace our woefully outdated and partly inaccurate guide to stats.
Linear run estimators
You might have heard the term linear weights, and maybe it intimidated you. For our purposes, however, the term linear run estimators works better, though it means the same thing. Using data from actual baseball games, a linear run estimator assigns a certain run value to each offensive event. On average, the event is said to be worth that many runs. Here’s a simple table to describe the value of each hit type (from Tango):
There are other linear models that put these events into context, such as the base-out situation, but for the main UZR calculation we need only worry ourselves with the linear estimator.
To determine a fielder’s range, we need to determine a zone for which he is responsible. A baseball field can be broken into 78 zones, of which UZR uses 64. They’re labeled by the position or positions they’re closest to, and also by depth. You can see a diagram of all field zones at Retrosheet.
Determining league average
If I told you that a certain player, say Herp Aherpaderp, hit 18 home runs, what could you make of that data point? Not much, really, unless you also knew the league average number of home runs that year. If Herp Aherpaderp played in the NL in 1920, he would have led the league. If he played in the NL in 2001 he would have been tied for 51st. Comparing players to their peers is important in rating them.
What we need to know here is how many hits and how many outs passed through each zone. Thanks to baseball stats companies, like Baseball Info Solutions and STATS, those numbers go on record. We further need to know which player made the plays in that zone. For example, if we’re looking at Zone 56, the one between short and third, we want to know how many outs the shortstop recorded, and how many outs the third baseman recorded. Finally, we want to know the run value per hit to that zone, which we can find using the above table.
Assigning credit and fault
At this stage we encounter math, so bear with me as I explain. Obviously, we want to know the rate at which balls in play in a particular zone turned into outs and hits. Using that and the league averages, we can then determine how many more or fewer balls a player got to than the league average player. Then, using the run value per hit, we can calculate how many runs the player cost or saved his team.
To make things a bit clearer, we’re just trying to determine which player was responsible for which hits. So if there are 1,000 hits and 1,500 outs in Zone 56, we want to know how many of those outs the third baseman converted, and how many the shortstop converted. Using this ratio, we can determine the responsible party for the hits. So, if the third baseman made 70% of the outs recorded from Zone 56, 1,050 in this example, he’s also responsible for 70% of the hits, or 700. That’s the baseline we apply to individual fielders.
Then we find the value of the balls the player caught and add subtract the value of the balls the player allowed for hits. This gives us the number or more or fewer balls the player got to. Multiply that by the run value per hit, and you have UZR runs.
What about errors?
Errors can be tricky when calculating range, because when a fielder makes an error he usually got to the ball, but could’t field or throw it cleanly. To this end, UZR initially counts errors as outs, again because the fielder got to the ball and therefore has the range to get to that ball in the future. But we still need to factor in errors somehow.
Instead of calculating the error rate for each zone, UZR calculates it by position. It uses the total number of errors committed by players at a certain position and determines the rate by dividing it by the total number of ball players at that position got to. We can then determine how many more or fewer errors a particular player made, and then, multiplying by the run value of an error, we can determine how many more runs he cost or saved his team.
That counts only reached-on-errors, ones which caused the batter to reach first base. There is another type of error, non-ROE, which means a different calculation, since it means a runner moving up rather than a runner reaching base. That’s taken care of with UZR, too.
Adjusting for other factors
Obviously, other factors play into how a defender fields a ball. First up is Park Factor, a metric I’m assuming we’re all relatively familiar with. UZR breaks down park factor into positions, including the infield as one position. The idea here is to make small adjustments for how certain parks play. If an infield plays badly — has high grass, has messy lips — that factors into defense. So does outfield space. Those all get factored into UZR.
Batted ball speed is seemingly the most important adjustment. A third baseman might be able to not only field a tapper between him and the shortstop, but have enough time to set his feet and throw. On a sharp grounder, however, the play becomes more difficult. Game stringers — people who watch the game and record every event — classify ground balls as soft, medium, or hard, and fly balls as easy, medium, and hard. Those all factor into UZR as well, with each zone getting a weight for each batted ball speed. It takes into account the difficulty of catching a lightly hit fly ball to a shallow zone, as well as a hard fly ball to a deep zone and everything in between.
Batter handedness plays a part, too, since that can cause fielders to adjust. Also, batters of a certain handedness tend to hit balls harder to certain zones and softer to others. This adjustment is made so that, for example, a shortstop defending against a left-handed batter doesn’t get extra credit for fielding a ball in the 6M zone — i.e., shortstop up the middle — when he might have been positioned there in the first place.
Then there’s pitcher ground ball ratio. This doesn’t make a huge difference, since on average a pitching staff has a, well, average GB/FB ratio. It still gets factored, though, to ensure accuracy.
Finally, we get to the base-out situation that I mentioned earlier. Again, this has to do with positioning. Middle infielders are more likely to get to their middle zones with a runner on first, since they’re playing closer to the bag for the double play.
Earlier this week, Mike Rogers at Bless You Boys examined UZR, noting its ups and downs. He makes good points as to the limitations of UZR. The main point is the subjectivity of the batted ball type. What one stringer sees as a medium hit fly ball another might see as hard. Also, limiting the data to just three classifications might provide simplicity, but it also detracts from accuracy. And, as far as accuracy goes, UZR doesn’t always agree with other defensive metrics, most notably John Dewan’s plus/minus.
Yet despite its limitations, UZR remains the best tool we currently have to measure defense. When using it, however, Mike points out five rules we should abide by.
1. 1 year of UZR data is on par with about 50-55 games worth of offense.Would you judge Miguel Cabrera’s talents at the plate on just his games from April 1st through June? I wouldn’t, and neither would you (or so I hope). So don’t do it with defense. Personally, if I have three years of UZR data for a player, I’d rather have four. If I have four years of UZR data, I’d rather have five. I don’t believe that you can have enough.
2. One full year of defensive data is at least 1200 innings worth of data.
3. Do not use UZR per 150 games (UZR/150; found on Fangraphs’ player pages) if at all possible. It’s way too misleading.
4. If Player A is a -10 one year, +10 the next year and then +0 the next year, he’s likely an average fielder. Large swings in year-to-year data isn’t out of the norm, but you should always use an average (preferably, a weighted average) and be conservative with it.
5. When possible, use multiple defensive systems to grade a player (UZR, John Dewan’s Plus/Minus system, etc).
The biggest shortcoming with using multiple years of data, as I see it, is that if a player is in physical decline we still might rate him positively because of previous years’ data. But that’s just a minor quibble.
The most frequent criticism I hear of UZR is that we can better assess defense by just watching. Observation, combined with a knowledge of the game, should allow us to assess the defensive abilities of a player. Unfortunately, as Mike mentioned in his defensive stats post, our eyes can deceive us. Our memories get distorted, and the effect gets multiplied as we become further removed from the event. The data used in calculating UZR was observed by human eyes and recorded as such — usually by multiple people per game, to weed out bias.
In other words, UZR is based on eyeball data. It just takes a heap of such data and compiles it into a workable statistic. It tries to factor in all those contextual questions we have after seeing raw data — like how Herp Aherpaderp hit 18 home runs. Well, when did he hit those 18 home runs? In what park did he hit those home runs? Were a bunch of them cheapies that barely cleared the wall? With these questions reasonably answered, or at least accounted for, we can get a better idea of a player’s defensive abilities.
The best and, in my opinion, only place to get started is by reading explanations by the man who created UZR. That’s Mitchel Lichtman, co-author of The Book. Here are his two UZR primers.