# Misunderstanding randomness, by Tom Verducci

ByThe one thing I love about the evolving use of mathematics in baseball is that it helps us learn math concepts outside the class room. I played Tetris on my graphing calculator in the back of calculus class in high school, but once I see a concept applied to baseball stats, I’m all ears. My fascination with randomness started with baseball, and was amplified by this guy. I’m still a fledgling in randomness, but I can usually recognize when someone is misusing the concept, as Tom Verducci did.

His argument goes a little like this. When you seed all the playoff teams by record, and then see how those seeds fared in the postseason, you’ll see that seed has no correlation to World Series championships. In fact, in the past nine years, the first through seventh seeds have each won the series once, while seed eight or worse have won it twice. Verducci presents a table with the data, but Flip Flop Fly Ball has an even better one.

Unfortunately, Verducci misses something in all this. Yes, in the playoffs it appears not to matter what seed you were in the regular season. But this information by itself does not denote a completely, or even somewhat, random situation, as Verducci believes.

So the next time an expert tells you that they know who is going to win the World Series because a certain team is “built for the postseason” or because of how well that team played in the regular season, don’t believe it. As the chart shows, the postseason is incredibly random, partly because all the off days make for very different circumstances than teams find all year, but mostly because it’s such a small sample. The best team doesn’t always win the World Series — or even anything close to most of the time. The hottest team wins it.

The data, as presented, does not dispel the “built for the postseason” argument. Just last month we looked at some research regarding playoff success, conducted by Nate Silver, who knows a bit more about randomness than Verducci (it’s part of Nate’s vocation). They found three factors — strikeout rate, closer, and defense — pretty accurately predicts postseason success. So what gives here?

Just because there’s a random distribution of seeded teams winning the World Series does not make the process random. It could just be — and I’m sure Silver would argue this — that the teams best built for the postseason happen to finish in different spots every year. As we’ve learned this decade, what works in the regular season doesn’t necessarily work in the postseason. A team could have a mediocre offense, but could also have a staff that strikes out a lot of hitters, a shutdown closer, and a solid defense. Those attributes would help them win in the playoffs, regardless of their seed.

In some ways, I do think that the playoffs are a crapshoot. A team can get hot at the right time and mow down the competition. A great team can get cold and take an early exit. But I also understand that there are other factors that play into the playoff equation, and I wouldn’t write them off as completely random. Verducci does a good job to show that teams of all seeds can win and have won the World Series. That doesn’t mean that the process is completely random.

man 2001 still hurts

very very much

This post oughta be a primer.

Nah. Nate Silver and Nassim Nicholas Taleb should be a primer. I’m just here to dispel fallacies.

Yankees are sure due for some randomness to go their way.

we really haven’t had much go our way since Game 3 in the 2004 ALCS if fact we’ve only had the exact opposite

Agreed. Saying that the playoffs are totally random and/or a “total crapshoot” is like saying that baseball as a whole is a total crapshoot because of how many ways things can turn out. Baseball as a whole lends itself to much more variation than other sports, hence the 162 game schedule. Just because the factors that will increase propensity to win over the course of a 162 game season are different from those that will increase the propensity to win over the course of 3 postseason series does not render the whole process random. It just so happens that over the course of a regular season, the factors that tend to produce winning teams are a consistent, deep, high 0BP offense with power, and consistent, high IP, decent ERA pitching. In other words, if you have 5 starters with ERAs of 4.50 who log 200 innings each, and have a cumulative OPS of .800+, there is a very good chance you will win at least 90 games, which may or may not get you in to the postseason, depending on the talent in your division, but nevertheless constitutes a fairly successful season by any objective measurement. Carry that into the postseason, however, and you probably won’t get out of the first round. Good pitchers generally don’t walk a lot of guys and if they do they have the K stuff to get out of jams, making the 3 run hr much more improbable for an offense-dependent team. Meanwhile, your 4.50 ERA guys aren’t going to be as effective against good teams in the playoffs, and you’re going to end up losing three games 5-2. Like exactly. The chances of that happening, given the previously mentioned outliers, is like 100%. For example, let’s say it’s the yankees vs. phillies, and somehow is CC vs. Joe Blanton. Blanton is going to go 5 2/3, giving up 4 runs. Cano will add a solo shot in the 8th. CC, meanwhile will go 7, giving up 2 runs, one earned. I mean seriously there would be no point in watching, because that is exactly what would happen. Ok i really don’t know where i’m going with this but I had fun so w/e.

http://en.wikipedia.org/wiki/Paragraph

hahahha IETCVM

technically, there weren’t any grammatical/stylistic errors. It was a rambling, early morning nothing of a paragraph, but it was a paragraph nonetheless. Sorry though, I’ll try to better construct my non-arguments next time I suppose.

I mean…did you want people to read it? An unbroken block of text is going to be skipped over by many people, because it just looks like it’ll be a chore to wade through.

i didn’t really care i guess while i was writing it. It’s kindof embarrassing now, but there’s no delete comment function (that I know of). I was just ranting really, although I stand by some of the things I said, so I’ll say them in a more comprehensive and succinct fashion:

The factors that indicate regular season success are different from those that indicate postseason success, but there are still very real indicators for both. And a team with 5 Joe Blantons and 9 Jorge Posada-esque bats would probably make the postseason, but would lose 3 consecutive games by a score of 5-2 once they got there.

What the hell…?

Also, you should get in touch with this guy: [bad link removed] whose brilliant website your text would be perfect for.

That site gave me a malware alert.

Me too, please mods, remove the link.

Reduce the season by ~10 games and change the playoff format to 7-7-9.

9? What is this, 1903?

Haha I know right. But really, I think 7-7-9′s great idea.

Verducci doesn’t really have it right, but it goes both ways. You see lots of quantitative analysts misuse the concept of randomness grossly as well.

One of the dominant uses of randomness in statistics related to the process of drawing valid inference from a sample of data and projecting that onto a population.

That doesn’t apply to baseball in the same way because baseball is one of the rare domains in which there is no statistical sampling – all of the data the vast majority of statistics are derived from use the entire population of data to begin with.

Applying conventional uses of statistical notions of randomness to baseball often don’t hold for these reasons.

The issues that factor into baseball derive much more from how to use panel data and time series analysis.

+,-1x

The regular season isn’t designed to find the best overall team, but rather the best team in each division. The best path to the playoffs is by being the best team in your division, and the schedule is stack to make you prove you’re better than your divisional opponents. AL and NL schedules have very little intersection, so it’s meaningless to compare records between leagues. Just from that, you should expect a low correlation between regular season record and postseason success.

There’s also other factors that come into play, such as the differences between constructing your roster to win as much as possible in a 162 game season vs trying to win 11 games over a span of about a month. In the regular season it’s often ok to sacrifice today to stand a better chance of winning tomorrow, but in the postseason you rarely play that way. It’s just a different playing field in the postseason, and it’s rare that any one team is built so well that they can play either style equally well.

This is correct, mostly. It does seem that a team built for the playoffs will not necessarily be the best team over a long season. The top of the rotation is relatively much more important, because they will start a bigger share of games. I also think power is more important, because long-sequence offenses lose more effectiveness against good pitching than short-sequence offenses.

The point about defense is not quite right, I think. Defense does become more imoportant in the post-season, but that’s because the teams are more closely matched in hitting and pitching. When that happens, more marginal differences, like defense, become more important in determining the outcome.

By the way, I think the difference between 5-game and 7-game series is exaggerated. If you have the better team, the chance of winning the 5-game series is only a tiny bit less than the chance of winning a 7-game series.

“The point about defense is not quite right, I think. Defense does become more imoportant in the post-season, but thatâ€™s because the teams are more closely matched in hitting and pitching. When that happens, more marginal differences, like defense, become more important in determining the outcome.”

What “point about defense” is not quite right? They looked at defensive quality, independent of other factors, and noticed a strong correlation between that and winning in the post-season. That is an inarguable fact.

First of all, what they did was, as the linked post points out, fairly simplistic, and the results could hardly described as “inarguable fact.”

But suppose they did a careful study of playoff teams and found that defense was an important factor in postseason success. That still wouldn’t mean that you should trade offense for defense to succeed in the post-season. My point is that when the most important factors are close to equal less important ones start to make the difference.

A good-hitting team with weak defense is better than a weak offensive team with good defense. It’s not until you improve the offense of the second team that its superior defense starts to matter.

How important is a punter to a football team? Well, he’s important, but a particularly good punter doesn’t make up for a lousy QB or a porous defense. Give a lousy team a great punter and it’s still a lousy team. If you correlate regular season records with the quality of punting you’ll get some small relationship, but not a strong one. Game outcomes will mostly be determined by the relative strengths of the offensive and defensive units.

In the playoffs, though, you are likely to get a much stronger correlation. That’s because the teams are more closely matched in other respects. Otherwise they wouldn’t be in the playoffs. So a big difference in punting ability accounts for a bigger part of the difference between two playoff teams than between two random teams.

So it is with defense in baseball.

Joe — Thanks for the link to Flip Flop Fly Ball. Those are some cool graphics posted there!

You have actually misunderstood Taleb, who would likely side with Verducci against Silver. What Silver is trying to do is predict the future based on modeling prior events, that is taking history, trying to extract measurable variables and applying statistical models (i.e., implying a distribution) to arrive at a predicted outcome. Taleb would say that Silver’s model will correctly predict only what has occurred in the past and may predict future events until it doesn’t, which is also a certainty. This is because actual events do not behave according to gaussian distributions. Taleb would also laugh at the number of years of history upon which Silver based his model. Taleb’s focus is finance where modeling along the same lines as Silver’s created models that are used to price almost everything and where you daily run into people who think they can predict markets based on these models, and they often are able to do just that, until they aren’t. That isn’t to say that Taleb doesn’t use models himself, he just reminds himself that the models are wrong but might be useful. But Taleb is a randomness extremist.

I think you nailed it here. I don’t think I’ve misunderstood Taleb. I only cited him here because I read most of what he writes.

As in, not citing him in terms of the argument.

Yes, but baseball is not finance. The possibilities and the variations are vastly more limited. In fact, events in baseball do occur in accordance with probabilities based on the usual normal and binomial distributions.

The playoffs being a crapshoot is actually a good analogy. Dice behave very predictably over the long run. So do the playoffs. This doesn’t mean the better team almost always wins. That’s not what probablity says. It says first, that small differences in regular season performance don’t imply that one team is clearly better than another. It also says that even if one team can be determined to be a little better the worse team still has a decent chance to win.