Friday, May 24, 2013

Baseball Mistakes Quantified

For the past month or so, I've been updating the Mistake Index on a daily basis. Click on the page to the right for a fuller explanation, but briefly this is what I quantify:
1. Base running mistakes--any time a player is thrown out on the bases (does NOT include forceouts or GDPs).
2. Bunting mistakes--any bunt with runners on base that does NOT result in a hit or sacrifice (failed bunts with bases empty do NOT count).
3. Pitching mistakes that DIRECTLY lead to runs--walks, balks, passed balls and wild pitches that allow a run to score. Blown saves are also tabulated.
4. Fielding errors, which are double-counted when they result in unearned runs. Too bad.
5. A stolen base index which adds the positive components (stolen bases, throwing out and picking off opponents) and subtracts the negative (caught stealing, pickoffs and opponent stolen bases).

If turnovers are considered bad in football and basketball (try to find any "key of the game" that DOESN'T include them), then baseball's equivalent of turnovers should also be emphasized. Every manager states how he's going to "stress the fundamentals" in spring training, and they should--they're in the control of teams--opponents can hit home runs, but only teams themselves can throw wild pitches, make errors, etc. I challenge you to find these numbers quantified elsewhere (or in some cases, FIND them). I also add one very unique feature--I also tabulate the mistakes their OPPONENTS make. Seriously, find all THAT in one place.

Here's the index through Thursday, May 23rd:

But DOES it translate? Do teams that keep the mistakes to a minimum have greater success? This scatter chart measures whether a team's winning percent correlates with the DIFFERENCE (opponent - team) in their mistakes. This will NOT include the stolen base index for reasons I'll describe later:

Team winning percent is plotted on the horizontal axis, the difference in mistakes on the vertical. For example, Arizona has made 85 mistakes, its opponents 108 and they have a .553 winning percent through Thursday--their point is (probably) the dot right above the "0" in the R^2 value. The trend line shows a positive relationship between a greater differential and higher winning percent, but I'd be more comfortable with more data. Some day I'll run this with the data I have going back to 2009, which will give me 150 data points.

There are outliers, the greatest being the point that represents an approximate .400 winning percent and a positive differential of around 37 mistakes--that would be the Twins, living proof that solid fundamentals alone won't suffice. Consider the bottom five teams in baseball in winning percent:
5th worst: Brewers, .400, 0 mistake differential (MD)
4th worst: Cubs, .391, -45 MD
3rd worst: Mets, .386, 3 MD
2nd worst: Astros, .298, -48 MD
WORST:     Marlins, .277, -33 MD



Here's the top 5 in winning percent:
BEST:        Cardinals, .652, -1 MD
2nd best: Texas, .638, 25 MD
3rd best:  Reds: .617, 47 MD and Pirates, -1 MD
5th best: Yankees, .609, 30 MD

Look at that, the Pirates with the FOURTH-BEST RECORD in baseball! NO measure is perfect, but this seems to have potential. If nothing else, it quantifies those things that the lazy (i.e., like Ken Harrelson) lump in as "intangibles," the catch-all phrase that is shorthand for "I have no idea what I'm talking about." This chart introduces a measure that baseball has been waiting for since the National Association was established in 1871:

PYTHAGOREAN MISTAKES

Pythagorean Wins as developed by Bill James is simple:
Runs ^2/(Runs ^2 + Runs Allowed ^2), which reduces to a winning percent. 

I did the same thing using baseball mistakes:
Opponent mistakes ^2/(Opponent mistakes ^2 + Team mistakes ^2)

Here's a scatter chart that shows this:

This looks very similar to the prior one and shows a positive relationship between actual win percent and Pythagorean Mistake win percent. THIS is why you need to check the Mistake Index at least once a week, because it clearly depicts the vast majority of mistakes that teams make AND helps provide a snapshot of a team's success. 

Nothing is absolute--a team can have unlimited errors, allow stolen bases in quantities that would make Rickey Henderson blush or field like an outfield of Greg Luzinski, Dave Kingman and Travis Hafner, which can be overcome with stellar pitching or lights-out hitting, but that probably can't be sustained--the Mistake Index suggests that teams that do this are whistling in the graveyard. If you read my Box Score Ephemera each day (and seriously, why don't you?), you'll find that many of my entries focus on the "little thing" like a passed ball or walk or series of walks that led not only to a run, but often the run that made the difference in a game. The Cubs aren't where they are currently due to starting pitching--it's been amongst the most effective in baseball so far this year, but they're making mistakes in the field and giving up runs on the base paths that a team with no margin for error can't afford.


I quantify stolen bases and opponent stolen bases because I cannot understand why these two facets of the game are viewed separately. For example, the Diamondbacks are doubly abysmal on the base paths--they're 15-of-29 in stolen base attempts, been picked off 3 times and allowed their opponents to steal 15-of-18. It also doesn't appear to matter all that much, since they have a .553 winning percent and are tied with the Rockies for first in the NL West. There is NO correlation between base-stealing and winning, absolutely NONE--typically, teams that are lower on the payroll scale emphasize speed on the base paths, precisely because baseball in this age doesn't value speed. That may be changing as we speak, but it's too soon to tell. This scatter chart shows the correlation between a team's stolen base index as described in the beginning and winning percent:

Using the example of the Diamondbacks above, they're a -12 in the stolen base index, 6th-worst in baseball. Conversely, the Orioles are the best with 27, a combination of a speedy team and a good catcher in Matt Wieters. The correlation line IS there but hard to see as it straddles the horizontal axis. This is the epitome of a chart that shows no correlation--in the modern era, teams that are good stealing bases may win, may not win, but they're not winning because of it--that's why I left it out of the charts above, but the information is still important.















For all the talk that turnover ratios get in football and basketball, actually tracking them can be difficult. Pro-Football-Reference allows you to do it with the Play Index feature, and when the data is compiled it IS important, and why wouldn't it be--any time the ball is turned over, not only is the other team given another opportunity to score, the team is deprived of an opportunity as well. If it's that important, it should be easy to find. In baseball, there's no excuse--it has the longest history of a wealth of statistical data available and a vaunted heritage of truly brilliant people who crunch this data. That's why I'm surprised that baseball mistakes like the ones I tabulate aren't viewed together the way they should. There are no linear relationships in baseball (or in life, for that matter)--one thing rarely by itself leads to a result. However, the Mistake Index isn't one metric--it's 9, expanded to 18 when opponent miscues are included and up to 24 when base stealing is included.

Tell your friends--it tells a very compelling story that isn't available ANYWHERE ELSE.

No comments :

Post a Comment