Monday, April 29, 2013

Percent of Runs from Home Runs

STATS LLC had a tweet last week stating that the White Sox led the majors in the percent of runs that had were scored on home runs, somewhere over 50% as I recall. I measured that number earlier in the year, but it's buried somewhere in one of my Box Score Ephemera posts and it deserves its own spot. This tables shows the percentage of runs scored by home runs for all teams (games through Sunday, April 28th):























Using the Red Sox as an example, they've scored 128 runs this year, hit 23 home runs which drove in 35 runs, for a percentage of 27.3% of their runs coming from home runs. There are many moving parts in this chart:
1. Runs are runs, no matter how they're scored. The Red Sox are in the middle of the pack for homers, don't have many runners on base when those homers are hit, yet are still second in the majors in runs scored. Results, not how they occur, are what matter.
2. You can't score what isn't on base, so this become a proxy measure (and a not-particularly-accurate one, I'll admit) of the number of base runners a team has. You can see how well a team is doing in getting runners on base AND scoring them at this Baseball-Reference.com chart--it tells a much more complete story. The reason the Red Sox are second in the majors in runs scored is because they're fourth in runners on base with 625.
3. Even after approximately 25 games, it's easy to see the teams that can score runs and contrast them with the teams that can't. This is just another measure of how woeful the Marlins are going to be this year--they have no power outside of Giancarlo Stanton, and he's only hit three homers so far. At least the Astros are young and bad--the Marlins are OLD and bad.

Two teams are over the 50% threshold in runs scored on home runs, both Chicago teams. Both are well below the league average in the number of base runners (451 for the Cubs, 425 for the Sox), and runners that aren't on base can't score. Ultimately, that's what is measured using this stat--teams aren't over-reliant on the home run as much as challenged in getting runners on base.

Saturday, April 27, 2013

The Value of Baseball Statistics

I was mowing the lawn this morning and musing over Ken Harrelson and his inane comments regarding sabermetrics (and by extension, ANY baseball measure) in his diatribe on the MLB network on Thursday, April 25th. I shouldn't be disappointed with Hawk, since this is a common refrain which hasn't changed one iota over time (a clear indicator of a man striving to improve his knowledge). Hawk is what he is (and here's what the Wall Street Journal thought he was a couple of months back), but it was the outpouring from folks who actually THOUGHT HE WAS RIGHT that made me sad. As such, I offer this polemic on statistics and why we use them.

At the heart, we use stats for one reason, and one reason only--to tell a story. Words are nice, but numbers can amplify, quantify and supplement that narrative. Just this morning, my wife was describing her place of work and said that something was removed "a long time ago," which meant at least 30 or so years ago in that context. NOT 30 SECONDS LATER she described a restaurant that had changed names "a long time ago," which in THAT context meant two years. Words have meanings, but they're fluid and can be confusing.

Numbers add context. They help explain the why behind what we see, and with any luck, add predictive ability to what we MIGHT SEE in the future. Numbers are only noteworthy when compared to other numbers. Frank Baker was considered a great power hitter, leading the AL in homers from 1911-1914, never hitting more than 12 in that span--he wouldn't make it out of Double-A ball today. But that was the Dead Ball Era--the home run hadn't become a potent offensive tool. In 1911, Ty Cobb drove in 127 runs, leading the AL--and hit eight home runs.

Why were Babe Ruth's 54 home runs in 1920 such a big deal? Two reasons:
1. Because the previous record for home runs in a year had been set by Ruth the year before--with 29. In one year he moved the benchmark by almost 100% (86.2%, to be precise). Very rarely do records fall with that kind of magnitude.
2. George Sisler was second in homers in 1920--with 19. Ruth hit more than twice as many homers as the runner-up.

Fast forward to Hawk's comments, which focused on easily my (least) favorite things:
1. Intangibles
2. Leadership
3. The Will To Win

I was a debater in high school and college (not a very good one) for the obvious reason--to meet women. My first girlfriend was a debater at another high school, and she came home from a tournament and said "The other team said 'God said we can't do it,' so we can't do it--how do you answer that?" You can't--those of us of faith (and I'm one) have some fundamental beliefs that we consider foundational, the building blocks upon which our faith is based. The term dogma can rightly be used, since these beliefs are accepted on faith and not provable using human means. That's why we didn't invoke God in debates--how can you argue it without completely getting off the topic?

That's what "leadership" and "intangibles" are--the dogma, the foundational arguments of the ignorant and uninformed that CAN'T BE CHALLENGED. They're true because they're true, and everyone knows it. They are beyond question. They are accepted by anyone who knows anything, and anyone or anything that questions them is without value. There's one slight problem with this line of reasoning--if everyone has some level of "leadership" or "intangible" in them, how can we use this as any kind of yardstick? Everyone has it--what makes it special? Bryce Harper isn't a great player because he has two legs, four fingers and two thumbs--EVERY player has those. Something else must set him apart.

These qualities are the argument-stopping, I-don't-want-to-discuss-it-anymore lynchpins of the non-metric Luddites. It's their life preserver in an ever-changing world of stats and metrics that THEY DON'T UNDERSTAND and have no intention of investing the effort necessary to learn what can be learned from them. It's laziness disguised as knowledge. The problem with injecting a term like "leadership" into the discussion has a very tacit expectation that it's VISIBLE, and if it's visible, it's measurable. The only way "The Will To Win" works is if someone can recognize it, as opposed to invoke it, which is all Hawk does.

Other than having to watch the Cubs, watching their broadcasts with Len Kasper and Jim Deshaies is a joy because they bring meaning and understanding to these numbers. If a TV is developed where you can block out Hawk and listen only to Steve Stone, you'll have the chance to hear one of the best analysts in the game and not feel your brain melt as you listen to Hawk make you dumber with each sentence. 

A couple of years ago, Chip Caray, Braves TV announcer, was on the Boers and Bernstein Show on The Score, AM 670 in Chicago, and Dan Bernstein innocuously asked him about modern baseball metrics, to which Caray responded that he had no use for them. After making this point for about a minute, he then recounted how Chipper Jones is a sure Hall of Fame inductee (which is true)...by relating a steady stream of his stats. The only stats Caray didn't like were the ones he didn't understand, and if he took the trouble, he'd see they tell a BETTER and more complete story about how Chipper is easily one of the ten best third basemen in baseball history. It's one thing to know something--isn't it nice to know the foundation of that knowledge?

Folks who don't like advanced metrics seem to believe that people who do look at numbers only and ignore anything else. If you look up the definition of "strawman argument" in the dictionary, this is the working defintion . No serious metric person would even think otherwise, let alone say it. Only idiots like Hawk think that, for the same reason he knows nothing about advanced metrics--because he's an idiot. I'm sure Hawk would be surprised to learn that he's a poster boy for 40+ years of post-modern thought, where nothing is real and meaning is changeable at the drop of a hat, but that's what he does when he throws out his cornpone terms. Just because he says it doesn't make it true.

That's why we use statistics. We measure and LOOK for truth, instead of blithely asserting it.

Thursday, April 25, 2013

NFL Draft History (Part 3)

As the hours count down to the 2013 NFL Draft, this chart shows just how important draft POSITION is to overall projected NFL success. This shows the average Career Approximate Value (CAV) by what spot the player was taken, from 1-32 and from 1970-2012:

The average value of the 10th pick is around 75% of the #1, and the #20 pick roughly half the value of the #1 pick. This should come as no surprise, since a dispassionate analysis of most drafts will show about 10-15 (and that 15 is being generous) true can't-miss prospects, which assumes they can stay healthy. The problem is to confuse potential All-Pros with serviceable players, since a team doesn't need to have the best player at every position as much as a good one.

There are no surpises in those 43 #1 picks--by position:
Quarterback--20
Defensive End--9
Running Back--6
Defensive Tackle--2
Linebacker--2
Tackle--2
Wide Receiver--2

And even that's deceptive--here's the last time any position OTHER than QB was taken #1:
Defensive End--2006 (Mario Williams)
Running Back--1995 (Ki-Jana Carter--how'd THAT turn out?)
Defensive Tacke--1994 (Dan Wilkinson)
Linebacker--1988 (Aundray Bruce)
Tacke--2008 (Jake Long)
Wide Receiver--1996 (Keyshawn Johnson)



Now, to really break things out for these top picks. This chart shows these picks by:
1. Busts (no games)
2. Flameouts (1-49 games)
3. Solid (50-99 games)
4. Great value (100+ games)

There are no judgments on the QUALITY of the games played. For example, John Elway played 234 games in the NFL and Vinny Testaverde played 233, which places them in the same group here when anyone lacking sawdust between their ears knows this isn't true. However, in one sense, Testaverde answered questions for his teams for many years--they had no doubts who was going to start at QB, freeing them up to address other needs. Say what you want about him, he didn't play 233 games because coaches and GMs were throwing up their hands and saying "Oh well, it could be worse, I guess..."


























This chart breaks down the players by the CAV to separate the great from the merely serviceable. This is an important distinction, since whomever is chosen #1 tonight won't be picked because they might play a long time--teams expect IMPACT from the #1 pick. I was all set to put in a chart, but a line graph shows it quite nicely:

Not too often do you see a correlation that clear. If it isn't obvious, the horizontal axis is draft picks 1-32, and the value is the Career Approximate Value per Game played. This helps normalize careers that are still in progress, and what was discussed earlier is graphically shown:
1. Picks 1-5 are that much better than the rest
2. Picks 6-15 are roughly similar, meaning teams are advised to fill needs as opposed to drafting the "best available player."
3. Other than some blips (the spike at #18 comes courtesy of one Art Monk, and the blip at #24 is Ed Reed), it becomes a slow but steady decline in value.




This last chart ties it all together and shows the best player by CAV per pick, the person in the mid-range and the person with the lowest CAV:









































This is a ton of information between the three posts I made. I don't make judgments, merely present the information. 

Wednesday, April 24, 2013

NFL Draft History (Part 2)

This chart shows the success teams have had with their picks by round since 1970:

I had a big old mess of a chart that broke all this down from 1970-2012 with about 20 columns and 40+ rows of data and I realized it would look terrible. This summarizes it much more cleanly.

To explain, there have been 1,249 first round draft picks since 1970. EVERY SINGLE ONE has played at least one game in the NFL, with 272 of them with careers between 1-49 games and 977 with careers of 50 games or more, close to a four-year career. Of course first round picks are expected to produce and have productive careers, but to have every single first round pick play at least one game is simply amazing.

As would be expected, the number of players who don't play increases with the lower rounds. There is a slight bias to the numbers in that players that were drafted from 2010 haven't had the opportunity to play 50 games yet, but the general trend is clear and not unexpected. I'm sure someone has done the analysis, but the primary reason why the draft was reduced to seven rounds was because players drafted lower rarely made rosters. In the 1970-1993 time frame when there were more rounds in the draft, approximately 15% of players drafted in those lower rounds made rosters and had decent careers. I would be very curious if a similar percentage of undrafted free agents are making rosters today.

These next two charts show the strike rates by position. Some quick points:
1. I did some cleaning up of how PFR listed the players. Since these charts only deal with players from 1970 on, it won't matter much, but terms such as middle guard, half back, wing back and the like were changed to reflect their modern names.
2. For some reason, the 2011 draft had a number of players listed as offensive linemen instead of guards or tackles. Since I couldn't make accurate calls, I left those players out. Likewise, there were a number of players listed as defensive linemen instead of defensive tackles or ends in the same draft, so they're left out as well. Upon further review, I see why that happened--none of them made pro rosters, but since we're talking about 25 players on offense and defense, it won't make material differences in the numbers.
3. In this case, I truncated the chart down to players drafted and those who had careers of at least 50 games. As mentioned above, this will skew the numbers slightly since the players drafted from 2010 on haven't been able to play enough games to make the threshold, but the trends are still clear.

Offensive positions first:




































Sorry for the typo in the 6th line--it wasn't worth the effort to make new pictures. I beg your forgiveness. Defensive chart:
















The best way to look at these charts isn't in the aggregate, because with the knowledge and data available, teams rarely (not never) make mistakes like Mike Mamula, Tony Mandarich or (heaven forbid) Ryan Leaf. What these charts do is allow one to review a team's picks and see how well they did by round. In the modern NFL, teams just can't afford to have second and third round picks not produce, and these numbers suggest that even sixth and seventh round players have significant careers around 25% of the time.

Just for fun, punters and kickers:
Chances are pretty good you'll recognize those first round punters and kickers--they were:
1. Ray Guy, Oakland, 1973
2. Steve Little, St. Louis Cardinals,1978, K--33 games
3. Russell Erxleben, New Orleans, 1979 P--59 games
4.  Sebastian Janikowski, Oakland, 2000

The moral of the story being that if you're either:
1. Al Davis, or
2. Bad
You'll draft a punter or kicker in the first round.  

NFL Draft History (Part 1)

The NFL is the league that never sleeps, and with the draft coming up, there's lots of chatter regarding the "best" drafts ever. I won't argue with draft experts because they know more than I do, but there is a way that we can effectively measure drafts. Pro-football-reference.com has data available on every NFL draft since 1936, and include two metrics on their charts:
CarAV-- Career Approximate Value (will be shortened to CAV from here on)
DrAV--the Career Approximate Value that the drafted player accumulated for the drafting team
I rooted around and can't find how they calculate Approximate Value, and I'm not sure I really care--what matters to me is that we can rank drafts using these values. Accept or reject them, but they are a yardstick.

In all cases (unless otherwise noted), I only use the first seven rounds of the draft. Since the NFL went to a seven round draft in 1994, it's not fair to use straight sums, even though we can safely assume that the vast majority of players drafted in the eighth round or lower prior to 1993 was very small (and I'll quantify that at some point), using seven rounds only allows for a similar point of reference. With that, this chart shows the best drafts in NFL history, as ranked by the CAV measure:


Some recognition of expansion needs to be acknowledged, so this includes only drafts from 1980 on. To explain, in 1993, there were 196 picks in the first seven rounds, and those players had a cumulative Career Approximate Value of 4,374, or an average of 22.3. The QB-rich draft of 1983 was right on its heels.

Some of the marquis players in that 1993 draft were Michael Strahan (121 CAV), Drew Bledsoe (103 CAV) and Hall of Famer Willie Roaf (101 CAV). Other notables included Jerome Bettis, Mark Brunell, Lincoln Kennedy and Garrison Hearst. By using this measure, steady is preferred to stellar. For example, the 1985 draft included Jerry Rice (160), Bruce Smith (143) and Chris Doleman (112), Hall of Famers all, but didn't have a broad base of success as a draft class. It's not a perfect measure, but it allows for meaningful comparisons.




This next chart shows two things--the best drafts by each franchise for all time, and also since 1980. For relocated teams, where they currently reside is the place shown:

I meant to rank this, but no matter, you can do that on your own. According to the CAV measure, the Packers 1958 draft was the best ever. It included first round pick LB Dan Currie (60 CAV), second round pick RB Jim Taylor (103 CAV) and third round pick LB Ray Nitschke (110 CAV), absolutely nothing wrong with those choices. The second-best was the Steelers draft of 1987, with solid pros Rod Woodson (140 CAV), Hardy Nickerson (122 CAV) and Greg Lloyd (89 CAV), with Nickerson and Lloyd going in the fifth and sixth rounds, respectively.

For those teams that had their best drafts prior to 1980, the right-hand columns list their best drafts since 1980. As you check those drafts, you'll find that the best aren't usually when the biggest stars were drafted, but when a solid core of players that lasted 8-10 years were picked. The Patriots 2000 draft doesn't rank well even though it includes Tom Brady because it didn't have many other players with sustained players.

I'll show one more table, which shows how NFL teams rank since 2002 in cumulative draft CAVs (scroll down a bit):

























There's a pretty solid correlation between success over the past 11 years and how well these teams have drafted. I've paid less attention to the Draft AV column, since in the free agent era it's more difficult for teams to hang onto every player they want, but it sure helps if a team can keep the players they draft, especially if they're good. San Diego in particular was particularly hit, losing both Philip Rivers and Eli Manning, but they still drafted extremely well in the past 11 years. There's little doubt that effective drafting is a key component to NFL success, and this buttresses that. 

Tuesday, April 23, 2013

Game Length

I had the database open to check something else, here's the trend in game length since 1920 (all data adapted from baseball-reference.com):

























After the slow but steady increase, the decrease in offense and some other minor changes appeared to usher in faster games--a decrease of around 12 minutes, but what was gained has been given back in the meantime.

Unfortunately, I suspect that only one thing can speed the game up, and it's not something that people will like. If baseball were able to migrate more and more people to pay TV (NOT just cable, but more like MLB network), the possibility exists that the (potential) increased revenues could offset the need for advertising dollars, which could decrease the amount of ad time sold. The Wall Street Journal recently ran a chart that showed how much of a NCAA tournament game broadcast was really basketball:


















 You can read the brief article here. 30% of a basketball game is actually the game itself, which should come as a surprise  to no one. A couple of years ago, I went to a Brewers game and was watching the replay in my hotel later that night--they only showed the pitches, and the game went by FAST. It's not the game play that slows down the games.

But that would also require the individual teams to give up their most lucrative revenue streams, the sports networks that the larger teams have built. That is simply not going to happen absent some drastic changes in the game, and as long as the teams own those broadcast rights, they'll sell as much advertising as they can to maximize that revenue stream, which is pure common sense. To pray for shorter games is futile--I think we've entered the era where we can only hope they don't get any LONGER.

Saturday, April 20, 2013

Funny (and Generally Tasteless) Baseball Names

A couple of years ago, I was going through my baseball databases and was interested in some of the names that I saw--some were funny, some odd, and many were just humorous at the 4th Grade level. For those that love a good humorous name, this list is for you. It's separated into two categories--the non-Richard category and the (poor) Richards. All links go to their baseball-reference.com page if you want more information on them. The non-Richards come first:
Alamazoo Jennings—played 1 game in 1878, was neither from Kalamazoo nor played anywhere near it
Alex Mustaikis—can I get a “By cracky”?
Anderson Hernandez –the old double last name treatment
Antonio Bastardo –and Jimmy Piersall would say he’s consistently a…
Astyanax Douglass –played in the 20s, have no clue what that first name is
Austin Knickerbocker –no, he wasn’t a Knick, but briefly a Philadelphia Athletic in 1947
Bake McBride –just because it’s a great name
Beauty McGowan—not an ugly man, but it’s still no name for a man
Bill Knickerbocker –also not a Knick, because he played in the 30s, before the Knicks were formed
Bill Mountjoy –I’d have to get a look at her first…
Bill Peterman –gets me thinking of Diedrich Bader yelling “Hey Peterman” in “Office Space”—also had one AB in majors
Boileryard Clarke –just because
Bots Nekola –I think Rick Telander wrote about him once
Brickyard Kennedy—the location and how to get there in one name
Bris Lord—an alternate term for “mohel”
Burleigh Grimes –just for Terry Boers, 670 The Score afternoon host
Casper Asbjornson –just a fun name to say
Charlie Furbush—can I get a…
Cleatus Davidson –he played in 1999
Cookie Cuccurullo –not sure on the pronunciation, but it could be very alliterative
Count Sensenderfer –Sensenderfer sounds like a fun name to pronounce
Creepy Crespi –after retiring, became a Canadian junior hockey coach
Cub Stricker –this year, it’s more like Cub StrickeN
Dave DeBusschere –had a brief baseball career—can I get a…
Davey Crockett –shot him a bar when he was only three…
Drungo Hazewood –just because
Fred Woodcock—not to brag, but…
Gene Woodburn—OUCH!
Harry Chappas—I’m torn as to his popularity at TB Diddler’s
Heinie Meine –one can only hope the last name has two syllables
Ken Szotkiewicz –spelled just like it sounds
Kila Ka'aihue –I’ve never had the slightest idea how to pronounce this--I think it's "not-in-the-majors-now"
Lil Stoner –does Steve Stone have any kids?
Mark Woodyard—now that’s just bragging…
Mel Held –probably would have been a poor offensive lineman
Merlin Nippert –played in 1962, so no old-timer excuse
Mickey Klutts –always one of my favorites
Miller Huggins –also for Terry Boers
Moonlight Graham –not just a figment of W.P. Kinsella’s imagination, this guy, played by Burt Lancaster in “Field of Dreams,” was real and he really was in one game and never got up to bat
Nick Strincevich –sounds like a great name for a Chicago cop
Nino Bongiovanni –his name is longer than his career
Oil Can Boyd –I don’t believe I’ve ever heard the story behind his name
Pat Listach –can I get a…
Pepper Peploski –don’t stand next to a spitter if they pronounce this one
Pickles Dillhoefer –cue the clown sound drop
Pinky Woods –hey now!
Quinton McCracken –can I get a…
Red Woodhead—somehow, that’s too much information
So Taguchi –just because he’s…
Socks Seibold and Socks Seybold –these are two completely different and unrelated players
Twink Twining –say that one fast
Verle Tiefenthaler –another name longer than his career—3 games, 3.2 innings for the 62 White Sox
Woodie Held—thank you!

Now, the Richard category:



Dick Bayless—that’s no way to talk about Skip
Dick Brown—that’s getting a little personal
Dick Cox –redundant
Dick Green –get to the doctor, stat!
Dick Hall –isn’t that the auditorium at T.B. Diddler’s?
Dick Groat –Duke grad, right?
Dick Hunt –probably best not to read this one on-air, or from his brother Mike
Dick Pole –was anyone ever pulling for his staff?
Dick Such –I would KILL to know how that last name is pronounced
Dick Wantz --…what the dick wants…
Les Cox—OUCH!
Terry Cox

 Crass, childish and not worthy of the bandwidth on which it's stored. You're welcome.