I've been interested in sports statistics since I was a kid re-typing the weekly batting and pitching stats that were included in the Sunday sports sections. The numbers themselves weren't as important to me as the stories they told--even then, I wasn't as interested in whether a pitcher had an 3.47 ERA as much as how well that compared to other pitchers. For as long as I can remember, the CONTEXT of the numbers was much more important than the number itself.
Around 2000, four events occurred that each played a role in my love of sports stats:
1. While taking a basic statistics course for my MBA one of the problems in the multiple linear regression section had us determine the prime determinants of wins using data from the 1990 season. To combine baseball stats with the processing power of a computer vs. the absolute tedium of fitting a regression line manually was as close to an epiphany as I'd ever had.
2. Bill James published his updated Historical Baseball Abstract in 2001, and in it he ranked the best players in baseball history at each position from 1 to 100 (with bios) and then 101-125 (just a list). My interest was piqued to see who DIDN'T make the list and if I thought they DID belong in the top 100, and if I could replicate it.
3. 670 The Score in Chicago switched frequencies so that I could hear them in eastern Iowa. At the time, I was a pharmaceutical sales rep for GlaxoSmithKline and spent up to 5-6 hours a day in a car, so good radio content was always appreciated during the hours Rush Limbaugh wasn't on.
4. I can't state how I became aware of it, but when I first saw baseball-reference.com, I was hooked.
I started creating a database of baseball data so I could make meaningful comparisons. At first, I was manually entering data from Total Baseball editions, but as I learned advanced techniques, I was able to amass data quicker. By 2005, I had created my first complete database of baseball data from 1871-2004--all of the approximately 17,000 players who had played, the 1,000+ managers. I included trades, salaries, the best metrics available at that time (Total Baseball's Total Player Index) and was at last able to start making some comparisons.
I collect data the way some people hoard trash or old magazines, so my data sets have increased. Most of these databases wouldn't have been possible prior to the release of Microsoft Excel 2007, which allowed for spreadsheets that could include over 1,000,000 lines vs. the old limit of around 56,000. For example, to get every every player's hitting stat into a spreadsheet requires about 87,000 lines (for example, Hank Aaron played 23 seasons, requiring 23 lines to account for his seasons), so in the old days, that would require 2 spreadsheets (or two tabs in one spreadsheet, which is effectively just as useless), adding complexity in making comparisons. Right now, I'm working on play-by-play data to investigate any number of questions, and I've honed my techniques to where I can get a full slate of baseball games done in an hour.
My intentions remain the same--numbers are just numbers, I'm much more interested in what the numbers can tell us. For example, Babe Ruth's 1920 season when he hit 54 HRs was amazing not only because he hit that number, but because:
1. No one had EVER hit that many homers in a season
2. No one had EVER hit EVEN REMOTELY CLOSE to that many homers in a season (the previous best had been 29 by Ruth in 1919 and 27 by Ned Williamson in 1884)
3. No one else hit EVEN REMOTELY CLOSE to that many homers in 1920 (#2 in the majors was George Sisler of the St. Louis Browns--with 19)
I will use numbers like WAR (Wins over Replacement Player), FIP (Fielding-Independent Pitching) and other stats that may not be immediately recognizable, but never let the numbers get in the way of the story, or even worse, replace the story. All people like me are doing is providing context--HOW MUCH BETTER was this player (quick example--Mariano Rivera has a career WAR of around 66, and the next-best relief pitcher is Rich Gossage, with a WAR around 33--Rivera's metric in that case is TWO TIMES BETTER than that of the second person, which is HUGE).
I'm always ready and willing to answer questions or expand on methods. Never forget my bottom line hope--to explain. Very rarely will I state my opinion--I find my opinions boring, and I assume others will as well. There is no shortage of outlets available for people to tell you what they think, but I'm more interested in laying out a case and supporting it with relevant data. Whether I'm successful in that regard is up to you to decide.