Archive for the ‘Statistics’ Category

NBA Draft Math: Strength of Draft Class

June 2, 2012 7 comments

After creating a simple metric to evaluate the success of an NBA draft pick, I realized that the same approach could be used to evaluate the overall strength of a draft class.

To quantify the success of an individual draft pick I’m looking at the total minutes played by a player during the first two years of his contract.  As far as simple evaluations are concerned, I think minutes played is as good a measure as any of a player’s value to a team, and I’m only looking at the first two years as those are the only guaranteed years on a rookie’s contract.  This is by no means a thorough measure of value–it’s meant to be simple while still being relevant.

After using this measure to compare the performance of individual draft picks, I used the same strategy to evaluate the entire “Draft class”.  I computed the average total minutes per player for the entire first round (picks 1 through 30, in most cases) of each draft from 2000 to 2009.  Here are the results.

There doesn’t seem to be much variation among the draft classes, but the 2006 draft certainly looks weak by this measure.  Upon closer inspection, that year does seem like a weak draft:  the best players being LeMarcus Aldridge (2), Brandon Roy (6), and Rajon Rondo (21).  The weakness of the 2000 draft also seems reasonable upon closer inspection at

Another approach would  be to somehow aggregate the career stats of each player in a draft, rather than looking at only the first two years, but that would make it difficult to compare younger and older players.

Are there any other suggestions for rating the overall strength of an NBA draft class?

Click here to see more in Sports.

NBA Draft Math, Part I

May 31, 2012 9 comments

Having put some thought into the mathematics of the NFL draft, I decided to turn my attention to basketball.  From an anecdotal perspective, the NBA draft seems to be more hit-or-miss than the NFL draft:  teams occasionally have success and draft a great player, but it seems more common that a draft pick doesn’t achieve success in the league.

In an attempt to quantify the “success” of an NBA draft pick, I researched some data and ending with choosing a very simple data point:  the total minutes played by the draft pick in their first two seasons.

Total minutes played seems like a reasonable measure of the value a player provides a team:  if a player is on the floor, then that player is providing value, and the more time on the floor, the more value.  I looked only at the first two seasons because rookie contracts are guaranteed for two years; after that, the player could be cut although most are re-signed.  In any event, it creates a standard window in which to compare.

There are plenty of shortcomings of this analysis, but I tried to strike a balance between simplicity and relevance with these choices.

I looked at data from the first round of the NBA draft between 2000 and 2009.  For each pick, I computed their total minutes played in their first two years.  I then found the average total minutes played per pick over those ten drafts.

Not surprisingly, the average total minutes played generally drops as the draft position increases.  If better players are drafted earlier, then they’ll probably play more.  In addition, weaker teams tend to draft higher, and weak teams likely have lots of minutes to give to new players.  A stronger team picks later in the draft, in theory drafts a weaker player, and probably has fewer minutes to offer that player.

However, when I looked at the standard deviation of the above data, I found something more interesting.  Standard deviation is a measure of dispersion of data:  the higher the deviation, the farther a typical data point is from the mean of that data.

Notice that the deviation, although jagged, seems to bounce around a horizontal line.  In short, the deviation doesn’t decrease as the average (above in blue) decreases.

If the total number of minutes played decreases with draft position, we would expect the data to tighten up a bit around that number.  The fact that it isn’t tightening up suggests that there are lots of lower picks who play big minutes for their teams.  This might be an indication that value in the draft, rather than heavily weighted at the top, is distributed more evenly than one might think

This rudimentary analysis has its shortcomings, to be sure, but it does suggest some interesting questions for further investigation.

Click here to see more in Sports.

Joe Girardi, Probability, and Expected Value

During last night’s Yankees-Twins baseball game, the commentators were discussing the Yankees’ increased use of defensive shifts.

A “shift” is a defensive realignment of the infield to guard against a particular player’s hitting tendencies.  For example, if a player is much more likely to hit the ball to the right side of the infield (as, say, a strong left-handed hitter might be), a team may move an infielder from the left side to the right side to increase the chance of defensive success.

Dramatic infield shifting was once a rarity in the game, employed against only a few hitters in the league.  It is now being used with increasing frequency.  “All the data is out there,” said the announcers when discussing Yankees’ manager Joe Girardi’s explanation of why he was using it more.  (Which sounded remarkably like what Rays’ manager Joe Maddon, a pioneer in increased defensive shifting, had to say when asked about it some time ago).

The essential idea is that, given the reams of data now recorded on player performance, teams have a much more refined understanding of what a player will do.  No longer is the projection “The player has a 30% of getting a hit”; now, it’s “The player pulls 83% of ground balls to the left side of the infield”.  Naturally, teams try to use such information to their advantage.

It’s good that Joe Girardi is demonstrating an increased appreciation for, and understanding of, probability.  But as last night’s game suggests, he may need to learn more about the principle of expected value.

Early in the game, the bases were loaded with two outs, and a left-handed batter came to the plate.  Girardi put the defensive shift on, responding to data on this player that suggested he was extremely likely to ground out to the right side of the infield.  But probability considerations should be only one part of the analysis.  By leaving so much of the left side of the infield undefended, a situation was created where a weakly hit ground ball that would usually be an easy out actually produced two runs for the Twins.

In short, although the probability of that event (ground ball to the left side) was low, the risk (giving up two runs) was high.  Considering both the probability and the payoff is essential to long-term success.

I’d be surprised if the Yankees’ employ the shift again in that situation.  And if the Yankees need a special quantitative consultant, I am available during the summer.

Click here to see more in Sports

Statistically Solving Crossword Puzzles

I am lover of crossword puzzles.  I do the NYT crossword puzzle regularly, I’ve competed in the American Crossword Puzzle Tournament, and I’ve even dabbled in constructing puzzles myself.

There’s a great deal of crossover between math lovers and crossword puzzle lovers, and one example of this crossover is Matthew Ginsberg.  Ginsberg is a regular puzzle constructor, has a PhD in math from Oxford, and is an expert in artificial intelligence.

Not a huge stretch, then, that he has developed a rather effective crossword puzzle solving robot, Dr. Fill, that is now challenging the top human performers .

Ginsberg runs a company that produces software for the Air Force that helps calculate the most efficient flight path for airplanes.  Here’s the cool part:  “Some of the statistical techniques [used to calculate optimal paths of airplanes] are also handy, it turns out, for solving crossword puzzles.

Yet another example of how statistical reasoning is emerging as primary tool in modern science and society!

Click here to see more in Application.

Superbowl Scoring

February 6, 2012 Leave a comment

After enjoying a well-contested Superbowl that seemed to appropriately represent the teams, the season, and the league in terms of the level of play and competitiveness, I started wondering about how the big game compares to regular season play.  I wondered if teams performed better or worse, on average, given the pressure and scrutiny of the championship game.

I thought a simple place to start examining this question would be to look at Superbowl scoring versus regular season scoring.  Below is a chart showing the difference (Superbowl Score – Average Regular Season Score) for all 46 Superbowls.

At the far right, we see the results of Superbowl 46:  Giants 21, Patriots 17.  The league average in scoring this years was 22 points per game, so the difference here is 38 – 44 = -6.

It seems as though it is more common for more points to be scored in the Superbowl than in an average regular season game.  Unfortunately, there are a lot of stories one could tell about why that might be so:  better teams (and therefore better offenses) make it to the Superbowl; defenses are more susceptible to pressures of the big game; the extra preparation time gives offensive coordinators and advantage.

So how could we more rigorously explore the quantitative characteristics of the Superbowl?

Click here to see more in Sports.

Yet Another Way to Lie With Statistics

January 12, 2012 Leave a comment

This is a nice takedown of some spurious economic analysis, courtesy of Freakonomics:

Looking at the graph at the right, it’s hard not notice the negative correlation between the two given variables, and the economist in question uses that correlation to bolster his policy argument.

The graph looks a lot different, however, when you look at all the available data, not just the data between today and the arbitrarily chosen cut-off of 1990.  But that chart doesn’t support the argument as decisively.

As the author suggests, “Be wary of economists wielding short samples.”

Click here to see more in Statistics.

The Year in NFL Scoring

As the books close on the 2011 NFL regular season, it’s time to revisit my pre-season prediction that the new kickoff rule would result in a slight decrease in per-game scoring.

The pre-season predictions on the number of touchbacks turned out to be fairly accurate.  In 2011, about 43% of kickoffs (922 out of 2151) resulted in touchbacks; in 2010, only 16% of kickoffs (359 out of 2221) resulted in toucbacks (thanks to for the data).

Did the increase in touchbacks reduce overall scoring in 2011, as hypothesized?  No.  In 2011, around 44.4 points were scored per game in the NFL; in 2010, around  44.1 points were scored per game.  Per-game scoring actually increased slightly this year !

One issue worth mentioning, however, is the disproportionate effect the top three scoring teams have on the data.  During the 2010 season, New England was the highest scoring team in the league with 518 points total points; this was nearly 80 points more than the second highest scoring team.  In 2011, the Packers, Saints, and Patriots all scored over 500 points!  If we remove the three highest-scoring teams from each season, scoring for the rest of the league actually drops about 0.7 points per game.

It’s been fun drilling down into the data this year, and many other interesting questions popped up along the way.  And off-season changes always create new opportunities for analysis.

Click here to see more in Sports.

%d bloggers like this: