## Google and Conditional Probability

Conditional probability is one of my favorite topics to teach. Whereas normal probability calculations simply compare *favorable outcomes *to *total* *outcomes*, conditional probability allows us to consider the impact of certain knowledge on the likelihood of those outcomes.

For example, the probability of rolling a 6 on a six-sided die is 1/6, but if it is known that the number showing is greater than 3, then the conditional probability that a 6 is rolled is 1/3.

There are many applications of conditional probability, but a recent “Math Encounter” from the Museum of Math made me aware of an application of conditional probability that all of us see on a regular basis: Google search autocomplete.

Suppose I type in the search term “under”:

Here, Google is trying to *autocomplete *my search query. In essence, Google is trying to guess the next word I’m going to type. How does it make its guess? It computes a conditional probability!

Google has a lot of data on when words follow other words. When I enter “under” into the search bar, Google looks for the word/phrase with the highest conditional probability of being next. Here it turns out to be “armour”; the word with the second highest conditional probability is “world”, and so on.

Naturally, as more information is provided, the conditional probabilities change.

A fascinating, and perhaps surprising, application of a powerful mathematical idea!

*Click here to see more in Application*

## Joe Girardi, Probability, and Expected Value

During last night’s Yankees-Twins baseball game, the commentators were discussing the Yankees’ increased use of defensive shifts.

A “shift” is a defensive realignment of the infield to guard against a particular player’s hitting tendencies. For example, if a player is much more likely to hit the ball to the right side of the infield (as, say, a strong left-handed hitter might be), a team may move an infielder from the left side to the right side to increase the chance of defensive success.

Dramatic infield shifting was once a rarity in the game, employed against only a few hitters in the league. It is now being used with increasing frequency. “All the data is out there,” said the announcers when discussing Yankees’ manager Joe Girardi’s explanation of why he was using it more. (Which sounded remarkably like what Rays’ manager Joe Maddon, a pioneer in increased defensive shifting, had to say when asked about it some time ago).

The essential idea is that, given the reams of data now recorded on player performance, teams have a much more refined understanding of what a player will do. No longer is the projection “The player has a 30% of getting a hit”; now, it’s “The player pulls 83% of ground balls to the left side of the infield”. Naturally, teams try to use such information to their advantage.

It’s good that Joe Girardi is demonstrating an increased appreciation for, and understanding of, probability. But as last night’s game suggests, he may need to learn more about the principle of **expected value.**

Early in the game, the bases were loaded with two outs, and a left-handed batter came to the plate. Girardi put the defensive shift on, responding to data on this player that suggested he was extremely likely to ground out to the right side of the infield. But probability considerations should be only one part of the analysis. By leaving so much of the left side of the infield undefended, a situation was created where a weakly hit ground ball that would usually be an easy out actually produced two runs for the Twins.

In short, although the probability of that event (ground ball to the left side) was low, the risk (giving up two runs) was high. Considering both the probability and the payoff is essential to long-term success.

I’d be surprised if the Yankees’ employ the shift again in that situation. And if the Yankees need a special quantitative consultant, I am available during the summer.

*Click here to see more in Sports*

## Leap Day Birthdays

In my Leap Day contribution to the New York Times Learning Network, “10 Activities for Learning About Leap Year and Other Calendar Oddities,” I calculated the odds of a person having a Leap Day birthday.

Assuming each day of the year is an equally likely birthday, and noting that there is one Leap Day every four calendar years, I calculated the probability to be

*P *(Leap Day Birthday) =

or around 0.7%.

So how many people with Leap Year birthdays do **you **know?

*Click here to see more in Probability.*

## A One-in-a-Million Baseball Play

As the **2011 MLB season** winds down, there is a slim chance of something very unusual happening: a three-way tie for the wild card playoff birth!

It seems highly unlikely that the **Red Sox**, **Rays**, and **Angels** will actually all finish in a dead-heat, but if they do, it will pose a lot of problems for playoff scheduling.

This is a fun, if complicated, math question to think about: what are the chances that after a 162-game season, three of the eleven teams ultimately vying for the wild card end up with identical records?

To investigate, the first thing I’d do is simplify the situation. I’d reduce the number of teams and the number of games, give every team a 50/50 chance to win every game, and then see what happens. After I’d explored a bit, I’d then consider complicating matters by using more teams, more games, and more realistic winning percentages.

A math challenge that any **Strat-o-matic** player could love!

*Click here to see more in Sports.*

## Un-Random Shufflers

This is a great story about how statisticians at Stanford audited a new automatic shuffling machine and determined that the cards weren’t distributed **randomly enough**.

If a deck of cards is dealt one at a time, a knowledgeable observer, in theory, should be able to **predict** the next card dealt around **4.5 times** per 52-card deck. For example, by remembering which cards have been dealt, the observer will definitely know the final card, as it’s the only one that **hasn’t** been dealt. Similarly, the observer will have a 1 in 2 chance of guessing the second-to-last card, and so on. Calculations involving **probability** and **expected value** will give you the theoretical result.

For this particular shuffler, however, the statisticians from Stanford determined that an observer should be able to predict the next card **9.5 times** per 52-card deck! The shuffling machine manufacturer that hired them must have been pretty upset to hear this, but redesigning the machine is probably not as costly as selling casinos hundreds of predictable shufflers and then dealing with the consequences.

It should come as no surprise that **Persi Diaconis** is the lead author on the paper. Diaconis is a living legend in the world of mathematics, having left home at an early age to become a sleight-of-hand artist, then returning to earn a PhD from Harvard in mathematical probability. One of Diaconis first major results was proving that seven shuffles are necessary to “randomize” a standard 52-card deck.

The full paper from Stanford can be found here:

http://statistics.stanford.edu/~ckirby/techreports/GEN/2011/2011-08.pdf

*Click here to see more in Probability.*

## Paul the Octopus, 2008-2010

Paul the Octopus, whose prognosticating skills captured the imaginations of World Cup viewers everywhere, died this week at the age of 2.5. He died of natural causes.

Paul defied probability by correctly predicting the results of all of Germany’s seven World Cup matches. After making it through the tournament with a perfect record–during which he received death threats and had a stamp printed in his honor–Paul retired from predicting. Rediscovering his British roots, Paul was appointed an official ambassador for England’s 2018 World Cup bid, a post he held until his untimely demise.

Apparently there have been many copy-cats, so to speak, including “a saltwater crocodile named Dirty Harry, who predicted Spain’s World Cup final win and called the result of Australia’s general election by snatching a chicken carcass dangling beneath a caricature of Prime Minister Julia Gillard”. But Paul will always have a special place in our hearts.

*Click here to see more in Appreciation*

** **

## Are You Related to Confucius?

I recently read an interesting argument claiming that we are all descendants of Confucius. Basically, the argument goes like this:

No matter who you are, you came from a mother and a father (I won’t go into details). So, in your family tree, the part **behind** **you** has two branches, like this:

The same goes for your mother and father, and their mothers and fathers, and so on. Thus, continuing on back the line, you see a family tree like this

And it just keeps going and going and going. Now, an interesting mathematical feature of this tree is that, as your move backward in time, each generation has **twice as many branches** as the generation that precedes it (roughly speaking). Go back a hundred or so generations to the time of Confucius–that means that the number of branches in your family tree at that time is roughly **2 raised to the 99th power!**

Now, 2^99 = **633825300114114700748351602688 **(thank you, **WolframAlpha!**) A reasonable estimate is that, at the time of Confucius, there were only something like 250000000 (250 million) people **total**. Each of those 2^99 spots in your family tree has to be filled by **someone**, which means, on average, each person in existence at that time had to fill roughly **2535301200456458802993** of the spots in your family tree. It’s virtually a **statistical impossibility** that Confucius wasn’t filling one of those spots.

I guess that makes us cousins?

*Click here to see more in Probability.*