Today marks the running of the 2014 Boston Marathon. Last year an explosion took the lives of three people. Hundreds were injured, many losing limbs. 6,000 runners were unable to complete the marathon due to the death and destruction.
Richard Smith is Triangle based marathon runner, and a statistician. He was approached with an unusual request from race organizers. Could he, somehow, figure out official finish times for the runners who could not complete the race?
This is more complicated than it might seem at first. A runner's pace fluctuates. She might go more slowly up Heartbreak Hill, or he might speed around the corner onto Boylston Street, just a few blocks from the finish line.
Once I got their email," said Smith, "of course I knew I had to help them."
Smith gathered a team of researchers. They started with a huge data set including:
- all the runners who reached the halfway point of the 2013 race but did not finish
- all the runners from the 2010 Boston Marathon
- all the runners from the 2011 Boston Marathon
The data was provided in chunks, called "split times." Each runner's time was documented every 5 kilometers. Then it all came down to math...What was the missing split time?
The researchers experimented with different methods, even one which estimated ratings which "Netflix subscribers would have given to movies they had not seen."
Finally, though, the researchers settled on a method called "nearest neighbor." The researchers looked at individual runners and compared them with runners who had similar times from 2010 and 2011. They used the information from 200 "nearest neighbors" to come up with a time for each 2013 runner.
Lead researcher Richard Smith is not resting on his laurels in his office today. He is out running the 2014 Boston Marathon. The race organizer, the Boston Athletic Association, offers the ability to track a runner's progress in real time. Follow him here.
Here's more information about the project, Completing The Results of the 2013 Boston Marathon.