San Francisco Transit Prediction Accuracy

You’re waiting at the bus stop and the sign says the next vehicle will arrive in three minutes. Five minutes go by…then ten minutes…still no bus. You start wondering if the bus will ever show up.

Real-time arrival information is extremely important for transit riders. If you rely on this information, however, odds are you have felt the frustration of incorrect arrival times. As frequent transit riders, we’ve always been curious to know just how accurate this information truly is.

To answer this question, we collected real-time predictions from NextBus, the official transit information provider for San Francisco, during the month of August 2015 for the entire city of San Francisco every minute — that’s several hundred million data points. We then analyzed the data by comparing each predicted arrival time against the bus’ actual arrival time to measure the accuracy of the prediction.

We discovered that NextBus’s prediction algorithm has significant room for improvement, which can not only be frustrating for transit riders, but also for agencies whose goals are to provide a better service for their customers. Additionally, we found that it is possible for riders to avoid inaccurate predictions by accessing crowdsourced reports from fellow passengers. This post reveals our findings and explores the benefits of working together to improve the daily commute.

The Results

Before diving into the data, let’s be precise about how we characterize accurate predictions. We categorize a given prediction as accurate if the actual arrival time of the vehicle is anywhere between 30 seconds earlier and 4 minutes later than the predicted arrival time. For example, if a NextBus prediction says the bus will arrive in 3 minutes, and the bus actually arrives anywhere between 2 1/2 minutes from now (30 seconds early) until 7 minutes from now (4 minutes late), we categorize this as accurate. We provide far less room for error with regards to early bus arrivals since in this case the rider will miss her bus and will have to wait for the next one.

Overall Accuracy

We found that when the bus is within 30 minutes from a desired stop, predictions on average are accurate 70% of the time. This means that for San Francisco, real-time predictions are only marginally better than the 60% accuracy of scheduled arrival information (as measured by the SFMTA). In addition, NextBus predictions become much less accurate during commute hours when riders need them most. This is due to the fact that NextBus makes transit predictions based on limited historical information. Furthermore, historical information cannot account for the real-time variability that occurs during commute hours caused by rapidly changing traffic conditions, overcrowded buses, and a variety of other issues.

Here’s an animated rendering that highlights when and where predictions are least accurate in San Francisco during the morning commute. Each dot in the picture below represents a transit stop at a particular time. Darker dots represent larger prediction errors. As you can see, NextBus is least accurate during peak commute times around 7:30am, and the areas that are the most affected are downtown in the Financial District, along Market Street, and across Geary Blvd. Visit http://bit.ly/1HAC0wz to see prediction accuracy over the whole day.

 Real-time transit prediction accuracy from NextBus in San Francisco during the morning commute. Each dot represents a Muni stop at a particular time. Darker dots represent stops where the NextBus prediction accuracy was lowest. See prediction accuracy over the entire day at  http://bit.ly/1HAC0wz . Map will render with local times.

Real-time transit prediction accuracy from NextBus in San Francisco during the morning commute. Each dot represents a Muni stop at a particular time. Darker dots represent stops where the NextBus prediction accuracy was lowest. See prediction accuracy over the entire day at http://bit.ly/1HAC0wz. Map will render with local times.

Prediction Accuracy by Time to Destination

NextBus accuracy plummets as it tries to predict arrivals further out in time. When a vehicle is 5 minutes away from its stop, NextBus is accurate 91% of the time. However, when we’re looking at predictions 25–30 minutes out, NextBus prediction accuracy drops to 59%.

 NextBus prediction accuracy by time to destination (i.e. how far away the vehicle is from the current stop you are at). Prediction accuracy slips below scheduled arrival times once vehicles are 30 minutes away.

NextBus prediction accuracy by time to destination (i.e. how far away the vehicle is from the current stop you are at). Prediction accuracy slips below scheduled arrival times once vehicles are 30 minutes away.

Prediction Accuracy by Route

The 82X, 28, Muni Metro Bus Shuttle, 81X, and 39 routes ranked lowest in prediction accuracy in San Francisco. Predictions for these routes are often worse than scheduled arrival information. Additionally, every express route, with the exception of the 8BX, has a prediction accuracy below 70%, the average accuracy for the city of San Francisco. Express routes only run during commute hours, so this further validates that NextBus prediction accuracy declines when the transit system is under higher stress levels during the commute.

 NextBus prediction accuracies for the five most and least accurate Muni routes.

NextBus prediction accuracies for the five most and least accurate Muni routes.

Conclusions

Looking at the data, one thing is clear — while generally an improvement over scheduled timetables, NextBus is only accurate 70% of the time and struggles when we need it most — during the commute.

Working Together to Improve the Commute

Many of the problems with NextBus predictions appear to be related to changing real-world conditions, like traffic and transit delays, that get amplified during commute hours. To help improve the rider experience, we experimented crowdsourcing transportation issues, like delays and overcrowding, directly from within Swiftly, our iOS app currently in public beta. Our hope was that if our users could find and report issues during their commute, we could help other riders avoid those problems. We were surprised by what came next. Our users began reporting dozens of transit issues, often times even before official agency alerts were broadcast to the public!

 Screen shot from Swiftly iOS public beta app. Users can report a variety of transit issues to fellow riders.

Screen shot from Swiftly iOS public beta app. Users can report a variety of transit issues to fellow riders.

The heatmap below shows the most common areas where Swiftly users reported various issues. Regions with the most dense volume of reports are shown in red. The figure to its right is a snapshot of the where NextBus is least accurate at 6:46 pm. It is clear that Swiftly users are reporting issues in the same neighborhoods where NextBus is least accurate. In other words, the Swiftly community is actually quite effective at finding and broadcasting transportation issues that can cause real-time predictions to be inaccurate. While this does not solve prediction accuracy problems, it demonstrates the potential for a broader community to work together to avoid common transit issues and delays.

 Left figure is a Heatmap showing where Swiftly users report transportation issues in San Francisco. Areas that are highlighted in red have the greatest density of user reports. Right figure is a snapshot of where NextBus is least accurate at 6:46 pm. Darker dots represent Muni stops where NextBus predictions are less accurate. The images demonstrate that Swiftly users are reporting transit issues in the same areas where NextBus suffers from low accuracy. A majority of issues are downtown in the Financial District, along Market Street, along Geary Blvd to Richmond, and along Judah Street to Ocean Beach.

Left figure is a Heatmap showing where Swiftly users report transportation issues in San Francisco. Areas that are highlighted in red have the greatest density of user reports. Right figure is a snapshot of where NextBus is least accurate at 6:46 pm. Darker dots represent Muni stops where NextBus predictions are less accurate. The images demonstrate that Swiftly users are reporting transit issues in the same areas where NextBus suffers from low accuracy. A majority of issues are downtown in the Financial District, along Market Street, along Geary Blvd to Richmond, and along Judah Street to Ocean Beach.

At Swiftly, we’re on a mission to make getting around town fast, affordable, and environmentally friendly. At the core of that vision is a community of users working together to help fellow riders avoid transit delays and issues in real-time. We’re extremely thankful for all the support we have received to date, and we invite you to give Swiftly a try!

All the best,

The Swiftly Team

Questions

There are so many additional ways we could analyze predictions. If you have any questions regarding the data, please don’t hesitate to contact our CEO Jonny Simkin at jonny[at]goswift.ly.

The City and the SFMTA

While this report covers the ways in which NextBus is inaccurate, we should note that the city and the SFMTA are doing a lot to make transit work better, including adding new vehicles, dedicated bus lanes, and transit signal priority. There’s still a lot of work to do, but it is important to take a step back and look at how far we’ve come.

Acknowledgements

This report would not have been possible without the help of:

Jake Feldman: Assistant Professor at Olin Business School. Ph.D in Operations Research from Cornell University.

Bobby Nyotta: Ph.D Student in Decisions and Operations Management at the UCLA Anderson School of Management.

Mike Smith: former General Manager and Chief Technology Officer at NextBus, Inc.

and many others, including Chris Antaki, Stan Parkford, and Will Dayton.