Mapping MagicSeptember 26, 2004
My mother and stepfather, both professors of medicine, are relentlessly curious about the world and often ask me about technology that fascinates them. During one of our weekly phone calls, my mother alluded to my stepfather's fascination with MapQuest, one of the Web sites that will produce a map for just about any U.S. address. My stepfather thinks the mapping function is pretty neat, she said, but what really blows him away are the driving directions.
"It seems to him like there are infinitely many routes you can take, but it somehow figures out the fastest way to get there," she said. "It's like magic."
The topic came up again the next time they visited, and I relished the opportunity to make use of my limited knowledge of computer science-all gleaned from teaching the computer science department's "Algorithms" class as a math graduate student at Berkeley. I gave them my best guess of how MapQuest must work.
"There is this elegant algorithm called Dijkstra," I said. "It finds the shortest path between two points quite quickly."
"How does it do that?" my stepfather asked. "Well," I explained lamely, "it grows a cluster of points of known distances from the starting point by looking at all the points nearby and continually adding the one that's closest." At this point, my stepfather began shaking his head with authority. "Nope," he said. "That's not it."
After more thought, I realized he had a point: A lot of magic was missing from my explanation. MapQuest produces customized maps and driving directions in a few seconds, for tens of millions of users a day. For a graph of V vertices
and E edges, Dijkstra runs in time O(V log V + E), and these are potentially very large graphs. How does MapQuest produce directions so quickly? The software must be using tricks that let it look at only a portion of the graph, such as the shortest path to the closest major highway. And to keep that data available
for rapid access, in what form does MapQuest store it?
A nice thing about being a journalist is that when you or your parents have questions like these, there's often an efficient way to answer them. You call up your editor and ask if she'd like a story for SIAM News on the inner workings of MapQuest. If she says yes, you get to call up an engineer at MapQuest's parent company, AOL (a subsidiary of Time Warner), and ask.
After several weeks of back-and-forth with an AOL public-relations person, ("Yes, I really do want to talk to an engineer"), I finally set up an interview with Marc Smith, MapQuest's chief technical developer.
Dijkstra Does Directions
The first thing I learned from Smith was that, despite my stepfather's skepticism, MapQuest really and truly does use a variant of Dijkstra.
For SIAM News readers not familiar with this elegant and simple algorithm, which Edsger Dijkstra developed in 1956 while sipping coffee in a Dutch cafe, this is how it works:
Starting with a graph, directed or undirected, with positively weighted edges (distances) and a source node, Dijkstra computes the distances from the source to all the other nodes in the graph.
The do-loop of the program is quite simple: At any time, there is a cluster of nodes containing the source whose distance from the source is known. The algorithm considers the nodes adjacent to this cluster and computes a tentative shortest-path distance, which is the shortest path containing only nodes within the cluster. The algorithm then takes the node v for which the tentative distance is least and, concluding that this is the real distance, adds the node to the cluster.
To see why this works, suppose that the computed distance is not the real distance to v, but an overestimate. There is then some shortest path from the source to v that contains nodes outside the cluster, and there is a first vertex u outside the cluster on this path. The distance to u through the cluster would have to be less than the tentative distance computed to v---a contradiction.
The magic of Dijkstra is in the running time, which depends wholly on how the data about the graph is stored. The algorithm does best with a data structure called a "heap," which is like an inverted tree with some unfilled leaves at the bottom. Using the heap, the algorithm can continually pick the node with the shortest tentative distance, without having to sift through all the nodes each time. Instead, it updates the heap simply by following a path from top to bottom that has length logarithmic in the number of leaves. (A "Fibonacci heap" can be updated in constant time, amortized over all queries, but is apparently very difficult to program and is thus not often used in practice.)
Smith acknowledged that MapQuest uses a "double Dijkstra" algorithm for its driving directions, working backward from both the starting and ending points at once. "It's like the golden spike mentality," he said, in a reference to the building of the first transcontinental railroad in the U.S. from both coasts at once. Smith also conceded that the algorithm uses heuristic tricks to minimize the size of the graph that must be searched.
For edge weights, Smith said, the algorithm uses estimated driving times, rather than distances, based on classifications of the different types of roads. The MapQuest software also takes into account such factors as left-turns and long-term construction, and the newest version even incorporates weather conditions and time of day.
"Over time, we have done such radical things to the algorithm that it totally disresembles the original algorithm," Smith said.
MapQuest gets all its road data from about 30 different vendors, each of which concentrates on a certain region. These vendors hire people to drive around and explore new subdivisions and ongoing road construction projects to keep the data up to date. (See http://www.fastcompany.com/magazine/76/mapquest.html.) The site also has a user-feedback feature for identifying problems with vendor data.
The software, which runs on about three dozen Sun servers, is maintained and continually improved by a team of about 20 programmers.
Delighted with the answers I was getting, I rubbed my palms together, anticipating a definitive article that would lay it all out for my stepfather. But when I asked about the crucial issues---how MapQuest stores its data and optimizes its algorithm---I began to run into problems.
"Please describe the heuristics you use," I said. "Also, how do you splice together the map data? How do you store it?"
"Unfortunately," Smith replied, "the way we store the data is what makes MapQuest MapQuest." I got the hint.
Sara Robinson is a freelance writer based in Pasadena, California.