youtube.nixfred.com nixfred.com

Google Maps is unreasonably fast. Let me explain

Veritasium, with the animation channel 2swap, rebuilds the chain of algorithms that lets Google Maps route across a 64 million intersection continent in seconds when brute force would take longer than the age of the universe. It starts with Edsger Dijkstra inventing his shortest path algorithm in 20 minutes over coffee in Amsterdam in 1956, then climbs through breadth first search, Dijkstra's cheapest cost first method, A star with a straight line distance heuristic, bidirectional search, and the manual road hierarchies of 1990s GPS units. The payoff is contraction hierarchies, which rank every intersection by importance using graph cuts and nested dissection, add shortcut edges, and search only upward to hit roughly 200 microseconds on North America, about 35,000 times faster than Dijkstra. The throughline is that every modern routing method still has Dijkstra's 70 year old algorithm at its core.

Published May 30, 2026 29:54 video 29 min read Added Jun 16, 2026 Open on YouTube →

At a glance

You ask Google Maps for the route from New York to San Francisco, and four seconds later you have it. That is unreasonable. The North American road network has more than 64 million intersections, which gives on the order of ten to the 220 possible routes between two distant points. Checking a billion routes a second, brute force would take longer than the age of the universe many times over. This video, a collaboration between Veritasium and the animation channel 2swap, rebuilds the actual chain of ideas that beats that wall, starting from Edsger Dijkstra sketching his shortest path algorithm in his head over coffee in Amsterdam in 1956, and ending at the modern technique that runs the same search 35,000 times faster.

The arc is a ladder of algorithms, each one fixing the flaw of the last. Breadth first search finds the fewest hops but ignores distance. Dijkstra's algorithm handles weighted roads but searches outward in every direction, including away from the target. A star adds a geographic hint so the search leans toward the goal, which is great for shortest distance but stumbles when you optimize for travel time. Bidirectional search runs from both ends and meets in the middle. And finally contraction hierarchies, built by ranking every intersection by importance and stitching in shortcuts, deliver sub millisecond queries on a continent sized map. The punchline: the core of all of it is still Dijkstra, a 70 year old idea born of a 20 minute invention.

The problem: too many roads to count

Derek opens with the scale of the thing. Drive from New York to San Francisco, and you want the absolute shortest path. There are over 64 million intersections in the North American road system, and a simplified estimate gives around ten to the 220 possible routes. Even checking a billion routes per second, testing all of them would take over ten to the 200 years. Brute force is not slow, it is impossible.

And this is not only a maps problem. Any independent system that has to navigate, a robot crossing a room, a video game character solving a maze or an obstacle course, faces the exact same combinatorial explosion. Yet Google or Apple Maps finds you the route in about four seconds. The rest of the video is the story of how.

It traces back to a shopping trip in Amsterdam in 1956. Edsger Dijkstra was working on software for ARMAC, an early computer built at the Mathematical Center in the Netherlands, and it was almost ready to debut. The problem: the public considered computers practically useless. So useless that when Dijkstra tried to file his marriage license and stated his profession as "programmer," the municipal authorities of Amsterdam refused it on the grounds that no such profession existed. Under "profession," his marriage act ended up with the entry "theoretical physicist."

Dijkstra needed a demo to prove ARMAC's power even to a layperson. In his own words, the problem statement had to be something non mathematicians could understand, and they had to understand the answer too. On that shopping trip he found it: what is the shortest way to travel from Rotterdam to Groningen? In other words, a shortest path algorithm.

Breadth first search: fewest hops, wrong question

Picture a simplified graph of the Netherlands. The nodes are cities. The edges connecting them are roads. The simplest way to find the shortest path from Rotterdam to Groningen is to fan out. Start at Rotterdam, explore all of its neighbors looking for Groningen. It is not among them, so mark Rotterdam explored and keep going. Branch out from those four nodes to all of their neighbors, and keep branching, exploring every neighboring node, until you stumble on Groningen.

This is breadth first search. It always finds the shortest path in terms of number of steps, because it checks everything one step away, then everything two steps away, and so on. If a four step path from Rotterdam to Groningen existed, it would have surfaced on the fourth iteration. It did not, so the shortest must be five steps.

But there is a fatal flaw. Breadth first search treats every road as the same length, which is obviously false. Add weights to the edges to represent real distances, and suddenly "fewest hops" and "shortest drive" diverge. A five hop path down country lanes can be far longer than a six hop path that uses a straight highway. Dijkstra needed a method that respected distance.

Dijkstra's algorithm: cheapest path first

The idea came to Dijkstra over coffee. As he tells it: one morning he was shopping in Amsterdam with his young fiancee, and tired, they sat down on a cafe terrace to drink a cup of coffee, and he was just thinking about whether he could do this.

His scheme was this. Keep track of the shortest known distance from the source to each node. Call that the node's cost. If you can nail down the true shortest path to one node, you can build paths to its neighbors from there, then to the next, and on and on. Set the starting node's cost to zero. Since the rest of the graph is unexplored, set every other node's cost to infinity.

Then, like breadth first search, start at the source and look at each neighbor. At every edge ask one question: is the current path to this node the shortest we have seen so far? If an edge of weight one reaches a node whose current cost is infinity, then yes, one beats infinity, so update that node's cost to one. The same pass updates other neighbors to three, two, and four. Once all of Rotterdam's neighbors are checked, mark Rotterdam explored.

Now the crucial move: explore next the unexplored node with the lowest cost. Check each of its edges to try to reduce its neighbors' costs. You can pull one neighbor from infinity down to six, a path of length five plus one. But another neighbor already has cost three from a direct edge off the source, and the path through here would cost five, so you leave it alone. There is already something shorter. Continue: explore the node with cost two next, updating a neighbor to five. Then the node with cost three, updating its neighbors, and so on, always taking the cheapest unexplored node.

Sometimes a node's cost gets revised more than once. One node is first reached by a path of weights four then three, total seven. Later a different path turns out to cost six, so the cost drops to six. The algorithm keeps the best it has found.

1 3 4 2 3 5 6 2 7 S 0 1 3 5 5 8 T 7 source target
Figure 1. Dijkstra's algorithm in miniature. Numbers in the nodes are costs, the cheapest distance found so far from the source S. Numbers on the edges are road weights. The algorithm always expands the cheapest unexplored node next, so it discovers shortest paths in increasing order of cost. The target T is only finalized once it is itself the cheapest unexplored node, which guarantees nothing shorter was missed.

The first time Dijkstra's search reaches Groningen, the cost is 19, but Groningen is not yet the cheapest unexplored node. Another node sits at cost 16 and could hide a shorter route to the goal. So keep exploring in cost order, and a path of cost 18 to Groningen turns up. Now, among all unexplored nodes, Groningen has the lowest cost. That is the moment of certainty. Here is the why: if the cheapest remaining cost is 18, then every path of cost 16, 14, 13, every path under 18 has already been explored, and none reached Groningen by a shorter route. It is breadth first search again, only stepping outward by cost instead of by hop count. By exploring from lowest to highest cost, Dijkstra's algorithm guarantees the shortest path to any target.

Dijkstra on the invention: it was a 20 minute invention, designed without pencil and paper. He later said one advantage of designing without pencil and paper is that you are almost forced to avoid all avoidable complexities.

The first route planner, and a decade in a drawer

At ARMAC's official inauguration in 1956, Dijkstra asked the audience for two towns on a simplified map of the Netherlands. ARMAC printed the shortest route, town by town. It ran perfectly. The machine was such a hit that the center immediately started building its successor.

The algorithm itself, though, sat in his papers for years. Algorithms were not considered respectable enough for mathematics journals, and there were barely any computing journals yet. In 1959 he finally published it in Numerische Mathematik, a brand new German journal that needed help getting established. That was the beginning. Dijkstra recalled his amazement when the algorithm became one of the cornerstones of his fame, finding it in the early 1960s in a German management science book as "Das Dijkstra'sche Verfahren." Suddenly there was a method named after him. It is still regarded as one of the most elegant shortest path algorithms.

But it is not good enough for Google Maps. Take a short city trip: Newark Airport in New Jersey to the Central Park Zoo. Dijkstra checks all the 10 kilometer journeys, then all the 20 kilometer journeys, on up to the 28 kilometer journeys, which finally includes the zoo. The search frontier balloons to over 65,000 nodes and sweeps in places nowhere near the route, like Staten Island and large swaths of New Jersey. Even searching in those illogical directions, the runtime is about a tenth of a second, which is incredibly fast. For anyone who wants to try this, the team built a feel for these algorithms with Python scripts on real maps from OpenStreetMap.

A star: point the search at the goal

A tenth of a second sounds fast, so what is the problem? Bigger graphs. On the full North American network a well tuned Dijkstra averages around seven seconds per path, and for many queries it has to explore most of the 64 million nodes. One caveat on that average: most queries in it are long journeys, since two randomly picked points are unlikely to be close. Within a single city Dijkstra runs much faster, but the continent wide average is the fair way to compare algorithms.

Seven seconds you could wait. But you are not alone. You are in line behind Derek, Gregor, Casper, and everyone else, and there can be millions of people asking for directions at once. Google has many servers, but there is a ceiling. To answer in a couple of seconds at that scale, the algorithm itself needs to run in under a millisecond, something like 7,000 times faster than Dijkstra.

The first big idea: Dijkstra wastes effort searching the opposite direction from the target. If you know the rough direction of the goal, punish illogical routes. Back to Newark to the Central Park Zoo. You want to prioritize nodes closer to the zoo. Using longitudes and latitudes you can compute the straight line distance from any node to the target. Then order nodes not by cost alone, but by cost plus that straight line distance.

A great way to visualize this comes from the channel Polylog: treat each node's straight line distance to the target as its height, so the flat graph stretches up into 3D. The farther a node is from the zoo, the higher it sits, and the algorithm penalizes any move that climbs the graph. Nodes in the wrong direction are not explored early. So where Dijkstra's frontier spreads in all directions, this modified version, called A star, heads straight for the zoo. It checks only around 7,000 nodes, almost a tenfold improvement.

That straight line distance is called a heuristic, and tuning it changes how aggressively A star aims at the target. The steeper the slope, the less it spreads. It is a more human way to find a route, exploring mostly in the goal's direction, which is why A star is one of the most popular algorithms behind video game NPCs. Many Minecraft mobs use a version of A star with extra penalties for dangerous zones, and it is a popular maze solver. At worst A star checks as many nodes as Dijkstra, but on the right maze it can cut the search space by more than half.

source Dijkstra: spreads everywhere source target A star: leans toward the goal
Figure 2. Why A star wins on distance. Dijkstra has no idea where the target is, so its frontier is a circle growing outward in every direction. A star adds a straight line distance hint to each node's priority, stretching the frontier into a narrow ellipse aimed at the target. On the Newark to Central Park Zoo trip A star checked about 7,000 nodes against Dijkstra's 65,000, an order of magnitude fewer.

Shortest does not mean fastest

Here is the catch. When you actually drive, you do not care about minimizing distance, you want to arrive in the least time. To optimize for time, divide each road by its maximum allowed speed to get a minimum travel time, and use that as the heuristic. But this heuristic is an extreme underestimate, and that hurts A star badly.

One of the researchers in the video lays it out plainly. If you want the shortest geographic distance, A star gives you real advantages over Dijkstra. But if you optimize for travel time, A star, from his experience, loses against a well tuned Dijkstra. You explore fewer nodes, but not many fewer, and now you also have to evaluate a lot of heuristics you would not need for pure Dijkstra, and those heuristics contain a nasty square root computation.

The numbers make it concrete. On the New York City graph, Dijkstra explored about 9.5 times more nodes than A star to find the shortest path by distance. But switch the objective to shortest travel time, and the number of nodes A star explores triples, while Dijkstra only grows by 10 percent. And it gets worse on larger graphs. So A star, the clean win for distance, is the wrong tool for the thing people actually want, fastest arrival. Google Maps needs something better suited to travel time.

Bidirectional search: meet in the middle

What if you ran the search from both ends at once, source to target and target to source? If two points are a distance R apart, a single Dijkstra explores roughly the nodes inside a circle of area pi R squared. With two searches, each frontier only has to reach about R over two before they overlap, so together they cover about half pi R squared, half the area of the single search. Depending on the graph it can do even better than half.

The demo: a basic Dijkstra from Carnegie Hall to Wall Street explored over 7,200 nodes. A bidirectional Dijkstra explored a little over 2,600, close to a threefold improvement.

But even bidirectional search misses the logic baked into road networks. As one researcher puts it, these algorithms still do things you would never conceive of as a good idea. They will look at the drive through at In-N-Out and check whether driving halfway into it and back out is somehow a shorter path. Dijkstra and A star simply have no sense of the hierarchy of a road network.

Road network hierarchy

Think about how you really plan a drive. You meander on local roads, hop on a highway, ride it most of the way, then hop off and meander on local roads near the destination. There is a hierarchy to road networks. But these algorithms cannot exploit it: they are just as likely to search a twisty local road as an interstate, as long as the weights match. The question is how to encode that human intuition into an algorithm.

Industry's first answer, in 1990s in car GPS units, was to do it by hand. They precomputed a road class and annotated it onto every road. Surveyors went out and recorded a road's speed limit, width, whether it was a highway, and sent that to mapping companies, who manually sorted roads into hierarchies like express highway, local major road, or narrow road.

With those maps, a GPS ran a bidirectional Dijkstra, but with a twist. It first searched only the narrow roads within a candidate area around the start. If the target was not found, it stepped up a level to the major local roads and searched those within a larger candidate area. Still not found, it stepped up again to the highway level. The two searches eventually overlapped and returned a path. A little preprocessing shrank the search and let the algorithm use the road hierarchy.

The flaw was correctness. How do you compute the levels right, and prove you are not missing shortest paths? If the candidate area is too small, the algorithm might return a highway path when a chain of local roads is actually better. Make the candidate area large enough to always be correct, and you are barely better than Dijkstra. To map the whole world, Google needed a routing algorithm that was both fast and provably correct.

The tradeoff: preprocessing versus query time

Every speedup here is really a trade between work you do once, ahead of time, and work you do per query. The brute force lookup table is the extreme. If you precomputed and stored the shortest path between every pair of points, queries would be instant, just a table read. But building it is absurd: running Dijkstra from each of 64 million intersections is more than a decade of compute on a single core, and the full table would be over eight petabytes, like storing all of Wikipedia 16 times over. Worse, you would have to rebuild large chunks every time a road closes, a bridge opens, or traffic shifts.

At the other extreme, Dijkstra needs no preprocessing at all but is slow at query time. Google Maps wants to live between these poles: very fast queries with only moderate preprocessing.

Dijkstra no prep, slow query full lookup table 8 PB, decade of prep contraction hierarchy the sweet spot preprocessing time → query runtime →
Figure 3. The whole design problem on two axes. Plain Dijkstra costs nothing up front but is slow per query. A full lookup table answers instantly but needs a decade of compute and eight petabytes, and breaks on every road closure. Contraction hierarchies land in the low corner of both axes: heavy but one time preprocessing buys sub millisecond queries that survive traffic updates with only a cheap re weighting pass.

Contraction hierarchies: rank, then shortcut

The new idea echoes the manual hierarchy but changes two things. First, automatically preprocess the graph to rank nodes by importance, no more categorizing roads by hand. An exit linking two highways is important; the intersection into a tiny cul de sac is not. Second, run a bidirectional Dijkstra that only ever searches up the hierarchy from both ends, and if you build it cleverly enough, you can drop the candidate areas entirely and still guarantee the shortest path.

Phase one is building the hierarchy. How do you decide which nodes are important? Take two small towns connected by a single bridge. Strip one town down to a single house. Anyone leaving that house has to cross the intersection at the end of the bridge, so every shortest path to and from the house runs through that node. It is very important to that house, but not to the rest of the town, since the only reason to cross is to reach that one house. Add more houses on that side, and the bridge node grows more important: every trip between the two sides must use it. It matters most when the two towns are about the same size, splitting the graph in half, because that is when the maximum number of shortest paths run through it. Other sets of nodes also split the graph, like a main street, but a street has many intersections while the bridge is a single node. So a small cut of nodes that roughly splits the graph in half is important.

Bring back North America. A set of blue nodes roughly splits the continent in half: to get from anywhere on the east coast to anywhere on the west coast, you must pass through one of just 102 nodes. As a researcher notes, that is really small, and where the cut runs is essentially the Mississippi River, and bridges over the Mississippi are expensive to build, so there are few of them. Those 102 nodes are the biggest bottlenecks, so they get the highest rank.

Now find the next most important nodes by doing the same thing on each side. On the east coast the next cut follows the Ohio River, then loosely traces up the Appalachian Mountains before heading back south along the Potomac River. On the west coast the cut mostly follows the Rockies, then the Colorado River. Rank those nodes next, split each half again, rank again, and recurse until all 64 million nodes are ranked. This recursive splitting is called nested dissection.

The preprocessing has two goals at once: an algorithm that searches few nodes, so queries are fast, and one that always returns the true shortest path. The hierarchy has to serve both.

Why a search up the hierarchy needs shortcuts

Test it on a small graph, ignoring edge weights at first. Two nodes split the graph roughly in half, so they get the highest rank. On one side, the node that splits its subset in half gets the highest of the three there; the remaining two can be ranked in any order. Do the other side the same way. Interestingly, you could swap rankings within the same cut, or split the graph differently and get yet another valid ranking that even splits perfectly in half. All of these rankings are valid, and the reason will matter in a moment.

The clean way to picture a ranking is to contract the nodes in order, pulling node one furthest down, two next up, and so on, so the hierarchy literally stacks. Now bring in the search. To keep it fast, only move up the hierarchy. This is exactly why bidirectional search is essential: if you could only search one way, you would get stuck on many pairs; instead the two searches climb and meet at the top.

Add the edge weights back and ask for the shortest path from node four to node eight. From node four, search up and reach node nine at cost 10. From node eight, search up and reach node nine at cost five. Meeting there suggests a path of cost 15. Done, except it is wrong. The real shortest path costs only three, but it dips through lower ranked nodes, which an upward only search never checks.

The fix is a little more preprocessing. Start at the lowest ranked node and ask: does it connect to at least two higher ranked nodes? If node one connects to three and eight, that is a lower triangle. The path through node one might be the shortest between three and eight, but the upward search would never consider it, so add a shortcut edge directly between three and eight, weighted with the cost of that lower triangle. This shortcut does not correspond to any real road; it is purely bookkeeping for shortest paths. Move to the next lowest node and repeat. If there are several lower triangles, or an edge already exists between the two higher nodes, the shortcut keeps whichever path has the minimum cost.

A path that only uses lower ranked nodes is like a path that only uses small local roads. You do not want the algorithm returning a highway route when a local road is genuinely shorter. Building shortcuts from the bottom up guarantees you never miss those paths, while still ignoring them when they are not needed. So you both shrink the search space and never miss the shortest path.

top cuts (Mississippi) regional cuts local roads S T meet search up from S search up from T
Figure 4. A contraction hierarchy. Intersections are ranked by importance: local roads at the bottom, continent splitting cuts like the Mississippi at the top. A query searches only upward from the source S and the target T until the two climbs meet. Dashed shortcut edges, added once during preprocessing from each lower triangle, let the upward search leap over contracted local nodes without ever missing a true shortest path.

There is a bonus to shortcuts: you do not need a perfect ranking. Cut the graph roughly in half and rank nodes within a cut in any order; if a ranking overlooks a shortest path, a shortcut adds it back later. That is why the same instructions earlier produced three valid rankings. But while any ranking still finds the shortest path, a bad one runs slow or blows up the shortcut count. A straight line of nodes contracted left to right needs no shortcuts, but an end to end search then has to walk every node and edge, no better than Dijkstra. Another ranking needs only two edges end to end but demands many shortcuts, and still checks all the edges in some sections. So there is a real tradeoff between how many shortcuts you add and how much faster the query gets. In practice nested dissection gives a decent ordering without taking too long. The whole method is called a customizable contraction hierarchy.

The three phases, measured on North America

Ben ran the method on the North American network to get real numbers.

Phase one orders the nodes and adds the shortcuts. This only changes when the graph itself changes, a new bridge or a road closure, and it is by far the most expensive step, but it almost never runs. The build came in around an hour and 40 minutes; spend more time and you get better results, which is an argument for doing so.

Phase two computes the weights for all the shortcuts. This has to rerun on every traffic update, so it must be quick: about seven or eight seconds without parallelization, down to about a second with it, at which point memory becomes the constraint.

Phase three is the actual search. Recall a well tuned Dijkstra explores most of the 64 million nodes and takes around seven seconds per long North American path. The contraction hierarchy answered the same query in about 200 microseconds, and depending on configuration you can get down to roughly 100 microseconds, though below a millisecond is all you need in practice. That is 35,000 times faster than Dijkstra in this experiment, and on average it explored only 1,450 nodes, cutting the search space by a factor of around 44,000.

To picture it: a Dijkstra search from San Francisco to Montreal checks essentially every node in a vast region between them. On a customizable contraction hierarchy it searches just 1,236 nodes, clustered tightly around the source and target. Once the search climbs high enough up the hierarchy, it only needs the most important cuts, like the Mississippi.

AlgorithmIdea it addsNodes / time on the big graph
Breadth first searchfewest hopsignores road length entirely
Dijkstra (1956)cheapest cost first, exactmost of 64M nodes, about 7 s/path
A starstraight line distance heuristicabout 10x fewer for distance, poor for travel time
Bidirectional Dijkstrasearch from both ends, meet in middleroughly half the area, about 3x fewer in the NYC demo
Manual hierarchy (1990s GPS)hand classed road levels + candidate areasfaster, but hard to prove correct
Contraction hierarchyauto rank by cuts + shortcutsabout 1,450 nodes, near 200 microseconds, 35,000x faster
Figure 5. The ladder of ideas, each step fixing the flaw of the one before. Green marks where an approach wins, amber where it falls short. The core search underneath every row from Dijkstra down is still Dijkstra's algorithm; the later rows only change which nodes it is allowed to look at.

How map apps actually work

Nobody outside the companies knows exactly what Google Maps, Apple Maps, or other big routing apps run under the hood, but they all could be built on contraction hierarchies. Customizable contraction hierarchies are just one variation, and there are more routing algorithms the video did not cover. Yet at the core of all of them is still Dijkstra, a 70 year old algorithm.

Four decades after Dijkstra's article, Danish computer scientist Mikkel Thorup said that all theoretical developments in single source shortest paths have been based on Dijkstra's algorithm. Since 2000, many of the newest shortest path algorithms still use bits and pieces of Dijkstra, from variations on contraction hierarchies all the way to a recent theoretical breakthrough for an even faster shortest path algorithm. All of it descends from a 20 minute invention.

Simplicity is prerequisite for reliability

Part of Dijkstra's enduring success is how simple the algorithm is. As he said, simplicity is a prerequisite for reliability. He thought deeply about problems before ever touching a pen, because at the end of the day it would be a programmer, not the machine, who is accountable for the work.

He hoped for one lasting legacy. In his words: if 10 years from now, when you are doing something quick and dirty, you suddenly visualize that he is there looking over your shoulder and say to yourself "Dijkstra would not have liked this," that would be enough immortality for him. Even if you only control your little corner of the world, you can still make it as elegant and thoughtful as possible, for you and everyone who comes after.

The video closes, fittingly, with the voice every driver knows: you have arrived at your destination.

Key takeaways

Chapters

0:00 What is a shortest path algorithm? 3:30 Dijkstra's 20 minute algorithm 6:30 The first route planner 10:31 A star search algorithm 12:40 Shortest does not mean fastest 15:08 Road network hierarchy 18:29 Mapping North America, nested dissection 25:17 How do map apps work? 28:04 Simplicity is prerequisite for reliability

Notable quotes

"It was a 20 minute invention. I designed it without pencil and paper. I learned later that one of the advantages of designing without pencil and paper is that you are almost forced to avoid all avoidable complexities." Dijkstra, 5:30

"The municipal authorities of the town of Amsterdam didn't accept it on the ground that there was no such profession. And under the heading profession, my marriage act shows the ridiculous entry, theoretical physicist." Dijkstra, 2:00

"If you go for travel time, A star, from my experience, loses against a well tuned Dijkstra. You explore fewer nodes, but not that many. But you now also have to evaluate quite a few heuristics, and those heuristics contain a nasty square root computation." 13:40

"They still look at the drive through at In-N-Out and see if there isn't perhaps a shorter path if you drive halfway into it, and then some clever way out of it." 15:30

"That's essentially the Mississippi River. And bridges over the Mississippi are expensive to build." 19:50

"That's 35,000 times faster than Dijkstra in just this experiment. And on average, it only needed to explore 1,450 nodes, cutting the search space by a factor of around 44,000." Henry, 25:00

"If 10 years from now, when you're doing something quick and dirty, you suddenly visualize that I'm here, looking over your shoulders, and say to yourself, Dijkstra would not have liked this. Well, that'd be enough immortality for me." Dijkstra, 28:40

Resources mentioned

Full transcript
- [Derek] Say you wanna drive from New York to San Francisco. How could you calculate the absolute shortest path? There are over 64 million intersections in the North American road system, so a simplified estimate gives around 10 to the 220 possible routes. Even if you could check a billion routes per second, testing all of them would still take over 10 to the 200 years. And this isn't just a problem for navigation software. Any independent system like a robot or a video game character that needs to navigate a maze, or obstacle course faces this same problem. And yet, Google or Apple Maps will find you the route in only four seconds. So how is that possible? Well, we teamed up with the channel 2swap, who creates incredible simulations and animations to show you step by step, how these algorithms work. They trace their origins back to a shopping trip in Amsterdam in 1956. (jaunty music) Edsger Dijkstra needed a problem. He was working on software for ARMAC, an early computer developed at the mathematical center in the Netherlands. And it was almost ready to debut. But at the time, the general public considered computers to be practically useless. So useless, in fact, that Dijkstra ran into issues filing his marriage license. - [Casper's Dijkstra Impression] Dutch marriage rights require you to state your profession, and I said that I was a programmer. But the municipal authorities of the town of Amsterdam didn't accept it on the ground that there was no such profession. And believe it or not, that under the heading "profession," my marriage act shows the ridiculous entry, "theoretical physicist." - [Derek] And so Dijkstra was looking for a specific demo to prove just how powerful ARMAC was, even to the layperson. - ['Dijkstra'] The problem of course, is that for a demonstration for non-computing people, you have to have a problem statement that non-mathematicians can understand. And they have to understand the answer. - [Derek] And during that shopping trip, he thought of the perfect problem. - ['Dijkstra'] What's the shortest way to travel from Rotterdam to Groningen? - [Derek] In other words, a shortest path algorithm. Here's a simplified graph of the Netherlands. The nodes represent the cities, and the edges connecting them are the roads. The simplest way to find the shortest path from Rotterdam to Groningen looks like this. Starting from Rotterdam, we're first going to explore all its neighbors looking for Groningen. Since Groningen isn't one of these neighboring nodes, we'll mark Rotterdam as explored, and keep searching. So now we'll branch out from these four nodes to explore all of their neighbors. And we keep branching out this way, exploring all neighboring nodes, until we stumble upon Groningen. This algorithm is known as Breadth-First Search. It will always find the shortest path because it checks all nodes one step away, and then two steps away, and so on. So if there were a path from Rotterdam to Groningen in four steps, we would've found it on the fourth iteration, but we didn't. So the shortest path must be five steps. But there's a problem with this approach. I mean, the algorithm considers all these roads to be the same length, which is obviously not true. So let's add weights to represent distances between these nodes. Now the shortest path is much harder to figure out. So Dijkstra needed a better method. - ['Dijkstra'] One morning I was shopping in Amsterdam with my young fiance, and tired, we sat down on the cafe terrace to drink a cup of coffee. And I was just thinking about whether I could do this. - [Henry] His idea was this. First he wanted to keep track of the shortest distance from the source to each node. This is also known as its cost. His goal, well, if he could find the shortest path from the source to one node, then he could build up other shortest paths starting from this new node, and then the next, and on and on. Now, the cost of the starting node was zero. But since he hadn't actually explored the rest of the graph yet, he set all the other nodes to a cost of infinity. Then just like breadth-first search, Dijkstra started from the source, and explored each of its neighboring nodes. At every step he'd ask, "Is the current path to this node the shortest we've seen so far?" For example, this edge only has a weight of one, while the node's current cost is infinity. So this path is the shortest yet to that node. So Dijkstra updated its cost to one. Likewise, these nodes were updated to three, two, and four. After checking all of Rotterdam's neighboring nodes, Dijkstra marked it as explored. Then since this node had the lowest cost out of all the unexplored nodes, Dijkstra explored it next. Like before, he checked each of its edges to reduce the neighbor's costs. Dijkstra could update this neighbor from infinity down to six, a total path length of five plus one, but he couldn't update this one. The current path has a cost of five, but the nodes' cost was three. That meant there was already a shorter path, this one directly from the source. So there was no need to update the cost. The next node to explore is this one with a cost of two, which would update its neighbor to five. Then this one with a cost of three would update its neighbors, and so on. Sometimes Dijkstra had to update a node's cost a few times. For example, he first reached this node through this path with edge weights four, and then three, for a total cost of seven. But later he found this path was cost six. So he updated the nodes cost again. (gentle music) The first time Dijkstra reached Groningen, it had a cost of 19. But it didn't have the lowest cost out of all the unexplored nodes. There was still this node with a cost of 16, which could be hiding a shorter path to our goal. So he continued exploring nodes in cost order, and found this path with cost 18. Now, out of all unexplored nodes, Groningen had the next lowest cost. So Dijkstra was confident that he'd found the shortest path. Here's why. If the lowest cost among the unexplored nodes is 18, then Dijkstra must have explored all the paths with cost 16, 14, 13, every path less than 18. And none of those paths reached Groningen. It's just like breadth-first search, searching every step away from the source. By exploring from lowest to highest cost, Dijkstra's algorithm guaranteed the shortest path to any target. - ['Dijkstra'] It was a 20-minute invention. I designed it without pencil and paper. I learned later that one of the advantages of designing without pencil and paper is that you are almost forced to avoid all avoidable complexities. - [Derek] During ARMAC's official inauguration in 1956, Dijkstra asked the audience members for two towns on a simplified map of the Netherlands. ARMAC then printed the shortest route town by town. It ran perfectly. The computer was such a success that the center immediately started the next generation. And as for his algorithm, well, it just sat in his papers for years. Algorithms weren't respectable enough for mathematics journals, and there weren't very many computing journals yet. But in 1959, he published his work in Numerische Mathematik, or Numerical Mathematics, a brand new German journal that needed some help getting established. And that was the beginning. - ['Dijkstra'] Eventually, that algorithm became, to my great amazement, one of the cornerstones of my fame. I found it in the early '60s in a German book on management science, Das Dijktra'sche Verfahren. Suddenly there was a method named after me. Dijkstra's algorithm is still regarded as one of the most elegant, shortest path algorithms. But it isn't good enough for Google Maps. Let's say I want to get from Newark Airport in New Jersey over to the Central Park Zoo. First, Dijkstra's algorithm checks all the 10-kilometer journeys, and then all the 20 kilometer journeys, and so on until it reaches all the 28 kilometer journeys, which includes the zoo. The search frontier covers over 65,000 nodes, and includes places that are way off, like Staten Island, and large swaths of New Jersey. But even though it searched in a illogical directions, the runtime was about a 10th of a second, which is incredibly fast. And for anyone wanting to set up their own pathfinding tests, we were able to get a feel for these algorithms using some Python scripts and actual maps from OpenStreetMaps. Solving problems using code like this is as much about the journey as it is about the destination. You get to understand how Dijkstra works under the hood. Along the way, you also brush up on your coding skills. If that's something you wanna try out, I strongly recommend today's sponsor, Boot.dev. Instead of blinding you with code, their courses drill in the fundamentals by giving you actual problems to solve. Designing an RPG game, building a payment app, or yes, running Dijkstra's algorithm. Their courses start with overview videos and steadily feed in terminology to make sure that you actually understand it. They even cover pronunciation. You ever heard of SQL? Okay, it's SQL. Well, it's a programming language that's become a bottleneck for certain jobs. Take data science. 60% of US job listings last year required you to know SQL. That's almost 80% for Python. Boot.dev is geared up to teach you the languages and tools that you'll need for backend and DevOps roles. From Go in Docker to Linux and AWS. You can start from scratch, or jump into more advanced topics. See, DevOps engineers in the US are still some of the highest paid tech roles. Because these skills are genuinely hard to learn. And that's the point. To get started, go to Boot.dev and use our code "Veritasium" to get 25% off your entire first year on the annual plan. I wanna thank Boot.dev for sponsoring this part of the video. Now less than 100 milliseconds seems pretty fast. So what's the problem? Well, things slow down a lot with bigger and bigger graphs. Like the North America network. One way we can compare search algorithms is by their average query speed. First we pick lots of random points. Then we run Dijkstra's algorithm on each pair of points and average all the runtimes. On the North American network, a well-tuned Dijkstra takes around seven seconds per path. And for a lot of these searches, the algorithm has to explore most of those 64 million nodes. One note, most of the queries in this average are for very long journeys, since it's unlikely that two randomly picked points will be close together. Now we've seen that Dijkstra runs much faster within one city, but this average is a good way to compare pathfinding algorithms. But still, I can wait seven seconds. I mean it took about four seconds on my phone. But the thing is, it's not just me. I'm waiting in a line behind Derek, Gregor, Casper, and all of you. There can be millions of people asking for directions all at once. And while Google has a lot of servers, there's a limit. If we want Google Maps to run in a couple of seconds, we actually need the algorithm to run in less than a millisecond, something 7,000 times faster than Dijkstra's. (whooshing sound) Well, one problem with Dijkstra's is it searches in the opposite direction of the target. So if we know the rough direction, is there a way that we can punish a illogical route? Let's go back to our problem, getting from Newark to the Central Park Zoo. We'd like to prioritize nodes that are closer to the zoo. So using longitudes and latitudes, we can easily calculate the straight line distance between any node and our target. We'll order the nodes by their cost, plus this straight line distance. A great way to visualize this comes from the channel Polylog. We can represent each node's straight line distance to the target as its height, so the graph stretches into 3D space. The further a node is from the zoo, the higher it is. So the algorithm penalizes moves that climb up the graph. That way nodes in the opposite direction aren't explored early on. So while Dijkstra's search frontier spreads out in all directions, this modified Dijkstra's, also called A-star, immediately heads toward the zoo. It only checks around 7,000 nodes, which is almost a 10 times improvement over Dijkstra's. And by changing this penalty, also called a heuristic, we can tweak how aggressively A-star targets the zoo. The steeper the slope, the less A-star spreads out. It's a more human approach to finding the shortest path, since we're only exploring in the rough direction of the target. That's why A-star is one of the most popular algorithms behind video game NPCs. For example, many of the mobs in Minecraft use a version of A-star with extra penalties for dangerous zones. It's also a popular maze-solving algorithm. At worst, A-star checks as many nodes as Dijkstra. But on the right maze, it can cut down the search space by more than half. But when I'm trying to get somewhere, I don't care about minimizing the distance. I just wanna get there the fastest. (Henry laughing) To optimize for time, we divide by the maximum speed allowed on the graph. This gives us the minimum travel time. But this heuristic is an extreme underestimate. - If you wanna go for the shortest path in the sense that shortest geographic distance, you will gain some advantages using A-star, compared to Dijkstra. If you go for travel time, A-star, from my experience, loses against a well-tuned Dijkstra. You explore fewer nodes, but not that many. But you now also have to evaluate quite a few heuristics that you would not need to evaluate when doing pure Dijkstra. And those heuristic contain a nasty square root computation. - [Henry] On our New York City graph, Dijkstra's algorithm explored around 9.5 times more nodes than A-star, to find the shortest path by distance. But if we switch that to the shortest travel time, the number of nodes A-star explores triples, while Dijkstra only increases by 10%. And it only gets worse on larger graphs. So Google Maps need something better that works well for travel time. What if we ran the search in both directions? From the source to the target, and from the target to the source? If two points are a distance R away, Dijkstra searches most of the nodes roughly within a circle of area pi R-squared. But with two searches, the frontiers will overlap around R over two, and cover an area of half pi R-squared, half that of the single search. And depending on the graph, it can sometimes be better than half. A basic Dijkstra search from Carnegie Hall to Wall Street explored over 7,200 nodes. But a bi-directional Dijkstra explored a little over 2,600 nodes, close to a three times improvement. But even a bi-directional search doesn't take advantage of the logic built into road networks. - They still do things that you wouldn't necessarily conceive as a good idea, right? Like they still look at the drive-through at In-N-Out and see if there isn't perhaps a shorter path if you drive halfway into it, and then some clever way out of it, right? So maybe that is the intuition between why kind of Dijkstra and A-star itself don't suffice. Both of them still kind of don't have a good sense of the hierarchy of the road network. - When I think about how to get from one place to another, I know that I'm probably gonna meander around some local roads, then hop on a highway, and then when I'm close to my destination, I'm gonna hop off the highway and meander around some more local roads. There's a hierarchy to road networks, but these algorithms can't take advantage of them. So they're as likely to search a twisty local road as an interstate highway, as long as they have the same weight. So how do we actually encode that human intuition into an algorithm? - If you look at kind of industry implementations in the '90s, for instance, like in these early in-car GPS systems, they had like a pre-computed road class that they annotated on the roads. - So here was the basic idea. Surveyors would go out and take notes about a road. Like its speed limit, how wide it was, whether it was a highway. And then they would send that to the mapping companies who would manually categorize it into hierarchies. An example of a hierarchy might be express highway, local major road, or narrow road like this. Once it had the maps, GPS would run a bi-directional Dijkstra, but there was a catch. It first searched all the narrow roads within a candidate area. If it didn't find the target, it mapped the path to the next hierarchy, the major local roads. Then search all of the major local roads within the new candidate area. If it still didn't find the target, it moved up again to the highway level. The two searches would overlap at some point, and return the final path. This way, a little pre-processing could reduce the search area and take advantage of the road hierarchy. - One obvious downside here is that how do you correctly compute those levels of hierarchy? And how can you prove to yourself that they're correct in the sense that you are not missing any shortest paths? - For example, if the candidate area is too small, there's a chance the algorithm returns a highway path, when a series of local roads really is better. A much larger candidate area might always find the shortest path, but at some point, that's not much better than Dijkstra. If Google wanted to map the world, they need a better routing algorithm. (upbeat music) Using a pre-processed hierarchy gives us way faster query runtimes. But, there's a trade-off. Our brute force algorithm from earlier where we just calculated all the paths to get the shortest one is a perfect example. If we had a massive lookup table of all the shortest paths from any two points, then we would get insanely fast query runtimes. But building that list would take way too long. Here I have a graph. On the vertical axis is query runtime. And on the horizontal axis is pre-processing time. In the case of our massive lookup table, it would be right about here. Even running Dijkstra from every one of those 64 million intersections, that's more than a decade of compute on a single core. And the entire table would be more than eight petabytes of raw data. That's like storing all of Wikipedia, 16 times over. And even if we had the table, we'd have to rebuild entire portions whenever a road closes, a new bridge opens, and for every small traffic adjustment. Now in reality it would probably not even fit on this chart. But for now, we're just gonna put it here. On the other hand, Dijkstra's algorithm requires no pre-processing at all. But it takes a long time to run. So on our graph, it would fit about here. Google Maps need something in between. Very fast query run times, but not too much pre-processing. In that case, we want to be in this region. Now the idea is very similar to the manual hierarchies, but with two major changes. First, we'll automatically pre-process our graph to order nodes by how important they are. An exit connecting two highways, quite important. The intersection to a tiny cul-de-sac, well, not as much. No more categorizing roads by hand. We'll still use a bi-directional Dijkstras that only searches up the hierarchy from both directions. But if we build it clever enough, we can get rid of candidate areas and still guarantee the shortest path. So phase one is building this hierarchy. How do we pick the most important nodes though? Let's take a look at these two small towns. We're gonna throw away everything in one town except for this house for right now. If someone in that house wants to get anywhere, they have to pass through this intersection at the end of the bridge. All of the shortest paths to and from their house include that node, so it's pretty important to them. But that node isn't as important to the rest of the town. After all, the only reason to cross the bridge is to get to that one house. But with more and more houses on the other side, the bridge node gets more and more important. Any trip from one side to the other needs to use it. It's the most important when the two towns are roughly the same size, when it splits the graph in half. That's when the maximum number of shortest paths include that bridge. There are other sets of nodes that split the graph in half, like this main street, but there's far more intersections along that street than just the one bridge node. So a small set of nodes, or a small cut of nodes is important when it roughly splits the graph in half. Let's bring back the North American network. These blue nodes roughly split the graph in half. If you wanted to get from anywhere on the east coast, to anywhere on the west coast, you'd have to go through one of these 102 nodes. It's just like the bridge connecting the two towns. - That's really small. But if you look at where does this thing go, that's essentially the Mississippi River. And bridges over the Mississippi are expensive to build. - [Henry] These are the biggest bottlenecks. Out of all the 64 million plus nodes, these 102 get the highest rank, so the biggest numbers. Now let's find the next most important nodes. - Well, what can we do on both sides? Well, we can do the same thing. We again, search for small cuts. - The next cut on the east coast follows the Ohio River. And then it loosely traces up the Appalachian Mountains before heading back south along the Potomac River. On the west coast, the cut mostly follows the Rockies, and then the Colorado River. These two sets get the next highest ranks for their sides, and we can split these in half again and rank those nodes. And again, and on and on, until we've ranked all 64 million nodes. This is called nested dissection. This pre-processing step has two goals. We want an algorithm that doesn't search a lot of nodes, so the runtime is really fast, but also will always return the shortest path. The hierarchy we build now needs to support both of these goals. Let's test this out on a smaller graph. We ignore the edge weights for the first step. These two nodes split the graph roughly in half, so we give them the highest rank. Looking at one side, this node splits the subset in half, so it has the highest rank of the three. We can rank the remaining two nodes in any order. Now, on the other side, these two nodes split this subset in half. So they have the next highest rank. And then again, we rank the final two nodes with the remaining numbers. But what about this ranking? We split the graph the same, but the rankings within the same cut are all swapped around. Or if we split the graph differently, we could get this ranking. It even splits the graph perfectly in half. Isn't that better? And these rankings are all valid. We'll see in a bit why we could use any one of them. But for now, we'll just take this one. The best way to visualize this ranking is to pull down, or contract the nodes in order. One is the furthest down, two is next up, and so on. Now we can bring in a search algorithm. To speed up the runtime, we limit the search space by only moving up the hierarchy. And that's why it's so important to use a bi-directional Dijkstra. If the algorithm couldn't search in both directions, it would get stuck on a lot of pairs. So instead, the two searches meet in the middle at the top of the hierarchy. Now let's add the edge weights back. Okay, what's the shortest path from node four to node eight? Well, starting from node four, we search up, and reach node nine with a cost of 10, From node eight, we also search up, and get to node nine with a cost of five. So that must be the shortest path with a cost of 15. Done. Except, there's a problem. That's not the shortest path. Using this path only costs three. This search doesn't check paths that go through lower rank nodes. But we can fix it with a little more pre-processing. Start from the lowest rank. Does this node connect to at least two higher nodes? On this graph, node one connects to three and eight. This is called a lower triangle. And while there's a chance this path is actually the shortest between node three and node eight, our search will never consider it. So we'll add a shortcut between nodes three and node eight. That's just the cost of this lower triangle. Now, this edge doesn't correspond to anything real on the road network. It's just a way to keep track of shortest paths. We move on to the next lowest node and run the same test. But there could be multiple lower triangles, or there might already be an edge between the two higher nodes. In those cases, the shortcut represents whichever path has the minimum cost. A path that only uses lower ranked nodes is like a path that only uses smaller local roads. We don't want the algorithm to return a highway route if a local road is shorter. Building shortcuts from the bottom up makes sure we don't miss these paths, while also ignoring them if they aren't needed. This way, we both limit the search space, and never miss the shortest path. But there's another benefit to shortcuts. We don't need to worry too much about getting a perfect ranking. We can roughly cut the graph in half, and rank nodes in the same cut in any order. If the ranking overlooks the shortest path, well, we'll add it back later with a shortcut. That's why we were able to get three valid rankings using the same instructions earlier. But while any ranking will still find the shortest path, a bad one will go really slow, or it'll quickly blow the graph up. For example, with a series of nodes in a straight line, we don't need to add any shortcuts if we just contract the nodes from left to right. But when we search between the two ends, we have to explore every node and edge. And that's no improvement over Dijkstra's. Here's another possible ranking with the necessary shortcuts. End-to-end, we only need to search two edges, but we need a lot of shortcuts. And in some sections, we still have to check all the edges. Manageable, but not very efficient. - And so you want to have this trade-off between the number of shortcuts that you introduce and how much faster the query gets. - [Henry] In practice, the nested dissection method gets a decent node ordering and doesn't take too long. This entire algorithm is called a customizable contraction hierarchy. And Ben tested it out on the North American network to give us a sense of the runtimes. In phase one, we order the nodes and add shortcuts. This doesn't change unless the graph itself changes, like a new bridge, or a road closure. It's by far the most expensive part of the process. But once it's done, it hardly ever changes. - I think it would ended up at an hour 40 or something like that. You can spend more time, you get better results. So it's an argument to make that that's a good idea. - [Derek] In phase two, we calculate the weights for all the shortcuts. Whenever there's traffic updates, we'll have to rerun this step. So it needs to be relatively quick, - I think seven, eight seconds without parallelization and take it down to about a second, and then that's where the memory starts setting. - [Derek] Finally, phase three is the actual search. Remember, for a long path on the North American network, a well-tuned Dijkstra needs to explore most of the 64 million nodes and takes around seven seconds per path. For this contraction hierarchy, the query runtime was around 200 microseconds. - Depending on how you configure stuff, you can get down to like 100 microseconds or so. But for practical purposes, below a millisecond is all you need. - [Henry] That's 35,000 times faster than Dijkstra in just this experiment. And on average, it only needed to explore 1,450 nodes, cutting the search space by a factor of around 44,000. A Dijkstra search from San Francisco to Montreal would need to check all the nodes roughly in this area. On a customizable contraction hierarchy, it only needs to search these 1,236 nodes. Most of the nodes cluster around the source and target. Once it gets far enough up the hierarchy, it only needs to check the most important cuts, like the Mississippi. While we don't know exactly what Google maps, Apple maps, or any of these big routing applications actually use under the hood, they all could be built with contraction hierarchies. And customizable contraction hierarchies is just one variation, and there are even more routing algorithms that we haven't covered. But if you think about it, its core search is still Dijkstras, a 70-year-old algorithm. Four decades after Dijkstra publishes article, Danish computer scientist Mikkel Thorup said that all theoretical developments in single source shortest paths have been based on Dijkstra's algorithm. And since 2000, many of the newest shortest path algorithms still use bits and pieces of Dijkstras. From variations on contraction hierarchies, all the way to a recent theoretical breakthrough for an even faster shortest path algorithm. All of this, based on a 20-minute invention. Part of Dijkstra's enduring success comes down to just how simple it is. As he said, "Simplicity is prerequisite for reliability." Dijkstra thought deeply about problems before ever touching a pen. Because at the end of the day, it would be a programmer who is accountable for their work, not the machine. (speaking foreign language) - So with those two ideas, he hoped to have one lasting legacy. He said, "If 10 years from now, when you're doing something quick and dirty, you suddenly visualize that I'm here, looking over your shoulders, and say to yourself, 'Dijkstra would not have liked this.' Well, that'd be enough immortality for me." So even if you can only control your little corner of the world, you can still make it as elegant and thoughtful as possible. For you, and all the people who come after. (bright music) Before we go, I just wanted to say thanks again to Boot.dev for sponsoring this video. If you guys wanna check 'em out, you can check the link in our description, or scan this QR code right here, and make sure to use that checkout code "Veritasium" for 25% off. So thanks to Boot.dev, and I also wanted to thank you for watching. - [Speaker] You have arrived at your destination.