Skip to content

David Bombal Networking Hardware AI Homelab DevOps

My Dream "home lab"

David Bombal tours Cisco's 20 million dollar AI networking lab to answer the question most AI coverage skips: once you have the GPUs, how do you wire hundreds of thousands of them into one computer. Three Cisco engineers walk through scale up, scale out, and scale across, the new G300 100 terabit and P200 routing switches, why optics are often the most expensive line on the bill, and how a 128 GPU scalable unit plus simulated flows proves designs at much larger scale. The recurring lesson is economic: a 10,000 GPU cluster rents for 175 million dollars a year, so a 5% efficiency loss or one bad cable costs millions. It also settles the InfiniBand versus Ethernet debate (Ethernet wins at scale) and shows security moving off the edge firewall and into the fabric on DPUs and switches.

Published May 17, 2026 28:27 video 28 min read Added Jun 16, 2026 Open on YouTube →

At a glance

David Bombal calls this his dream home lab, and then immediately admits the joke: it is a tour of Cisco's AI networking lab, a room of GPUs, switches, fiber, optics, and storage that costs about 20 million US dollars and would need its own power plant to run. The framing is a hook, but the substance is real. The video answers one question that most AI coverage skips entirely: once you have the fastest GPUs, how do you actually wire hundreds of thousands of them together so they behave as a single computer.

Bombal walks the lab with three Cisco engineers. Rakesh runs the physical lab and explains the economics of failure, the H200 compute tray, the difference between scaling up and scaling out, and how a 128 GPU "scalable unit" plus a 512 node CPU cluster is used to prove designs that will run at tens of thousands of GPUs. Will gives the product and industry view: the new G300 100 terabit switch, the P200 routing family for connecting data centers, the Nvidia silicon partnership, a real neo cloud customer in Australia, the InfiniBand versus Ethernet verdict, and where speeds go next. Richard opens up the switch itself, the silicon, the operating system, the optics, and the management layer, and lands the point that visibility and automation matter as much as raw speed.

The thread tying it together is a number Bombal keeps circling back to. At AI scale a tiny problem, a bad cable, a hot optic, a 5% efficiency loss, is not a nuisance, it is millions of dollars. A 10,000 GPU cluster rents for 175 million dollars a year, so 5% wasted is 8 million dollars a year set on fire. That is why the lab exists, and why this page rebuilds the whole tour in order so you get the full technical story without watching first.

Figure 1. The whole argument on one canvas. Inside a server, NVLink scales up across eight GPUs. Across racks, the G-series Ethernet fabric scales out. Across buildings, the P200 routing family scales across. The speeds are extreme because the GPU count is extreme, and the cost strip is why a bad cable matters: at 175 million dollars a year to rent the cluster, a 5% efficiency loss burns 8 million dollars.

The cold open: a 20 million dollar room and an 8 GPU server

The video opens mid sentence, on the line that sets the scale. Each accelerator in the server, Bombal is told, is roughly 1,000 times more powerful than an RTX 5090, and there are eight of them in a single server. That is the dream lab: GPUs, switches, fiber, storage. The only problem, he says, is that it costs about 20 million US dollars, and he would need a power plant just to run it.

The premise of the whole tour follows immediately. You cannot train a large-scale AI model with a single GPU. You need thousands or hundreds of thousands of them, and they have to work together, connected by some kind of high-speed, low-latency network. So Bombal went inside Cisco's AI lab to see what powers modern AI and how to optimize the clusters, asking Rakesh and his team to show him around and explain the components they use.

The economics of failure: why a bad cable costs millions

Bombal asks Rakesh the obvious first question, how much does the lab cost. The answer: close to 20 million dollars already spent, and not cheap equipment, to be honest.

But Bombal's point is sharper. The most surprising number is not the cost of the lab, it is the cost of a small failure or even a delay. The math Rakesh lays out is the spine of the video:

A GPU costs about 2 dollars per hour to rent.
A 10,000 GPU cluster therefore costs about 175 million dollars per year to rent.
A 5% efficiency loss on that is roughly 8 million dollars per year wasted.

That is the thing that can make or break these businesses, and it is why the lab's job matters. A bad cable, a hot optic, a congested link, packet loss, even a tiny delay at this scale, any small problem can wreck performance.

Rakesh gives the war story. When they build clusters, sometimes a cable goes bad, or optics go bad, and people do not necessarily notice because the network has grown large. One time someone plugged in the wrong cable size, and it took the team almost two days to find it, because performance was dropping by almost 75%. That one example is why the lab exists: modern AI is not about buying the fastest GPUs, it is about making sure thousands of GPUs can communicate. That depends on the servers, the NICs, the switches, the optics, the storage, and the software, all working together.

Scale up, scale out, scale across

Before talking about 100 terabit switches, Bombal frames the basic problem: how do you connect GPUs at this scale. There are three answers, and the video uses all three terms precisely.

Scale up is inside a single server. GPUs talk to each other over extremely fast internal connections.
Scale out is across racks in one data center. You need multiple racks, multiple GPUs per rack, all working in unison. The network fabric that connects them starts to behave like part of the computer itself.
Scale across is the newest idea: connecting multiple data centers together, so a single job can run across buildings.

Rakesh (Bombal also calls him Ramesh once) shows the H200 compute tray. The clarification matters: what is on the bench is not the whole server, it is just the compute tray with eight GPUs. The other components are not visible here, including the NICs that let the GPUs reach out and talk to other servers in the cluster, which is the scale-out fabric. Inside the server, the GPUs are connected by NVLink, the scale-up domain, an extremely fast set of connections that let the GPUs exchange gradients while training.

That is the key idea. Inside the server, NVLink connects the GPUs. Outside the server, the network fabric connects many GPU servers. Once you scale to thousands of GPUs, the network becomes one of the most critical parts of the whole system.

The speeds that make it work

The numbers are deliberately extreme. The lab has 100 terabit per second switches, and each port is 1.6 terabits per second. The GPUs need that bandwidth to communicate. The big concept: hundreds, thousands, or hundreds of thousands of GPUs in one training cluster, all behaving as a single system. That is why 100 Tbps switches and 1.6 Tbps ports are not overkill, because these switches are not moving ordinary internet traffic, they are connecting GPUs to GPUs in a way that keeps the AI jobs running without packet loss and without delay. Hence Cisco building 100 Tbps switches, 1.6 Tbps interfaces, new Ethernet fabrics, and technology designed specifically for AI clusters.

The product news: G300, P200, and the Nvidia silicon partnership

Bombal turns to Will for what has changed since they last met at Cisco Live in Amsterdam in February, and what came out of Nvidia GTC. The announcements:

The G300, the new 100 terabit switch, announced at Cisco Live Amsterdam. It is the leaf-spine core of the data center family: 64 ports each running at 1.6 terabits per second.
The Spectrum 4 based switch from Cisco, which puts Nvidia silicon into Cisco switches, running parallel to switches that use Cisco's own silicon.
The P200 family, aimed at the routing functions and at scale across.

Will draws the same two-problem picture Rakesh did. Inside the data center you need scale out, and that is where the G-series switches (the G300) come in. But scale up is not enough and scale out is not enough, so you need scale across, combining multiple data centers, and that is the P200's job. The P200 has routing capabilities, deep buffers, larger tables, security, and the ability to connect AI factories together across very high-speed, long-distance links.

Routing itself is not new, Will notes, with the usual affection for BGP, the Border Gateway Protocol. What is new about scale across is wanting to run a single job across data centers, where the GPUs on the east-west back-end network have to talk between buildings. He still does not see many customers doing it yet, but a lot are starting to investigate. The P200 is positioned for exactly that: strong routing (deep buffers, large IP prefix tables, ACLs) plus the ability to run a job across.

There is one more chip in the pipeline: Spectrum 6, the 100 terabit switch from Nvidia, in development and orderable soon, expected to ship around late summer. Will marvels at how fast the cycles condense, from first touching a chip to shipping it, driven by customers who need the performance.

The neo cloud customer: Sharon AI in Australia

Will makes the point that this is not theory, it matters for companies building AI infrastructure now, especially neo clouds and GPU as a service providers. They need huge GPU clusters but often run small teams, so the infrastructure has to be repeatable, manageable, and fast to deploy.

The real-world example is Sharon AI, a newly funded neo cloud in Australia that ramped up funding and strategy very fast and joined Will for a GTC talk on architectures and strategies. The specifics of what they used:

Hyperfabric, Cisco's technology to simplify operations so a small team can manage large clusters.
Cisco Silicon One on the front end.
A Spectrum switch on the back end, because they want to be a full Nvidia Cloud Partner Program member. Cisco supports either Cisco Silicon One or Nexus to meet those goals.
The full Cisco reference architecture, including VAST Data for storage, which let them move from concept to deployment very fast, plotting week by week as equipment arrived.

The takeaway: it is not just about fast switches. The neo cloud moves fast precisely because the whole validated stack, silicon plus OS plus optics plus storage plus management, is ready to deploy.

Inside the switch: the switching complex

Bombal then talks to Richard, who opens up one of the switches Cisco positions for AI networking. The phrase he uses is important, switching complex, because it is not just a fast box with a lot of parts. For AI networking Cisco thinks about the whole system: the silicon, the operating system, the optics, the automation, and the management layer.

The silicon is the foundational piece, the heart of the switch, where packet forwarding and programmable pipelines live. This is not the CPU in your laptop, it is a switching ASIC, custom silicon built to move massive amounts of traffic. It matters in an AI cluster because the traffic is not normal office traffic. The GPUs constantly exchange huge amounts of data across the fabric and cannot tolerate packet loss, because if packets are lost the entire training run has to stop and roll back to a saved snapshot.
The operating system has to take advantage of what is in the custom silicon, with congestion mechanisms that can be reactive or proactive.

Richard makes the operational point with a number. Quality of service (QoS) for a typical box might be 50 or 60 lines of CLI config on a single switch. Now multiply that across 16 scalable units plus more leaf switches, and you need it fabric wide. The answer is unified management via the Nexus Dashboard: single, consistent, API first, with workflows that guide you.

That is an operational problem, not a speed problem. One switch is manageable, but an AI fabric has many switches, links, optics, and thousands of GPUs. You cannot have engineers hand-typing config across the whole fabric and hoping every box is consistent. You need automation, and when something breaks you need to find it fast.

Finding the needle in the haystack

When something breaks, Richard wants to see jobs, scheduling, and latency, and find the needle in the haystack, not a fire hose of information but something pinpointed. When there is an anomaly or advisory, he wants to correlate it to something understandable. Unified management exposes what is in the silicon, micro bursts, buffering, congestion, programmable pipeline capabilities, because AI throws unique workloads at the network, from the converged front end to storage to the back end, with customers wanting specific load balancing strategies.

This connects straight back to Rakesh's cable story. The problem at scale is not just that something can go wrong, it is finding exactly where. Congestion, buffering, a bad optic, a misconfiguration, a link problem, a workload pattern. There are so many possibilities, and in an AI data center every bit of downtime costs money. That is why management and visibility matter as much as raw switch speed.

The optics: the most expensive line on the bill

Richard raises something Bombal says most people underestimate: the optics. When you connect thousands of GPUs, the optics can be the single most expensive part of the entire AI network.

Bombal had not mentioned them, and Richard's reason is simple: in heavy quantities, optics are one of the most expensive line items on the bill of materials, because there are so many of them. Customers take them for granted, but you want quality, integrity, investment protection, and sometimes forward and backward compatibility. Cisco offers two form factors, OSFP and QSFP. The whole system has to be very low power, while still giving customers a high radix so they can flatten the network and reduce the number of tiers. Because in AI networking, job completion time, latency, and jitter can all destabilize jobs. The nightmare he names: a job scheduled for 14 days that fails on day 13.

The specific hardware shown:

800 gig optics, the OSFP form factor with an integrated heat sink.
The same optic as an LPO (linear pluggable optic), which removes the DSP, the digital signal processing, from the optic to make it more efficient for unique use cases such as switch-to-GPU connections. These are popular because they save a lot of power.

A 14 day training job failing on day 13 means days of wasted time, huge GPU costs, and engineers trying to work out whether the culprit was the workload, the network, the optics, the storage, or the configuration.

Component	What it is in the lab	Why it matters at AI scale
GPU server	H200 compute tray, 8 GPUs, each about 1,000x an RTX 5090	The compute, but only one piece of the system
Scale up link	NVLink inside the server	Lets 8 GPUs exchange gradients at full speed
NIC	North-south super NICs, BlueField 3 and 4	Beefy Arm cores, programmable, now run security too
Scale out switch	G300, 100 Tbps, 64 ports at 1.6 Tbps	The leaf-spine core, GPU server to GPU server
Scale across switch	P200 routing family	Deep buffers, large tables, runs a job across data centers
Silicon	Custom switching ASIC, plus Nvidia Spectrum option	Packet forwarding, programmable pipelines, congestion control
Optics	800 gig OSFP with heat sink, and LPO variant	Often the most expensive line on the bill
Storage	VAST Data, validated with partners	Was the hidden bottleneck in a DLRM test
Management	Nexus Dashboard, API first, unified	QoS and policy fabric wide, not box by box

Figure 2. Every piece of gear in the tour and the reason it is there. The recurring lesson is that the GPUs (top row) are only one part. The network, the optics, the storage, and the management layer each have a number attached, and each can quietly cost millions if it is wrong.

Testing the whole system: why the lab exists

This brings the story back to why Cisco built the lab. At this scale you cannot just trust a spec sheet. You have to test the whole system together, the GPUs, NICs, optics, switches, storage, software, automation, and the actual AI workloads customers will run.

Rakesh describes the work. The team brings in all the equipment and all the components, and works closely with the storage partner, because the AI infrastructure is a huge, complex beast. The job is to make sure that when a customer deploys a Cisco validated design, it works and performs well. So they run a lot of performance benchmarking, testing the networking layer and the RDMA layer. As clusters scale to tens of thousands of GPUs, the network is critical: without the latest load balancing techniques, you can have plenty of network capacity yet still suffer suboptimal performance from inefficiency. They test storage too, with the storage partner, and validate everything together, giving customers real visibility and benchmark numbers, not just talk. They work with partners AMD and Nvidia to meet the performance requirements.

When the bottleneck is storage

A standout lesson: the bottleneck is not always where you expect. Not always the GPU, not always the switch. In one test, training the DLRM model (deep learning recommendation model), performance was very poor. Profiling tools pointed at storage as the bottleneck. The team spent a couple of weeks fine-tuning with their partner, and ended up performing even better than what the competition had shown on DLRM. That is the point of the lab work: customers inherit the result.

Proving it scales: the 128 GPU scalable unit

Bombal asks the obvious challenge. If the lab only has a 128 GPU cluster, how do they prove designs scale to tens of thousands of GPUs.

Rakesh's answer is a method. The lab has 16 H100 nodes, each with 8 GPUs, so 128 GPUs, used as a scalable unit. Working closely with Nvidia, who provide a performance benchmarking document specifying the test tools, the KPI matrix to collect, and the target numbers, the team must not only follow the methodology but beat those numbers. Once a scalable unit passes, you can horizontally scale it into very large clusters of hundreds of thousands of GPUs. That horizontal scaling is the Spectrum-X solution, built with Nvidia.

There is a second trick. Even without a giant GPU cluster, the lab has a large CPU cluster of 512 nodes. Using RDMA and a home-grown tool, the team simulates tens of thousands of flows to understand how flows behave, how they load balance, and where congestion appears. So a small 128 GPU cluster plus simulation lets them mathematically and scientifically prove a design works at tens of thousands of GPUs.

Spectrum-X, packet spray, and Jain's fairness index

Rakesh explains what Spectrum-X actually does. When traffic runs from a GPU, the leaf switch must equally distribute traffic to all the spine switches. If it does not, some links are over utilized and some under utilized, which causes artificial congestion: you have enough bandwidth, you are just not using it efficiently. The fix Cisco works on with Nvidia is packet spray and Spectrum-X, making sure all traffic is equally distributed.

They measure it with Jain's fairness index. Running a workload, they compute how traffic is distributed across the links. A Jain's fairness index of 1 means 100% perfect distribution. So Cisco is not just shipping a fast switch, it is testing a scalable unit, simulating massive traffic patterns, measuring congestion and load balancing, and using that data to prove behavior at much larger scale.

SPINE spine 1 spine 2 spine 3 spine 4

leaf switch 128 GPU scalable unit 16 H100 nodes · 8 GPUs each Jain's fairness index = 1.0 perfect spread

Figure 3. How Cisco proves even load balancing. The leaf switch sprays GPU traffic equally across all four spine switches, so no link is over or under utilized (which would cause artificial congestion). A Jain's fairness index of 1.0 means perfect distribution. The 128 GPU scalable unit is the building block, simulated up to tens of thousands of flows on a 512 node CPU cluster.

The big debate: InfiniBand or Ethernet

This leads to one of the biggest questions in AI networking. If you are using RDMA and building massive GPU clusters, do you use InfiniBand or Ethernet.

Rakesh gives the lab answer first: his lab is only Ethernet. Bombal wanted the industry answer, so he asked Will. Will's verdict is clear:

There is still a lot of InfiniBand out there, but in the last 12 months, even Nvidia is pushing and supporting Ethernet much more.
Customers want choice, so multi-vendor is a key asset.
InfiniBand hits a scaling limit around 30,000 GPUs where you top out. As customers think about hundreds of thousands of GPUs, that is an Ethernet-based network, not InfiniBand.
Most customers, even those with a lot of InfiniBand, arrive saying they are heading towards Ethernet.

The one gap: InfiniBand has some multi-tenancy features that are not native to Ethernet. Beyond operational simplicity, Cisco is working to pull some of those feature functions out of InfiniBand and add them to Ethernet, with more to come in the next few months.

Security moves into the fabric

Will closes with a structural shift. If these clusters are worth hundreds of millions of dollars, and most traffic is east-west between servers, security cannot just sit at the edge of the data center. It has to move deeper into the cluster.

The vehicle is the hardware already in the servers. The DPUs, the super NICs, split into two roles. The east-west NICs connect GPUs. The north-south NICs connect servers to the internet and to other servers, and those, the BlueField 3 and BlueField 4, are very sophisticated, with a large number of beefy Arm cores you can program.

Cisco's GTC announcement: security services running on those north-south NICs, so you can get firewall type function without buying more hardware, just by enabling it. It interconnects with Isovalent (the company Cisco acquired, makers of Cilium), which provides a distributed policy enforcement point. The goal: define policy once, then deploy enforcement across many points, including inside the AI cluster, on hardware you already own. You can enforce policy on the DPU in the server, and on the switch too.

Bombal sums it up: the traditional firewall is no longer all you have, you are putting the firewall everywhere. Will agrees, with a caveat: there is still a place for a firewall in your DMZ for internet traffic, but the phase of putting large beefy firewalls all over inside the AI cluster or for server-to-server traffic is coming to an end. Cisco sees that as a critical architecture transition.

The firewall does not disappear, but in an AI data center security cannot be one big box at the edge. Traffic moves inside the cluster between servers, GPUs, NICs, DPUs, and switches, so security has to move closer to the workload. That is the whole story: the future AI data center is not just faster GPUs, it is faster networking, better optics, better visibility, better automation, validated designs, and security built into the fabric.

Figure 4. The architecture shift. A firewall stays in the DMZ for internet traffic, but inside the cluster the east-west traffic between GPU servers is policed where it lives. Enforcement runs on the BlueField DPUs in each server and on the switches themselves, using policy defined once via Isovalent and pushed everywhere, on hardware the customer already owns.

Where speeds go next

If 100 Tbps switches and 1.6 Tbps interfaces are not insane enough, Will says it is not slowing down. Asked whether we will reach 3.2 terabits per second, his answer, from co-designing multi-years out with a hyperscaler last week:

400 gig SerDes (the serializer/deserializer analog technology that packs data into a single pair of wires).
3.2 terabit interfaces.
A coming limit on pluggable optics, beyond which they may no longer be supportable.
Liquid cooling becoming required, not just desirable.

Will does not see an end yet, and is still amazed at how far the analog SerDes technology has come.

Bombal's closing takeaway

Bombal's biggest takeaway: AI is not just about GPUs. The GPUs are the part everyone talks about, but the network is what makes it work together. The switches, the optics, the storage, the software, the automation, the security, and the validation all matter, because when one small problem can cost millions, you do not want to guess, you want the whole system tested before you deploy. That is what Cisco does in the lab, testing AI infrastructure as a complete system so customers can build clusters faster, run them reliably, and avoid wasting expensive GPU capacity.

And yes, he would still love a home lab like this. He just cannot afford it, and he would need a power plant.

Key takeaways

The GPUs are the smallest part of the story. A dream AI lab is GPUs plus switches, optics, fiber, storage, and software, and the network is what turns thousands of GPUs into one computer.
Small failures cost millions. Renting 10,000 GPUs runs about 175 million dollars a year, so a 5% efficiency loss burns roughly 8 million. A single wrong cable once dropped performance 75% and took two days to find.
Three scaling domains. Scale up (NVLink inside one server), scale out (G300 Ethernet fabric across racks), scale across (P200 routing across data centers). Running one job across buildings is the new frontier.
The speeds are extreme on purpose. 100 Tbps switches, 1.6 Tbps ports, 64 ports per G300, because GPUs cannot tolerate packet loss without rolling a whole training run back to a snapshot.
Optics are often the most expensive line item. There are so many of them. The lab uses 800 gig OSFP optics with integrated heat sinks, and LPO variants that drop the DSP to save power.
Visibility beats raw speed. A switching complex is silicon plus OS plus optics plus automation plus management. QoS that is 60 lines per box becomes unmanageable fabric wide without unified management like the Nexus Dashboard.
Prove it small, deploy it big. A 128 GPU scalable unit plus a 512 node CPU cluster simulating tens of thousands of flows, scored by Jain's fairness index, mathematically proves a design works at hundreds of thousands of GPUs.
Ethernet is winning. InfiniBand tops out around 30,000 GPUs, customers want multi-vendor choice, and even Nvidia is pushing Ethernet for hundreds-of-thousands-of-GPU clusters.
Security moves into the fabric. Firewalls stay in the DMZ, but east-west traffic gets policed on BlueField DPUs and switches via Isovalent policy, not one big edge box.
The bottleneck is rarely where you expect. In one DLRM test it was storage, not the GPU or the switch, which is exactly why you test the whole system before deploying.

Chapters

00:00 The power of AI data centers 00:51 Cost and challenges of AI data centers 02:35 How to connect GPUs in an AI cluster 05:23 The future of data centers 08:45 NeoCloud infrastructure in Australia 10:28 The right components matter 16:32 Testing to avoid failures 21:43 Ethernet vs InfiniBand 23:27 Cisco security services 26:14 Future of speeds in the data center 27:23 Conclusion

Notable quotes

"Each of these is 1,000 times more powerful than say a 5090 RTX, right? And there's eight in here." (00:00)
"This is the type of home lab that I'd love to have. The only problem is that it costs about 20 million US dollars, and I'd probably need a power plant just to run it." (00:20)
"The GPU cost is like 2 dollars per hour per GPU. And if you rent a 10,000 GPU cluster, that costs 175 million dollars to rent per year. And imagine that if you have 5% efficiency loss, that's like 8 million dollars per year." (01:10)
"They plug that wrong cable size, and it took us almost 2 days to figure it out, because performance was dropping almost 75%. Can you imagine that?" (01:55)
"We have 100 terabit per second switches, and each port is 1.6 terabits per second. These GPUs need huge bandwidth to communicate together." (04:35)
"Believe it or not, one of the most expensive line items on the bill of materials in a heavy quantity are the optics." (13:35)
"Imagine you have a job that's scheduled for 14 days. It fails in day 13." (14:25)
"If your Jain's fairness index is one, that means you have 100% perfect distribution." (20:50)
"So my lab is only Ethernet." (21:50)
"InfiniBand does hit, what is it, 30,000 GPU, there's a scaling limit where you do top out. As customers start to think about hundreds of thousands of GPUs, that is an Ethernet-based network." (22:20)
"Your traditional firewall no longer is all you have. Now you're putting the firewall everywhere basically." (25:30)
"I don't see an end yet. I see this is coming fast." (27:00)
"AI is not just about GPUs. The GPUs are the part that a lot of people talk about, but the network is what makes it work together." (27:25)

Resources mentioned

David Bombal (YouTube channel), the host, networking and security education.
Cisco, builder of the AI networking lab and all the switches in the tour.
Cisco Live, where the G300 was announced in Amsterdam.
Nvidia GTC, the conference behind the Spectrum and security announcements.
Nvidia, GPU and silicon partner (H100, H200, NVLink, Spectrum, BlueField).
Nvidia H200 and H100, the GPUs used in the lab's compute trays and nodes.
Nvidia NVLink, the scale-up interconnect inside a server.
Nvidia Spectrum-X, the load balancing and packet spray solution Cisco builds with Nvidia.
Nvidia BlueField DPUs, the north-south super NICs that now run security services.
Nvidia Cloud Partner Program, the program Sharon AI is pursuing.
RTX 5090, the consumer card used as a power reference point.
Cisco Silicon One, Cisco's own switching silicon.
Cisco Nexus switches, the data center switching line.
Cisco Nexus Dashboard, the unified, API first management layer.
Cisco Hyperfabric, simplifies large cluster operations for small teams.
Sharon AI, the Australian neo cloud customer example.
VAST Data, the storage in the reference architecture.
AMD, a compute and performance partner.
Isovalent, the acquisition behind distributed policy enforcement.
Cilium, the eBPF networking and security project from Isovalent.
RDMA (Remote Direct Memory Access), the transport tested in the lab and used to simulate flows.
InfiniBand and Ethernet, the two fabric choices debated.
BGP (Border Gateway Protocol), the routing protocol behind scale across.
Quality of Service (QoS), the config example that motivates fabric wide automation.
Jain's fairness index, the metric for even traffic distribution.
OSFP and QSFP, the two optics form factors offered.
Linear pluggable optics (LPO), the DSP-free, lower power optic variant.
SerDes, the serializer/deserializer technology driving interface speeds.
DLRM, the recommendation model that exposed a storage bottleneck.

Where it stands

Worth stating plainly, since the page rebuilds the tour in its own frame: this is a vendor walkthrough, three Cisco engineers showing Cisco products in a Cisco lab, and the framing as Bombal's "dream home lab" is affectionate marketing. The numbers, the 20 million dollar build, the 175 million dollar annual rental, the 5% to 8 million dollar loss, the 75% performance drop from a bad cable, the beat-the-competition DLRM result, are figures the team supplied, illustrative of real dynamics but not independent benchmarks you could reproduce off a shelf. That does not make the engineering less real. The underlying ideas are mainstream and vendor neutral: GPUs are network bound at scale, packet loss forces costly training rollbacks, optics dominate the bill of materials, load balancing fairness can be measured, InfiniBand has a practical ceiling while Ethernet scales with multi-vendor choice, and security is shifting from an edge box toward distributed enforcement on DPUs and switches. The honest read: the architecture and the principles are sound and broadly applicable across the industry, while the specific products, integration polish, and figures are Cisco's and shown to their best advantage. The durable lesson outlives the brand. In modern AI the network is not plumbing around the computer, it is part of the computer, and the cost of getting it wrong is measured in millions.

Full transcript

And so, each of these is 1,000 times more powerful than say a 5090 RTX, right? And there's eight in here. Yes. Okay, you got to show me this. Now, this is the type of home lab that I'd love to have. We've got GPUs, switches, fiber, storage, and a whole bunch more. The only problem is that it costs about 20 million US dollars, and I'd probably need a power plant just to run it. I went inside Cisco's AI lab to see what powers modern AI and how to optimize AI clusters, because you can't train large-scale AI models with just a single GPU. You need thousands or hundreds of thousands of GPUs, and they need to work together connected via some kind of high-speed, low-latency network. I asked Rakesh and his team to show me around this amazing lab and also explain some of the components that they are using to optimize for AI clusters. Rakesh, you've just shown me your amazing lab. How much does that cost? It's close to 20 million dollars we already spent on this lab. So, this is not really cheap equipment, to be honest with you. The most surprising number wasn't the cost of the lab, it's the cost of a small failure or even a delay. Even if just some packets go missing, it can cost a lot of money. The GPU cost is like 2 dollars per hour per GPU. And if you rent a 10,000 GPU cluster, that costs 175 million dollars to rent per year. And imagine that if you have 5% efficiency loss, that's like 8 million dollars per year. So, that's a very big thing that can make or break some of these businesses. So that's the reason the job we do is very important. A bad cable, a hot optic, a congested link, packet loss, even a tiny delay at this scale, even a small problem can hurt performance. Rakesh gave me an example of just that situation. I'll tell you, sometimes the kind of challenge we have seen is very interesting. So one time, when we build our clusters, sometimes some cable goes bad, optics goes bad, and sometimes people, they don't necessarily know, because over time the network becomes large. So let's say somebody finds whatever the cable size is there, and they plug that wrong cable size, and it took us almost 2 days to figure it out, because performance was dropping almost 75%. Can you imagine that? Now, that one example is why this lab exists, because modern AI is not just about buying the fastest GPUs. It's about making sure thousands of GPUs can communicate together. This relies on the servers, the NICs, the switches, the optics, the storage, and the software controlling the system to work together to optimize performance. So before we can even talk about 100 terabits per second switches and the future of AI clusters, we need to understand the basic problem. How do you connect GPUs together at this kind of scale? Inside a single server, GPUs can talk to each other using extremely fast internal connections. That's called scaling up. But even a single rack of GPUs is nowhere near enough in today's world. You need to scale out, which means you need multiple racks running multiple GPUs within each rack. They all need to work in unison to train AI models. But even that's not enough these days. Now, we have to connect multiple data centers together in what's called scale across. In scale out, we're connecting many GPUs together across a network fabric in a data center, and that network starts to behave like part of the computer itself. Ramesh showed me the H200 compute tray in the lab. He also showed me how that works inside an H200 server. So, this is only a single server, right? With eight GPUs in. But surely if you're training some big model, you're going to have a whole bunch of these. Yes. So, this one is actually not the whole server. This is just the compute tray. There are other components to this which you don't see here. There are the NICs which are actually helping the GPUs go out and talk to other servers that are part of the cluster, which is the scale-out fabric that you might have heard that term. So the one within the server is connected by something called NVLink, which is a scale-up domain. And these are extremely fast connections, and they help the GPUs talk and exchange gradients when they are training. Now, that's the key idea. Inside the server, NVLink helps the GPUs talk to each other, but outside the server, the network fabric connects many GPU servers together. And once you scale up to thousands of GPUs, the network becomes one of the most critical parts of an AI system. The speeds that they have these days is insane. We have 100 terabit per second switches, and each port is 1.6 terabits per second. These GPUs need huge bandwidth to communicate together. The big concept here is you could have hundreds or thousands or hundreds of thousands of GPUs in an AI training cluster all working as one system. And that's why we need 100 terabits per second switches and ports that are 1.6 terabits per second each. Once you understand that kind of requirement, the hardware in this lab doesn't look like overkill, because these switches are not just moving internet traffic. They're connecting GPUs to other GPUs in a way that keeps the AI jobs running. They need to be fast enough for these AI jobs to continue running without packet loss and without delay. That's why Cisco's building 100 terabits per second switches, 1.6 terabits per second interfaces, new Ethernet fabrics, and technology designed specifically for AI clusters. I asked Will what's changed since I last spoke to him at Cisco Live, and what's happening at NVIDIA GTC, and where he sees data centers going next. So, we saw each other in Amsterdam in February at Cisco Live. I wanted to ask you, because there's a whole bunch of stuff been happening. So perhaps you could just recap what we spoke about at Cisco Live, and then give us a rundown, because GTC's happened. There's a whole bunch of stuff that's happened since we spoke. Yeah, so the biggest announcement at Cisco Live Amsterdam was announcing the G300, which is the new 100 terabit switch. We were also just coming off having announced at the GTC DC things like the Spectrum 4 base switch from Cisco, where we use Nvidia silicon and put it in our switches, parallel to switches where their own silicon. We also announced the P200 family, which is relative for these more routing functions and scaling across. So there are really two problems here. Inside the data center, you need scale out. GPU servers talking to other GPU servers across the fabric, and that's where the G-series switches come in, including the G300, which are 100 terabits per second switches and 64 ports each running at 1.6 terabits per second. But that's once again not enough. Scale up is not enough. Scale out is not enough. So we have to do scale across, where we combine multiple data centers together. So we are connecting AI workloads across multiple data centers, and that's where the P200 switch is used. It has routing capabilities, deep buffers, larger tables, security, and the ability to connect these AI factories together across very high-speed, long-distance links. To some degree, routing has been there for years. We love our BGP, Border Gateway Protocol. But what is new in this concept of scale across is that you might want to run a job across data centers and have that really work, where the GPUs on the east-west back-end network have to talk across data centers. That's new. I still don't see too many customers doing it, but a lot are starting to investigate it. And so that's where we've positioned this P200, which not only is very good for routing (deep buffers, larger tables, they can hold large IP prefix tables and ACLs and all those kind of things), but also supports running a job across. So that's the P200 architecture. And then the G-series with G300 is really that leaf-spine core of the data center family for us. And that's the 64 times 1.6 terabits per second interfaces on the switch. And then we announced we've been in development on the Spectrum 6, which is the 100 terabit switch from Nvidia. And that is going to be orderable soon. That's something by, let's say, late summer that we expect to be shipping. So these cycles, it's just amazing how quickly they condense, how quickly we have to go from first touching a chip to being able to ship it. But the customers need these performances. And so we are very happy to announce the progress on that Nvidia partnership side with their silicon. So it's important to note that this is not just theory. These designs matter for companies building AI infrastructure right now, especially neo clouds and GPU as a service providers. They need huge GPU clusters, but often only have small teams. So the infrastructure has to be repeatable, manageable, and faster to deploy. Will gave me a real world example of a neo cloud in Australia. So Sharon AI is a newly funded neo cloud in Australia who has ramped up funding and strategy very quickly. And we've been very excited to partner with them from day one. So they joined me for a talk at GTC about architectures and strategies. And some of the things that we hit together were everything from operationally, they formed like many of these neo clouds very quickly. Small teams. So they are leveraging hyper fabric, which is a new technology we have to really simplify and enable small teams to manage large clusters. So they're using hyper fabric. So that's been interesting to see. They also are a customer where they're using Cisco Silicon One on the front end. But what they are looking for is that Spectrum switch on the back end, because they're looking to be a full NVIDIA Cloud Partner Program. And so that is again in this model of common operational, but being able to support them on either Cisco Silicon One or Nexus in order to support their end goals. We've also seen that putting together the full software stack, so they're using our full reference architecture that includes the VAST Data, all those pieces, and that has enabled them to move very quickly from concept, which feels like it was just the end of 2025. They were like, okay, the funding is almost here. Like, we're going to have to move fast. And then really plotting week by week when equipment arrives and how they go from arrival to deployment, and then getting this in the hands of their end customers. So I'm always amazed how quickly the Neo Cloud industry moves. It's very exciting to see. That example matters because it shows why this is not just about fast switches. Now I also spoke to Richard, who explained this in more detail. Speed alone is not enough. These switches also need the right silicon, the right operating system, the right optics, and the right management layer, so customers can operate these AI clusters in the real world. This is actually one of our switches that we position for AI networking. And what we're going to show you here today is talk about the internals of that switching complex and really what drives this communication within the AI cluster that we've already been talking about. Now that phrase is important, switching complex, because this is not just a fast box with a lot of parts. For AI networking, Cisco's thinking about the whole system: the silicon, the operating system, the optics, the automation, and the management layer. That matters because the switch has to move massive amounts of GPU to GPU traffic. But it also has to deal with congestion, latency, jitter, quality of service, visibility, and troubleshooting when something goes wrong. We start out here with the silicon. This is the foundational piece. That's the component here that really runs everything, and it's the heart. It's really the main vital organism that's inside the switch itself. And this is where you're going to see things like the packet forwarding, programmable pipelines. But before that works, I need to make sure I have the right type of software to implement the features for this AI networking kind of paradigm that we're seeing for solutions. So there's things like congestion mechanisms, whether it's reactive or proactive. What I need to make sure is, how can I have the operating system take advantage of what's in the custom silicon? So this isn't like the CPU in your laptop. This is a switching ASIC, custom silicon built to move massive amounts of traffic through the network. And in an AI cluster, that matters because the traffic is not like normal office traffic. These GPUs are constantly exchanging huge amounts of data across the fabric, and they cannot have packet loss, because if packets are lost, the entire training run has to stop and go back to a point where they've taken a snapshot, as an example, and continue. The operating system is very, very key, because when you start handling tons of these AI networking unique traffic loads, this is where you need to configure. And David, you probably remember quality of service, QoS. How many lines of code do you think that is? For a typical box, it could be anywhere from 50, 60. You mean in the CLI? On the CLI. So what's important about that is, imagine you're running your engineers, you have one box, it could be 60 lines of QoS config only. Now I multiply this. We saw earlier 16 scalable units. I could have more leaves. How can I get that to be fabric wide? I can do all that with unified management here with the Nexus dashboard. So what's single, it's consistent, it's an API first mentality. I can have different workflows to help guide me to those principles. That is an operational problem. One switch is manageable, but an AI fabric can have many switches, many links, many optics, and thousands of GPUs depending on the deployment. You don't want engineers manually typing configuration across the whole fabric and hoping every box is consistently configured. You need automation. And more importantly, when something breaks, you need to find the issue as quickly as possible. But what if something breaks? I need to see different jobs, scheduling, latency, and this is where I can find that needle in the haystack. That's why I need something that I don't want a fire hose of information. I want something that's pinpointed. So when there's an anomaly, an advisory, can I correlate that information to something that I can understand and relate to? This is where unified management comes into play. So I can take all this information, I can expose information that's in the silicon, whether it's micro burst, whether it's buffering, whether it's some kind of congestion, programmable pipeline capabilities. We have that ability with our unified management to really configure that. Because AI is giving you tons of unique workloads right now. And we're seeing that from the front end, converged front end to storage, the back end, customers want specific load balancing strategies and techniques. That connects us directly back to Rakesh's cable story. At this scale, the problem is not just that something can go wrong. The problem is finding exactly where it went wrong. Is it a congestion issue? Is it a buffering problem? A bad optic, a misconfiguration, a link problem, a workload pattern perhaps? There are so many things that can go wrong. And in an AI data center, every bit of downtime can cost huge amounts of money. That's why management and visibility matter as much as raw switch speed. Richard then brought up something that I think most people underestimate, the optics. Because when you're connecting thousands of GPUs together, the optics can become really expensive. They can actually be the most expensive part of the entire AI network. Now, you didn't mention the optics. Good question, because, believe it or not, one of the most expensive line items on the bill of materials in a heavy quantity are the optics. Because there's so many of them, right? There's so many of them, and that's something customers do take for granted, because you want quality in these optics. You need integrity. You want investment protection. Sometimes you need something that's forward and backward compatible. So we give you two different flavors, whether it's OSFP or QSFP. So you want to make sure this entire system is going to be very, very low power, but you want to make sure you're giving customers enough of a radix, a high radix to flatten that level of network, the number of tiers. Because remember with AI networking, job completion time, latency, jitter, those are all very important factors that can really destabilize your jobs. Imagine you have a job that's scheduled for 14 days. It fails in day 13. You're looking at some of the coolest 800 gig optics. This one particular, this one is OSFP form factor, and it's the integrated heat sink. But we also have the same LPO optic. So, LPO optics, we technically remove the DSP or digital signal processing out of the optic and will make it more efficient for unique use cases if you have connections between your switch and your GPUs. Those are very popular because you save a lot on the power. Now, that's brutal. A training job failing on day 13 can mean days of wasted time, huge GPU costs, and engineers trying to figure out whether the problem was the workload, the network, the optics, the storage, or the configuration. And that brings us back to why Cisco built this lab, because at this scale, you can't just trust a spec sheet. You need to test the whole system, the GPUs, the NICs, the optics, the switches, the storage, the software, the automation, and the actual AI workloads that customers are going to run. And that's what Rakesh's team is doing. So, what we do in our lab is previewing all the equipment. I'll walk you through. I'll take you to my lab and I'll walk you through everything. We bring all this equipment, we bring all the components, and we also work very closely with the storage partner, because this AI infrastructure is a very, very big, complex piece of beast. It has all the components. So part of my job and part of what Cisco does is, how do I make sure my customer, when they deploy a Cisco solution, it will work and perform very well. So we put all this together, and we make sure we test. We do a lot of different kinds of performance benchmarking. We test the networking layer, the RDMA layer. We look at everything from the RDMA perspective. When we look at the network perspective, as you scale these clusters to tens of thousands of GPUs, the network plays a very, very critical role. If you're not using the latest and greatest technology like load balancing techniques, then even though you have network capacity, your inefficiencies can lead to suboptimal performance. Then we also look at the storage. We talk to the storage partner. We bring all these products. We put everything together. We want to make sure when customers deploy our validated design, it works, it works very well, and we give them all that visibility, performance benchmark numbers. It's not just talk. We give them all the numbers, and we work with our partners AMD and NVIDIA to make sure that we meet the performance requirements. And sometimes the bottleneck is not where you'd expect it to be. It's not always the GPU. It's not always the switch. In one test, Rakesh's team found the problem was actually storage. So, one time we were training the DLRM model, and our performance was very, very poor. So, when we analyzed that, there are a lot of tools available so that we can analyze and profile, and we found out the storage was the bottleneck. So we sat with our partner. We spent almost a couple weeks fine-tuning that. And once we did that, you won't believe that we were able to do even better than what the competition has shown on the DLRM. So, that's the point we are trying to make. This is very important, the kind of work we do, because our customers can benefit from the work we do in the lab. But that raises another question. If Cisco only has a 128 GPU cluster in the lab, how do they prove that these designs can scale to tens of thousands of GPUs? Rakesh explained that they use 128 GPUs as a scalable unit, and then use RDMA testing and a 512 node CPU cluster to simulate tens of thousands of flows. We have the AI cluster. Like I was telling earlier, we have 16 H100 nodes. Each one is eight GPU. That's 128 GPU. And 128 GPU is used as a scalable unit. So what we do is we work very closely with our partner Nvidia. They have a very nice performance benchmarking document. They specify all the test tools you have to run and all the KPI matrix you have to collect, and they also give the numbers. So if you say that, hey, my scalable unit, which is 128 GPU, it performs very well as per Nvidia's guideline. We have to not only follow the test methodology, but the numbers. We have to beat those numbers. And once we test that scalable unit, you can create very, very large clusters, hundreds of thousands of GPU. Because of our partnership with Nvidia, we're doing a Spectrum-X solution that allows us to horizontally scale their scalable units. There's one more thing I want to tell you. Even though we don't have such a large cluster, we have a very large number of CPU nodes, our CPU cluster, that's like 512. So what we do is we use RDMA and we have written our own home-grown tool where we simulate tens of thousands of flows and we understand how these flows work, how they get load balanced, what congestion. So these are the methodologies, and this is a scientific approach, that even though I have a small 128 GPU cluster working, but we can mathematically and scientifically prove to you that the solution we Cisco proposing, it works when deploying tens of thousands of GPU scale. So what Spectrum-X does is, when the traffic is run from a GPU, how does the leaf make sure that I'm equally distributing traffic to all the spine? If you do not equally distribute that, some link would be over utilized, some link would be under utilized, and that can cause artificial congestion. And the reason I'm saying artificial congestion is, you have enough bandwidth, but you're not efficiently using the resource of the bandwidth. That is very important, and that's what we are working very closely with Nvidia on, packet spray and Spectrum-X, we call the solution, where we make sure that all the traffic is equally distributed. We calculate a Jain's fairness index, which is very important. So when we train this model, when we run any workload, we compute all the metrics. We calculate how much traffic is distributed across the links. So if your Jain's fairness index is one, that means you have 100% perfect distribution. So, Cisco is not just saying here's a very quick switch, a fast switch. They're testing a scalable unit, simulating massive traffic patterns, measuring congestion and load balancing, and using that data to prove how the design behaves at a much larger scale. That also leads to one of the biggest questions in AI networking. If you're using RDMA and building massive GPU clusters, should you use InfiniBand or Ethernet? So, are you running InfiniBand or Ethernet or both in your lab? No, very good question. So my lab is only Ethernet. Rakesh gave me the lab answer. Cisco's AI lab uses Ethernet, but I wanted the bigger industry answer, so I asked Will directly. InfiniBand or Ethernet? What's winning in the data center? So, we still see a lot of InfiniBand out there, but I think in the last 12 months, clearly the position even from Nvidia, you see a lot more push and support towards Ethernet. I think that's for a few different reasons. First of all, customers want choice, and so with multi-vendor as a key asset. I think also we're getting to a point where InfiniBand does hit, what is it, 30,000 GPU, there's a scaling limit where you do top out. And so as customers start to think about hundreds of thousands of GPUs, even if that's not today, that is an Ethernet-based network rather than InfiniBand. And so for multiple reasons, I think it's very clear that it's all Ethernet. We don't see that being a major topic. Most customers come, even if they've had a lot of InfiniBand, with an okay, I'm heading towards Ethernet. Now, one of the things that we have had in those discussions is there's some features in InfiniBand, some multi-tenancy features that they've had that aren't native in Ethernet. So beyond operational simplicity, which we've put a lot of effort into, we are looking at pulling some of those feature functions out of InfiniBand and adding that to Ethernet. And so that's something you'll hear more from us in the next few months. So, Ethernet gives customers scale and choice. It's also very familiar to all of us. We have Ethernet almost everywhere these days. But there's another problem that you need to be aware of. If these AI clusters are worth hundreds of millions of dollars, and most of the traffic is moving east-west between servers, security cannot just sit at the edge of the data center anymore. It has to move deeper into the cluster. Another major announcement we talked about this week is around services. We're in many regards a hardware vendor at heart. We love shipping news. But many times customers are already, if you think about a cluster, they already come with very sophisticated chips in the servers. And so the DPUs, these super NICs, there's the ones that go in the east-west, so connecting GPUs. But the ones that go in the north-south, so connecting servers to the internet, connecting servers to other servers, which we call north-south NICs, those, for instance, BlueField 3, BlueField 4, those are actually very sophisticated. They have a very large number of beefy Arm cores. You can do a lot of programming in those. And so one of the announcements of GTC has been we're now providing security services, our first entry point onto those NICs, on the north-south, so that we can provide, for instance, firewall type function without having to buy more hardware. So you just enable it, and that's something we plan to keep extending. And that interconnects with Isovalent, which is the company we acquired, which allows us to do distributed policy enforcement point. And so our goal is, you should be able to have policy defined without the Isovalent controller and then deploy the enforcement across all of these different points, including in your AI cluster with hardware you already have. So you can enforce policy on the DPU in the server. Yes. You can enforce it on the switch as well, because it's doing something similar, right? So your traditional firewall no longer is all you have. Now you're putting the firewall everywhere basically. Absolutely. There's still a place for a firewall in your DMZ, for internet, but within these architectures, trying to put large beefy firewalls all over within whether it's your AI clusters or in your server to server communication within a data center, I think that phase is going to be coming to an end, and we at Cisco believe that's something that's a very critical architecture transition. That's a big shift. The firewall doesn't disappear, but in these AI data centers security cannot be one big box at the edge. The traffic is moving inside the cluster between the servers, the GPUs, NICs, DPUs, and switches. So the security has to move closer to the workload. And that brings the whole story together. The future AI data center is not just faster GPUs, it is faster networking, better optics, better visibility, better automation, validated designs, and security built into the fabric itself. And if a 100 terabits per second switch and 1.6 terabits per second interfaces is not insane enough, Will told me that it's not slowing down. Expect the speeds just to continue ramping up. Speeds in the data center. Yes. So we've got a 100 terabits per second switch, 1.6 terabits per second interfaces. Is it just going to keep on going? Are we going to get like 3.2 terabits per second at some point, stuff like that? Yeah, it's interesting, I was visiting with a hyperscaler last week and we were very closely co-designing where it's multi-years out. We were talking about 400 gig SerDes. We were talking about 3.2 terabit interfaces. We were talking about where is the limit where we will no longer be able to support pluggables. Liquid cooling being required instead of just desirable. So it is, I don't see an end yet. I see this is coming fast. And some of these like the SerDes, the analog technology where you actually serialize data into a single pair of wires, I'm still amazed how far that has come. So my biggest takeaway from Cisco's AI lab is this. AI is not just about GPUs. The GPUs are the part that a lot of people talk about, but the network is what makes it work together. The switches matter, the optics matter, the storage matters, the software matters, the automation matters, the security matters, and the validation matters, because when one small problem can cost millions of dollars, you don't want to guess. You want the whole system tested before you deploy it. That's what Cisco is doing in this lab. Testing AI infrastructure as a complete system, so customers can build these clusters faster, operate them more reliably, and avoid wasting expensive GPU capacity. And yes, I'd still love to have a home lab like this. Can't afford it. I mean, I need a power plant. I need lots and lots of money. I want to thank the entire team for showing me around, showing me the cool technologies that they get to work on. What do you think about this? What do you think about the technologies? Did you learn something new in this video? Hopefully you did. I'm David Bombal and I want to wish you all the very best.