Technology
Delos Data offers AI chip startups a fast track to rack scale
Key Points
It’s hard enough for startups to compete with AMD and Nvidia on chip design. The rise of rack-scale architectures has only made things harder. Companies not only have to invest in chip design but also the mechanical, thermal, and power engineering necessary to pack six dozen or more AI accelerators into a single rack that functions as one enormous GPU.
COMPUTEX 2026 It’s hard enough for startups to compete with AMD and Nvidia on chip design. The rise of rack-scale architectures has only made things harder. Companies not only have to invest in chip design but also the mechanical, thermal, and power engineering necessary to pack six dozen or more AI accelerators into a single rack that functions as one enormous GPU. At Computex last week, Delos Data, a startup funded by former Intel and Barefoot Networks execs, showed off a modular server platform aimed at giving chip startups a shortcut to rack scale. One of the challenges with the move to rack scale is actually the sheer amount of networking that needs to be enabled at the box. A typical eight GPU HGX node only needs one or two ports per GPU. By comparison, a GB300 NVL72 needs 18 400 Gbps ports per GPU. Nvidia and AMD have developed custom racks with integrated backplanes, power delivery, and cooling. Delos by comparison is keeping things relatively simple by designing a chassis that, at least from the front, looks more like a switch than a GPU server. It features 36 OSFP ports, nine for each of the four OAM sockets at the heart of the system. OAM, if you’re not familiar, is an open socket commonly used by high-performance accelerators requiring more interconnect bandwidth and power delivery than standard PCIe cards can manage. Assuming 200 Gbps SerDes, that works out to 3.6 TB/s per chip of interconnect, the same as Nvidia's new Rubin GPUs. OSFP means that customers can use standard DACs or pluggable transceivers, and switches depending on how large they want their scale-up domain to be. And while OSFP is usually associated with Ethernet, you can run just about anything you want through them, whether it be UALink, Ultra Ethernet, PCIe, or something else. From a deployment standpoint, these systems would be wired up like any other hyperscale system, just a whole lot denser. Delos isn’t the only option out there for chip startups looking for scale up reference design. AWS for example appears to be repurposing Nvidia’s MGX form factor for its Trainium 3 rack systems, while AMD’s Helios rack is now an OCP standard. Both designs would, in theory, be easier to service, but Delos argues that its modular design offers greater flexibility. “It makes it a little bit more flexible in terms of, maybe you want a scale up domain of 100 or maybe you want it a scale up domain of one,” CTO Dan Daly told El Reg. “It just depends on how many cables you want to plug in. This also allows you to go plug into different types of switches… it could be simpler switches, maybe even optical circuit switches (OCS).” Using existing packet switches from Broadcom or Marvell, such a design could support 512-1,024 accelerators in a single layer fabric depending on whether you're using 200 Gbps or 100 Gbps SerDes. Using multi-layer fabrics, OCS, and/or 2D/3D toruses, the compute domain could scale even further, all while using off-the-shelf components. While OSFP keeps things simple and easy, it also means power consumption could become problematic for larger compute domains requiring pluggable optics. In fact, this is why Nvidia has taken so long to embrace optical scale-up. Copper may not have the reach, but it uses a fraction of the power. Delos CEO Ed Doe tells us the company is already exploring versions of the system that will use near package or co-packaged optics out to MPO-style connectors rather than the OSFP. The startup isn't just doing hardware. As anyone who's done large scale networking knows, the physical and logical topologies — that is, the way devices communicate with one another on the network — can look very different depending on the workload. Delos has developed a software orchestration platform designed to facilitate the configuration and monitoring of these switched fabrics or meshes in order to enable dynamic rerouting of traffic in the event of a link failure. At Computex, this software platform, which Delos has dubbed its Nonstop AI network, was on display, allowing attendees to pull links at random and see the network react and correct itself automatically. The company's ambitions don't stop at network orchestration and systems. We're told Delos has additional products in the works, and we don't know for sure what they are, but a high radix switch design built atop merchant silicon would certainly complement its Nonstop AI systems. ®