We Are Scaling Heterogeneous Compute

We are scaling heterogeneous compute to unlock a completely new era of AI infrastructure. One that delivers systems that are faster, cheaper, and far more capable than anything a homogeneous stack can deliver.

Today we announce three early milestones in service of that mission: deployment of heterogeneous intelligence across all major cloud providers, partnerships with four next-generation compute companies spanning entirely different physical paradigms and a $2.9 million ARIA grant to pioneer R&D on the first co-located heterogeneous compute cluster for multi-agent intelligence.

Heterogeneity is a new dimension of scale. We have proven formally that systems composed of diverse models and compute substrates outperform any homogeneous system. These milestones chart our path to delivering it - each building on the last, each pushing deeper into the hardware stack.

Heterogeneous intelligence across all major cloud providers

Our orchestration frameworks are cloud-agnostic by design, executing across all major cloud providers, selecting across instances within and between them. TPUs, AMD, multiple generations of NVIDIA accelerators, each at different performance and price points. Where the hyperscalers build homogeneous clusters, we treat the full landscape as a heterogeneous system and route workloads to the compute best suited to them.

Cross-instance networking is typically the constraint. We have made significant breakthroughs in cross-vendor GPU networking that overcome it, breakthroughs we will detail in a forthcoming post. For our enterprise partners, this means greater capability, faster execution, and lower cost, without lock-in to any single provider or generation of hardware. But existing cloud hardware is only the starting point.

Accelerating the next generation of compute

For our vision to fully materialise, we need companies with new compute architecture ideas to succeed. Their success creates variety in the market. Variety unlocks new corners of the algorithm-hardware co-design space, systems-level architectures that are impossible when every chip looks the same. Our interests are directly aligned with theirs, so we are actively giving next-generation compute companies the quickest route into the stack.

We are already co-evolving with new silicon. Working with AWS, we built custom on-die capabilities on Inferentia2 for multi-agent tool calling, results we have already detailed and are deploying for customers. Our multi-endpoint technology makes this straightforward: any provider that exposes an access endpoint, we can integrate into our orchestration layer and deploy alongside hyperscaler clouds today.

But we are not waiting for this hardware to come to the cloud. Today we announce partnerships with four next-generation compute companies, each representing a fundamentally different physical paradigm:

Normal Computing - a new class of physics-based ASICs designed for diffusion-based generative AI workloads, including image and video generation, unlocking orders-of-magnitude more energy efficient compute and lower latency.

Mixx - silicon photonic interconnects integrating optics directly with ASICs, enabling switchless clusters with dramatically lower power and latency.

Cortical Labs - biological computing that fuses lab-grown human neurons with silicon, creating adaptive networks that learn from minimal data at a fraction of the energy cost.

Great Sky - superconducting optoelectronic networks approaching physical limits of neural computation, combining semiconductors, superconductors, and photonics.

Thermodynamic, biological, photonic, superconducting. No other company orchestrates across this range. The system that can select and compose across these paradigms, matching workload to substrate, will define the next era of compute.

Setting the new trajectory for heterogeneous AI infrastructure

The workloads that matter most are fundamentally heterogeneous. Multi-agent intelligence demands different computation at every layer: reasoning, retrieval, tool use, adaptation, each with different latency, cost, and capability profiles. No single chip handles all of them well. The infrastructure to run these systems must be heterogeneous from the ground up, and co-located so that hardware and intelligence can co-evolve together, each shaping the other.

The Advanced Research + Invention Agency (ARIA) has awarded Callosum a $2.9 million grant to pioneer R&D on exactly this: the first co-located heterogeneous compute cluster for scaling multi-agent intelligence, in collaboration with CommonAI.

Callosum and ARIA (Advanced Research + Invention Agency) logos side by side on a beige background.
ARIA has awarded Callosum $2.9M to pioneer R&D on the first co-located heterogeneous compute cluster for multi-agent intelligence - accelerating next-generation compute technologies into the stack and unlocking orders-of-magnitude gains in performance, cost, and speed.
Breaking the hardware lottery to unlock the next era of discovery

As Sara Hooker argued in The Hardware Lottery, the algorithms that win are not necessarily the best. They are the ones that fit the hardware available. When the only chips on offer are homogeneous GPU clusters, entire families of approaches go undiscovered. Not because they lack merit, but because the compute to run them does not exist at scale. This not only holds back the delivery of our existing solutions at scale, but research teams all over the world from discovering new ones.

Today's hardest problems - the ones that demand deep reasoning, real-time adaptation, and operation at the extremes of cost and latency - will not be solved by scaling the same chips further. They will be solved by systems that bring the right computation to bear at every level: faster, cheaper, and far more capable than anything a homogeneous stack can deliver. A plurality of chips and compute primitives, orchestrated together, giving rise to systems of intelligence we haven't yet imagined.