Raja Koduri on Why AI Infrastructure Needs a Rethink From the Ground Up

The Cost Problem Nobody Wants to Say Out Loud

Building AI infrastructure at scale is expensive in ways that most public conversations still understate. A single hyperscale AI data center now costs approximately $50 billion. The silicon budget alone, specifically the cost of buying the most popular GPU, runs approximately $30 to $35 billion per gigawatt. Energy, power delivery, networking, and cooling add another $10 to $15 billion on top of that, depending on geography.

The industry, in short, cannot build fast enough to meet demand.

Raja Koduri addressed this directly in a recent conversation about OXMIQ, the company he is now part of. His framing was precise: faster chips alone will not fix this. What matters is utilization, getting more useful work out of every transistor. He argued that silicon costs need to come down by a third or more for AI to reach broader adoption globally, citing mobile internet in India as a reference point for what happens when cost barriers fall and a billion people gain access to something previously out of reach. That, in his telling, is part of the reason OXMIQ exists.

Raja Koduri and the Architecture at the Center of It All

OXMIQ’s core technical offering is something called OxCore, which Raja Koduri describes as “the new computing core.” The idea behind it addresses a fragmentation problem that has existed in silicon design for years. Most systems today treat three types of computation as separate abstractions:

Scalar processing, the kind associated with traditional CPUs
Vector processing, the parallel compute model associated with GPUs and frameworks like CUDA
Matrix math, the tensor-style operations associated with TPUs

OxCore unifies all three into a single architecture. As he put it, “You can think of it as a single core that encapsulates CPU, GPU, and TPU into one unified architecture.” The practical benefit is improved utilization and a simpler mental model for developers thinking about how execution works.

Chiplets Sound Simple Until You Actually Build Them

Chiplets have attracted significant attention as a path to scalable silicon. Raja Koduri has a more grounded take: “Everyone who is super excited about chiplets are the ones who haven’t done them yet.”

That perspective comes from direct experience. He drove chiplet integration from AMD’s HBM1 all the way through Intel’s 47-chiplet Data Center GPU Ponte Vecchio, one of the most complex chiplet designs ever attempted. The lessons from that work are embedded in how OXMIQ approaches the problem.

The core issue is that standards, while necessary, are not sufficient. As he explained, “You doing a chiplet and me doing a chiplet and expecting them to come together and work, if you just have a standard? No, no, no. Even within my own team, there were challenges.” High-bandwidth chiplet systems introduce power, thermal, and validation complexity that scales with the bandwidth itself. The hotter the connection between two chiplets, the harder it becomes to standardize reliably.

OXMIQ’s chiplet quilting approach is built around that reality. It requires standardization, disciplined execution, and continual validation across architecture, power, thermals, and testing throughout the full stack.

From Agents to Atoms

The most forward-looking part of the conversation centered on what Raja Koduri called “probably the most visionary or most profound thing” in the discussion.

The argument is this: between an AI agent generating work and the silicon executing it, there are many layers of abstraction. Programming languages, frameworks, drivers, runtimes. Those layers were built by humans, for humans. But agents are not humans.

“They don’t need to talk through Python, C, all these intermediate languages. We created them for humans to program. But when it’s an agent generating work, there will be new, more efficient forms of communication, where the agent can talk to what I call nano-agents in silicon directly.”

He pushed the idea further: “You can express an entire inference model in a single page of math equations. Why am I breaking that down into tens of thousands of lines of code, through all the layers of the stack, just to ask the atoms to wiggle and perform a math computation? What if the future hardware just talks math?”

Fewer translation layers means lower latency, less wasted energy, and higher efficiency. When asked what one thing listeners should take away about OXMIQ, his answer was clean: “Think agents to atoms.”

The ARM Analogy and Why GPU IP Licensing Matters

OXMIQ does not design and sell its own chips. It licenses GPU IP so that other companies can build chips suited to their specific workloads. The reasoning connects directly to a gap in the current market.

“There is ARM for CPUs. But there is not ARM for GPUs. Anyone can license our IP and build a chip. That’s the problem we’re trying to solve.”

The analogy he used to explain why this matters: ARM’s existence in the mobile space made it possible for Apple to design its own chip for the iPhone. Without that, the product likely never happens. OxCore is built to scale across edge devices, data centers, robotics, and automotive applications, all within a shared architecture and software ecosystem.

OxCapsule and the Developer Experience

On the software side, OXMIQ launched OxCapsule in public beta in November 2025, with V1.0 supporting Windows and Mac. Version 2.2, which added Linux client support, followed in December 2025, with monthly updates continuing from there.

The product connects developers to remote GPU environments through a single interface, allowing them to shift between hardware platforms without rebuilding their workflows. Early beta participants include ARM, AMD, Intel, Infineon, Global Foundries, Tenstorrent, and Radisys, along with universities such as Boston University, NYU, Texas A&M, IIT Hyderabad, and the University of Utah.

A Career Built Across the Full Stack

Raja Koduri’s perspective on these problems is shaped by experience across Apple, AMD, and Intel. He described his Apple years as “almost like going to a university,” a period that shifted how he approached problems from a user-first rather than purely technical standpoint. His time at Intel he called “the PhD phase,” a period that brought together everything from transistor manufacturing through packaging, systems, and software.

That combination of depth and breadth is what he brings to OXMIQ, and it comes through in how precisely he frames the challenges the industry is facing.

About Raja Koduri

Raja Koduri is a semiconductor and computing architect with career experience at Apple, AMD, and Intel. He currently works at OXMIQ, where he contributes to the company’s architecture strategy across its OxCore platform, chiplet integration work, and GPU IP licensing model. His background spans the full computing stack, from transistor-level manufacturing to systems, software, and product development.

Raja Koduri on Why AI Infrastructure Needs a Rethink From the Ground Up

The Cost Problem Nobody Wants to Say Out Loud

Raja Koduri and the Architecture at the Center of It All

Chiplets Sound Simple Until You Actually Build Them

From Agents to Atoms

The ARM Analogy and Why GPU IP Licensing Matters

OxCapsule and the Developer Experience

A Career Built Across the Full Stack

About Raja Koduri

How to Choose the Right Port Chester Pool Builder

Top Benefits of Personal Training in Charlotte NC

66 Mulberry St New York, NY 10013

latest Posts

How to Choose the Right Port Chester Pool Builder

Top Benefits of Personal Training in Charlotte NC

Port Chester Pool Builder: What You Need to Know

Pages