Google used its Cloud Next 2026 keynote this week to show off the eighth generation of its home-grown AI silicon — and this time, it's two chips instead of one. TPU 8t is built for training, TPU 8i is built for inference, and both are designed around one assumption: the next big leap in AI won't come from bigger single models, it will come from armies of smaller AI agents working together.
"Today at Google Cloud Next, we are introducing the eighth generation of Google's custom Tensor Processor Unit," the company wrote in a blog post announcing the chips. "TPU 8t and TPU 8i are designed to power our custom-built supercomputers, to drive everything from cutting-edge model training and agent development, to massive inference workloads."
Two chips, one strategy
Splitting training and inference is a notable shift. For years, Google's TPUs (like Nvidia's GPUs) have tried to be good at both. But the workloads are starting to pull apart:
- Training rewards raw compute, enormous memory bandwidth, and tight, uniform communication between chips.
- Inference — especially for chatty AI agents that call each other and external tools — rewards low latency, big on-chip caches, and the ability to keep huge numbers of requests in flight at once.
TPU 8t, codenamed Sunfish, is the training workhorse, designed in partnership with Broadcom and reportedly built on TSMC's 2-nanometer process. TPU 8i, codenamed Zebrafish and designed with MediaTek, is the inference specialist. Google says TPU 8i connects 1,152 TPUs in a single pod, dramatically reducing latency, with 3× more on-chip SRAM than the previous generation — the kind of capacity you need when millions of agents are all thinking at the same time.
Both chips, Google says, deliver up to 2× better performance-per-watt than the seventh-generation Ironwood TPUs, thanks to liquid cooling, custom interconnects, and power management that dynamically adjusts to real-time demand.
Why efficiency is the whole story
Ironwood — the seventh-generation TPU first teased last year — hit general availability at the same event, giving cloud customers their first broad access to it. But the efficiency story is really where the eighth-gen chips flex. AI workloads are now a non-trivial chunk of global electricity demand, and hyperscalers are under growing pressure to keep the curve from turning vertical. A chip that does the same amount of useful work on half the power isn't a nice-to-have; it's the only way the industry's current growth math works out.
Google also used the event to announce a $750 million fund aimed at helping enterprises adopt AI more broadly, and a batch of new tools for building and operating AI agents across multi-cloud environments. The through-line: as models get cheaper and more modular, the constraint shifts to infrastructure.
What it means for everyone else
For customers, the practical effect is simple. Gemini and other foundation models will keep running on TPUs, but enterprises building their own agents on Google Cloud will now have a dedicated inference platform tuned for the way agents actually work — lots of small, low-latency calls rather than one giant query.
For the broader industry, the TPU 8t/8i split is a signal that specialization is winning. Nvidia has dominated AI compute with general-purpose GPUs, but the sheer variety of AI workloads is opening room for purpose-built silicon. Google is betting that by co-designing its chips, software, and data centers together — a decade-long effort that started with the first TPU back in 2015 — it can keep delivering performance gains even as transistor-level improvements slow.
The eighth-generation TPUs will ship to Google Cloud customers later this year. Given how tightly the chips are tied to Gemini and to Google's internal agent platform, you'll probably feel their effects indirectly before you ever see one: in faster responses from AI assistants, longer context windows, and agents that can handle genuinely multi-step tasks without falling over.

