Cloud TPUs optimize performance and cost for all AI workloads, from training to inference. Using world-class data center infrastructure, TPUs offer high reliability, availability, and security.
Not sure if TPUs are the right fit? Learn about when to use GPUs or CPUs on Compute Engine instances to run your machine learning workloads.
Overview
Google Cloud TPUs are custom-designed AI accelerators, which are optimized for training and inference of AI models. They are ideal for a variety of use cases, such as agents, code generation, media content generation, synthetic speech, vision services, recommendation engines, and personalization models, among others. TPUs power Gemini, and all of Google’s AI powered applications like Search, Photos, and Maps, all serving over 1 Billion users.
Cloud TPUs are designed to scale cost-efficiently for a wide range of AI workloads, spanning training, fine-tuning, and inference. Cloud TPUs provide the versatility to accelerate workloads on leading AI frameworks, including PyTorch, JAX, and TensorFlow. Seamlessly orchestrate large-scale AI workloads through Cloud TPU integration in Google Kubernetes Engine (GKE). Leverage Dynamic Workload Scheduler to improve the scalability of workloads by scheduling all accelerators needed simultaneously. Customers looking for the simplest way to develop AI models can also leverage Cloud TPUs in Vertex AI, a fully-managed AI platform.
Cloud TPUs are optimized for training large and complex deep learning models that feature many matrix calculations, for instance building large language models (LLMs). Cloud TPUs also have SparseCores, which are dataflow processors that accelerate models relying on embeddings found in recommendation models. Other use cases include healthcare, like protein folding modeling and drug discovery.
A GPU is a specialized processor originally designed for manipulating computer graphics. Their parallel structure makes them ideal for algorithms that process large blocks of data commonly found in AI workloads. Learn more.
A TPU is an application-specific integrated circuit (ASIC) designed by Google for neural networks. TPUs possess specialized features, such as the matrix multiply unit (MXU) and proprietary interconnect topology that make them ideal for accelerating AI training and inference.
Cloud TPU versions
| Cloud TPU version | Description | Availability |
|---|---|---|
Ironwood | Our most powerful and efficient TPU yet, for the largest scale training and inference | Ironwood TPU will be general available in Q4, 2025 |
Trillium | Sixth-generation TPU. Improved energy efficiency and peak compute performance per chip for training and inference | Trillium is generally available in North America (US East region), Europe (West region), and Asia (Northeast region) |
Cloud TPU v5p | Powerful TPU for building large, complex foundational models | Cloud TPU v5p is generally available in North America (US East region) |
Cloud TPU v5e | Cost-effective and accessible TPU for medium-to-large-scale training and inference workloads | Cloud TPU v5e is generally available in North America (US Central/East/South/ West regions), Europe (West region), and Asia (Southeast region) |
Ironwood
Description
Our most powerful and efficient TPU yet, for the largest scale training and inference
Availability
Ironwood TPU will be general available in Q4, 2025
Trillium
Description
Sixth-generation TPU. Improved energy efficiency and peak compute performance per chip for training and inference
Availability
Trillium is generally available in North America (US East region), Europe (West region), and Asia (Northeast region)
Cloud TPU v5p
Description
Powerful TPU for building large, complex foundational models
Availability
Cloud TPU v5p is generally available in North America (US East region)
Cloud TPU v5e
Description
Cost-effective and accessible TPU for medium-to-large-scale training and inference workloads
Availability
Cloud TPU v5e is generally available in North America (US Central/East/South/ West regions), Europe (West region), and Asia (Southeast region)
How It Works
Get an inside look at the magic of Google Cloud TPUs, including a rare inside view of the data centers where it all happens. Customers use Cloud TPUs to run some of the world's largest AI workloads and that power comes from much more than just a chip. In this video, take a look at the components of the TPU system, including data center networking, optical circuit switches, water cooling systems, biometric security verification and more.
Common Uses
High-performance, scalable, cost-efficient inference
Accelerate AI Inference with vLLM and MaxDiffusion. vLLM is a popular open-source inference engine, designed to achieve high throughput and low latency for Large Language Model (LLM) inference. Powered by tpu-inference, vLLM now offers vLLM TPU for high-throughput, low-latency LLM inference. It unifies JAX and Pytorch, providing broader model coverage (Gemma, Llama, Qwen) and enhanced features. MaxDiffusion optimizes diffusion model inference on Cloud TPUs for high performance.
High-performance, scalable, cost-efficient inference
Accelerate AI Inference with vLLM and MaxDiffusion. vLLM is a popular open-source inference engine, designed to achieve high throughput and low latency for Large Language Model (LLM) inference. Powered by tpu-inference, vLLM now offers vLLM TPU for high-throughput, low-latency LLM inference. It unifies JAX and Pytorch, providing broader model coverage (Gemma, Llama, Qwen) and enhanced features. MaxDiffusion optimizes diffusion model inference on Cloud TPUs for high performance.
Run optimized AI workloads with platform orchestration
A robust AI/ML platform considers the following layers: (i) Infrastructure orchestration that support GPUs for training and serving workloads at scale, (ii) Flexible integration with distributed computing and data processing frameworks, and (iii) Support for multiple teams on the same infrastructure to maximize utilization of resources.
Run optimized AI workloads with platform orchestration
A robust AI/ML platform considers the following layers: (i) Infrastructure orchestration that support GPUs for training and serving workloads at scale, (ii) Flexible integration with distributed computing and data processing frameworks, and (iii) Support for multiple teams on the same infrastructure to maximize utilization of resources.
Generate a solution
What problem are you trying to solve?
What you'll get:
Step-by-step guide
Reference architecture
Available pre-built solutions
This service was built with Vertex AI. You must be 18 or older to use it. Do not enter sensitive, confidential, or personal info.
Pricing
| Cloud TPU pricing | All Cloud TPU pricing is per chip-hour | ||
|---|---|---|---|
| Cloud TPU Version | Evaluation Price (USD) | 1-year commitment (USD) | 3-year commitment (USD) |
Trillium | Starting at $2.7000 per chip-hour | Starting at $1.8900 per chip-hour | Starting at $1.2200 per chip-hour |
Cloud TPU v5p | Starting at $4.2000 per chip-hour | Starting at $2.9400 per chip-hour | Starting at $1.8900 per chip-hour |
Cloud TPU v5e | Starting at $1.2000 per chip-hour | Starting at $0.8400 per chip-hour | Starting at $0.5400 per chip-hour |
Cloud TPU pricing
All Cloud TPU pricing is per chip-hour
Trillium
Evaluation Price (USD)
Starting at
$2.7000
per chip-hour
1-year commitment (USD)
Starting at
$1.8900
per chip-hour
3-year commitment (USD)
Starting at
$1.2200
per chip-hour
Cloud TPU v5p
Evaluation Price (USD)
Starting at
$4.2000
per chip-hour
1-year commitment (USD)
Starting at
$2.9400
per chip-hour
3-year commitment (USD)
Starting at
$1.8900
per chip-hour
Cloud TPU v5e
Evaluation Price (USD)
Starting at
$1.2000
per chip-hour
1-year commitment (USD)
Starting at
$0.8400
per chip-hour
3-year commitment (USD)
Starting at
$0.5400
per chip-hour
PRICING CALCULATOR
Estimate your monthly Cloud TPU costs, including region specific pricing and fees.
CUSTOM QUOTE
Connect with our sales team to get a custom quote for your organization.
Start your proof of concept
Try Cloud TPUs for free
Get a quick intro to using Cloud TPUs
Run TensorFlow on Cloud TPU VM
Run JAX on Cloud TPU VM
Run PyTorch on Cloud TPU VM