tflop.dev · interactive TPU performance tools

TPUs are compute monsters built around one thing — matrix multiplication.

Understanding when they actually hit peak FLOPs — and when they don't — usually means dense roofline math on paper.

tflop.dev turns that math into three interactive widgets: a roofline plotter, an op-cost calculator, and a systolic-array simulator you can scrub cycle by cycle.

/tpu/roofline

Roofline plotter

Drop a workload onto a chip's roofline; sweep one shape dimension across orders of magnitude and watch the point cross the knee from memory-bound to compute-bound. Compare two chips side-by-side.

open →
/tpu/calc

Op-cost calculator

Pick a chip + dtype + op + shape; get FLOPs, bytes, arithmetic intensity, T_math, T_comms, and a lower-bound runtime. Loads worked-problem presets out of the scaling-book literature.

open →
/tpu/mxu

Systolic array

A real cycle-by-cycle MXU compiled from C++ to WASM. Press play; watch weights load, activations stream in from the left, and partial sums drop down through the array.

open →

What's this for? Building intuition about TPU performance — the kind that's hard to extract from roofline equations on a page. These widgets are the sandbox.

Audience. ML researchers, infra engineers, and students working through Google DeepMind's How to Scale Your Model who want to feel the math instead of just reading it.