Skip to main content
Report Bug / Feature Request

GPU (Graphics Processing Unit)

Definition

Specialized hardware that runs AI models much faster than CPUs. NVIDIA A100, H100, etc.

Why It Matters

AI inference does the same matrix multiplication millions of times. CPUs are general-purpose and serial; GPUs have thousands of small cores running the same operation in parallel, they finish the same workload 10–100× faster. Without GPUs, hosting a 70B model would mean waiting 10 seconds per token.

Key Points

  • H100 SXM5: 80 GB HBM3, ~3.35 petaFLOPS BF16, NVLink 900 GB/s inter-GPU bandwidth. Cloud spot pricing ~$2–4/hour per GPU.
  • A100 80GB: ~2 petaFLOPS BF16, the most widely deployed GPU for production inference as of 2024.
  • NVLink connects multiple GPUs into a shared VRAM pool, necessary for models whose weights exceed a single card's memory.
  • Consumer RTX 4090: 24 GB GDDR6X, ~330 TFLOPS FP16. Fast for prototyping; cannot fit a 70B model in full precision.
  • Memory bandwidth is the binding constraint for inference at typical batch sizes, H100's 3.35 TB/s vs. A100's 2 TB/s is why H100 produces ~2× higher token throughput despite a similar FLOPS ratio.

Example

An NVIDIA H100 has 80 GB of VRAM and ~3,000 trillion operations per second. Generating one image with FLUX.1 takes about 3 seconds on an H100. The same workload on a high-end CPU would take 5+ minutes.

Common Misconception

GPU FLOPS rating does not predict inference token throughput. Most LLM serving is memory-bandwidth-bound, not compute-bound, because batch sizes are small and the bottleneck is moving model weights from VRAM to compute units. A higher-bandwidth GPU produces more tokens per second even at similar FLOPS.

Related Terms

  • InferenceThe process of running an AI model to generate a response. When you send a message to ChatGPT, the model performs inference.
  • VRAMVideo RAM, the memory on a GPU used to store AI model weights during inference.
  • ParameterA trainable weight in an AI model. Larger models have more parameters (7B, 70B, 400B).

GPU (Graphics Processing Unit) on Rewind.ai

Rewind.ai's self-hosted GPU pool runs every open-source model on the platform. Token pricing reflects GPU-second cost, that's why a long video render burns more tokens than a one-line chat reply.

Explore the Tools

Quick Facts

TermGPU (Graphics Processing Unit)
RelatedInference, VRAM, Parameter

Browse Glossary

View All AI Terms

FAQ

GPU (Graphics Processing Unit) on Rewind.ai is a free AI tool. There's no charge and no sign up needed to start.

Yes. You get 2,500 free tokens per day to use GPU (Graphics Processing Unit) and every other tool on Rewind.ai. A free account raises that to 5,000 tokens/day. You can buy more starting at $1.

GPU (Graphics Processing Unit) runs open-source AI models on our GPU servers. Send your request and the result comes back in seconds.

No. You can use GPU (Graphics Processing Unit) right away without signing up. A free account doubles your daily usage to 5,000 tokens and saves your history.

Anonymous users get 2,500 tokens/day. Free accounts get 5,000 tokens/day. Tokens reset every 24 hours. Each generation costs ~100-5,000 tokens depending on the operation.

Your data is processed on our servers and isn't stored permanently unless you choose to save it. We don't sell or share it.

Yes. Content from GPU (Graphics Processing Unit) is yours to use for personal or commercial work. The AI models we run are commercially licensed.

GPU (Graphics Processing Unit) matches the quality of paid services because it runs the latest open-source AI models. The difference is you don't pay per use.

GPU (Graphics Processing Unit) runs open-source AI models including Qwen 2.5, FLUX and Whisper. We update to newer models as they ship.

Yes. GPU (Graphics Processing Unit) works in any mobile browser, and the layout adapts to your screen size.

Sign up for a free account to get 5,000 tokens/day, double the anonymous limit. Or buy token packs starting at $5 for 200,000 tokens. See /pricing/ for all options.

Yes. After you generate content, you can download it, copy it, or share it via a unique link. Signed-in users can also view their generation history.

Love Rewind.ai? Tell your friends!

Rate this page