Question 1

How much does it cost to use AI APIs?

Accepted Answer

AI APIs are priced on a per-token basis, billed separately for input tokens (your prompts) and output tokens (the model's response). Costs vary widely by provider and model tier. Budget models like GPT-4o mini or Gemini 2.5 Flash start as low as $0.15 per million input tokens, while premium frontier models like Claude Opus can reach $15 per million input tokens. Your actual monthly cost depends entirely on how many tokens you send and receive — use this calculator to estimate from your own usage numbers.

Question 2

Which AI API is the cheapest?

Accepted Answer

It depends on your use case. For high-volume, cost-sensitive workloads, GPT-4o mini ($0.15/M input), Gemini 2.5 Flash ($0.15/M input), and Llama 4 via Together.ai ($0.27/M input) are among the most affordable options. For tasks requiring frontier intelligence, you'll pay more — Claude Sonnet 4.6 ($3/M input) and GPT-4o ($2.50/M input) offer a balance of capability and cost. Open-source models like Llama 4 can also be self-hosted to eliminate per-token fees entirely if you have the infrastructure.

Question 3

What is the difference between input and output tokens?

Accepted Answer

Input tokens are the tokens in the message you send to the model — your system prompt, conversation history, and user message. Output tokens are the tokens in the model's response. All providers charge different rates for each: output tokens are consistently more expensive than input tokens, often by a factor of three to five times. A typical API call might use 500 input tokens and generate 300 output tokens. For long-context or retrieval-augmented workloads where you embed large documents in every prompt, input costs can dominate your bill — many providers offer prompt caching at a steep discount for repeated context.

Question 4

Do AI providers offer batch pricing discounts?

Accepted Answer

Yes. Anthropic and OpenAI both offer a Batch API that processes requests asynchronously (with a turnaround time of up to 24 hours) at roughly 50% off standard pricing. This is ideal for workloads that are not latency-sensitive — data labeling, offline document processing, bulk classification, and similar tasks. Google's Gemini models do not currently offer a formal batch discount tier through the API Pricer data set. When batch pricing is available for a model, you can toggle between standard and batch rates directly in the calculator above.

Question 5

How often do AI API prices change?

Accepted Answer

AI API pricing has trended downward significantly since 2023 as providers compete for market share and infrastructure costs decline. Price drops can happen at any time — OpenAI, Anthropic, and Google have each cut prices multiple times in the past year. The pricing data on this site is updated manually on a monthly schedule by reviewing each provider's official pricing page.

Question 6

What is a token in AI API pricing?

Accepted Answer

A token is the basic unit of text that a language model reads and produces. Tokens are not the same as words — a single word may be one token or several, and punctuation marks and spaces are often their own tokens. As a rough rule of thumb, one token equals about four characters of English text, and 100 tokens equals roughly 75 words. A short paragraph like this one is approximately 80–100 tokens. Most providers publish tokenizer tools you can use to get an exact count for your specific content before estimating costs.

Model	Provider
Claude Sonnet 4.6	anthropic	$0.00	$0.00	$0.00	$0.00
GPT-4o	openai	$0.00	$0.00	$0.00	$0.00
GPT-4o mini	openai	$0.00	$0.00	$0.00	$0.00
Gemini 2.5 Flash	google	$0.00	$0.00	$0.00	$0.00
Mistral Large	mistral	$0.00	$0.00	$0.00	$0.00

Compare AI API Costs Across Every Major Provider

Models

Cost Comparison

Frequently Asked Questions