Penny-Pincher Provider

A curated list of affordable (or almost free) LLM providers for people who are unwilling to pay premium prices for AI.

Note: Claude is arguably the best AI coding assistant out there — but it’s expensive. If you want to try it before committing, you can use a guest pass for 1 week of free Claude Pro (includes Claude Code).

Guest passes are limited and available on a first-come, first-served basis. New users only — you’ll need to enter payment info to activate, but you can cancel before the trial ends to avoid charges. Learn more.

My passes:

~~https://claude.ai/referral/ZkoAngod1A~~ — out of stock as of 2026-04-30

Friend’s passes:

https://claude.ai/referral/7RBJExZ7LA

Got a spare Claude Code guest pass? You can help others try Claude Code for free by sharing your pass here — open a pull request adding your link under the Friend’s passes section, or open an issue and I’ll add it for you. Passes are first-come, first-served, so the more we pool together, the better.

If you know of other providers, feel free to create a pull request or open an issue here. I will review and add them when possible. Thank you!

If you find this helpful, you can support me by donating or registering with my referral links. Thank you!

Providers with Coding Plans

These providers offer coding plan subscriptions. You can prepay a monthly fee and use their LLM APIs with usage limits.

Z.ai

Offers 3 plans (starting from $18/month):

Lite: 3x usage of the Claude Pro plan
Pro: 5x Lite plan usage
Max: 4x Pro plan usage

Prices vary based on plan duration (monthly, quarterly, or yearly) and occasional promotional offers, so check the website for current pricing.

Notice (Apr 30, 2026): Auto-renewal on legacy plans (no weekly limit version) is being disabled. Affected users receive 2 months gift on the new equivalent plan.

Models: GLM-5.1, GLM-5-Turbo. OpenAI-compatible API + Anthropic-compatible endpoint.

Source: https://z.ai/subscribe, https://docs.z.ai/devpack/overview ¹

Homepage: https://docs.z.ai/devpack/overview

My referral:

🚀 You’ve been invited to join the GLM Coding Plan! Enjoy full support for Claude Code, Cline, and 20+ top coding tools — starting at just $18/month. Subscribe now and grab the limited-time deal! Link： https://z.ai/subscribe?ic=PLKIAYEIPW

MiniMax

Offers a Token Plan — a unified subscription for multimodal AI (text, speech, video, image, music). Pricing is based on API calls, not tokens — very generous!

Plans (monthly):

Starter: $10/month — 1,500 M2.7 requests per 5-hour rolling window
Plus: $20/month — 4,500 M2.7 requests per 5-hour window + speech, image, music generation
Max: $50/month — even higher quotas across all models
Highspeed tiers ($40–$150/month) — dedicated M2.7-highspeed access, up to 30,000 requests per 5-hour window

Includes access to M2.7 language model, Speech 2.8, Image-01, Hailuo video, and Music-2.5. Yearly plans available with ~17% discount.

Source: https://platform.minimax.io/docs/token-plan/intro, https://platform.minimax.io/docs/guides/pricing-token-plan ²

Homepage: https://platform.minimax.io

My referral:

🎁 MiniMax Token Plan New Year Mega Offer! Invite friends and earn rewards for both! Exclusive 10% OFF for friends. Ready-to-use API vouchers for you! Token Plan Referral Program ends May 1, 2026 — referred users get 10% off their subscription and join the dev ambassador community; referrers earn 10% back in API vouchers per paid referral, usable across all MiniMax models, plus priority access to events and model previews. 👉 Get your referral link: https://platform.minimax.io/subscribe/token-plan?code=CAQ5sxHAq6&source=link

Kimi Code

A coding-focused perk included with Kimi membership — drops into any dev workflow (terminal, IDE, or Kimi CLI) and is backed by Moonshot’s Kimi K-series models, which are sharply priced per token.

Tiers:

Adagio — free tier, unlimited basic conversations
Andante — paid, higher K2.5 quotas
Presto — paid, top quotas

Usage quotas tracked on a rolling 5-hour window and scale with membership tier. Pay-as-you-go API also available via platform.moonshot.ai.

Source: https://www.kimi.com/code, https://www.kimi.com/membership/pricing ³

Homepage: https://www.kimi.com/code

Alibaba Cloud Model Studio — Coding Plan

Monthly subscription for AI coding tools — top Qwen/Kimi/GLM/MiniMax models at fixed, predictable pricing.

Pro plan: $50/month

6,000 requests per 5-hour sliding window
45,000 requests per week (resets Monday 00:00 UTC+8)
90,000 requests per month (resets on subscription anniversary)

Models include qwen3.5-plus, qwen3-max, qwen3-coder, kimi-k2.5, glm-5, and MiniMax-M2.5.

Supported tools: Claude Code, Cursor, Cline (VS Code), OpenCode, Qwen Code, Kilo Code, Kilo CLI, OpenClaw, Codex, and more.

Note: the Lite plan stopped accepting new subscriptions on Mar 20, 2026.

Source: https://www.alibabacloud.com/help/en/model-studio/coding-plan ⁴

Homepage: https://www.alibabacloud.com/product/modelstudio

BytePlus ModelArk — Coding Plan

ByteDance’s ModelArk coding subscription — flat monthly fee, works with mainstream coding tools, models swappable per task.

Standard plans:

Lite: $5/month
Pro: $25/month

Models include latest ByteDance-Seed-2.0-pro/lite, DeepSeek-V3.2, GLM-5.1, GLM-4.7, Kimi-K2.5, and GPT-OSS variants.

Supported tools: Claude Code, Cursor, Cline (VS Code), Kilo Code, Roo Code, OpenCode, TRAE, and more.

Note: new-user first-purchase promo pricing was suspended on Mar 17, 2026 — everyone now pays the list price.

Source: https://www.byteplus.com/en/activity/codingplan, https://docs.byteplus.com/en/docs/ModelArk/1925114 ⁵

Homepage: https://console.byteplus.com/ark

A subscription tier for the open-source opencode CLI that pools access to ~10 open-source coding models behind one flat price — aimed at developers who want generous request limits without premium-provider fees.

Pricing:

$5 first month, $10/month thereafter
Top up extra credit as needed; cancel anytime

Models include GLM-5.1, GLM-5, Kimi K2.6 (3× quotas through Apr 27), Kimi K2.5, MiMo-V2-Pro/Omni, Qwen3.5/3.6 Plus, MiniMax M2.5/M2.7.

Per-5-hour request limits vary by model tier (≈200 to 10,200).

API portability: The opencode-go API key is portable — works with Claude Code (via oc-go-cc or LiteLLM proxy), Cline (OpenAI-compatible), and any OpenAI-API-compatible client. Model IDs use opencode-go/<model-id> format.

Source: https://opencode.ai/go, https://opencode.ai/docs/go/ ⁶

Homepage: https://opencode.ai

Synthetic

Run open-source AI models for you in private, secure datacenters.

Privacy-first inference: Synthetic never trains on your data and doesn’t store API prompts or completions.

Pricing:

Subscription: $30/month (app-based access)
Usage-based: pay-as-you-go, no charge for unused capacity

Models include Kimi K2.5, MiniMax M2.5, GLM 5.1, GLM 4.7 Flash, plus any vLLM-compatible open-source LLM.

OpenAI-compatible — works with Roo, Cline, Octofriend, and any other OpenAI-API-compatible client.

Source: https://synthetic.new/ ⁷

Homepage: https://synthetic.new

Free Providers

Sorted by attractiveness — biggest recurring free quota, model quality, and lowest friction first.

LongCat AI

Meituan’s open-source LongCat models. API platform in public beta — no paid tier yet.

Free quota (resets daily 00:00 Beijing Time, no rollover):

LongCat-Flash-Lite: 50M tokens/day (no upgrade path — uniformly free)
LongCat-Flash-Chat: 500K tokens/day
LongCat-Flash-Thinking / Thinking-2601: 500K tokens/day each
LongCat-Flash-Omni-2603 (multimodal): 500K tokens/day
LongCat-2.0-Preview: 10M tokens / 2 hours (invite-only, 1M context)

Both OpenAI-compatible (https://api.longcat.chat/openai) and Anthropic-compatible (https://api.longcat.chat/anthropic) endpoints. 256K context on most models.

Source: https://longcat.chat/platform/docs/ ⁸

Homepage: https://longcat.chat/platform

Mistral La Plateforme

Mistral’s developer platform. Free Experiment plan — full model access with rate-limited prototyping quotas.

Free tier:

Up to ~1B tokens/month for prototyping (third-party reports; specific RPM not published)
All models incl. Mistral Large 3, Medium 3, Small 3.1, Codestral, Pixtral, embeddings
No credit card required — phone verification only
Up to $30K startup credits via separate Startup Program

OpenAI-compatible. Upgrade to Scale plan for production rate limits.

Source: https://docs.mistral.ai/deployment/ai-studio/tier, https://mistral.ai/pricing ⁹

Homepage: https://console.mistral.ai

Cerebras Cloud

World-fastest LLM inference (wafer-scale chip), OpenAI-compatible. Free tier — no credit card required.

Free tier limits:

1M tokens/day shared cap across free models
30 RPM most models (10 RPM for zai-glm-4.7)
60K TPM per model
Free models: gpt-oss-120b, llama3.1-8b, qwen-3-235b-a22b-instruct-2507, zai-glm-4.7

Pay-as-you-go tier removes daily and per-minute caps for higher throughput.

Source: https://inference-docs.cerebras.ai/support/rate-limits ¹⁰

Homepage: https://www.cerebras.ai/inference

xAI Grok API

xAI’s frontier Grok models — Grok 4, Grok 4.1 Fast (2M context), Grok Code Fast. OpenAI + Anthropic compatible at https://api.x.ai/v1.

Free tier (combined up to $175 in month one):

$25 in free signup credits (one-time)
+$150/month via Data Sharing Program (recurring, eligible countries)
Min $5 API spend required before opting in to data sharing

Pricing (Grok 4.1 Fast): $0.20/M input, $0.50/M output. Server-side tools (web search, code execution) +$5/1K calls.

⚠️ Privacy caveat: Data Sharing opt-in lets xAI train future models on your prompts and responses. Opt-in is irreversible at the team level.

Source: https://docs.x.ai/developers/models, https://x.ai/api ¹¹

Homepage: https://x.ai/api

OpenRouter

Free usage limits: If you’re using a free model variant (with an ID ending in :free), you can make up to 20 requests per minute. The following per-day limits apply:

If you have purchased less than 10 credits, you’re limited to 50 :free model requests per day.

If you purchase at least 10 credits, your daily limit is increased to 1000 :free model requests per day.

Source: https://openrouter.ai/docs/api/reference/limits ¹²

Homepage: https://openrouter.ai

Groq

Fast LPU inference, OpenAI-compatible. Free tier with no credit card.

Free tier (per-model, organization-level):

llama-3.1-8b-instant: 30 RPM, 14.4K RPD, 6K TPM, 500K TPD
llama-3.3-70b-versatile: 30 RPM, 1K RPD, 12K TPM, 100K TPD
whisper-large-v3 (audio): 20 RPM, 2K RPD
Also free: gemma2-9b-it, allam-2-7b, gpt-oss variants

Upgrade to Developer plan for higher RPM/TPD, Batch and Flex processing.

Source: https://console.groq.com/docs/rate-limits ¹³

Homepage: https://groq.com

GitHub Models

Single-API gateway to OpenAI, Anthropic, Llama, Mistral, DeepSeek, Grok, Phi, and more — free for any GitHub account. OpenAI-compatible at https://models.github.ai/inference.

Free tier:

Rate-limited free access for all GitHub accounts (no extra signup)
Per-model RPM/RPD vary (e.g. GPT-4o: 10 RPM / 50 RPD; DeepSeek-R1: 15 RPM / 150 RPD)
Personal Access Token with models:read permission required
Pay-as-you-go available beyond free tier; BYOK supported

⚠️ Note: Copilot Pro/Pro+ migrate to usage-based billing on Jun 1, 2026, and new Copilot Pro/Pro+ signups paused since Apr 20, 2026 — monitor changes if relying on Copilot tier limits.

Source: https://docs.github.com/github-models/prototyping-with-ai-models, https://docs.github.com/billing/managing-billing-for-your-products/about-billing-for-github-models ¹⁴

Homepage: https://github.com/marketplace/models

NVIDIA NIM

NVIDIA-hosted inference for 50+ open models — free for NVIDIA Developer Program members, no credit card required. OpenAI-compatible API at https://integrate.api.nvidia.com/v1 works out of the box with Cline, Roo, OpenCode, and any OpenAI-compatible client.

Free access:

Sign up for the free Developer Program → generate an nvapi- API key on build.nvidia.com
1,000 inference credits on signup (some accounts report rate-limit-only model since early 2025)
Personal-account rate limits shown in dashboard top-right — typically ~40 RPM and 1,000 requests/month, resetting on the 1st
Models include Kimi K2.5, GPT-OSS, DeepSeek-V3.2, Llama 3.x, Mistral, Phi, and NVIDIA’s own Nemotron family

Paid self-hosted NIM containers and pay-as-you-go API are available for higher throughput; the hosted free tier is fine for evaluation and light coding use.

Source: https://build.nvidia.com, https://developer.nvidia.com/nim ¹⁵

Homepage: https://build.nvidia.com

Cloudflare Workers AI

Serverless inference on Cloudflare’s global edge network. Free tier on both Free and Paid Workers plans.

Free allowance:

10,000 Neurons/day (resets 00:00 UTC); failures with 429 after exhaustion
50+ models: LLMs (Llama 3.3 70B, Gemma 3, Qwen2.5-Coder, DeepSeek R1), embeddings (BGE), Whisper transcription, image generation
Example mileage: ~150 LLM responses/day on Llama 3.3 70B, or ~500 audio-seconds Whisper
Overage billed at $0.011 per 1,000 Neurons on paid plan

Requires Cloudflare account API token + Account ID.

Source: https://developers.cloudflare.com/workers-ai/platform/pricing/ ¹⁶

Homepage: https://developers.cloudflare.com/workers-ai/

Hugging Face Inference Providers

Routes requests across multiple inference backends (Together, Fireworks, Novita, Cerebras, Replicate, DeepInfra, Scaleway, etc.) behind a single API. OpenAI-compatible.

Free tier:

100,000 monthly Inference Provider credits (recurring, free account)
PRO ($9/month): 2M monthly credits + $2 inference credits/month — 20× free tier
Free models include open-source LLMs across all routed backends
OpenAI-compatible endpoint at https://router.huggingface.co/v1 (chat completion only)

Sign-up free, no credit card. Other tasks (text-to-image, embeddings, speech) use HF inference clients.

Source: https://huggingface.co/docs/inference-providers/pricing, https://huggingface.co/changelog/inference-providers-openai-compatible ¹⁷

Homepage: https://huggingface.co/docs/inference-providers

Google Cloud Vertex AI (free trial credits)

Not a free-forever tier — but new GCP customers get $300 in free credits valid for 90 days, usable across Vertex AI for Gemini 3 Pro/Flash, Anthropic Claude on Vertex, and Vertex Partner models (DeepSeek, GLM, Qwen via MaaS).

Setup:

Sign up at https://cloud.google.com/free (credit card required for verification, not charged unless you upgrade)
Enable Vertex AI API in your GCP project
$300 expires after 90 days; account does not auto-convert to paid
Model access varies by region; Gemini 3 Pro/Flash available in most regions

Vertex AI Express Mode (no billing required): New users can sign up for an Express-mode account with limited free quotas — no credit card needed for evaluation. See https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview.

Practical for short-term heavy evaluation; not a long-term free path.

Source: https://cloud.google.com/free, https://cloud.google.com/vertex-ai/generative-ai/pricing ¹⁸

Homepage: https://console.cloud.google.com/vertex-ai

DeepSeek Platform

DeepSeek’s official API — flagship V4 / V3.2 / R1 models direct from the source. Notoriously cheap, no credit card to sign up.

Free tier:

5M free tokens at signup (no promo code, applied automatically)
Approx. 2,500 standard API calls or ~10M characters processed
No credit card required

Pricing (PAYG):

DeepSeek V4 Flash: $0.14/M input, $0.28/M output
DeepSeek V4 Pro: 75% off until May 5, 2026 ($0.435/M input, $0.87/M output)
Cached input: $0.03/M (90% discount)

OpenAI + Anthropic compatible at https://api.deepseek.com.

Source: https://api-docs.deepseek.com/quick_start/pricing ¹⁹

Homepage: https://platform.deepseek.com

Scaleway Generative APIs

EU/GDPR-compliant inference hosted in Paris, France. Privacy-first — provider does not log or train on inputs/outputs.

Free tier:

1,000,000 free tokens for every new customer (no time limit advertised)
Models: Qwen3 (235B / 397B / coder-30B), Llama 3.3 70B, Mistral Small 3.2 24B, DeepSeek R1 distill, Pixtral, Gemma, embeddings
Higher rate limits unlocked after KYC + payment method on file
Beyond free tokens, paid pricing typically €0.20–€0.90 per 1M tokens

Source: https://www.scaleway.com/en/generative-apis/, https://www.scaleway.com/en/pricing/model-as-a-service/ ²⁰

Homepage: https://console.scaleway.com

Kilo Code — Gateway

Open-source agentic coding extension for VS Code, JetBrains, and CLI. Its built-in Kilo Gateway routes LLM requests to any provider and ships with a genuine free path — no subscription required.

Free access:

Free models (IDs ending in :free) cost nothing — usage is tracked but not billed, rate-limited to 200 requests/hour per IP
kilo-auto/free auto-routes among available free models (e.g. GLM 4.7, MiniMax M2.1)
First top-up grants $20 in bonus credits (expires after 60 days) usable on paid models
Kilo Pass first-time subscribers get a 50% welcome bonus in month 1
Bring-your-own-keys works for any provider — no Kilo subscription required

Paid Kilo Pass tiers are available for higher throughput on premium models (Starter $19, Pro $49, Expert $199/month), but the free path covers most casual coding use.

Source: https://kilo.ai/pricing, https://kilo.ai/docs/getting-started/using-kilo-for-free, https://kilo.ai/docs/gateway/usage-and-billing ²¹

Homepage: https://kilo.ai

Pollinations AI

Open-source Gen-AI platform (Berlin) for text, image, audio, and video generation. OpenAI-compatible endpoints.

Free access (post-2026 key migration):

Publishable key (free, beta): 1 pollen/IP/hour — for client-side, demos, prototypes
Secret key: server-side only, no rate limit listed (still free during beta)
Sign up at https://enter.pollinations.ai; ~$1 ≈ 1 Pollen for paid pay-as-you-go
Models: DeepSeek V4 Flash/Pro, Flux, GPT Image, Seedream, Whisper, ElevenLabs voices, Veo (alpha)

Source: https://github.com/pollinations/pollinations ²²

Homepage: https://pollinations.ai

Together AI

Serverless inference for 200+ open-source models (Llama, Qwen, DeepSeek, Mixtral, etc.). OpenAI-compatible — drop-in via base URL change.

Free access:

$25 in free credits at signup (one-time)
No permanent free tier — beyond credits, pay-per-use ($0.06/M tokens for small models)
Startup Accelerator program: $15K–$50K credits for eligible startups
OpenAI-compatible API: change base URL + model name, keep the SDK

Source: https://www.together.ai, https://www.together.ai/startup-accelerator ²³

Homepage: https://www.together.ai

DeepInfra

Pay-per-token inference for 100+ open-source models. OpenAI-compatible endpoint at api.deepinfra.com/v1/openai.

Free access:

Sign-up credits (one-time, no permanent free tier)
Drop-in OpenAI SDK compatibility — swap base URL and API key
Pricing from $0.02/M tokens for small models, $0.06/M for mid-tier

Best for low-cost production traffic, not free-forever.

Source: https://deepinfra.com/docs/deep_infra_api ²⁴

Homepage: https://deepinfra.com

Fireworks AI

Fast inference for 50+ open-source models, plus tooling (function calling, MCP support, response API). OpenAI-compatible.

Free access:

$1 in free starter credits at signup (small but enough to evaluate)
OpenAI-compatible — initialize OpenAI client with Fireworks base URL + key
Pay-per-use beyond starter credit

Source: https://fireworks.ai/pricing, https://docs.fireworks.ai/tools-sdks/openai-compatibility ²⁵

Homepage: https://fireworks.ai

Serverless GPU platform for deploying your own LLMs (vLLM, TGI, custom models). Different paradigm: not a pre-hosted LLM API, you bring/deploy the model.

Free tier (Starter plan):

$30/month recurring credits (free, no credit card to start)
3 workspace seats, 100 containers, 10 concurrent GPUs
Pay-per-use beyond credits — only pay for actual compute

Use case: deploy any open-source LLM as your own OpenAI-compatible endpoint, full control over model + privacy.

Source: https://modal.com/pricing, https://modal.com/blog/how-to-deploy-vllm ²⁶

Homepage: https://modal.com

Checked on Apr 30, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 22, 2026 ↩
Checked on Apr 28, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 28, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Mar 25, 2026 ↩
Checked on Apr 28, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 25, 2026 ↩
Checked on Apr 28, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 28, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 28, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 30, 2026 ↩
Checked on Apr 30, 2026 ↩

Penny-Pincher Provider

Providers with Coding Plans

Free Providers

Modal Labs — Self-Host