Penny-Pincher Provider
A curated list of affordable (or almost free) LLM providers for people who are unwilling to pay premium prices for AI.
Note: Claude is arguably the best AI coding assistant out there — but it’s expensive. If you want to try it before committing, you can use a guest pass for 1 week of free Claude Pro (includes Claude Code).
Guest passes are limited and available on a first-come, first-served basis. New users only — you’ll need to enter payment info to activate, but you can cancel before the trial ends to avoid charges. Learn more.
My passes:
https://claude.ai/referral/ZkoAngod1A— out of stock as of 2026-04-30
Friend’s passes:
Got a spare Claude Code guest pass? You can help others try Claude Code for free by sharing your pass here — open a pull request adding your link under the Friend’s passes section, or open an issue and I’ll add it for you. Passes are first-come, first-served, so the more we pool together, the better.
If you know of other providers, feel free to create a pull request or open an issue here. I will review and add them when possible. Thank you!
If you find this helpful, you can support me by donating or registering with my referral links. Thank you!
Providers with Coding Plans
These providers offer coding plan subscriptions. You can prepay a monthly fee and use their LLM APIs with usage limits.
Z.ai
Offers 3 plans (starting from $18/month):
- Lite: 3x usage of the Claude Pro plan
- Pro: 5x Lite plan usage
- Max: 4x Pro plan usage
Prices vary based on plan duration (monthly, quarterly, or yearly) and occasional promotional offers, so check the website for current pricing.
Notice (Apr 30, 2026): Auto-renewal on legacy plans (no weekly limit version) is being disabled. Affected users receive 2 months gift on the new equivalent plan.
Models: GLM-5.1, GLM-5-Turbo. OpenAI-compatible API + Anthropic-compatible endpoint.
Source: https://z.ai/subscribe, https://docs.z.ai/devpack/overview 1
Homepage: https://docs.z.ai/devpack/overview
My referral:
🚀 You’ve been invited to join the GLM Coding Plan! Enjoy full support for Claude Code, Cline, and 20+ top coding tools — starting at just $18/month. Subscribe now and grab the limited-time deal! Link: https://z.ai/subscribe?ic=PLKIAYEIPW
MiniMax
Offers a Token Plan — a unified subscription for multimodal AI (text, speech, video, image, music). Pricing is based on API calls, not tokens — very generous!
Plans (monthly):
- Starter: $10/month — 1,500 M2.7 requests per 5-hour rolling window
- Plus: $20/month — 4,500 M2.7 requests per 5-hour window + speech, image, music generation
- Max: $50/month — even higher quotas across all models
- Highspeed tiers ($40–$150/month) — dedicated M2.7-highspeed access, up to 30,000 requests per 5-hour window
Includes access to M2.7 language model, Speech 2.8, Image-01, Hailuo video, and Music-2.5. Yearly plans available with ~17% discount.
Source: https://platform.minimax.io/docs/token-plan/intro, https://platform.minimax.io/docs/guides/pricing-token-plan 2
Homepage: https://platform.minimax.io
My referral:
🎁 MiniMax Token Plan New Year Mega Offer! Invite friends and earn rewards for both! Exclusive 10% OFF for friends. Ready-to-use API vouchers for you! Token Plan Referral Program ends May 1, 2026 — referred users get 10% off their subscription and join the dev ambassador community; referrers earn 10% back in API vouchers per paid referral, usable across all MiniMax models, plus priority access to events and model previews. 👉 Get your referral link: https://platform.minimax.io/subscribe/token-plan?code=CAQ5sxHAq6&source=link
Kimi Code
A coding-focused perk included with Kimi membership — drops into any dev workflow (terminal, IDE, or Kimi CLI) and is backed by Moonshot’s Kimi K-series models, which are sharply priced per token.
Tiers:
- Adagio — free tier, unlimited basic conversations
- Andante — paid, higher K2.5 quotas
- Presto — paid, top quotas
Usage quotas tracked on a rolling 5-hour window and scale with membership tier. Pay-as-you-go API also available via platform.moonshot.ai.
Source: https://www.kimi.com/code, https://www.kimi.com/membership/pricing 3
Homepage: https://www.kimi.com/code
Alibaba Cloud Model Studio — Coding Plan
Monthly subscription for AI coding tools — top Qwen/Kimi/GLM/MiniMax models at fixed, predictable pricing.
Pro plan: $50/month
- 6,000 requests per 5-hour sliding window
- 45,000 requests per week (resets Monday 00:00 UTC+8)
- 90,000 requests per month (resets on subscription anniversary)
Models include qwen3.5-plus, qwen3-max, qwen3-coder, kimi-k2.5, glm-5, and MiniMax-M2.5.
Supported tools: Claude Code, Cursor, Cline (VS Code), OpenCode, Qwen Code, Kilo Code, Kilo CLI, OpenClaw, Codex, and more.
Note: the Lite plan stopped accepting new subscriptions on Mar 20, 2026.
Source: https://www.alibabacloud.com/help/en/model-studio/coding-plan 4
Homepage: https://www.alibabacloud.com/product/modelstudio
BytePlus ModelArk — Coding Plan
ByteDance’s ModelArk coding subscription — flat monthly fee, works with mainstream coding tools, models swappable per task.
Standard plans:
- Lite: $5/month
- Pro: $25/month
Models include latest ByteDance-Seed-2.0-pro/lite, DeepSeek-V3.2, GLM-5.1, GLM-4.7, Kimi-K2.5, and GPT-OSS variants.
Supported tools: Claude Code, Cursor, Cline (VS Code), Kilo Code, Roo Code, OpenCode, TRAE, and more.
Note: new-user first-purchase promo pricing was suspended on Mar 17, 2026 — everyone now pays the list price.
Source: https://www.byteplus.com/en/activity/codingplan, https://docs.byteplus.com/en/docs/ModelArk/1925114 5
Homepage: https://console.byteplus.com/ark
opencode — Go
A subscription tier for the open-source opencode CLI that pools access to ~10 open-source coding models behind one flat price — aimed at developers who want generous request limits without premium-provider fees.
Pricing:
- $5 first month, $10/month thereafter
- Top up extra credit as needed; cancel anytime
Models include GLM-5.1, GLM-5, Kimi K2.6 (3× quotas through Apr 27), Kimi K2.5, MiMo-V2-Pro/Omni, Qwen3.5/3.6 Plus, MiniMax M2.5/M2.7.
Per-5-hour request limits vary by model tier (≈200 to 10,200).
API portability: The opencode-go API key is portable — works with Claude Code (via oc-go-cc or LiteLLM proxy), Cline (OpenAI-compatible), and any OpenAI-API-compatible client. Model IDs use opencode-go/<model-id> format.
Source: https://opencode.ai/go, https://opencode.ai/docs/go/ 6
Homepage: https://opencode.ai
Synthetic
Run open-source AI models for you in private, secure datacenters.
Privacy-first inference: Synthetic never trains on your data and doesn’t store API prompts or completions.
Pricing:
- Subscription: $30/month (app-based access)
- Usage-based: pay-as-you-go, no charge for unused capacity
Models include Kimi K2.5, MiniMax M2.5, GLM 5.1, GLM 4.7 Flash, plus any vLLM-compatible open-source LLM.
OpenAI-compatible — works with Roo, Cline, Octofriend, and any other OpenAI-API-compatible client.
Source: https://synthetic.new/ 7
Homepage: https://synthetic.new
Free Providers
Sorted by attractiveness — biggest recurring free quota, model quality, and lowest friction first.
LongCat AI
Meituan’s open-source LongCat models. API platform in public beta — no paid tier yet.
Free quota (resets daily 00:00 Beijing Time, no rollover):
LongCat-Flash-Lite: 50M tokens/day (no upgrade path — uniformly free)LongCat-Flash-Chat: 500K tokens/dayLongCat-Flash-Thinking/Thinking-2601: 500K tokens/day eachLongCat-Flash-Omni-2603(multimodal): 500K tokens/dayLongCat-2.0-Preview: 10M tokens / 2 hours (invite-only, 1M context)
Both OpenAI-compatible (https://api.longcat.chat/openai) and Anthropic-compatible (https://api.longcat.chat/anthropic) endpoints. 256K context on most models.
Source: https://longcat.chat/platform/docs/ 8
Homepage: https://longcat.chat/platform
Mistral La Plateforme
Mistral’s developer platform. Free Experiment plan — full model access with rate-limited prototyping quotas.
Free tier:
- Up to ~1B tokens/month for prototyping (third-party reports; specific RPM not published)
- All models incl. Mistral Large 3, Medium 3, Small 3.1, Codestral, Pixtral, embeddings
- No credit card required — phone verification only
- Up to $30K startup credits via separate Startup Program
OpenAI-compatible. Upgrade to Scale plan for production rate limits.
Source: https://docs.mistral.ai/deployment/ai-studio/tier, https://mistral.ai/pricing 9
Homepage: https://console.mistral.ai
Cerebras Cloud
World-fastest LLM inference (wafer-scale chip), OpenAI-compatible. Free tier — no credit card required.
Free tier limits:
- 1M tokens/day shared cap across free models
- 30 RPM most models (10 RPM for
zai-glm-4.7) - 60K TPM per model
- Free models:
gpt-oss-120b,llama3.1-8b,qwen-3-235b-a22b-instruct-2507,zai-glm-4.7
Pay-as-you-go tier removes daily and per-minute caps for higher throughput.
Source: https://inference-docs.cerebras.ai/support/rate-limits 10
Homepage: https://www.cerebras.ai/inference
xAI Grok API
xAI’s frontier Grok models — Grok 4, Grok 4.1 Fast (2M context), Grok Code Fast. OpenAI + Anthropic compatible at https://api.x.ai/v1.
Free tier (combined up to $175 in month one):
- $25 in free signup credits (one-time)
- +$150/month via Data Sharing Program (recurring, eligible countries)
- Min $5 API spend required before opting in to data sharing
Pricing (Grok 4.1 Fast): $0.20/M input, $0.50/M output. Server-side tools (web search, code execution) +$5/1K calls.
⚠️ Privacy caveat: Data Sharing opt-in lets xAI train future models on your prompts and responses. Opt-in is irreversible at the team level.
Source: https://docs.x.ai/developers/models, https://x.ai/api 11
Homepage: https://x.ai/api
OpenRouter
Free usage limits: If you’re using a free model variant (with an ID ending in :free), you can make up to 20 requests per minute. The following per-day limits apply:
- If you have purchased less than 10 credits, you’re limited to 50 :free model requests per day.
- If you purchase at least 10 credits, your daily limit is increased to 1000 :free model requests per day.
Source: https://openrouter.ai/docs/api/reference/limits 12
Homepage: https://openrouter.ai
Groq
Fast LPU inference, OpenAI-compatible. Free tier with no credit card.
Free tier (per-model, organization-level):
llama-3.1-8b-instant: 30 RPM, 14.4K RPD, 6K TPM, 500K TPDllama-3.3-70b-versatile: 30 RPM, 1K RPD, 12K TPM, 100K TPDwhisper-large-v3(audio): 20 RPM, 2K RPD- Also free:
gemma2-9b-it,allam-2-7b,gpt-ossvariants
Upgrade to Developer plan for higher RPM/TPD, Batch and Flex processing.
Source: https://console.groq.com/docs/rate-limits 13
Homepage: https://groq.com
GitHub Models
Single-API gateway to OpenAI, Anthropic, Llama, Mistral, DeepSeek, Grok, Phi, and more — free for any GitHub account. OpenAI-compatible at https://models.github.ai/inference.
Free tier:
- Rate-limited free access for all GitHub accounts (no extra signup)
- Per-model RPM/RPD vary (e.g. GPT-4o: 10 RPM / 50 RPD; DeepSeek-R1: 15 RPM / 150 RPD)
- Personal Access Token with
models:readpermission required - Pay-as-you-go available beyond free tier; BYOK supported
⚠️ Note: Copilot Pro/Pro+ migrate to usage-based billing on Jun 1, 2026, and new Copilot Pro/Pro+ signups paused since Apr 20, 2026 — monitor changes if relying on Copilot tier limits.
Source: https://docs.github.com/github-models/prototyping-with-ai-models, https://docs.github.com/billing/managing-billing-for-your-products/about-billing-for-github-models 14
Homepage: https://github.com/marketplace/models
NVIDIA NIM
NVIDIA-hosted inference for 50+ open models — free for NVIDIA Developer Program members, no credit card required. OpenAI-compatible API at https://integrate.api.nvidia.com/v1 works out of the box with Cline, Roo, OpenCode, and any OpenAI-compatible client.
Free access:
- Sign up for the free Developer Program → generate an
nvapi-API key on build.nvidia.com - 1,000 inference credits on signup (some accounts report rate-limit-only model since early 2025)
- Personal-account rate limits shown in dashboard top-right — typically ~40 RPM and 1,000 requests/month, resetting on the 1st
- Models include Kimi K2.5, GPT-OSS, DeepSeek-V3.2, Llama 3.x, Mistral, Phi, and NVIDIA’s own Nemotron family
Paid self-hosted NIM containers and pay-as-you-go API are available for higher throughput; the hosted free tier is fine for evaluation and light coding use.
Source: https://build.nvidia.com, https://developer.nvidia.com/nim 15
Homepage: https://build.nvidia.com
Cloudflare Workers AI
Serverless inference on Cloudflare’s global edge network. Free tier on both Free and Paid Workers plans.
Free allowance:
- 10,000 Neurons/day (resets 00:00 UTC); failures with 429 after exhaustion
- 50+ models: LLMs (Llama 3.3 70B, Gemma 3, Qwen2.5-Coder, DeepSeek R1), embeddings (BGE), Whisper transcription, image generation
- Example mileage: ~150 LLM responses/day on Llama 3.3 70B, or ~500 audio-seconds Whisper
- Overage billed at $0.011 per 1,000 Neurons on paid plan
Requires Cloudflare account API token + Account ID.
Source: https://developers.cloudflare.com/workers-ai/platform/pricing/ 16
Homepage: https://developers.cloudflare.com/workers-ai/
Hugging Face Inference Providers
Routes requests across multiple inference backends (Together, Fireworks, Novita, Cerebras, Replicate, DeepInfra, Scaleway, etc.) behind a single API. OpenAI-compatible.
Free tier:
- 100,000 monthly Inference Provider credits (recurring, free account)
- PRO ($9/month): 2M monthly credits + $2 inference credits/month — 20× free tier
- Free models include open-source LLMs across all routed backends
- OpenAI-compatible endpoint at
https://router.huggingface.co/v1(chat completion only)
Sign-up free, no credit card. Other tasks (text-to-image, embeddings, speech) use HF inference clients.
Source: https://huggingface.co/docs/inference-providers/pricing, https://huggingface.co/changelog/inference-providers-openai-compatible 17
Homepage: https://huggingface.co/docs/inference-providers
Google Cloud Vertex AI (free trial credits)
Not a free-forever tier — but new GCP customers get $300 in free credits valid for 90 days, usable across Vertex AI for Gemini 3 Pro/Flash, Anthropic Claude on Vertex, and Vertex Partner models (DeepSeek, GLM, Qwen via MaaS).
Setup:
- Sign up at https://cloud.google.com/free (credit card required for verification, not charged unless you upgrade)
- Enable Vertex AI API in your GCP project
- $300 expires after 90 days; account does not auto-convert to paid
- Model access varies by region; Gemini 3 Pro/Flash available in most regions
Vertex AI Express Mode (no billing required): New users can sign up for an Express-mode account with limited free quotas — no credit card needed for evaluation. See https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview.
Practical for short-term heavy evaluation; not a long-term free path.
Source: https://cloud.google.com/free, https://cloud.google.com/vertex-ai/generative-ai/pricing 18
Homepage: https://console.cloud.google.com/vertex-ai
DeepSeek Platform
DeepSeek’s official API — flagship V4 / V3.2 / R1 models direct from the source. Notoriously cheap, no credit card to sign up.
Free tier:
- 5M free tokens at signup (no promo code, applied automatically)
- Approx. 2,500 standard API calls or ~10M characters processed
- No credit card required
Pricing (PAYG):
- DeepSeek V4 Flash: $0.14/M input, $0.28/M output
- DeepSeek V4 Pro: 75% off until May 5, 2026 ($0.435/M input, $0.87/M output)
- Cached input: $0.03/M (90% discount)
OpenAI + Anthropic compatible at https://api.deepseek.com.
Source: https://api-docs.deepseek.com/quick_start/pricing 19
Homepage: https://platform.deepseek.com
Scaleway Generative APIs
EU/GDPR-compliant inference hosted in Paris, France. Privacy-first — provider does not log or train on inputs/outputs.
Free tier:
- 1,000,000 free tokens for every new customer (no time limit advertised)
- Models: Qwen3 (235B / 397B / coder-30B), Llama 3.3 70B, Mistral Small 3.2 24B, DeepSeek R1 distill, Pixtral, Gemma, embeddings
- Higher rate limits unlocked after KYC + payment method on file
- Beyond free tokens, paid pricing typically €0.20–€0.90 per 1M tokens
Source: https://www.scaleway.com/en/generative-apis/, https://www.scaleway.com/en/pricing/model-as-a-service/ 20
Homepage: https://console.scaleway.com
Kilo Code — Gateway
Open-source agentic coding extension for VS Code, JetBrains, and CLI. Its built-in Kilo Gateway routes LLM requests to any provider and ships with a genuine free path — no subscription required.
Free access:
- Free models (IDs ending in
:free) cost nothing — usage is tracked but not billed, rate-limited to 200 requests/hour per IP kilo-auto/freeauto-routes among available free models (e.g. GLM 4.7, MiniMax M2.1)- First top-up grants $20 in bonus credits (expires after 60 days) usable on paid models
- Kilo Pass first-time subscribers get a 50% welcome bonus in month 1
- Bring-your-own-keys works for any provider — no Kilo subscription required
Paid Kilo Pass tiers are available for higher throughput on premium models (Starter $19, Pro $49, Expert $199/month), but the free path covers most casual coding use.
Source: https://kilo.ai/pricing, https://kilo.ai/docs/getting-started/using-kilo-for-free, https://kilo.ai/docs/gateway/usage-and-billing 21
Homepage: https://kilo.ai
Pollinations AI
Open-source Gen-AI platform (Berlin) for text, image, audio, and video generation. OpenAI-compatible endpoints.
Free access (post-2026 key migration):
- Publishable key (free, beta): 1 pollen/IP/hour — for client-side, demos, prototypes
- Secret key: server-side only, no rate limit listed (still free during beta)
- Sign up at https://enter.pollinations.ai; ~$1 ≈ 1 Pollen for paid pay-as-you-go
- Models: DeepSeek V4 Flash/Pro, Flux, GPT Image, Seedream, Whisper, ElevenLabs voices, Veo (alpha)
Source: https://github.com/pollinations/pollinations 22
Homepage: https://pollinations.ai
Together AI
Serverless inference for 200+ open-source models (Llama, Qwen, DeepSeek, Mixtral, etc.). OpenAI-compatible — drop-in via base URL change.
Free access:
- $25 in free credits at signup (one-time)
- No permanent free tier — beyond credits, pay-per-use ($0.06/M tokens for small models)
- Startup Accelerator program: $15K–$50K credits for eligible startups
- OpenAI-compatible API: change base URL + model name, keep the SDK
Source: https://www.together.ai, https://www.together.ai/startup-accelerator 23
Homepage: https://www.together.ai
DeepInfra
Pay-per-token inference for 100+ open-source models. OpenAI-compatible endpoint at api.deepinfra.com/v1/openai.
Free access:
- Sign-up credits (one-time, no permanent free tier)
- Drop-in OpenAI SDK compatibility — swap base URL and API key
- Pricing from $0.02/M tokens for small models, $0.06/M for mid-tier
Best for low-cost production traffic, not free-forever.
Source: https://deepinfra.com/docs/deep_infra_api 24
Homepage: https://deepinfra.com
Fireworks AI
Fast inference for 50+ open-source models, plus tooling (function calling, MCP support, response API). OpenAI-compatible.
Free access:
- $1 in free starter credits at signup (small but enough to evaluate)
- OpenAI-compatible — initialize OpenAI client with Fireworks base URL + key
- Pay-per-use beyond starter credit
Source: https://fireworks.ai/pricing, https://docs.fireworks.ai/tools-sdks/openai-compatibility 25
Homepage: https://fireworks.ai
Modal Labs — Self-Host
Serverless GPU platform for deploying your own LLMs (vLLM, TGI, custom models). Different paradigm: not a pre-hosted LLM API, you bring/deploy the model.
Free tier (Starter plan):
- $30/month recurring credits (free, no credit card to start)
- 3 workspace seats, 100 containers, 10 concurrent GPUs
- Pay-per-use beyond credits — only pay for actual compute
Use case: deploy any open-source LLM as your own OpenAI-compatible endpoint, full control over model + privacy.
Source: https://modal.com/pricing, https://modal.com/blog/how-to-deploy-vllm 26
Homepage: https://modal.com
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 22, 2026 ↩
-
Checked on Apr 28, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 28, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Mar 25, 2026 ↩
-
Checked on Apr 28, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 25, 2026 ↩
-
Checked on Apr 28, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 28, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 28, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 30, 2026 ↩
-
Checked on Apr 30, 2026 ↩