In ProgressSocial ImpactAI/LLMRAGNext.js 14

YojanaKhoj — AI Government Benefits Finder

An AI-powered platform that matches Indian citizens to 200+ government welfare schemes in 3 minutes — hybrid rule-based + GPT-4o eligibility matching, personalised application guidance in Hindi and English.

200+

Govt Schemes

3 min

Quiz Duration

EN + HI

Languages

codeView on GitHub

help_outline

The Problem

Every year, ₹1.7 lakh crore in government welfare schemes go unclaimed in India. Not because people don't qualify — but because they don't know they qualify. 60%+ of eligible citizens never claim Ayushman Bharat. MGNREGA has ₹8,000+ crore in unpaid wages due to application failures. There are 400+ central and state schemes, spread across 30+ separate portals with different logins, different jargon, and different application processes. The people who need benefits most — BPL families, small farmers, rural women, senior citizens — are the least equipped to navigate this system. A 60-year-old farmer in Rajasthan shouldn't need to know what "PM-KISAN" stands for to receive the ₹6,000/year he legally qualifies for. The government's own solution — myscheme.gov.in — is a static directory. It lists all schemes, not *your* schemes. There is no personalisation, no guidance, and no plain-language explanation.

lightbulb

The Solution

YojanaKhoj (योजना खोज — "scheme search") is a conversational AI platform that asks 10–12 plain-language questions about a user's life situation, matches their profile against a database of 200+ government schemes, and returns a personalised benefits report. The core user flow: → 3-minute conversational quiz (one question at a time, branching logic) → "Matching your profile..." — hybrid rule + AI eligibility engine runs in 3–5 seconds → Results page: schemes ranked by confidence (Definitely Qualifies / Likely Qualifies) → Each scheme: what you get, how to apply step-by-step, your document checklist, where to go → Download PDF report or share on WhatsApp The one-sentence pitch: "Tell us about yourself, we tell you every rupee the government owes you."

warning

Why It Was Hard

The naive approach — ask GPT-4o "does this user qualify for PM-KISAN?" — fails at scale for three reasons: 1. Cost. 200 schemes × N users × GPT-4o calls = API bill that kills the product before it helps anyone. At ₹0.02–0.05 per LLM call, you need to be surgical about when you invoke the model. 2. Inconsistency. Government eligibility rules have hard boundaries (land holding ≤ 2 hectares) and soft edge cases (what if the wife owns the land, not the husband?). Pure LLM matching is inconsistent on the hard cases and the only reasonable approach for the soft ones. 3. Data quality. Government scheme documents are PDFs with inconsistent formatting, contradictory clauses, and outdated information. Building a reliable database meant AI-assisted extraction plus manual verification. The distribution problem is equally hard: my target users — rural, low-literacy, low-data — don't discover things via Google. The primary channel is WhatsApp. A website-first strategy, however well-built, misses 80% of the target population.

architecture

Architecture

Tech stack: Next.js 14 (App Router) + Node.js/Express API on AWS Lambda + MongoDB Atlas + OpenAI GPT-4o + LangChain + Pinecone. The eligibility matching engine is the core of the system — a hybrid approach: → Rule Engine (first pass): Each scheme has structured eligibility rules stored in MongoDB — { field: "landOwnership", operator: "lte", value: 2 }. For a given user profile, the rule engine filters to definite matches (confidence ≥ 0.9) in milliseconds. No LLM needed for the clear cases. → GPT-4o (edge cases only): The ~30% of scheme-user pairs that are ambiguous get routed to GPT-4o with a structured prompt: user profile + scheme eligibility text + "does this user qualify? return JSON { qualifies, confidence, reason, caveat }". This is the expensive path — used only when rules can't resolve it. → RAG over scheme embeddings: For the conversational Q&A feature ("do I qualify if my wife owns the land?"), user queries are embedded and matched against scheme-level vector embeddings in Pinecone. Retrieved chunks are injected into the LangChain prompt. → Document checklist generation: GPT-4o generates a personalised document list — it knows from the quiz what the user already has (Aadhaar confirmed: yes) and only lists what they still need to gather. Quiz engine: branching question flow stored in MongoDB. Each question has a followUpLogic map — if user says "farmer", next question is "land holding"; if "student", next is "education level". The backend tracks session state and serves the correct next question per answer.

bug_report

What Failed First

First version sent all 200 schemes through GPT-4o for every user profile. API latency was 25–40 seconds on the results page. Switched to rule-based pre-filtering: 200 schemes → ~20 candidates via rules → ~8 edge cases sent to GPT-4o. Latency dropped to 3–5 seconds. Second failure: The quiz's branching logic was hardcoded in the frontend. When we needed to add a new question branch for a new state-specific scheme, it required a frontend deploy. Moved all branching logic to the database — each question document stores its followUpLogic map. New branches via admin panel, no code change. Third issue: PDF report generation was running synchronously. For users with 15+ matched schemes, the PDF took 8–12 seconds to build while the user stared at a spinner. Moved report generation to a background Lambda function — the user sees results immediately, PDF generates async, download button activates when ready. The biggest unresolved problem: the WhatsApp bot — the primary distribution channel for rural users — is not built. The website is a secondary channel serving literate, urban users. Until the WhatsApp integration ships, the product is solving a real problem for the wrong audience.

insights

Results (Current Status: In Progress)

MVP is functional end-to-end: quiz → AI matching → results → scheme detail → PDF download. Current coverage: • 20+ central government schemes seeded (target: 200+ at launch) • English + Hindi UI working • Hybrid eligibility matcher running at 3–5s latency • PDF report generation • Admin panel for scheme management What's left before launch: • Expand scheme database to 200+ (central) + 50+ state-specific • WhatsApp bot via WhatsApp Business API (the critical distribution channel) • Application tracker — let users mark which schemes they applied for • SMS/WhatsApp alerts for new schemes matching a user's profile • NGO dashboard for bulk beneficiary management The product is technically ready to deploy. The distribution problem — reaching rural users who won't find a website — is the unsolved challenge that matters most.

school

Key Learnings

1. Hybrid AI > pure LLM for rule-heavy domains. Government eligibility rules have deterministic parts (income < ₹2L) and fuzzy parts (exclusions for "connected persons"). Route the deterministic cases through a rule engine, reserve LLM calls for genuine ambiguity. You'll cut costs 10x and improve consistency. 2. Distribution is harder than the product for civic tech. The people this helps most won't Google it. WhatsApp-first, website-second is the right architecture for rural India — not an afterthought. 3. Data quality is the real moat. Anyone can build the AI layer. Building and maintaining a verified, structured database of 400+ government schemes with up-to-date eligibility rules, documents, and portal links — that's the defensible asset. 4. Build the admin panel on day one. A scheme database that requires a developer to update is a scheme database that gets stale in 2 weeks. The admin panel for non-technical scheme managers was not an afterthought — it was a launch requirement. 5. The user you're designing for can't give you feedback. Low-literacy, rural users won't file a GitHub issue. Testing with proxy users (NGO field workers, CSC operators) gave more actionable feedback than any analytics dashboard.

arrow_backAll Case Studies Next: NextRole — 7-Agent Job Copilotarrow_forward