ProductionAI/LLMRAGMongoDBLangChain
ESG Analytics Chatbot
How I built a RAG-based conversational AI that cut manual ESG analysis time by 70% and now serves 10+ organizations with 95%+ query accuracy.
95%+
Query Accuracy
70%
Time Saved
10+
Organizations
Under NDA — code and live demo cannot be shared.
The Problem
ESG (Environmental, Social, Governance) analysts at Planet Sustech's client organizations were spending 15+ hours per week manually querying sustainability data. The data lived in MongoDB across 10+ separate organizations, each with different schemas, different report structures, and different terminology for the same concepts.
A single analyst would open a spreadsheet, run 3–4 manual queries, cross-reference PDF reports, and then write a summary. It was slow, error-prone, and completely unscalable as the client base grew.
Why It Was Hard
The naive solution — a simple chatbot with a fixed prompt — failed immediately.
The core challenges were: (1) Multi-tenant data isolation: each organization had their own MongoDB collections and schemas. A query for "carbon emissions" in one org meant something different in another. (2) Natural language to MongoDB: this is a significantly harder problem than SQL, because MongoDB's aggregation pipeline syntax is complex and deeply nested. (3) Accuracy requirements: ESG compliance data is regulatory. An answer that's 80% correct is worse than no answer — it gives a false sense of confidence.
Architecture
I built a RAG (Retrieval-Augmented Generation) pipeline on top of LangChain with Groq LLaMA 3.3 as the inference engine.
The key architectural decisions:
→ Intent Router: Before any LLM call, an intent classifier determines which organization's schema is being queried and what type of ESG metric is requested (emissions, water usage, social metrics, etc.)
→ Schema-aware prompt construction: Each organization's MongoDB schema is stored as a vector embedding. When a query arrives, the most relevant schema fragments are retrieved and injected into the prompt context.
→ Query validation layer: Every generated MongoDB aggregation query is validated against a set of rules before execution. If validation fails, the system automatically retries with a corrected prompt.
→ Response grounding: The final answer always includes a citation back to the source document/collection, so analysts can verify.
What Failed First
My first approach used a single large prompt containing all 10+ organization schemas. This immediately hit LangChain's context window limits and the LLM started confusing schemas between organizations.
Second attempt: I tried fine-tuning a smaller model on MongoDB query syntax. This produced syntactically valid queries but semantically wrong ones — the model generated queries that ran but returned incorrect data.
The breakthrough was separating intent classification from query generation. By first determining WHICH organization and WHICH metric type, the subsequent query generation prompt could be much smaller and more precise.
Results
After 3 months in production across 10+ organizations:
• 95%+ query accuracy (measured by manual spot-checking 200 queries/month against ground truth data)
• 70% reduction in manual ESG analysis time
• <2 second average response time (Groq's inference speed was the key enabler)
• 1K+ queries/day at peak, zero downtime
The system now handles BRSR (Business Responsibility and Sustainability Report) queries automatically — a regulatory reporting format that previously required a dedicated analyst.
Key Learnings
1. Separate concerns aggressively in multi-tenant LLM systems. Intent routing before query generation is not optional — it's foundational.
2. Groq was the right choice for inference. At 1K+ queries/day, latency matters more than marginal accuracy improvements from a larger model.
3. Don't fight MongoDB aggregations — embrace them. The pipeline syntax is actually LLM-friendly once you give the model enough schema context.
4. Build the validation layer before the happy path. We added query validation as an afterthought and retrofitting it was painful.