Replies: 3 comments
-
|
Hi @alikalik9, glad you like it 🙂 Have you seen this? https://devblogs.microsoft.com/engineering-at-microsoft/how-we-built-ask-learn-the-rag-based-knowledge-service/ It was published in April 2024 and may not show all the details, but should give you an impression of the knowledge service. The service is used in multiple locations including Copilot for Azure and Learn Q&A, and now through MCP. cc @TianqiZhang for awareness |
Beta Was this translation helpful? Give feedback.
-
|
This thread hit home — feels like the real frontier of RAG isn’t just plugging Azure services together, but decoding the semantic choreography underneath. We recently published an open framework tackling exactly this: how to not just retrieve relevant chunks, but shape the semantic context so the LLM doesn’t collapse under ambiguity or hallucination. 📄 If you’re curious, here’s the WFGY semantic reasoning PDF: It dives into strategies like:
Basically — if RAG is the “muscle,” this part handles the “spine alignment.” |
Beta Was this translation helpful? Give feedback.
-
|
RAG + MCP is a strong combination. A few practical considerations from running this in production with multiple agents: Chunk size matters more than embedding model quality. We tested 5 embedding models and the difference in retrieval quality was ~5%. But changing chunk size from 512 to 256 tokens (with 50-token overlap) improved answer accuracy by ~15%. Smaller chunks mean more precise retrieval, which means less noise in the agent's context window. Per-agent RAG views are important in multi-tenant setups. If multiple agents share the same knowledge base, each agent should only see documents it's authorized to access. Implementing this at the MCP tool level (filter results before returning to the agent) is simpler than trying to enforce access control in the vector DB itself. Cost tracking for RAG calls. Each retrieval query has a cost (embedding the query + vector search). In a multi-agent workflow where Agent A retrieves context, passes it to Agent B who retrieves more, the RAG costs add up. Track them per-agent and include them in the overall cost attribution. Context window budget allocation. If the agent has a 100K token window, how much should go to RAG context vs conversation history vs system instructions? We found 40% RAG / 40% conversation / 20% system works well for most tasks, but this should be tunable per agent. Related: https://blog.kinthai.ai/why-character-ai-forgets-you-persistent-memory-architecture |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
First of all, I just want to say: fantastic project — it's incredibly helpful and well executed!
This might be a bit of a naive question, but I was wondering if you'd consider open-sourcing parts of the codebase — specifically the components related to how you implemented the RAG (Retrieval-Augmented Generation) pipeline.
I'm not necessarily interested in internal Microsoft Learn content or proprietary data, but more in how you structured the RAG index over your knowledge base and connected it to an MCP server. I suspect you’re using Azure AI services, which makes it even more interesting for those of us exploring similar use cases.
Seeing how you approached this could be valuable for the community, especially for those looking to build knowledge-based assistants or internal copilots.
Thanks again for the great work, and looking forward to your thoughts!
Beta Was this translation helpful? Give feedback.
All reactions