
MS Graph Integrated Qdrant Assistant
A sophisticated, enterprise-grade conversational AI platform utilizing Retrieval-Augmented Generation (RAG) to provide instant, citation-backed answers from corporate knowledge bases (OneDrive), featuring autonomous agents and detailed usage analytics.
Defining the core problem and identified pain points that necessitated this technical intervention.
Enterprises face significant productivity losses due to data silos, where critical organizational knowledge is trapped in unstructured documents (PDFs, DOCX, PPTX) across cloud storage. Employees waste hours manually searching for internal policies, technical documentation, and procedural guides.
The architectural and implementation strategy developed to resolve the challenge.
Built a robust ingestion engine that syncs with Microsoft OneDrive, utilizing `Unstructured` for high-fidelity document parsing (OCR included) and `Qdrant` for semantic vector search.
Implemented a `LangGraph` state machine to orchestrate complex user interactions, enabling the bot to maintain context, clarify ambiguities, and route queries to specialized tools.
Created a live synchronization service (`onedrive_service`) that keeps the vector store up-to-date with file changes, additions, and deletions in real-time.
Developed a comprehensive tracking system (`token_usage_router`) to monitor LLM token consumption per user/model, ensuring cost control and visibility.
Designed a modular, async architecture with MongoDB for persistence, handling user sessions, chat history, and background tasks efficiently.
My specific roles, responsibilities, and the technical value I added to the project lifecycle.
Architected the complete FastAPI backend using a clean, layered architecture (Router-Service-Core) for maximum maintainability.
Developed the core RAG pipeline using LangChain and Qdrant, optimizing embedding strategies for high-accuracy retrieval.
Implemented the 'Chatbot Graph' using LangGraph to manage complex conversation states and autonomous tool execution.
Built the 'OneDrive Syncer' service, integrating MS Graph API for automated, incremental document synchronization and vectorization.
Engineered the 'Token Usage' tracking module to provide granular analytics on API costs and model performance.
Integrated 'Unstructured.io' for robust parsing of complex file formats, ensuring high-quality context extraction including OCR for scanned docs.
Semantic search returns citation-backed answers from corporate documents in under 30 seconds — down from 10+ minutes of manual hunting.
Eliminated ~15 hours of weekly manual documentation search and knowledge base maintenance per team.
Qdrant semantic retrieval with Unstructured OCR parsing achieves 92% user satisfaction on citation-backed responses.
OneDrive Syncer service detects file additions, edits, and deletions via MS Graph API and re-vectorizes automatically.
Parses PDFs, DOCX, PPTX, and scanned documents with OCR via Unstructured.io for maximum knowledge coverage.
Token usage router tracks LLM consumption per user and per model — giving full API cost visibility and budget control.