This workflow automates PDF processing and Retrieval-Augmented Generation (RAG) for intelligent document querying, replacing manual PDF searches or basic OCR tools. It ingests PDFs from Google Drive, processes them with Mistral AI OCR, generates embeddings with OpenAI, summarizes content with Google Gemini, and stores vectors in Qdrant for semantic search and chat-based queries. Key nodes include Webhook for API intake, Switch for operation routing (process_pdfs, chat_query, check_status), Google Drive for file access, Mistral OCR for text extraction, Langchain nodes for summarization and embeddings, and Qdrant for vector storage. Ideal for knowledge-intensive teams (50-500 employees) handling 100+ PDFs daily, such as legal or research departments, reducing document query time from 15-20 minutes to seconds with accurate, context-aware responses.\n\nROI saves 10-15 hours weekly for teams processing 500+ queries, enhancing productivity in document-heavy workflows. Use cases include legal contract analysis, academic research retrieval, or compliance auditing. Requirements: Google Drive ($6/user/month Workspace), Mistral AI API (pay-per-use, ~$0.05/PDF), OpenAI API (~$0.02/1K tokens), Google Gemini API (~$0.01/1K tokens), Qdrant instance (free community or cloud ~$30/month), n8n instance (free or cloud.n8n.io, ~$20/month), DEVHUB_RAG_API_KEY env var. Scalability supports millions of document chunks; limited by Qdrant storage (~1M vectors free tier), Google Drive API (~1,000 requests/day), and Mistral rate limits.\n\nInstall n8n from n8n.io or cloud.n8n.io. Set up Google Drive OAuth2 via Google Cloud Console (enable Drive API). Obtain Mistral AI key from api.mistral.ai, OpenAI key from platform.openai.com, Gemini key from Google Cloud, and Qdrant key (self-hosted or cloud). Configure env var DEVHUB_RAG_API_KEY and Qdrant URL. Set n8n credentials: HTTP Header Auth, Google Drive OAuth2, Mistral Cloud, OpenAI, Gemini, Qdrant API. Node setup: Webhook (POST, path: 'rag-pdf-system'), Google Drive (folderId from request), Mistral OCR (model: mistral-ocr-latest), Qdrant (collection: ocr_mistral_test, 1536 dimensions), Gemini (gemini-1.5-flash), OpenAI embeddings, Langchain nodes (chunk size 400, overlap 40). Expose webhook via ngrok.\n\nTest with POST requests (e.g., {operation: 'process_pdfs', folderId: 'your-folder-id'} for processing; {operation: 'chat_query', query: 'Summarize contract terms'} for queries) using Postman. Verify Qdrant vector counts and chat responses. Common errors: Invalid API keys (401—check credentials), missing folderId (400—validate request), rate limits (429—add delays). Deploy by activating workflow, sharing webhook URL. Maintenance: Monitor Qdrant storage, rotate keys quarterly, check API quotas. Optimize: Adjust chunk size (300-600), tune summarization prompts, cache frequent queries.", "businessValue": "Saves 10-15 hours/week automating 500+ PDF queries for research or compliance teams", "setupTime": "45-60 minutes", "difficulty": "Advanced", "requirements": ["Google Drive Workspace", "Mistral AI API key", "OpenAI API key", "Google Gemini API key", "Qdrant instance", "DEVHUB_RAG_API_KEY env var", "n8n installation, API integration knowledge"], "useCase": "Automating PDF processing and intelligent document querying for knowledge-intensive teams"
$6.99
Workflow steps: 33
Integrated apps: webhook, if, respondToWebhook