LLM DevHubConnect Agent: Web Scraping with AI Extraction and Google Sheets Storage

This workflow automates web scraping with AI-driven data extraction, storing results in Google Sheets. Key nodes include Web Scraper Trigger (receives POST requests), Validate Scraper Input (checks URL validity), Sanitize Scraper Input (cleans inputs), DevHubConnect Web Fetcher (fetches content via Jina AI), DevHubConnect Data Extractor (extracts structured data with OpenAI), Save to Google Sheets (stores data), and Final Success Response (confirms completion). Error handlers (Input Validation Error, Web Fetch Error Handler, AI Extraction Configuration Error, Extraction Results Error) ensure reliability.\n\nSetup Requirements and Configuration: Install n8n from n8n.io for self-hosting or sign up at cloud.n8n.io. Obtain a Jina AI API key from jina.ai, an OpenAI API key from platform.openai.com, and Google Sheets OAuth2 credentials from console.developers.google.com. Import the JSON workflow in n8n. Configure Web Scraper Trigger with a unique path (e.g., 'scrape-data') and set it as a public URL in n8n settings. Add credentials for DevHubConnect Web Fetcher (Jina AI), OpenAI Extraction Model, and Save to Google Sheets under ‘Credentials.’ Set environment variables JINA_API_KEY, ENABLE_JINA_SCRAPING, OPENAI_API_KEY, ENABLE_AI_EXTRACTION, GOOGLE_SHEETS_DOCUMENT_ID, and GOOGLE_SHEETS_SHEET_NAME. Ensure internet access for API calls.\n\nTesting and Deployment Steps: Activate the workflow to generate the webhook URL. Send a test POST request (e.g., curl -X POST <webhook-url> -d '{"url":"http://books.toscrape.com"}') with a valid URL. Validate output in Google Sheets for fields like title, price, and availability. Check n8n logs for errors like ‘Invalid Jina API key,’ ‘OpenAI API key missing,’ or ‘Invalid URL format.’ Input Validation Error returns HTTP 400 for malformed URLs. Verify extracted data in the Final Success Response for record count and processing time. Test error cases: send an invalid URL or omit Google Sheets credentials to trigger error handlers. Debug using logs for credential or API issues. Confirm data in Google Sheets matches the source website’s content.

$6.99

Workflow steps: 24

Integrated apps: webhook, if, set

LLM DevHubConnect Agent: Web Scraping with AI Extraction and Google Sheets Storage preview