Build a production AI chatbot for your website with n8n and OpenAI
A useful website chatbot needs retrieval, guardrails, and escalation. The stable architecture is crawler → chunks → embeddings → vector store → n8n webhook → OpenAI → response. Keep the model on a short leash: answer from approved content, say when it does not know, and hand high-value or risky conversations to a person.
Most website chatbots fail because they are either static scripts with canned answers or unrestricted language models with no source of truth. A production chatbot sits between those extremes. It searches your own content, gives concise answers, and creates a clean path to a human when the visitor is ready.
Architecture
| Part | Job | Common choice |
|---|---|---|
| Crawler | Fetch pages and docs | n8n schedule plus HTTP |
| Chunker | Split content into useful blocks | Code node |
| Embeddings | Convert text to vectors | OpenAI embeddings |
| Vector store | Search relevant chunks | Qdrant, Supabase, Pinecone |
| Responder | Answer with guardrails | OpenAI chat model |
| Escalation | Create lead or ticket | Slack, HubSpot, email |
The n8n webhook
The widget should send message, session ID, page URL, and consent state to an n8n webhook. n8n retrieves the prior conversation summary, runs a vector search against approved content, constructs a grounded prompt, calls OpenAI, logs the answer, and returns JSON to the widget. Keep the response shape boring: text, confidence, escalation flag, and suggested next action.
Latency targets
A credible target is under two seconds for normal answers. Vector search should be under 200 ms. Prompt assembly should be under 100 ms. The model call is the main cost and latency driver. If answers regularly exceed three seconds, shorten context, reduce chunk count, or use a faster model for first response with a deeper follow-up path.
Cost model
For an SMB site with 1,000 to 5,000 monthly conversations, monthly API cost is usually modest compared with implementation. The cost drivers are long prompts, unbounded history, and visitors who treat the bot like a general assistant. Summarize history, retrieve only top chunks, and refuse off-topic questions.
The NexFlow site uses this pattern: a small widget, a PHP endpoint, and an OpenAI-backed assistant scoped to NexFlow services. Production versions add vector retrieval, lead capture, and escalation rules matched to the client stack.
Guardrails
- Answer only from approved website, pricing, policy, and support content.
- Do not invent case studies, guarantees, or legal claims.
- Do not accept confidential data or regulated health information.
- Refuse financial, legal, medical, or tax advice.
- Escalate buying intent, complaints, security issues, and ambiguous risk.
Human escalation
Escalation is where the ROI appears. A good handoff includes visitor email if provided, page URL, transcript, summary, intent, urgency, and recommended next step. n8n can send that to Slack, create a HubSpot contact, update GoHighLevel, or email support. The visitor should see a plain confirmation, not a vague promise.
- RAG beats generic chatbot prompts for business sites.
- n8n is a strong orchestration layer for retrieval, logging, and escalation.
- Cost is controlled by short prompts and scoped answers.
- Guardrails need to be explicit and tested.
- Escalation is part of the product, not an afterthought.
Frequently asked questions
How to build a chatbot with n8n and OpenAI?
Send widget messages to an n8n webhook, retrieve relevant content from a vector store, call OpenAI with guardrails, return the answer, and log escalation events.
What does a RAG chatbot cost per month?
For many SMBs, API and vector costs are under US$50 per month at modest traffic. Implementation and maintenance are the larger cost.
Can n8n handle chatbot escalation to humans?
Yes. It can create leads, tickets, Slack alerts, or emails with transcript and summary.
How to prevent chatbot hallucinations?
Use approved retrieval sources, strict prompts, refusal rules, short context, and human escalation for risk.
Want a chatbot that books real calls?
Book a map and we will scope sources, guardrails, cost, and escalation before implementation.
Sources and method
- Architecture based on NexFlow chatbot and RAG workflow builds for SMB websites.
- Latency and cost targets are implementation estimates and must be validated under site traffic.
- Model behaviour and pricing should be checked against current OpenAI documentation before deployment.