System Architecture
High-Level Architecture
The system consists of eight components. The chat widget is the sole entry point for visitor traffic. All AI processing runs server-side; the widget receives a token stream. The Knowledge Retriever executes vector search when the LLM determines domain content is required. Two external notification channels (Slack and CRM) are exit points for lead data. The fallback form operates independently of the AI backend.
Component Responsibilities
| Component | Responsibility | Technology | References |
|---|---|---|---|
| Chat Widget | Embeds on the company website; renders the conversation UI with streaming token display; shows GDPR data notice on first interaction; falls back to the contact form if the AI backend is unavailable | Custom JS — <growth-chat> Web Component (React, Shadow DOM, Vite IIFE bundle) |
ADR-005 |
| Fallback Contact Form | Captures visitor name and email when the AI service is unavailable; submits via a path independent of the AI backend | Static endpoint / third-party form service | EC-07 |
| Chat API | Authenticates the request, initiates or resumes a LangGraph session, pipes the token stream to the HTTP response | FastAPI (Python) | trd-api-specification.md |
| Conversation Orchestrator | Controls the full session lifecycle: qualification state updates, RAG triage routing, response generation, stall detection, escalation trigger | LangGraph (StateGraph) |
ADR-002 |
| Knowledge Retriever | Receives retrieve_knowledge tool calls from the LLM; embeds the query; executes HNSW vector search against pgvector; returns chunks above the relevance threshold |
Internal module — pgvector + OpenAI Embeddings | ADR-003 |
| Business Hours Detection | Determines whether the current timestamp falls within business hours (Mon–Fri 09:00–18:00 CET/CEST); DST-aware via IANA identifier Europe/Madrid |
Python zoneinfo |
EC-04 |
| Human Handoff Subsystem | Generates the context packet; dispatches to Slack and PostgreSQL leads table in parallel; records delivery outcome in handoff_records; handles partial failure; falls back to email on dual-channel failure |
Internal module | ADR-009, EC-03, FR-19 |
| LLM — Claude Haiku 4.5 | Generates conversational responses; executes the three-stage conversation model; signals when domain retrieval is required via retrieve_knowledge tool call |
Anthropic API | ADR-001 |
| OpenAI Embeddings | Converts query text to vectors at retrieval time; indexes document chunks at ingestion time | text-embedding-3-small |
ADR-003 |
| PostgreSQL | Single storage backend: pgvector extension for document chunks and HNSW index; langgraph-checkpoint-postgres tables for session state |
PostgreSQL + pgvector | ADR-003, ADR-004 |
Data Flow — Happy Path
The following steps describe the primary data flow for a standard visitor turn that requires domain knowledge retrieval. Handoff and degradation flows are specified in Sections 3.4 and 10 respectively.
Visitor sends a message. The chat widget sends a chat session request
with{ session_id, message }to the Chat API.Session load. The checkpointer loads existing session state for
session_idfrom PostgreSQL, or initialises a new session object if
none exists. State includes qualification dimensions, maturity signals,
turn counter, and conversation history (sliding window).Qualification node. The orchestrator updates the qualification state
based on the current message and conversation history. It setsscore
(HOT / WARM / COLD), updates maturity signal flags, and sets
handoff-triggerif an explicit escalation request is detected.Response generation. The orchestrator sends the full context (system
prompt + sliding window + current message) to Claude Haiku 4.5 with the
retrieve_knowledgetool available. The LLM decides per-turn whether to
call the tool based on whether the question requires company domain content.Vector retrieval (conditional). If the LLM calls
retrieve_knowledge,
the Knowledge Retriever embeds the query viatext-embedding-3-small, runs
HNSW vector search against pgvector, and returns chunks that exceed the
configured relevance threshold. Below-threshold results are discarded. The
orchestrator forwards the retrieved chunks to the LLM for final response
generation.Token stream delivery. The LLM streams the response token-by-token.
The Chat API pipes the stream to the chat widget, which renders tokens as
they arrive.State write. The orchestrator writes updated session state to the
PostgreSQL checkpointer. Thescore?router evaluates the new state and
determines the next routing decision (return to USER REQUEST, PROPOSE
HANDOFF, or stall path).Analytics event. The backend emits the relevant analytics event
(qualification_state_changeif state changed). Event schema is defined
in Section 9.3.