GPTWeb Two-Phase RAG Engine Routing

1 views
the GPTWeb RAG engine is a two-phase intelligent routing system — not a simple keyword lookup. It separates the problem of where to look from what to return, making responses dramatically more accurate and grounded in your actual content. Phase 1 — Semantic Routing converts the visitor's query into a vector embedding and matches it against your configured knowledge collections. The engine scores each collection by semantic similarity to the query and routes to the best-matching knowledge namespace. The richer your collection description, the more precise this routing becomes. Phase 2 — Semantic Retrieval then operates inside the matched collection. Documents are pre-chunked into overlapping segments (~1,500 characters) to preserve contextual meaning across boundaries. The engine performs cosine similarity search across all chunks and surfaces the top-K most relevant segments, which are passed directly to the LLM as a constrained context window.

Key Mechanics of the Two-Phase System

  • Collections act as semantic namespaces — richer descriptions improve Phase 1 routing accuracy
  • Vector embeddings (OpenAI or compatible model) power both phases — intent and meaning, not keywords
  • Documents are auto-chunked with overlapping context windows to preserve meaning at boundaries
  • Phase 2 returns top-K chunks ranked by cosine similarity — only the most relevant segments reach the LLM
  • The LLM is constrained to retrieved context — no hallucination beyond your knowledge base content
  • Multi-collection queries are supported — if two collections score similarly, both are queried and results merged

GPTWeb Two-Phase RAG Engine Architecture

GPTWeb Two-Phase RAG Engine Architecture
~1,500 chars
Chunk Size
Cosine Similarity
Routing Method
2 — Route + Retrieve
Phases
Context-Constrained LLM
Hallucination Control
Image
The two-phase approach is what separates GPTWeb from simple chatbot bolt-ons. By decoupling routing from retrieval, the engine scales gracefully as your knowledge base grows — adding more collections doesn't slow down responses, it sharpens them. Every answer your visitors receive is grounded, traceable, and scoped to what you've curated. Learn more about Getting Started with your knowledge base setup, or explore Use Cases that leverage this engine for Discussion Qualified Leads and visitor engagement. GPTWeb is the future of engagement, websites, and marketing automation combined — built for the AI era, built for now.

Need more help?

Our AI assistant can answer any question instantly.

Continue This Conversation