# Conversational Systems A conversational system processes dialogue between users and machines. The system receives one or more conversational turns and generates a response. Unlike single-turn NLP tasks, dialogue systems must maintain context across multiple exchanges. Example: | Speaker | Text | |---|---| | User | `What is PyTorch?` | | Assistant | `PyTorch is a deep learning framework developed by Meta.` | | User | `Does it support GPUs?` | | Assistant | `Yes. PyTorch supports CUDA and other accelerator backends.` | The second user message contains the pronoun `it`. The system must remember that `it` refers to PyTorch. Dialogue therefore requires contextual reasoning across turns. Modern conversational systems are usually based on transformers and large language models. Earlier systems relied heavily on rules, templates, and manually designed state machines. ### Components of a Conversational System A conversational system often contains several modules. | Component | Purpose | |---|---| | Tokenizer | Converts text into tokens | | Dialogue state tracker | Maintains conversational context | | Retriever | Retrieves relevant knowledge or memory | | Response generator | Produces the next response | | Safety and filtering | Blocks harmful or invalid outputs | | Tool interface | Calls external systems or APIs | Some systems integrate all behavior into one large model. Others use pipelines with separate modules. A simple chatbot pipeline may look like: ```text id="7npv9t" user message -> tokenize -> retrieve context or memory -> language model -> decode response -> safety filtering -> output ``` ### Dialogue as Sequence Modeling A conversation can be represented as a sequence of tokens across multiple turns. Suppose a dialogue contains turns: ```text id="yn7zv8" User: What is PyTorch? Assistant: A deep learning framework. User: Who created it? ``` The model may receive the full dialogue history: ```text id="vjlwmr" What is PyTorch? A deep learning framework. Who created it? ``` The task is to predict the next assistant response: ```text id="3slo3l" Meta developed PyTorch. ``` The probability of the response is modeled autoregressively: $$ P(y \mid x) = \prod_{t=1}^{T} P(y_t \mid y_{ ``` Example: ```text id="9nsm0d" You are a helpful assistant. Explain gradient descent. Gradient descent updates parameters using gradients. Why does the learning rate matter? ``` These role markers help the model distinguish instructions, user inputs, and assistant responses. The tokenized sequence is then passed into the transformer. ### Context Windows Transformers process only a limited number of tokens. This limit is called the context window. If the context length is: $$ L, $$ then the total conversation history must fit inside $L$ tokens. Long conversations therefore require truncation, summarization, retrieval, or external memory systems. Common strategies: | Strategy | Description | |---|---| | Truncation | Keep only recent turns | | Summarization | Compress earlier dialogue | | Retrieval | Retrieve relevant past turns | | External memory | Store conversation state separately | A simple truncation strategy keeps the most recent tokens: ```text id="whz8nv" keep last N tokens discard earlier history ``` This is computationally simple but may forget important information. ### Retrieval-Augmented Dialogue Conversational systems often need external knowledge. Example: ```text id="x0r8q9" User: What is the latest version of PyTorch? ``` A frozen language model may not know current information. A retriever can fetch documents from a database or search engine. The system becomes: ```text id="b2qq0t" user query -> retrieve documents -> append retrieved text to prompt -> generate response ``` This approach is called retrieval-augmented generation. The retrieved context may include: | Source | Example | |---|---| | Search engine | Web pages | | Documentation | API references | | Vector database | Similar embeddings | | Conversation memory | Earlier turns | | Knowledge base | Structured facts | The language model conditions on retrieved evidence when generating the response. ### Dialogue State Tracking Task-oriented systems often maintain structured dialogue state. Example restaurant-booking dialogue: | Slot | Value | |---|---| | Cuisine | Italian | | City | Paris | | Party size | 4 | | Time | 7 PM | The user may provide information incrementally: ```text id="z1jzef" User: Find an Italian restaurant. User: In Paris. User: For four people. ``` The dialogue state tracker accumulates constraints over turns. Older dialogue systems used explicit symbolic state tracking. Modern systems often encode dialogue state implicitly inside transformer hidden representations, though structured tracking remains useful for reliability. ### Intent Detection and Slot Filling Task-oriented dialogue systems commonly separate two subtasks: | Task | Goal | |---|---| | Intent detection | Identify user goal | | Slot filling | Extract parameter values | Example: ```text id="b1j41l" Book a flight to Tokyo tomorrow morning. ``` Intent: ```text id="mjlwmr" BOOK_FLIGHT ``` Slots: | Slot | Value | |---|---| | Destination | Tokyo | | Date | tomorrow | | Time | morning | Intent detection is usually sentence classification. Slot filling is usually sequence labeling similar to named entity recognition. Modern large language models often unify both tasks within generative prompting. ### Response Generation There are two major approaches to dialogue response generation. | Approach | Description | |---|---| | Retrieval-based | Select existing response | | Generative | Produce new response tokens | Retrieval-based systems choose responses from a database or candidate set. They are easier to control and safer but less flexible. Generative systems synthesize responses token by token. They are more flexible but can hallucinate or generate unsafe outputs. Modern conversational AI is mostly generative. ### Decoder-Only Dialogue Models Many modern chat systems use decoder-only transformers. The full conversation history is concatenated into one token sequence: ```text id="v5mtmv" ... ... ... ... ``` The model predicts the next assistant tokens autoregressively. If the input has shape: ```text id="72xuz8" [B, T] ``` the transformer produces hidden states: ```text id="46u0q5" [B, T, D] ``` The output projection maps hidden states to vocabulary logits: ```text id="qr6r0v" [B, T, V] ``` During generation, the model repeatedly predicts: $$ P(x_t \mid x_{ detect weather intent -> call weather API -> format result -> generate response ``` Tool use extends model capability beyond static parametric memory. Common tools include: | Tool type | Example | |---|---| | Search | Web retrieval | | Calculator | Arithmetic | | Code execution | Python runtime | | Database query | SQL | | Calendar | Scheduling | | Email | Messaging | | File retrieval | Document search | The language model acts as a controller that decides when and how to use tools. ### Safety and Moderation Conversational systems must manage unsafe or harmful outputs. Safety systems may address: | Risk | Example | |---|---| | Toxicity | Harassment | | Self-harm advice | Dangerous instructions | | Misinformation | False claims | | Privacy leakage | Personal information | | Jailbreaking | Prompt attacks | | Hallucinations | Unsupported facts | Moderation systems may use: | Method | Description | |---|---| | Rule filters | Keyword blocking | | Classifiers | Toxicity prediction | | Constitutional prompts | Instruction constraints | | RLHF | Human preference optimization | | Retrieval verification | Evidence grounding | Safety systems must balance protection against overblocking useful responses. ### Evaluation of Conversational Systems Dialogue evaluation is difficult because many valid responses may exist. Example: ```text id="n1l8i0" User: How are you? ``` Possible valid responses: ```text id="wr27yu" I'm doing well. I'm fine, thank you. Doing great today. ``` Automatic metrics such as BLEU correlate weakly with conversational quality. Modern evaluation often considers: | Criterion | Meaning | |---|---| | Helpfulness | Solves the user problem | | Relevance | Matches conversation context | | Faithfulness | Grounded in evidence | | Coherence | Logically consistent | | Safety | Avoids harmful outputs | | Latency | Responds quickly | | User satisfaction | Human preference | Human evaluation remains important. ### Memory Systems Long-term conversational systems may require persistent memory. Example: ```text id="wks0wf" User: My favorite language is Rust. ``` A later conversation: ```text id="vpk5fa" User: Recommend a systems programming book. ``` The assistant may use stored memory to personalize recommendations. Memory systems may store: | Memory type | Example | |---|---| | User preferences | Favorite languages | | Past conversations | Dialogue history | | Documents | Uploaded files | | Tool outputs | Previous searches | Memory retrieval must balance usefulness, privacy, and correctness. ### Hallucinations in Dialogue Generative dialogue systems may produce fluent but false statements. Example: ```text id="fwn6tr" PyTorch was released in 2008. ``` The statement is grammatically correct but factually wrong. Hallucinations arise because language models optimize next-token prediction, not truth verification. Methods to reduce hallucinations include: | Method | Description | |---|---| | Retrieval augmentation | Use external evidence | | Citation generation | Provide sources | | Verification models | Check factual consistency | | Tool use | Query trusted systems | | Abstention | Refuse uncertain answers | No current method eliminates hallucinations completely. ### Multi-Agent Systems Some conversational architectures use multiple interacting agents. Example pipeline: ```text id="xjl2s9" planner agent -> retrieval agent -> coding agent -> critic agent -> final response ``` Different agents specialize in different tasks. Examples: | Agent | Role | |---|---| | Planner | Break tasks into steps | | Retriever | Gather information | | Coder | Write programs | | Critic | Evaluate outputs | | Memory manager | Retrieve relevant context | Multi-agent systems may improve decomposition and reliability but increase orchestration complexity. ### Practical Chat Data Format Conversational datasets are often represented as message lists. Example: ```python id="mhmz2x" messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is PyTorch?"}, {"role": "assistant", "content": "PyTorch is a deep learning framework."}, ] ``` Tokenization converts these messages into one flattened token sequence with role markers. Training usually masks loss over user tokens and computes loss only on assistant outputs. ### Summary Conversational systems model multi-turn dialogue between users and machines. Modern systems are usually transformer-based autoregressive language models conditioned on dialogue history. Important components include dialogue context management, retrieval, response generation, safety systems, memory, and tool use. Instruction tuning and RLHF help models follow user intent and conversational conventions. Conversational AI extends beyond pure language modeling into retrieval systems, reasoning systems, planning systems, and external tool orchestration.