Tolan's Voice AI Blueprint Using GPT-5.1

Overview

The development of truly natural voice AI requires solving problems far beyond simple prompt-response cycles. Tolan, a voice-first AI companion built by Portola, represents a significant leap in this space, demonstrating how advanced foundation models can be engineered for continuous, meandering dialogue rather than quick, transactional queries. The platform features a personalized, animated character that learns and adapts through sustained conversation, moving the needle away from the chatbot experience and toward a genuine digital companion.

Portola, a team with prior exits, recognized early that while the initial hype cycle focused on text-based LLMs, the next frontier was inherently auditory. Holding a live, open-ended conversation presents unique technical hurdles that text inputs simply do not replicate. Voice AI demands near-instantaneous response times, the ability to track topic shifts mid-sentence, and the maintenance of a consistent, believable personality over extended periods.

The integration of OpenAI's GPT-5.1 models proved critical to this architecture. The update delivered specific gains in steerability and latency that allowed Portola to finally unify its complex systems. The resulting platform is not merely an advanced chatbot; it is a character-driven universe built on the technical pillars of memory, personality consistency, and sub-second responsiveness.

Engineering for Conversational Flow

Engineering for Conversational Flow

The foundational requirement for any successful voice AI is the elimination of noticeable lag. Users expect conversational flow that mirrors human interaction, meaning the system must respond almost instantaneously, even when the conversation shifts abruptly. Tolan’s architecture is fundamentally shaped by these demands, requiring a technical stack capable of near-real-time processing.

The adoption of GPT-5.1 and the Responses API was pivotal in achieving this low latency. The technology reportedly cut the speech initiation time by over 0.7 seconds, a margin that drastically improves the perceived naturalness of the dialogue. This level of speed is non-negotiable for a voice product; any perceptible delay breaks the illusion of a continuous conversation.

Crucially, the system also abandoned the industry standard of caching prompts across multiple turns. Instead, Tolan employs a sophisticated, real-time context reconstruction process for every single turn. This method is technically intensive but necessary for natural dialogue. Each context window is rebuilt from scratch, pulling together a summary of recent messages, the core persona card, vector-retrieved memories, tone guidance, and real-time application signals. This ensures the AI can adapt to abrupt topic changes, a capability essential for maintaining the illusion of a fluid, human-like exchange.

Building Persistent Memory and Personality

Handling context is only the first step; maintaining coherence over long, nonlinear conversations requires a robust memory system. Tolan addresses this by building a memory layer that retains more than just factual data. It captures emotional "vibe" signals—subtle clues that guide the AI's tone and steer its subsequent responses, preventing the conversation from becoming factually accurate but emotionally flat.

These memories are embedded using the OpenAI text-embedding-3-large model and stored in Turbopuffer, a high-speed vector database. The choice of this infrastructure is deliberate, as it enables sub-50ms lookup times—a speed requirement that keeps the memory recall process invisible to the user. Each turn triggers a recall process, using the user's latest input alongside system-generated questions (for example, "Who is the user married to?") to retrieve relevant past context.

To prevent memory bloat and maintain quality, the system runs a nightly compression job. This process actively filters out low-value or redundant entries—such as simple statements like "the user drank coffee today"—while simultaneously resolving any contradictions found within the stored data. This rigorous maintenance cycle is what allows the AI to feel like it genuinely remembers the user, rather than just recalling a database entry.

The Science of Character Design

The most challenging element of the entire stack is the personality itself. A highly functional LLM can be generic; a successful companion must feel distinct. Portola tackles this by giving each Tolan instance a distinct character scaffold, authored by the team's in-house creative staff, including a science fiction writer.

The combination of the real-time context management system and the defined character scaffold allows the AI to maintain personality consistency even when the conversation deviates wildly from the established narrative. The GPT-5.1 model’s enhanced steerability was critical here. It provided the necessary guardrails, allowing the developers to express highly specific character traits and tones that previous models struggled to maintain under pressure.

This focus on character elevates the product from a utility into a piece of interactive entertainment. The AI is designed not just to answer questions, but to participate in a narrative, creating a persistent, evolving relationship with the user. This depth of character design is what transforms the technical feat of context management into a compelling user experience.

Tolan's Voice AI Blueprint Using GPT-5.1

Key Points

Overview

Engineering for Conversational Flow

Building Persistent Memory and Personality

The Science of Character Design

More stories

Anthropic discovers "functional emotions" in Claude that influence its behavior

GPT-5.4 Just Dropped: Is OpenAI's New Model the AI Powerhouse We've Been Waiting For?

Gemma 4 Brings Private Agentic AI to Smartphones