Skip to main content
Blog Dec 30, 2025 · Armakuni ·7 min read

Working With Amazon Connect for Modern Voice AI | Armakuni

Learn how Amazon Connect, AgentCore, and Nova 2 Sonic simplify real-time voice AI by reducing state management, SIP complexity, and operational overhead compared to older Twilio and Chime SDK patterns.

Working With Amazon Connect for Modern Voice AI | Armakuni

As AI systems move from analysis to interaction, architecture matters more than ever. Real-time systems place different demands on infrastructure than chat or batch workloads, especially when timing and state must stay aligned.

Voice makes this clear. A phone call cannot pause or retry quietly. It must connect cleanly, stay connected, and return control to the wider journey without losing context. When this coordination fails, the experience fails, regardless of model quality.

Early voice AI designs extended existing telephony using Twilio, External routing, Amazon Chime SDK, and ECS. These patterns worked, but they often required teams to work around Amazon Connect rather than with it.

We at Armakuni regularly help enterprises modernise contact centre experiences with Amazon Connect and other AWS services, and we have seen this friction firsthand. The "do-it-yourself" era of stitching together SIP trunks, containers, and custom state machines is ending.

Today, Amazon Connect has become the natural control plane for voice journeys, and AgentCore provides a managed runtime for AI agents that fits directly into those journeys. Combined with Nova 2 Sonic, voice AI can now be built as part of the flow instead of as an external system.

This article explains how the earlier pattern works, why it becomes hard to evolve, and how AgentCore with Amazon Connect offers a simpler path forward.

#The early pattern: Twilio, Connect, and ECS

How it actually works

A customer calls a phone number hosted in Amazon Connect.

The call is answered by the telephony platform and then transferred using an external transfer into Twilio. From that point on, Twilio manages the SIP session and acts as a bridge to the ECS service.

During call setup, Lambda functions are invoked to capture call details. These details, such as session identifiers, tenant information, and routing context, are written to DynamoDB or Redis. This step is critical because the SIP session and the AI session are separate. A shared state store is needed to keep them aligned.

When the AI voice interaction begins, an ECS service reads the stored state. This allows the service to determine which call it belongs to, which tenant it serves, and which policies apply. The live phone audio itself is not processed in ECS. Instead, the SIP connection links the call directly to Nova 2 Sonic for real-time speech interaction.

When the AI segment ends, the flow runs in reverse. The ECS service updates the state in DynamoDB or Redis, and the call is routed onward. This may mean returning control to Twilio logic or handing the call back into Amazon Connect so the customer journey can continue.

The important point is this: the hard part is not audio processing. The hard part is keeping the call state consistent as the call moves between Amazon Connect, Twilio, Lambda, ECS, and back again.

On paper, this looks manageable. In practice, it adds up quickly.

Teams must build and operate ECS services, maintain infrastructure definitions for telephony, networking, and storage, and handle call lifecycle events that do not always arrive in clean order. They must also deal with multiple phone numbers, duplicate routing logic, and context passing workarounds just to keep the experience smooth.

This approach gives flexibility, but every new feature touches several systems. Over time, the architecture becomes harder to change safely.

#The modern pattern: AgentCore with Amazon Connect

AgentCore changes the shape of the system by moving the AI agent closer to Amazon Connect instead of pushing the call out and pulling it back.

How the flow changes

In this model, the customer calls a phone number hosted directly in Amazon Connect. There is no need for a separate Twilio layer. The Connect contact flow invokes an AgentCore-powered voice agent as part of the normal flow logic.

This creates a more natural integration. The call does not need to leave Connect and return later just to reach the AI. As a result, there is far less custom code dedicated to passing context between the contact flow and the agent runtime.

AgentCore provides a managed runtime for the AI agent and handles:

Nova 2 Sonic is responsible for the real-time voice interaction. It generates speech, reasons during the conversation, and identifies when tools need to be invoked.

When the agent completes its task, the Connect flow continues cleanly. This may involve transferring the call, wrapping it up, or escalating to a human agent, without needing external SIP loops or state recovery logic.

What becomes simpler

Context transfer is simpler because fewer systems are involved. Call flow continuation no longer depends on external SIP loops. This reduces the number of call legs, which can also reduce telephony charges.

Observability improves as well. Instead of correlating logs across Twilio, Chime SDK, Lambda, ECS, and a data store, teams can observe agent execution and MCP tool calls in one place.

This does not remove all complexity, but it removes the complexity that comes from working around Connect rather than with it.

#Why Nova 2 Sonic matters

Voice that fits real conversations

Nova 2 Sonic is built for real-time voice. It responds quickly and keeps speech clear, which is critical in live calls where long pauses feel broken.

A key capability is support for asynchronous tool calls. This means the model can keep the conversation moving while backend work happens in parallel. For example, an order lookup can run without forcing silence on the call. This makes the experience feel more natural without custom concurrency logic.

Clear separation of responsibilities with AgentCore

AgentCore runs the AI management layer around the model. This is where teams handle MCP configuration, multi-tenancy boundaries, and session and request control.

Nova 2 Sonic focuses on what the model should do: generating voice, reasoning during the conversation, and identifying when tools need to be invoked. This separation allows teams to change system behavior without changing the model, and vice versa.

#Side-by-side view

At a high level, the difference looks like this:

Amazon connect blog

Sources:

Call routing - Amazon Web Services, Inc. (Twilio + ECS) | AWS Documentation (AgentCore + Amazon Connect)

Voice path to Nova Sonic - Amazon Web Services, Inc. (AgentCore + Amazon Connect)

State and context - AWS Documentation (AgentCore + Amazon Connect)

Observability - Amazon Web Services, Inc. (AgentCore + Amazon Connect)

#What this shift means from a business perspective

The move from Twilio and Chime SDK based patterns to AgentCore with Amazon Connect is not only a technical change. It directly affects cost, speed, and risk in day-to-day operations.

Lower operational overhead

Earlier patterns require teams to coordinate across multiple systems: telephony providers, SIP routing, state stores, and container services. Each change to the call flow or agent logic often touches several components.

With AgentCore and Amazon Connect, fewer systems are involved in each call. This reduces the effort required to maintain the platform and lowers the chance that small changes introduce unexpected failures.

Faster change cycles

In the older model, improving a voice experience often meant updating infrastructure or redeploying services that exist only to move context around. That slows iteration.

A Connect-first design keeps call logic and agent behavior closer together. Business teams can adjust flows, routing, and escalation rules without waiting on complex backend changes.

More predictable costs

SIP-based designs often introduce extra call legs and external routing that are easy to overlook. Over time, these can increase telephony costs in ways that are hard to forecast.

Reducing external call transfers and keeping the journey inside Amazon Connect helps simplify cost tracking and reduces unnecessary telephony charges.

Clearer ownership and accountability

When a voice system spans Twilio, Chime SDK, ECS, Lambda, and multiple data stores, it becomes unclear who owns failures. Issues can bounce between teams before they are resolved.

A simpler architecture makes ownership clearer. Connect owns the call. AgentCore owns the agent execution. This shortens incident resolution and improves confidence in the system.

#Closing thoughts

The real shift here is not from containers to managed services. It is from working around Amazon Connect to working with Amazon Connect.

Older patterns solved real problems at the time, especially for teams tied to existing telephony. But they often made state management and routing the dominant challenge. AgentCore changes that balance by fitting the agent into the Connect flow instead of bending the flow around the agent.

For teams building voice AI today, the simplest path is usually the one that keeps the call journey, the agent runtime, and the voice model aligned from the beginning.

This is the kind of transition we support at Armakuni. Our work in GenAI and machine learning focuses on helping teams simplify architecture, reduce unnecessary complexity, and build systems that are easier to operate as they grow.

Related reading.

Contact Armakuni.

Most engagements start with an AWS-funded discovery. First conversation is with an engineer, not a sales exec.