Skip to main content
CF

AI Voice Agents with AWS

3h 1m 5s
English
Paid

You will build a real voice agent that speaks, listens, and reacts in real time. It can handle quick turns, deal with interruptions, and keep a smooth flow in each talk.

What You Build

You create a full voice assistant with AWS Bedrock using the Nova Sonic model and Python with asyncio. You set up audio streams in both directions. You shape a clean event flow that keeps delays low. You also add tools so the agent can work with real data and outside systems.

How It Works

The course stays focused on clear steps and real code. You do more than make the agent speak. You learn how each part of an interactive voice system fits together. This includes async pipelines, audio events, model calls, and links to cloud services.

Who Should Join

This course fits backend developers, AI engineers, and anyone building cloud apps. You should enjoy learning how things work under the hood and want to shape better voice interfaces.

What You Take Away

When you finish, you have a full working voice agent. You also gain a clear plan for building your own system from the ground up and moving it toward real use.

Additional

https://github.com/patrikszepesi/voice-agents

About the Author: Zero To Mastery

Zero To Mastery thumbnail

Zero To Mastery (ZTM) is a Toronto-based online coding academy founded by Andrei Neagoie, originally a senior developer at large Canadian tech firms before turning to teaching full-time. The academy's signature is the cohort-based bootcamp track combined with a deep self-paced course library, all aimed at career-changers and self-taught developers preparing to land software-engineering roles at top companies.

The instructor roster has grown well beyond Andrei to include other senior practitioners: Daniel Bourke (machine learning), Aleksa Tešić (DevOps), Jacinto Wong, and others. Courses cover the full software-engineering career path: web development with React and Next.js, Python, machine learning and deep learning, DevOps and cloud, system design, mobile, and the algorithm / data-structure interview prep that gates engineering jobs.

The CourseFlix listing under this source carries over 120 ZTM courses spanning that full range. Material is paid; ZTM itself runs on a monthly / annual membership model. The teaching style favours long-form, project-based courses where students build complete portfolio-quality applications rather than disconnected feature tutorials.

Watch Online 39 lessons

This is a demo lesson (10:00 remaining)

You can watch up to 10 minutes for free. Subscribe to unlock all 39 lessons in this course and access 10,000+ hours of premium content across all courses.

View Pricing
0:00
/
#1: Course Introduction
All Course Lessons (39)
#Lesson TitleDurationAccess
1
Course Introduction Demo
01:17
2
What We Are Building
04:54
3
Additional Course Information
01:04
4
Setting Up AWS Access Keys
08:36
5
Setting Up Files
02:32
6
Understanding Speech-to-Speech Models
02:47
7
Understanding Bidirectional Streaming
06:04
8
Creating Audio Configurations
03:25
9
Setting Up Debugging Functions
04:25
10
Non-Blocking Asyncio Python
05:12
11
Eventloop and Multithreads in Python
09:28
12
Getting Guests, Dynamodb Call
02:49
13
Getting Reservations, Dynamodb Call
07:29
14
Updating Reservations, Dynamodb Call
09:42
15
Event Templates Part 1
07:31
16
Event Templates Part 2
08:26
17
Exploring Tools Our Model Has Access To
06:27
18
Tool Result Event
01:38
19
Initialising the Bedrock Stream Manager Class
06:15
20
Initialising the Bedrock Stream
06:24
21
Sending Raw Events to Bedrock
02:29
22
Processing Audio Input
03:04
23
Sending Events to the Bedrock Stream
07:29
24
Processing Incoming Responses From Bedrock
07:11
25
Handling Tool Requests + Completions
03:52
26
Executing Tools + Gracious Closing and Shutting Down
02:51
27
Separate Input and Output Streams
04:15
28
Finishing the Audio Streamer Class
08:40
29
Ending the Stream Clarification
00:56
30
Finishing Up Our Final Script
03:07
31
AWS Quotas and Credentials
02:00
32
Installing Necessary Libraries
03:26
33
Setting up DynamoDB
05:02
34
First Test of Our Agent
04:47
35
Testing Reservation Updates
03:07
36
Testing with the Debug Flag
02:20
37
Testing the Final Product
07:04
38
Cleaning Up
02:11
39
Congratulations!
00:49
Unlock unlimited learning

Get instant access to all 38 lessons in this course, plus thousands of other premium courses. One subscription, unlimited knowledge.

Learn more about subscription

Related courses

Frequently asked questions

What are the prerequisites for enrolling in this course?
Prospective students should have a background in backend development or AI engineering. Familiarity with Python is essential, particularly with asynchronous programming using asyncio. The course involves working with AWS services, so knowledge of AWS, including access key setup, is recommended. Understanding cloud app development will be beneficial.
What will I build during the course?
During the course, you will build a fully functional voice assistant using AWS Bedrock and the Nova Sonic model. The assistant will be capable of handling bidirectional audio streams and maintaining a smooth conversational flow. You will integrate the agent with real data sources and external systems, creating tools that allow it to process and respond to various inputs.
Who is the target audience for this course?
The course is designed for backend developers, AI engineers, and individuals involved in building cloud applications. It is suitable for those interested in understanding the underlying mechanics of voice systems and who wish to create better voice interfaces. A passion for learning technical details and system architecture is important.
What specific tools and technologies will I learn to use?
You will learn to work with AWS Bedrock, specifically using the Nova Sonic model. The course also covers Python programming with asyncio for handling asynchronous operations. You'll engage with bidirectional streaming, event loops, and cloud services integration, including setting up and using DynamoDB for data management.
What topics are not covered in this course?
The course does not cover frontend development or user interface design for voice agents. It focuses exclusively on backend technologies and system integration, so students looking for courses on voice interface design or user experience might need to seek additional resources.
How much time will I need to commit to complete the course?
The course consists of 39 lessons. While the total runtime has not been specified, students should expect to dedicate time not only to watching the lessons but also to practical implementation and testing of the voice agent. Allocating several hours per week could be necessary, depending on your pace and prior experience.
How can the skills from this course be applied to other areas or careers?
The skills acquired in building a voice agent can be transferred to various fields, such as AI development, cloud architecture, and backend system design. Understanding asynchronous operations and cloud integrations is highly valuable in modern tech environments. These competencies can enhance roles in AI system development, cloud-based service creation, and interactive technology projects.