Trending...
- A Business Novel About Ambition, Ethics, and the Hidden Realities of International Business
- City of Colorado Springs to observe Juneteenth on Friday, June 19
- ReviewsAlly Launches Evidence-Based Review Platform for VPNs, Business Software, and Online Services
BOULDER, Colo. - ColoradoDesk -- Built on Vision Agents with Anam and Inworld to demonstrate emotionally aware, video-first AI
Stream released an open-source AI agent that responds to a user's facial expressions, gaze, and engagement in real time. The agent, called Crashout Buddy, is live at visionagents.ai.
The era of the floating orb is over. Most voice agents today are blind. They convert speech to text, run it through an LLM, and read the response back in a flat tone regardless of whether the user is laughing, frustrated, or close to tears. Built on Stream's Vision Agents framework in collaboration with Anam and Inworld, Crashout Buddy watches the user's face and shapes both what the agent says and how it says it. When the user goes quiet, it notices. When they look like they're about to lose it, it softens.
How It Works
The agent runs a multimodal perception stack on Stream's global edge network. MediaPipe tracks 52 facial blendshapes at 8 fps to classify emotion, gaze, and engagement. That signal is injected into the LLM (Gemini) on every turn, which steers Inworld's TTS-2 voice model using natural-language direction such as [say warmly with light, easy energy]. Anam renders a photorealistic, lip-synced avatar. Deepgram handles speech-to-text.
More on Colorado Desk
The same pattern (facial state, rich agent context, expressive voice, lip-synced avatar) suits apps in dating, coaching, recruitment, tutoring, and customer support.
Key capabilities include:
Availability
The full project is open source. Try the demo at visionagents.ai, read the guide on the Stream blog, or explore the code at: https://github.com/GetStream/Vision-Agents
Stream released an open-source AI agent that responds to a user's facial expressions, gaze, and engagement in real time. The agent, called Crashout Buddy, is live at visionagents.ai.
The era of the floating orb is over. Most voice agents today are blind. They convert speech to text, run it through an LLM, and read the response back in a flat tone regardless of whether the user is laughing, frustrated, or close to tears. Built on Stream's Vision Agents framework in collaboration with Anam and Inworld, Crashout Buddy watches the user's face and shapes both what the agent says and how it says it. When the user goes quiet, it notices. When they look like they're about to lose it, it softens.
How It Works
The agent runs a multimodal perception stack on Stream's global edge network. MediaPipe tracks 52 facial blendshapes at 8 fps to classify emotion, gaze, and engagement. That signal is injected into the LLM (Gemini) on every turn, which steers Inworld's TTS-2 voice model using natural-language direction such as [say warmly with light, easy energy]. Anam renders a photorealistic, lip-synced avatar. Deepgram handles speech-to-text.
More on Colorado Desk
- Governor Polis Marks Dobbs Anniversary: While Other States Restrict Healthcare Colorado Strengthens Access
- Colorado: Lt. Governor Primavera and CCIA announce Tribal and American Indian/Alaska Native Affairs Roadmap
- Governor Polis and the Colorado Department of Health Care Policy and Financing Open Applications for $160 Million to Reduce Healthcare Costs and Improve Access to Healthcare in Rural Colorado
- City announces plans to create Colorado Springs 911 Authority
- Colorado Springs: City Council leadership announces additional funding for the Pikes Peak or Bust Parade
The same pattern (facial state, rich agent context, expressive voice, lip-synced avatar) suits apps in dating, coaching, recruitment, tutoring, and customer support.
Key capabilities include:
- Emotion, gaze, and engagement classification with hysteresis to prevent flicker
- Natural-language voice steering in 100+ languages via Inworld TTS-2
- Photorealistic lip-synced avatar via Anam's CARA model
- Proactive re-engagement when the user drifts off-camera or goes quiet
- Composable processors running at independent frame rates
Availability
The full project is open source. Try the demo at visionagents.ai, read the guide on the Stream blog, or explore the code at: https://github.com/GetStream/Vision-Agents
Source: Getstream.io
Filed Under: Technology
0 Comments
Latest on Colorado Desk
- Colorado Springs: Podcast: Meet the new Parks, Recreation & Cultural Services Director
- Colorado Springs: Podcast: Safety Snapshot with Chief Vasquez - Cloned
- Resource Central's Popular Garden In A Box Returns June 24 as Colorado Drought Drives Record Demand for Waterwise Landscaping
- George Martinez Completes Community Re-distribution Initiative, Returning $5,000 In Campaign Resources To Anchorage Nonprofits
- Mister Omaha Tries The Turf At Lone Star Park
- Andrew D. Levine Releases The Lily Network, an Indian Noir Mystery of Power, Paperwork & Murder
- The Mapping Software Behind America's Viral Maps Just Got Faster and Smarter
- Longevityresearch.ca publishes cross-disease causal analysis quantifying endpoint reduction across 27 diseases
- Joulescope JS320 Launches to Help Engineers Develop Battery-Powered Devices with Greater Confidence
- Ghanaian Afrobeat Artist Praise Kusi Announces Upcoming EP "After 21:00" Releasing July 3, 2026
- TURRENTINE: A Family Legacy United Through Music
- GracePoint Publishing Highlights renewed interest in metaphyscial topics
- Governor Polis Tours Housing in Colorado Springs and Celebrates Newly Opened Da Vinci Museum in Pueblo
- Save 10 Percent Off Summer Stays at KeysCaribbean Resorts
- CGI Announces Pre-Order Launch for New Integrated Behavioral Health Book
- Prince George's County Students Now Have A Rare Opportunity In TV Film Production Career-readiness
- Evoke 3 Founder Aamie Benson to Appear on National Broadcast of Moving America Forward
- City of San José Could Lose Access to Millions Under New CalEnviroScreen Tool 5.0
- This Weekend Causeway Cove Country BBQ & Music Festival Returns for Fourth Year, Celebrating America's 250th Anniversary on the Water
- Webtronix Designs Web Agency Launches "LocalFind" to Revolutionize AI Local SEO for Local Businesses