科技回声

Hey HN, we’re James and Daniel, co-founders of Roark (<a href="https://roark.ai">https://roark.ai</a>). We built a tool that lets developers replay real production calls against their latest Voice AI changes, so they can catch failures, test updates, and iterate with confidence.Here’s a demo video: <a href="https://www.youtube.com/watch?v=eu8mo28LsTc" rel="nofollow">https://www.youtube.com/watch?v=eu8mo28LsTc</a>.We ran into this problem while building a voice AI agent for a dental clinic. Patients kept getting stuck in loops, failing to confirm insurance, or misunderstanding responses. The only way to test fixes was to manually call the agent or read through hundreds of transcripts, hoping to catch issues. It was slow, frustrating, and unreliable.Talking to other teams, we found this wasn’t just a niche issue - every team building Voice AI struggled to validate performance efficiently. Debugging meant calling the agent over and over. Updates shipped with unknown regressions. Sentiment analysis relied only on text, missing key audio cues like hesitation or frustration, which often signal deeper issues.That’s why we built Roark. Instead of relying on scripted test cases, Roark captures real production calls from VAPI, Retell, or a custom-built agent via API and replays them against your latest agent changes. We don’t just feed back text, we preserve what the user said, how they said it, and when they said it, mimicking pauses, sentiment, and tone up until the conversation flow changes. This ensures your agent is tested under real-world conditions, not just synthetic scripts.For each replay that we run, Roark checks if the agent follows key flows (e.g. verifying identity before sharing account details). Our speech based evaluators also detect sentiments such as frustration and confusion, long pauses, and interruptions - things that regular transcripts miss.After testing, Roark provides Mixpanel-style analytics to track failures, conversation flows, and key performance metrics, helping teams debug faster and ship with confidence. Instead of hoping changes work, teams get immediate pass/fail results, side-by-side transcript comparisons, and real-world insights.We’re already working with teams in healthcare, legal, and customer service who rely on Voice AI for critical interactions. They use Roark to debug AI failures faster, test updates before they go live, and improve customer experiences - without manually calling their bots dozens of times.Our product isn’t quite ready yet for self-service, so you’ll still see the dreaded “book a demo” on our home page. If you’re reading this, though, we’d love to fast-track you, so we made a special page for HN signups here: <a href="https://roark.ai/hn-access">https://roark.ai/hn-access</a>. If you’re working on Voice AI and want to try us out, please do!Would love any feedback, thoughts, or questions from the HN community!

9 条评论

Closi3 个月前

It looks great! Although the demo shows horrible security practices...Clearly authentication shouldn't rely on prompt engineering.Particularly when at the end of the demo it says "we have tested it again and now it shows that the security issue is fixed" - No it's not fixed! It's hidden! Still a gaping security hole. Clearly just a very bad example, particularly considering the context is banking.

评论 #43082208 未加载

mercurialsolo3 个月前

As someone who's building a personal work assistant for voice - I see the merit in automating test case generation and validation.All products in this space by YC teams are targeted at scaled voice agent startups or teams.- Roark (<a href="https://roark.ai/">https://roark.ai/</a>)- Hammin (<a href="https://hamming.ai/">https://hamming.ai/</a>)- Coval (<a href="https://www.coval.dev/">https://www.coval.dev/</a>)- Vocera (<a href="https://www.vocera.ai/">https://www.vocera.ai/</a>)How do you differentiate - who is this for? Voice agent devs paying $500/mo. for early stage software?

评论 #43083191 未加载

评论 #43084023 未加载

NewUser763123 个月前

Looks somewhat useful for voice AI QA.But I wonder if a company is deploying voice AI, wouldn't they have their own testing and quality assurance flows?Is this targeted at companies without an engineering department or something? In which case I find it surprising they're able to slot in some voice AI assistant in the first place.

评论 #43083248 未加载

jnovek3 个月前

I noticed this on your website regarding transcription --"More accurate than Deepgram, supporting 50+ languages with a word error rate of just 8.6%."Can you explain how this helps me? At the end of the day you are not my transcriber, wouldn't I want to test using transcriptions produced by the transcriber that I'm actually using in production?

评论 #43084647 未加载

aeternum3 个月前

This seems useful for issues early in the convo but what if the AI responses diverge from the recorded convo prior to the issue being hit?

评论 #43084662 未加载

aantix3 个月前

Curious, who do you guys consider is the leader in the space with real-time voice interactions, interruptions, etc?

sunshinerag3 个月前

Why is it called Roark?

评论 #43084605 未加载

评论 #43084219 未加载

zachthewf3 个月前

Congrats on the launch!

评论 #43084519 未加载

Cilvic3 个月前

Super cool to see this just now. We are building in the space of computer screen analysis and started to experience something similar, hence want to build something similar for Pixels instead of voice.Would love to chat to you, jan@kontext21.com

评论 #43083094 未加载

9 条评论

Closi3 个月前

评论 #43082208 未加载

mercurialsolo3 个月前

评论 #43083191 未加载

评论 #43084023 未加载

NewUser763123 个月前

评论 #43083248 未加载

jnovek3 个月前

评论 #43084647 未加载

aeternum3 个月前

This seems useful for issues early in the convo but what if the AI responses diverge from the recorded convo prior to the issue being hit?

评论 #43084662 未加载

aantix3 个月前

Curious, who do you guys consider is the leader in the space with real-time voice interactions, interruptions, etc?

Launch HN: Roark (YC W25) – Taking the pain out of voice AI testing

9 条评论

Launch HN: Roark (YC W25) – Taking the pain out of voice AI testing

9 条评论