TechEcho

Hello HN, Dawson and Ethan here from Martin (<a href="https://trymartin.com">https://trymartin.com</a>). We’ve been building an AI personal assistant (the elusive dream of a real life Jarvis) for about a year now, and we recently launched Martin as a web app. Watch our latest demo here: <a href="https://youtu.be/ZeafVF8U7Ts" rel="nofollow">https://youtu.be/ZeafVF8U7Ts</a>.We’re starting with common agentic tasks for consumers/prosumers - Martin can read/draft emails, make your calendar, text and call others for you, and use Slack. Like any personal assistant, it can also set reminders, track your to-dos, and send you daily briefings. The idea is to eventually tackle everything that an on-call virtual assistant does.4 months ago, we did a launch HN for Martin's voice-first iOS app. A big piece of product feedback we got was "I don't trust AI to take actions like sending texts/emails on my behalf if it's not 100% reliable."We’re happy to report that Martin's failure rate is now a lot lower than before (though we have a lot more work to do for more complex actions). We have tackled some pretty interesting problems since our last launch, so thought we’d share a couple of them here:First, building a testing suite to concretely measure and improve performance for agents is no trivial task. (We're optimistic that someone might build an awesome system for this one day, but we haven't found one so are doing it ourselves.) Specifically, what we’d like to do is run existing test cases on new implementations of our entire LLM processing flow - not just new prompts - and be able to rigorously say whether we’ve improved and/or where we’ve regressed. This means defining tests in such a way that they’re resilient to major overhauls of code structure, as well as building a testing execution context that mimics production behavior (i.e. a test user with calendar events, emails, contact info). On top of that, all test cases need to be manually and painstakingly written, with expected outputs sometimes being many tens of thousands of characters.On the monitoring side, most of our reliability issues are soft errors which are very hard to programmatically catch. When malfunctions happen, most of the time we learn of it through customer feedback and not any conventional third-party monitoring system. The best we can do without manually sifting through tons of data is to implement rudimentary checks based on behavior patterns which we know historically indicate errors (e.g. making many similar API calls in quick succession, implying rapid failure and retry of function calls)Another problem we keep coming back to is the stateless nature of LLM context (information is not stored latently and needs to be reintroduced at every invocation). Because of how much info Martin needs (product information, user memory, tool definitions, previous messages, platform-specific instructions, etc), we need to carefully manage what information we expose to Martin and how we balance broad context with specific information. Vanilla RAG can’t handle the complexity, so we built custom retrieval and context injection systems for each LLM call. We abstract away some information behind function calls and organize certain tools into modules which share context and instructions. This strategy has helped a lot with reliability.Of course, we're still a long way from Jarvis. Whenever one of us struggles with a technical problem, the other will kindly remind him that "Tony Stark built this in a cave, with a box of scraps!"We’re super pumped about where software is headed. It feels like we’re tinkering with ideas that are on the edge of what’s possible. You can try out Martin on desktop and iOS at <a href="https://trymartin.com">https://trymartin.com</a>. We have a 7-day free trial, and if you find it useful we charge $35/month afterwards for unlimited usage.Very excited to hear your thoughts! If you have any ideas around reliability for agents or the future of consumer AI interfaces, we’d love to discuss and trade notes.

8 comments

xhhaha6 months ago

oh, martin desktop is finally here! I use Martin to manage my todo list while i'm coding (on an hourly basis) - having it aside VSC is so handy! but i do want the todo list to be hanging somewhere on the martin screen, and i can interact with the todos by clicks (on top of asking martin to manage it). It's very nice to have a cloud-based agent that can access my todo whichever device/app i'm on - slack, text, whats-app!

jahooma6 months ago

I would try this if it had Android support! Is that planned?Also curious which integrations people use the most. If it's email, why not just build an email app?

评论 #42248870 未加载

edreichua6 months ago

Congrats on the launch! Where can I download the desktop app? I tried looking for it on the website and couldn't find it.

评论 #42248305 未加载

frankdenbow6 months ago

Good stuff, always shipping. I definitely sit at my desktop a lot of the day so having a window for this would be clutch

georgewangsf6 months ago

Congrats!! Looks like a super awesome upgrade to an already very useful app :)

kyleli6266 months ago

awesome - really cool product. slack integration makes a lot of sense. does martin live inside slack as a bot i can chat to or does it make actions on behalf of me in slack?

评论 #42248812 未加载

eggowaffle826 months ago

interesting product. how exactly does Martin text/call others?

评论 #42249389 未加载

1080pieces6 months ago

Congrats on the launch!!

8 comments

xhhaha6 months ago

jahooma6 months ago

I would try this if it had Android support! Is that planned?Also curious which integrations people use the most. If it's email, why not just build an email app?

评论 #42248870 未加载

edreichua6 months ago

Congrats on the launch! Where can I download the desktop app? I tried looking for it on the website and couldn't find it.

评论 #42248305 未加载

frankdenbow6 months ago

Good stuff, always shipping. I definitely sit at my desktop a lot of the day so having a window for this would be clutch

georgewangsf6 months ago

Congrats!! Looks like a super awesome upgrade to an already very useful app :)

kyleli6266 months ago

awesome - really cool product. slack integration makes a lot of sense. does martin live inside slack as a bot i can chat to or does it make actions on behalf of me in slack?

评论 #42248812 未加载

eggowaffle826 months ago

interesting product. how exactly does Martin text/call others?

评论 #42249389 未加载

1080pieces6 months ago

Congrats on the launch!!

Show HN: AI that can use Gmail, SMS, Slack, Calendar

8 comments

Show HN: AI that can use Gmail, SMS, Slack, Calendar

8 comments