Hey HN! We're camelQA (<a href="https://camelqa.com/">https://camelqa.com/</a>). We’re building an AI agent that can automate mobile devices using computer vision. Our first use case is for mobile app QA. We convert natural language test cases into tests that run on real iOS and Android devices in our device farm.<p>Flaky UI tests suck. We want to create a solution where engineers don’t waste time maintaining fragile scripts.<p>camelQA uses a combination of accessibility element data along with an in-house custom vision-only RCNN object detection model paired with Google Siglip for UI element classification (see a sample output here - <a href="https://camelqa.com/blog/sole-ui-element-detector.png">https://camelqa.com/blog/sole-ui-element-detector.png</a>). This way we’re able to detect elements even if they do not have accessibility elements associated with them.<p>Under the hood the agent is using Appium to interface with the device. We use GPT-4V to reason at a high level and GPT-3.5 to execute the high-level actions. Check out a gif of our playground here (<a href="https://camelqa.com/blog/sole-signup.gif">https://camelqa.com/blog/sole-signup.gif</a>)<p>Since we’re vision based, we don’t need access to your source code and we work across all app types - SwiftUI and UIKit, React Native, Flutter.<p>We built a demo for HN where you can use our model to control Wikipedia on a simulated iPhone. Check that out here (<a href="https://demo.camelqa.com/">https://demo.camelqa.com/</a>). Try giving it a task like “Bookmark the wiki page for Ilya Sutskever“ or “Find San Francisco in the Places tab”. We only have 5 simulators running so there may be a wait. You get 5 minutes once you enter your first command.<p>If you want to see what our front end looks like, we made an account with some test runs. Use this login (Username: hackerNews Password: 1337hackerNews!) to view our sandboxed HN account (<a href="https://dash.camelqa.com/login">https://dash.camelqa.com/login</a>).<p>Last year we left our corporate jobs to build in the AI space. It felt like we were endlessly testing our apps, even for minor updates, and we still shipped a bug that caused one of our apps to crash on subscribe (the app in question - <a href="https://apps.apple.com/us/app/tldr-ai-summarizer/id6449304716" rel="nofollow">https://apps.apple.com/us/app/tldr-ai-summarizer/id644930471...</a>). That was the catalyst for camelQA.<p>We’re excited to hear what you all think!
As someone who worked in a mobile dev team, I can only applaud your effort! :)<p>I have a few questions:<p>As with all new AI-based RPA & Testing frameworks (there are quite many in YC), I'm curious about the costs and performance. Let's say I want to run a few smoke tests (5-10 end-to-end scenarios) on my app across multiple iOS and Android devices with different screen sizes and OS versions before going into production.<p>What would it cost, and how long would it take to complete the tests?<p>Do you already have customers running such real-world use cases with it?
Appium project starter here. Congrats on the launch! If you ever want to talk shop, let me know!<p>I'm glad to see more vision-first, AI-powered testing tools in the world.
Very cool! I don't have this pain point currently but I can absolutely see the utility. I like the in built demo tool (although it sadly means you have no need for DemoTime lol).<p>The demo.camelqa needs some styling. I would invest a few minutes here. Maybe a loading spinner too if you're expecting 15second latency.<p>Technically is this doing clever things with markup, or literally just feeding the image into a multimodal LLM and getting function calls in response?
Having worked on mobile infra for many years now for a couple very large iOS teams, excited to learn more and kudos for putting yourselves out there.
1. Integration tests are notoriously slow, the demo seemed to take some time to do basic actions; is it even possible to run these at scale?
2. >Flaky UI tests suck; they can be flaky but it's often due to bad code and architecture. Any data to backup your tool makes the tests less flaky? I could see a scenario where there are 2 buttons with the same text, but under the hood we'd use different identifiers in-code to determine which button should be tapped in UI.<p>Overall I'm a bit skeptical because most UI tests are pretty easy to write today with very natural DSLs that are close to natural language, but definitely want to follow and hear more production use cases.
Why is the branding/mascot a camel?<p>I'm reminded of Waldo, a mobile testing automation product that was acquired in 2023.<p>Their mascot is another camelid (not sure if alpaca or llama). <a href="https://www.waldo.com/" rel="nofollow">https://www.waldo.com/</a>
This is excellent. Definitely useful and well communicated on your site. Curious where you want to expand it, seems like it can be used to track not just your own apps but the apps of others and track information and new UX from competitors as well (I've seen apps like ChangeTower for the web). Is this the direction you're planning to take this? More initial thoughts here: <a href="https://www.youtube.com/watch?v=FrLNG2vtxsA" rel="nofollow">https://www.youtube.com/watch?v=FrLNG2vtxsA</a>
Yeah, the two big issues with UI tests: flaky and slow.<p>Curious how using GPT and vision combats flakiness? I'd feel the entropy of GPT and anything less than 100% accuracy in the computer vision pieces would lead to more flakiness.<p>I also wonder about the speed and costs of running the tests. When E2E tests are traditionally slow and expensive already. The computer vision and GPT elements seem costlier and less fast.
Demo looks very slick.<p>How far off being able to integrate into a CICD pipe is this? I'd love if this could trigger off a pr, then block merging since it wasn't sure how to execute some regular user flow (even if that were due to it not understanding how it could perform an action, since this maybe means my flow doesn't make sense).
I LOVE this. I pitched something similar (albeit far less intelligent) to my last employer only to get scoffed at, so it makes me really happy to see someone actually make and productize it. Wishing you success!
Seems similar to App Quality Copilot - <a href="https://www.mobile.dev/app-quality-copilot" rel="nofollow">https://www.mobile.dev/app-quality-copilot</a>
I love this idea. Is there something similar for web-apps?<p>I wonder if you can easily add AI-based fuzzing or AI-based sample workflows to a testing pipeline.