Launch HN: CamelQA (YC W24) – AI that tests mobile apps

141 pointsby vercantezabout 1 year ago

Hey HN! We're camelQA (<a href="https://camelqa.com/">https://camelqa.com/</a>). We’re building an AI agent that can automate mobile devices using computer vision. Our first use case is for mobile app QA. We convert natural language test cases into tests that run on real iOS and Android devices in our device farm.Flaky UI tests suck. We want to create a solution where engineers don’t waste time maintaining fragile scripts.camelQA uses a combination of accessibility element data along with an in-house custom vision-only RCNN object detection model paired with Google Siglip for UI element classification (see a sample output here - <a href="https://camelqa.com/blog/sole-ui-element-detector.png">https://camelqa.com/blog/sole-ui-element-detector.png</a>). This way we’re able to detect elements even if they do not have accessibility elements associated with them.Under the hood the agent is using Appium to interface with the device. We use GPT-4V to reason at a high level and GPT-3.5 to execute the high-level actions. Check out a gif of our playground here (<a href="https://camelqa.com/blog/sole-signup.gif">https://camelqa.com/blog/sole-signup.gif</a>)Since we’re vision based, we don’t need access to your source code and we work across all app types - SwiftUI and UIKit, React Native, Flutter.We built a demo for HN where you can use our model to control Wikipedia on a simulated iPhone. Check that out here (<a href="https://demo.camelqa.com/">https://demo.camelqa.com/</a>). Try giving it a task like “Bookmark the wiki page for Ilya Sutskever“ or “Find San Francisco in the Places tab”. We only have 5 simulators running so there may be a wait. You get 5 minutes once you enter your first command.If you want to see what our front end looks like, we made an account with some test runs. Use this login (Username: hackerNews Password: 1337hackerNews!) to view our sandboxed HN account (<a href="https://dash.camelqa.com/login">https://dash.camelqa.com/login</a>).Last year we left our corporate jobs to build in the AI space. It felt like we were endlessly testing our apps, even for minor updates, and we still shipped a bug that caused one of our apps to crash on subscribe (the app in question - <a href="https://apps.apple.com/us/app/tldr-ai-summarizer/id6449304716" rel="nofollow">https://apps.apple.com/us/app/tldr-ai-summarizer/id644930471...</a>). That was the catalyst for camelQA.We’re excited to hear what you all think!

19 comments

hubraumhugoabout 1 year ago

As someone who worked in a mobile dev team, I can only applaud your effort! :)I have a few questions:As with all new AI-based RPA & Testing frameworks (there are quite many in YC), I'm curious about the costs and performance. Let's say I want to run a few smoke tests (5-10 end-to-end scenarios) on my app across multiple iOS and Android devices with different screen sizes and OS versions before going into production.What would it cost, and how long would it take to complete the tests?Do you already have customers running such real-world use cases with it?

评论 #39769940 未加载

hugsabout 1 year ago

Appium project starter here. Congrats on the launch! If you ever want to talk shop, let me know!I'm glad to see more vision-first, AI-powered testing tools in the world.

评论 #39772865 未加载

评论 #39775348 未加载

评论 #39772777 未加载

bluelightning2kabout 1 year ago

Very cool! I don't have this pain point currently but I can absolutely see the utility. I like the in built demo tool (although it sadly means you have no need for DemoTime lol).The demo.camelqa needs some styling. I would invest a few minutes here. Maybe a loading spinner too if you're expecting 15second latency.Technically is this doing clever things with markup, or literally just feeding the image into a multimodal LLM and getting function calls in response?

评论 #39769686 未加载

maxconfabout 1 year ago

Having worked on mobile infra for many years now for a couple very large iOS teams, excited to learn more and kudos for putting yourselves out there. 1. Integration tests are notoriously slow, the demo seemed to take some time to do basic actions; is it even possible to run these at scale? 2. >Flaky UI tests suck; they can be flaky but it's often due to bad code and architecture. Any data to backup your tool makes the tests less flaky? I could see a scenario where there are 2 buttons with the same text, but under the hood we'd use different identifiers in-code to determine which button should be tapped in UI.Overall I'm a bit skeptical because most UI tests are pretty easy to write today with very natural DSLs that are close to natural language, but definitely want to follow and hear more production use cases.

评论 #39770215 未加载

评论 #39771857 未加载

nchaseabout 1 year ago

Why is the branding/mascot a camel?I'm reminded of Waldo, a mobile testing automation product that was acquired in 2023.Their mascot is another camelid (not sure if alpaca or llama). <a href="https://www.waldo.com/" rel="nofollow">https://www.waldo.com/</a>

评论 #39780291 未加载

bluelightning2kabout 1 year ago

Your demo is very concise and well crafted. Is your host naturally smooth or it was many takes? Good job

评论 #39770577 未加载

frankdenbowabout 1 year ago

This is excellent. Definitely useful and well communicated on your site. Curious where you want to expand it, seems like it can be used to track not just your own apps but the apps of others and track information and new UX from competitors as well (I've seen apps like ChangeTower for the web). Is this the direction you're planning to take this? More initial thoughts here: <a href="https://www.youtube.com/watch?v=FrLNG2vtxsA" rel="nofollow">https://www.youtube.com/watch?v=FrLNG2vtxsA</a>

评论 #39784755 未加载

ngokevinabout 1 year ago

Yeah, the two big issues with UI tests: flaky and slow.Curious how using GPT and vision combats flakiness? I'd feel the entropy of GPT and anything less than 100% accuracy in the computer vision pieces would lead to more flakiness.I also wonder about the speed and costs of running the tests. When E2E tests are traditionally slow and expensive already. The computer vision and GPT elements seem costlier and less fast.

评论 #39778877 未加载

HughParryabout 1 year ago

Demo looks very slick.How far off being able to integrate into a CICD pipe is this? I'd love if this could trigger off a pr, then block merging since it wasn't sure how to execute some regular user flow (even if that were due to it not understanding how it could perform an action, since this maybe means my flow doesn't make sense).

评论 #39773361 未加载

chw9eabout 1 year ago

This looks awesome, automated UI testing is so hard to get right but also very important. Great work so far!

评论 #39773005 未加载

mdolonabout 1 year ago

I LOVE this. I pitched something similar (albeit far less intelligent) to my last employer only to get scoffed at, so it makes me really happy to see someone actually make and productize it. Wishing you success!

评论 #39773094 未加载

upekabeeabout 1 year ago

Great demo ! This is going to be a huge time / money saving for companies.

bartekpaciaabout 1 year ago

Seems similar to App Quality Copilot - <a href="https://www.mobile.dev/app-quality-copilot" rel="nofollow">https://www.mobile.dev/app-quality-copilot</a>

评论 #39778680 未加载

FredPretabout 1 year ago

I love this idea. Is there something similar for web-apps?I wonder if you can easily add AI-based fuzzing or AI-based sample workflows to a testing pipeline.

评论 #39779370 未加载

garyiskiddingabout 1 year ago

This is a great idea. I assume that inference costs will be higher for the time being, but it does aim to solve a real problem. Kudos..!

评论 #39779430 未加载

ahoog42about 1 year ago

Congrats, very exciting. Do you support configuring things like usernames and passwords? How do you handle MFA?

评论 #39778653 未加载

tetekabout 1 year ago

would be nice to just invite the AI tester to TestFlight instead of uploading the build :)

评论 #39778697 未加载

onion2kabout 1 year ago

This is very cool. Is there a version for browsers?

评论 #39778753 未加载

jondwillisabout 1 year ago

Any support for non-mobile native apps? e.g. macOS?

评论 #39772002 未加载

19 comments

hubraumhugoabout 1 year ago

评论 #39769940 未加载

hugsabout 1 year ago

Appium project starter here. Congrats on the launch! If you ever want to talk shop, let me know!I'm glad to see more vision-first, AI-powered testing tools in the world.

评论 #39772865 未加载

评论 #39775348 未加载

评论 #39772777 未加载

bluelightning2kabout 1 year ago

评论 #39769686 未加载

maxconfabout 1 year ago

评论 #39770215 未加载

评论 #39771857 未加载

nchaseabout 1 year ago

评论 #39780291 未加载

bluelightning2kabout 1 year ago

Your demo is very concise and well crafted. Is your host naturally smooth or it was many takes? Good job

评论 #39770577 未加载

frankdenbowabout 1 year ago

评论 #39784755 未加载

ngokevinabout 1 year ago

评论 #39778877 未加载

HughParryabout 1 year ago

评论 #39773361 未加载

chw9eabout 1 year ago

This looks awesome, automated UI testing is so hard to get right but also very important. Great work so far!

评论 #39773005 未加载

mdolonabout 1 year ago

评论 #39773094 未加载

upekabeeabout 1 year ago

Great demo ! This is going to be a huge time / money saving for companies.

bartekpaciaabout 1 year ago

Seems similar to App Quality Copilot - <a href="https://www.mobile.dev/app-quality-copilot" rel="nofollow">https://www.mobile.dev/app-quality-copilot</a>

评论 #39778680 未加载

FredPretabout 1 year ago

I love this idea. Is there something similar for web-apps?I wonder if you can easily add AI-based fuzzing or AI-based sample workflows to a testing pipeline.

评论 #39779370 未加载

garyiskiddingabout 1 year ago

This is a great idea. I assume that inference costs will be higher for the time being, but it does aim to solve a real problem. Kudos..!

评论 #39779430 未加载

ahoog42about 1 year ago

Congrats, very exciting. Do you support configuring things like usernames and passwords? How do you handle MFA?

评论 #39778653 未加载

tetekabout 1 year ago

would be nice to just invite the AI tester to TestFlight instead of uploading the build :)