TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Checksum – generate and maintain end-to-end tests using AI

78 pointsby Bootstrapper909about 2 years ago
Hey HN!<p>I’m Gal, co-founder at Checksum (<a href="https:&#x2F;&#x2F;checksum.ai" rel="nofollow">https:&#x2F;&#x2F;checksum.ai</a>). Checksum is a tool for automatically generating and maintaining end-to-end tests using AI.<p>I cut my teeth in applied ML in 2016 at a maritime tech company called TSG, based in Israel. When I was there, I worked on a cool product that used machine learning to detect suspicious vehicles. Radar data is pretty tough for humans to parse, but a great fit for AI – and it worked very well for detecting smugglers, terrorist activity, and that sort of thing.<p>In 2021, after a few years working in big tech (Lyft, Google), I joined a YC company, seer W21, as CTO. This is where I experienced the unique pain of trying to keep end-to-end tests in a good state. The app was quite featureful, and it was a struggle to get and maintain good test coverage.<p>Like the suspicious maritime vehicle problem I had previously encountered, building and maintaining E2E tests had all the markings of a problem where machines could outperform humans. Also, in the early user interviews, it became clear that this problem wasn’t one that just went away as organizations grew past the startup phase, but one that got even more tangled up and unpleasant.<p>We’ve been building the product for a little over a year now, and it’s been interesting to learn that some problems were surprisingly easy, and others unusually tough. To get the data we need to train our models, we use the same underlying technology that tools like Fullstory and Hotjar use, and it works quite well. Also, we’re able to get good tests from relatively few user sessions (in most cases, fewer than 200 sessions).<p>Right now, the models are really good at improving test coverage for featureful web-apps that don’t have much coverage (ie; generating and maintaining a bunch of new tests), but making existing tests better has been a tougher nut to crack. We don’t have as much of a place in organizations where test coverage is great and test quality is medium-to-poor, but we’re keen to develop in that direction.<p>We’re still early, and spend basically all of our time working with a small handful of design partners (mostly medium-sized startups struggling with test coverage), but it felt like time to share with the HN community.<p>Thanks so much, happy to answer any questions, and excited to hear your thoughts!

15 comments

therealrifathabout 2 years ago
This is dope, a couple of feedback points from a technical person that is a potential customer (I could be wrong on these!):<p>-- I think the name doesn&#x27;t sell me or even most people because &quot;checksum&quot; is more of a security&#x2F;crypto term. When I saw the HN post say Checksum I didn&#x27;t think it was going to be about end-to-end tests. I thought it was going to be some crypto thing. Maybe a name like &quot;Tested&quot; or &quot;Covered&quot; is going to click better with the potential customer.<p>-- I don&#x27;t feel like the demo video is making me feel like I know what this product is doing. I could also be misunderstanding the product. It might help more if the demo showed the following (in ideally less than 5-10s or most users might tune out):<p>1. A quick setup step for checksum 2. A set of generated tests 3. Passing tests<p>Seeing those steps would give me the emotion as an end-user &quot;wow this must something I can quickly setup and will make me feel like I have test coverage out of the box&quot;
评论 #35633258 未加载
ezekgabout 2 years ago
&gt; Our impact on performance is non-existent as we use battle-tested open source tools used by Fortune 500 companies<p>What does that mean, exactly? Just because it&#x27;s open source and used by F500s doesn&#x27;t mean it can&#x27;t have performance issues.
评论 #35632223 未加载
satisficeabout 2 years ago
Most tool companies making claims about their tools show a shocking lack of knowledge about testing. This generally guarantees that their tools are dismissed by serious professionals. That still leaves a pretty substantial market among credulous wishful thinkers, of course.<p>But as a tester, I would like to see a tool that isn’t just more bullshit. For this to happen you will have to explain:<p>- What exactly is your product designed to do? What kind of products can it be applied to test?<p>- What do you mean by the word test? Humans test in many ways and levels. Do you simply mean “exercise code while detecting crashes?” Because that’s a tiny part of testing.<p>- Code coverage is not the only kind of coverage. So how do you automatically achieve state and data coverage? I’m guessing you don’t, but hoping you will surprise me.<p>- Test oracles come in all shapes and sizes. One of the reasons I say testing cannot be automated is that I can easily demonstrate that a human tester cannot fully specify their own oracles, and thus cannot write code to implement them, either. So, how does your product recognize a bug when it sees it?<p>- How much human handholding is needed to operate your product?<p>- Testers think critically about how users interact with the product as users attempt to fulfill their purposes. This guides practical testing. I haven’t yet seen any product that thinks critically. ChatGPT can’t. So how does your product cope?<p>- When the product under test changes, what does your product do?<p>- Can your product EXPLAIN its test coverage (other than reporting code coverage, which is a poor indicator of good test coverage)?<p>- Say I have a product that sends the user through a multimodal questionnaire (including the use of animated screens that guide the user through measuring heartrate) and then produces a diagnosis of possible illnesses. Can your product tell if the diagnosis was correct in relation to the original intent of the logic that is documented in Jira tickets and Slack conversations? Will it generate questions about any of that, the way a real tester does?
JohnFrielabout 2 years ago
This is a really compelling idea – but I&#x27;m having a little trouble making the leap from the high level description to what it would mean for my projects in more concrete terms. Would it be possible to show off some example tests that the model generated and maybe even a story about how the generated tests caught a bug before the code made it to production?
评论 #35629819 未加载
johnsillingsabout 2 years ago
I know Gal has the link in plaintext above, but for folks who want to check out the homepage, it&#x27;s here: <a href="https:&#x2F;&#x2F;checksum.ai" rel="nofollow">https:&#x2F;&#x2F;checksum.ai</a>
8organicbitsabout 2 years ago
I&#x27;m always suspicious of tests when test coverage is the main metric. I&#x27;ve seen developers write tests that don&#x27;t really check anything but run all the code paths. I&#x27;ve also seen tests that check every bit of output, which end up being brittle.<p>How well do the tests hold up over time, and how well are the tests validating the contract of the code instead of just historical behavior and quirks?
评论 #35629491 未加载
jtambuntabout 2 years ago
Congratulations on all the progress you&#x27;ve made! We are all learning as we&#x27;re building and talking to users. I know for my team, E2E&#x2F;Integration testing is our main priority (over unit tests), and maintaining E2E tests is definitely a struggle. I imagine this problem is even more of an issue for larger codebases so I see why you&#x27;re going after medium-size startups where the product isn&#x27;t completely rebuilt every few months.
评论 #35632290 未加载
hiatusabout 2 years ago
Noticed a couple small typos in the marketing copy<p>&gt; Our impact on pefromence is non-existant as we use battle-tested open source tools used by Fortune 500 companies
评论 #35631736 未加载
varunjain99about 2 years ago
Congrats! Definitely think QA farms can be automated using AI! Can you explain more what part Checksum is using AI?<p>Is it for the identification of user sessions that are good candidates to make into tests? Is it the generation of test specification in some DSL &#x2F; Cucumber &#x2F; Selenium &#x2F; etc.?
评论 #35632261 未加载
BillSaysThisabout 2 years ago
Nothing on pricing on your site. Makes any other question difficult to formulate.
评论 #35634716 未加载
artur_maklyabout 2 years ago
congrats on the launch.<p>0 - seriously rethink your branding. I can help.<p>1 - how does the Ai know when the test is successful? — is it a visual comparison? — if so.. is there a threshold range that can be adjusted?<p>2 - how does this differ from <a href="https:&#x2F;&#x2F;www.meticulous.ai&#x2F;">https:&#x2F;&#x2F;www.meticulous.ai&#x2F;</a><p>3 - would it work on highly complex UX&#x2F;Ui interactions like these here?<p><a href="https:&#x2F;&#x2F;youtu.be&#x2F;WtglzRWQzVE" rel="nofollow">https:&#x2F;&#x2F;youtu.be&#x2F;WtglzRWQzVE</a>
sachuin23about 2 years ago
How is the product different from the other test generation tools? How do you check if the are testing the intended behavior. My experience with automated testing solutions has been lukewarm so far.
评论 #35629723 未加载
execore-1about 2 years ago
Awesome idea! Excited to see where this goes
评论 #35634287 未加载
firedupabout 2 years ago
Interesting, unrelated, but related to your intro, what is the best open source datasets for maritime data?
mackeyja92about 2 years ago
Does it only support web? What about react native mobile apps?
评论 #35630485 未加载