TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Introducing Qodo Cover: Automate Test Coverage

16 点作者 timbilt5 个月前

3 条评论

foundry275 个月前
First off, congratulations folks! It’s never easy getting a new product off the ground, and I wish you the best of luck. So please don’t take this as anything other than genuine constructive criticism as a potential customer: generating tests to increase coverage is a misunderstanding of the point of collecting code coverage metrics, and businesses that depend on getting verification activities right will know this when they evaluate your product.<p>A high-quality test passes when the functionality of the software under test is consistent with the design intent of that software. If the software doesn’t do the Right Thing, the test must fail. It’s why TDD is effective: you’re essentially specifying the intent and then implementing code against it, like a self-verifying requirements specification. When we look at Qodo tests in the GitHub MRs you’ve linked, it’s argued that a high-quality test is defined as one that:<p>1. Executes successfully<p>2. Passes all assertions<p>3. Increases overall code coverage<p>4. Tests previously uncovered behaviors (as specified in the LLM prompt)<p>So, given source code for a project as input, a hypothetical “perfect AI” built into Qodo that always writes a high-quality test would (naturally!) <i>never fail</i> to write a passing test for that code; the semantics of the code would be perfectly encoded in the test. If the code had a defect, it follows logically that optimizing the quality of your AI for the metrics Qodo is aiming for will actually LOWER the probability of finding that defect! The generated test would have successfully managed to validate the code against itself, enshrining defective behavior as correct. It’s easy to say that higher code coverage is good, more maintainable, etc., but this outcome is actually the exact opposite of maintainable and actively undermines confidence in the code under test and the ability to refactor.<p>There are better ways to do this, and you’ve got competitors who are already well on the way to doing them using a diverse range of inputs besides code. It boils down to answering two questions:<p>1. Can a technique be applied so that a LLM, with or without explicit specifications and understanding of developer intentions, will reliably reconstruct the intended behavior of code?<p>2. Can a technique be applied so that tests generated by a LLM truly verify the specific behaviors the LLM was prompted to test, as opposed to writing a valid test but not the one that was asked for?
m3kw95 个月前
Why can’t i just use cursor to just “generate tests” instead?
评论 #42320873 未加载
swyx5 个月前
congrats team! we just had Itamar back on the pod who reintroduced Qodo, AlphaCodium and teased Qodo Cover: <a href="https:&#x2F;&#x2F;www.latent.space&#x2F;p&#x2F;bolt" rel="nofollow">https:&#x2F;&#x2F;www.latent.space&#x2F;p&#x2F;bolt</a>