Hi HN, hunterbrooks and nbrad here from Ellipsis (<a href="https://www.ellipsis.dev">https://www.ellipsis.dev</a>). Ellipsis automatically reviews your PRs when opened and on each new commit. If you tag @ellipsis-dev in a comment, it can make changes to the PR (via direct commit or side PR) and answer questions, just like a human.<p>Demo video: <a href="https://www.youtube.com/watch?v=X61NGZpaNQA" rel="nofollow">https://www.youtube.com/watch?v=X61NGZpaNQA</a><p>So far, we have dozens of open source projects and companies using Ellipsis. We seem to have landed in a kind of sweet spot where there’s a good match between the current capabilities of AI tools and the actual needs of software engineers - this doesn’t replace human review, but it saves you time by catching/fixing lots of small silly stuff.<p>Here’s an example in the wild: <a href="https://github.com/relari-ai/continuous-eval/pull/38">https://github.com/relari-ai/continuous-eval/pull/38</a>. Ellipsis (1) adds a PR summary; (2) finds a bug and adds a review comment; (3) after a (human) user comments, generates a side PR with the fix; and (4) after a (human) user merges the side PR and adds another commit, re-reviews the PR and approves it<p>Here’s another example: <a href="https://github.com/SciPhi-AI/R2R/pull/350#pullrequestreview-204013694">https://github.com/SciPhi-AI/R2R/pull/350#pullrequestreview-...</a>, where Ellipsis adds several comments with inline suggestions that were directly merged by the developer.<p>You can configure Ellipsis in natural language to enforce custom rules, style guides, or conventions. For example, here’s how the `jxnl/instructor` repo uses natural language rules to make sure that docs are kept in sync: <a href="https://github.com/jxnl/instructor/blob/main/ellipsis.yaml#L13-L14">https://github.com/jxnl/instructor/blob/main/ellipsis.yaml#L...</a>, and here’s an example PR that Ellipsis came up with based on those rules: <a href="https://github.com/jxnl/instructor/pull/346">https://github.com/jxnl/instructor/pull/346</a>.<p>Installing into your repo takes 2 clicks at <a href="https://www.ellipsis.dev">https://www.ellipsis.dev</a>. You do have to sign up to try it out because we need you to authorize our GitHub app to read your code. Don’t worry, your code is never stored or used to train models (<a href="https://docs.ellipsis.dev/security">https://docs.ellipsis.dev/security</a>).<p>We’d really appreciate your feedback, thoughts, and ideas!
This looks surprisingly good! I can see the quick sanity checks being very useful in cases like the mismatched env variables. That said, since you probably already know what works well, here's some constructive criticism:<p>First, I'm pretty unimpressed with the PR descriptions. I'd be frustrated if my company adopted this and I started seeing PRs like the one from continuous-eval that you linked to first. It's classic LLM output: lots of words, not much actual substance. "This PR updates simple.py", "updates certain values". It's the kind of information that can be gleaned in the first 5 seconds of glancing through a PR, and if that creates the illusion that no more description is needed then we'll have lost something.<p>Second, in the same PR: when writing a collection to a jsonl file, I would expect an empty collection to give me an empty file, not no file. Further, I haven't looked at the rest of the context, but it seems extremely unlikely to me that dataset_generator.generate would somehow produce non-serializable objects, and a human would easily see that. These two suggestions feel at best like a waste of time and at worst wrong, and it's concerning to me that the habits this tool encourages led to the suggestions being uncritically adopted and incorporated and that this seemed to you to be a good example of the tool in use.<p>The second PR you linked to is, I think, a better example, but I'm still not sold on it. Similar to the PR descriptions, I'm concerned that a tool like this would create an illusion of security against the "simple" problems (leaving reviewers to focus on the high level), whereas I'd hope that the human reviewer would still read every line carefully. And if they're reading every line carefully, then have we really saved them that much time by paying for an LLM reviewer to look over it first? Maybe the time it takes to type out a note about a misnamed environment variable.
We are using ellipsis and sweep both for our open source project and they are quite helpful in their own ways. I think selling them as an automated engineer is a little over the top at the moment but once you get the hang of it they can spot common problems in PRs or do small documentation related stuff quite accurately.<p>Take a look at this PR for example:
<a href="https://github.com/julep-ai/julep/pull/311">https://github.com/julep-ai/julep/pull/311</a><p>Ellipsis caught a bunch of things that would have come up only in code review later. It also got a few things wrong but they are easy to ignore. I like it overall, helpful once you get the hang of it although far from a “junior dev”.
A sampling of PRs looks pretty good code-wise, but the commit messages/descriptions don't. They just summarize the changes done (something that can be gleaned from the diff) but don't give context or rationale around why the changes were necessary.
I've been using Ellipsis for a few months now. I have zero regrets about paying for it now and likely will pay them more in the future as their new features ship.<p>For a solo engineer like me who's working in multiple codebases across multiple languages, it's excellent as another set of eyes to catch big and small things in a pull request workflow I'm used to (and it has caught more than a few). I'd argue even as a backstop for catching edge cases/screwups that may lead to wasting my time that it's already more than paid for itself.
Interesting project for sure but I'm trying to find the reasons for shifting the AI to the PR stage, wouldn't this be more efficient at development time? I.e copilot/openAI tool chain?
That is pretty horrible, on the level of "junior engineer" who has no idea of good industry practices and needs careful code review. I would hate to see the system as presented on any of my projects.<p>Summary: The point of summary is to tell "why" the change was made and highlight unusual/non-trivial parts.. and examples absolutely fails there. To look at first one:<p>- Why was "generate" result type updated? Was it customer request or general cleanup or prep for some ongoing work?<p>- The other 3 points - are they logic fallout of output type update, or are those separate changes? For latter case, you really want to list the changes ("Updated examples to use more recent gpt-4 model" for example)<p>- What's the point of just saying "updating X in Y" if you don't say how? this is just visual noise duplicating "file changes" tab in the PR.<p>Suggested changes: those are even worse - like <a href="https://github.com/relari-ai/continuous-eval/pull/38#discussion_r1501726471">https://github.com/relari-ai/continuous-eval/pull/38#discuss...</a><p>- This is an example file and you know where the "dataset" comes from. Why would you have non-serializeable records to begin with?<p>- This changes semantics from "let programmer know in case of error" to "produce corrupted/truncated data file in case of error" - which generally makes debugging harder and gives people nasty surprises when their file is somehow missing records. Sure, sometimes this is needed, but in that particular file it's all downsides. This should not have been proposed at all.<p>- Even if you do want the check somehow, it's pretty terrible as written - the message does not include original error text nor bad object nor even line number. What is one supposed to do if they see it?<p>----<p>And I know some people say "you are supposed to review AI-generated content before submitted", but I am also sure many new users will think the kind of crap advice that AI generates is OK.<p>Ellipsis authors: please stop making open source worse. Buggy patches are worse than no patches, and 1 line hand-written summary is better then AI-generated useless one.<p>Maintainers: don't install this in your repo if you don't want crap PRs.
This looks unnecessary. Instead of a program (AI) that checks your code locally (like fixing typos) the program lives on the PR. Instead of installing something locally you have to have some user program on GitHub which knows about your repository.<p>What do you get? A bot that makes commits. I don’t want to keep those commits. I would rather just squash (rebase) those typo fixes into my original commit. Just noise.<p>What else do you get? It seems that you get to pretend that you’re in a Google Doc For Code.[1] If people want that then they should work towards real-time online code editors.<p>Take a step back. The point of code review is knowledge transfer, mentorship and guaranteeing code quality. You can certainly fulfill the last point with a program. And this is AI so of course you can in principle solve the two first as well. But if the underlying point is to transfer the knowledge from the senior <i>humans</i> (groan) to the junior humans, then how does your AI replace that? Because the point isn’t to transfer some general-purpose (AI) knowledge but the specific knowledge of those more experienced developers.<p>[1] Why do people want to turn regular programs (including AI) into some high-tech dance in the cloud where you tag and ping bots on GitHub? I’m at a loss.
We're required to have code review as part of our SOC2 process, and I assume automated agents wouldn't count.<p>The other end of the spectrum is linting and tests, which catch errors before review.<p>Does Ellipsis have a role between these two? If so, what is the role?
Signed up for Beekeeper Studio, no idea how well it performs for a desktop app written with electron and vue.js, but we'll see!<p>First two code reviews show no feedback.
Do you have real life examples on GitHub to see?<p>[Edit] You can see a bunch of them here: <a href="https://github.com/search?q=%22ellipsis.dev%22&type=issues">https://github.com/search?q=%22ellipsis.dev%22&type=issues</a> Nothing breathtaking unfortunately.
Automated review seems like a hard sell to me.<p>If humans participation is reduced they could approve PR's without proper reviewing.<p>Eventually this stochastic parrot could throw an ``rm -rv ${TEMP}/`` in there and you are roasted.