AI-powered conversion from Enzyme to React Testing Library

178 pointsby GavCo11 months ago

21 comments

morgante11 months ago

The Slack engineering blog[0] is more pragmatic, and shows more about how the approaches were actually combined.This is basically our whole business at grit.io and we also take a hybrid approach. We've learned a fair amount from building our own tooling and delivering thousands of customer migrations.1. Pure AI is likely to be inconsistent in surprising ways, and it's hard to iterate quickly. Especially on a large codebase, you can't interactively re-apply the full transform a bunch.2. A significant reason syntactic tools (like jscodeshift) fall down is just that most codemod scripts are pretty verbose and hard to iterate on. We ended up open sourcing our own codemod engine[1] which has its own warts, but the declarative model makes handling exceptions cases much faster.3. No matter what you do, you need to have an interactive feedback loop. We do two levels of iteration/feedback: (a) automatically run tests and verify/edit transformations based on their output, (b) present candidate files for approval / feedback and actually integrate feedback provided back into your transformation engine.[0] <a href="https://slack.engineering/balancing-old-tricks-with-new-feats-ai-powered-conversion-from-enzyme-to-react-testing-library-at-slack/" rel="nofollow">https://slack.engineering/balancing-old-tricks-with-new-feat...</a>[1] <a href="https://github.com/getgrit/gritql">https://github.com/getgrit/gritql</a>

评论 #40728125 未加载

评论 #40731601 未加载

anymouse12345611 months ago

The actual efficiency claim (which is also likely incorrect) is inverted from the original article, "We examined the conversion rates of approximately 2,300 individual test cases spread out within 338 files. Among these, approximately 500 test cases were successfully converted, executed, and passed. This highlights how effective AI can be, leading to a significant saving of 22% of developer time."Reading that leads me to believe that 22% of the conversions succeeded and someone at Slack is making up numbers about developer time.

评论 #40727035 未加载

评论 #40727101 未加载

评论 #40727270 未加载

评论 #40727201 未加载

评论 #40727015 未加载

jmull11 months ago

> saving considerable developer time of at least 22% of 10,000 hoursI wonder how much time or money it would take to just update Enzyme to support react 18? (fork, or, god forbid, by supporting development of the actual project).Nah, let's play with LLMs instead, and retask all the frontend teams in the company to rewriting unit tests to a new framework we won't support either.I guess when you're swimming in pools of money there's no need to do reasonable things.

评论 #40727612 未加载

评论 #40727236 未加载

评论 #40727183 未加载

评论 #40728956 未加载

评论 #40727339 未加载

评论 #40727456 未加载

AmalgatedAmoeba11 months ago

The conversion is between two testing libraries for React. Not to be too cynical (this sort of works seems to me like a pretty good niche for llms), but I don’t think I’d be that far off of 80% with just vim macros…

评论 #40727899 未加载

评论 #40726943 未加载

评论 #40727188 未加载

评论 #40727120 未加载

muglug11 months ago

For people unfamiliar with Enzyme and RTL, this was the basic problem:Each test made assertions about a rendered DOM from a given React component.Enzyme’s API allowed you to query a snippet of rendered DOM using a traditional selector e.g. get the text of the DOM node with id=“foo”. RTL’s API required you to say something like “get the text of the second header element”, but prevents you from using selectors.To do the transformation successfully you have to run the tests, first to render each snippet, then have some system for taking those rendered snippets and the Enzyme code that queries it and convert the Enzyme code to roughly-equivalent RTL calls.That’s what the LLM was tasked with here.

评论 #40728096 未加载

denys_potapov11 months ago

It's a 2024 webdev summary, nothing can be added:New React version made the lib obsolete, we used LLM to fix it (1/5 success rate)

评论 #40727693 未加载

jmartin268311 months ago

Sounds like a nightmare to be involved with anything that is written in react and requires 15,000 unit tests.

评论 #40727621 未加载

semanser11 months ago

I’m working on a similar project (DepsHub) where LLMs are used to make major library updates as smooth as possible. While it doesn’t work in 100% cases, it really helps to minimize all the noise while keeping your project up to date. I’m not surprised Slack decided to go this way as well.

dwringer11 months ago

It feels to me that there may be even more potential in flipping this idea around - human coders write tests to exact specifications, then an llm-using coding system evolves code until it passes the tests.

评论 #40727575 未加载

评论 #40728140 未加载

__jonas11 months ago

Seems like a reasonable approach. I wonder if it took less time than it would have taken to build some rule-based codemod script that operates on the AST, but I assume it did.

评论 #40727803 未加载

评论 #40726919 未加载

评论 #40727470 未加载

azangru11 months ago

We did this for our codebase (several hundred tests) manually, two or three years ago (the problems were already apparent with React 17). It helped that we never used Enzyme's shallow renderer, because that type of testing was already falling out of favor by late 2010s.The next fronteer is ditching jest and jsdom in favor of testing in a real browser. But I am not sure the path for getting there is clear yet in the community.

larodi11 months ago

Another proof this probabilistic stochastic approach works on the prediction/token level, but not on the semantic level, where it needs a discreet system. This essentially reminds of RAG setup and is similar in its nature.Perhaps reiterating my previous sentiment that such application of LLMs together with discreet structures brings/hides much more value than chatbots who will be soon considered mere console UI.

评论 #40727082 未加载

trescenzi11 months ago

Slightly tangential but one of the largest problems I’ve had working with React Testing Library is a huge number of tests that pass when they should fail. This might be because of me and my team misusing it but regularly a test will be written, seem like it’s testing something, and pass but if you flip the condition, or break the component it doesn’t fail as expected. I’d really worry that any mass automated, or honestly manual, method for test conversion would result in a large percentage of tests which seem to be of value but actually just pass without testing anything.

viralpraxis11 months ago

Can someone elaborate if the term “AST” is used correctly in the article?I’ve been playing with mutation-injection framework for my master’s thesis for some time. I had to use LibCST to preserve syntax information which is usually lost during AST serialization/deserialization (like whitespaces, indentation and so on). I thought that the difference between abstract and concrete trees is that it’s guaranteed CST won’t lose any information, so it can be used to specific tasks where ASTs are useless. So, did they actually use CST-based approach?

评论 #40729013 未加载

评论 #40730236 未加载

skywhopper11 months ago

Pretty misleading summary, given that LLMs played only a tiny part in the effort, and probably took more time to integrate than it saved in what is otherwise a pretty standard conversion pipeline, although I’m sure it’s heavily in the Slack engineers’ interest to go along with the AI story to please the Salesforce bosses who have mandated AI must be used in every task. Just don’t fall for the spin here, and think this will actually save you time on a similar effort.

29athrowaway11 months ago

Saving 22% of 15,000 tests is 3,300 tests.While 22% sounds low, saving yourself the effort to rewrite 3,300 tests is a good achievement.

评论 #40728151 未加载

tiffanyh11 months ago

Source article:<a href="https://slack.engineering/balancing-old-tricks-with-new-feats-ai-powered-conversion-from-enzyme-to-react-testing-library-at-slack/" rel="nofollow">https://slack.engineering/balancing-old-tricks-with-new-feat...</a>

torginus11 months ago

Just to shamelessly plug one of my old projects, I did something like this at a German industrial engineering firm - they wanted us to rewrite a huge base of old tests written in TCL into C#.It was supposed to take 6 months for 12 people.Using an AST parser I wrote a program in two weeks, that converted like half the tests flawlessly, with about another third needing minor massaging, and the rest having to be done by hand (I could've done better, by handling more corner cases, but I kinda gave up once I hit diminishing returns ).Although it helped a bunch that most tests were brain dead simple.Reaction was mixed - the newly appointed manager was kinda fuming that his first project's glory was stolen from him by an Assi, and the guys under him missed out on half a year of leisuirely work.I left a month after that, but what I heard is that they decided to pretend that my solution didn't exist on the management level, and the devs just ended up manually copypasting the output of my tool, and did a days planned work in 20 minutes, with the whole thing taking 6 months as planned.

anymouse12345611 months ago

Misleading title. Maybe try this one?"Slack uses ASTs to convert test code from Enzyme to React with 22% success rate"This article is a poor summary of the actual article, which is at least linked to Slack's engineering blog [0].[0] <a href="https://slack.engineering/balancing-old-tricks-with-new-feats-ai-powered-conversion-from-enzyme-to-react-testing-library-at-slack/" rel="nofollow">https://slack.engineering/balancing-old-tricks-with-new-feat...</a>[updated]

评论 #40727373 未加载

评论 #40728179 未加载

评论 #40731958 未加载

评论 #40727235 未加载

评论 #40728048 未加载

gjvc11 months ago

infoq has gone to pure shit

Aurornis11 months ago

This is from the actual Slack blog post:> We examined the conversion rates of approximately 2,300 individual test cases spread out within 338 files. Among these, approximately 500 test cases were successfully converted, executed, and passed. This highlights how effective AI can be, leading to a significant saving of 22% of developer time. It’s important to note that this 22% time saving represents only the documented cases where the test case passed.So the blog post says they converted 22% of tests, which they claim as saving 22% of developer time, which InfoQ interpreted as converting 80% of tests automatically?Am I missing something? Or is this InfoQ article just completely misinterpreting the blog post it’s supposed to be reporting on?The topic itself is interesting, but between all of the statistics games and editorializing of the already editorialized blog post, it feels like I’m doing heavy work just to figure out what’s going on.

评论 #40728246 未加载

评论 #40728387 未加载

评论 #40728039 未加载

评论 #40728791 未加载

21 comments

morgante11 months ago

评论 #40728125 未加载

评论 #40731601 未加载

anymouse12345611 months ago

评论 #40727035 未加载

评论 #40727101 未加载

评论 #40727270 未加载

评论 #40727201 未加载

评论 #40727015 未加载

jmull11 months ago

评论 #40727612 未加载

评论 #40727236 未加载

评论 #40727183 未加载

评论 #40728956 未加载

评论 #40727339 未加载

评论 #40727456 未加载

AmalgatedAmoeba11 months ago

评论 #40727899 未加载

评论 #40726943 未加载

评论 #40727188 未加载

评论 #40727120 未加载

muglug11 months ago

评论 #40728096 未加载

denys_potapov11 months ago

It's a 2024 webdev summary, nothing can be added:New React version made the lib obsolete, we used LLM to fix it (1/5 success rate)

评论 #40727693 未加载

jmartin268311 months ago

Sounds like a nightmare to be involved with anything that is written in react and requires 15,000 unit tests.

评论 #40727621 未加载

semanser11 months ago

dwringer11 months ago

评论 #40727575 未加载

评论 #40728140 未加载

__jonas11 months ago

Seems like a reasonable approach. I wonder if it took less time than it would have taken to build some rule-based codemod script that operates on the AST, but I assume it did.

评论 #40727803 未加载

评论 #40726919 未加载

评论 #40727470 未加载

azangru11 months ago

larodi11 months ago

评论 #40727082 未加载

trescenzi11 months ago

viralpraxis11 months ago

评论 #40729013 未加载

评论 #40730236 未加载

skywhopper11 months ago

29athrowaway11 months ago

Saving 22% of 15,000 tests is 3,300 tests.While 22% sounds low, saving yourself the effort to rewrite 3,300 tests is a good achievement.