Continuing on with my "old man yells at cloud" meme of late, here's my hot take:<p>So let me get this straight- we are going to train AI models to perform screen recognition of some kind (so it can ascertain layout and detect the "important" ui elements), and additionally ask that AI to OCR all text on the screen so it has some hope of being able to follow some natural language instructions (OCR being a task which, as a HN thread a day or two ago pointed out, AI is <i>exceedingly</i> bad at), and then we're going to be able to tell this non-deterministic prediction engine what we want to do with our software, and it's just going to do it?<p>Like Homer Simpson's button pressing birdie toy? :smackshead:<p>Why do I have reservations about letting a non-deterministic AI agent run my software?<p>Why not expose hooks in some common format for our software to perform common tasks? We could call it an "application programming interface". We might even insist on some kind of common data interchange format. I hear all the cool people are into EBCDIC nowadays.<p>Then we could build a robust and deterministic tool to automate our workflows. It could even pass structured data between unrelated applications in a secure manner. Then we could be sure that the AI Agent will hit the "save the world" button instead of the "kill all humans" button 100% of the time.<p>On a serious note, we should study various macro recording implementations, to at least have a baseline of what people have been successfully doing for 40+ odd years to automate their workflows, and <i>then</i> come up with an idea that doesn't involve investing in a new computer, gpu, and slowly boiling the oceans.<p>This reeks of a solution in search of a problem. And the solution has the added benefit of being inefficient and unreliable. But, people don't get billion dollar valuations for macro recorders.<p>Is this what they meant by "worse is better"?<p>Edit: and for the love of FSM, please <i>do not</i> expose any new automation APIs to the network.