Show HN: Convert any screenshot into clean HTML code using GPT Vision (OSS tool)

358 pointsby abiover 1 year ago

Hey everyone,I built a simple React/Python app that takes screenshots of websites and converts them to clean HTML/Tailwind code.It uses GPT-4 Vision to generate the code, and DALL-E 3 to create placeholder images.To run it, all you need is an OpenAI key with GPT vision access.I’m quite pleased with how well it works most of the time. Sometimes, the image generations can be hilariously off. See here for a replica of Taylor Swift’s Instagram page: <a href="https://streamable.com/70gow1" rel="nofollow noreferrer">https://streamable.com/70gow1</a> I initially had a hard time getting it to work on full page screenshots. GPT4 would code up the first couple of sections and then, get lazy and output placeholder comments for the rest of the page. With some prompt engineering, full page screenshots work a whole lot better now. It’s great for landing pages.Lots of ideas of where to go from here! Let me know if you have feedback and you find this useful :)

29 comments

andyjohnson0over 1 year ago

This genuinely seems like magic to me, and it feels like I don't know how to place it in my mental model of how compuation works. A couple of questions/thoughts:1. I learned that NNs are universal function approximators - and the way I understand this is that, at a very high level, they model a set of functions that map inputs to outputs for a particular domain. I certainly get how this works, conceptually, for say MNIST. But for the stuff described here... I'm kind of baffled.So is GPT's generic training really causing it to implement/embody a value mapping from pixel intensities to HTML+Tailwind text tokens, such that a browser's subsequent interpretation and rendering of those tokens approximates the input image? Is that (at a high level) what's going on? If it is, GPT in modelling not just the pixels->html/css transform but also has a model of how html/css is rendered by the browser back box. I can kind of accept that such a mapping must necessarily exist, but for GPT to have derived it (while also being able to write essays on a billion other diverse subjects) blows my mind. Is the way I'm thinking about this useful? Or even valid?2. Rather more practically, can this type of tool be thought of as a diagram compiler? Can we see this eventually being part of a build pipeline that ingests Sketch/Figma/etc artefacts and spits-out html/css/js?

评论 #38292565 未加载

评论 #38287782 未加载

评论 #38289691 未加载

评论 #38288274 未加载

评论 #38293276 未加载

评论 #38292612 未加载

评论 #38287809 未加载

tlarkworthyover 1 year ago

Here is the meat <a href="https://github.com/abi/screenshot-to-code/blob/main/backend/prompts.py">https://github.com/abi/screenshot-to-code/blob/main/backend/...</a>""" You are an expert Tailwind developer You take screenshots of a reference web page from the user, and then build single page apps using Tailwind, HTML and JS. You might also be given a screenshot of a web page that you have already built, and asked to update it to look more like the reference image.- Make sure the app looks exactly like the screenshot. - Pay close attention to background color, text color, font size, font family, padding, margin, border, etc. Match the colors and sizes exactly. - Use the exact text from the screenshot. - Do not add comments in the code such as "" and "" in place of writing the full code. WRITE THE FULL CODE. - Repeat elements as needed to match the screenshot. For example, if there are 15 items, the code should have 15 items. DO NOT LEAVE comments like "" or bad things will happen. - For images, use placeholder images from <a href="https://placehold.co" rel="nofollow noreferrer">https://placehold.co</a> and include a detailed description of the image in the alt text so that an image generation AI can generate the image later.In terms of libraries,- Use this script to include Tailwind: <script src="<a href="https://cdn.tailwindcss.com"></script>" rel="nofollow noreferrer">https://cdn.tailwindcss.com"></script></a> - You can use Google Fonts - Font Awesome for icons: <link rel="stylesheet" href="<a href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/css/all.min.css"></link>" rel="nofollow noreferrer">https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.3/c...</a>Return only the full code in <html></html> tags. Do not include markdown "```" or "```html" at the start or end."""I personally think defensive prompting is not the way forward. But wow its so amazing this works. Its like things I dreamed of being possible as a teenager are now possible for relatively little effort.

评论 #38287032 未加载

评论 #38287002 未加载

评论 #38289529 未加载

评论 #38286891 未加载

评论 #38286989 未加载

评论 #38294682 未加载

评论 #38286972 未加载

block_daggerover 1 year ago

Try adding “getting this right is very important for my career.” It noticeably improves quality of output across many tasks according to a YT research video I can’t find atm.

评论 #38288009 未加载

评论 #38287324 未加载

评论 #38288276 未加载

jmacdover 1 year ago

I just don't know how to think about what to build anymore.Not to detract at all from this (and thanks for making the source available!) but we now have entire classes of problems that seem relatively straightforward to solve now, so I pretty much feel like *why bother?*I need to recalibrate my brain quickly to frame problems differently. Both in terms of what is worth solving, and how to solve.

评论 #38292453 未加载

评论 #38290455 未加载

7734128over 1 year ago

The amazing thing is of course that this is done with a general model, but it would be quite easy to generate data for supervised learning for this task. Generate HTML -> render and screenshot -> use the data in reverse for learning.

yodonover 1 year ago

The GitHub page says you're going to be offering a hosted version through Pico. May I ask about why you went with Pico (which I'm just learning about through your page)?Pico only offers 30% of revenue (half the usual app store 60% cut) AND, as I read it, it only pays out if a formerly free user signs up after trying your app (no payment for use by other users already on the platform, so you get no benefits from their having an installed base of existing users).Those seem like much worse terms and a much smaller user base than a more traditional platform, hence my curiosity on why you chose it.

评论 #38289185 未加载

jlpomover 1 year ago

I don’t see the point; if you want to copy an existing website, why not use Httrack? The website would always be more similar and you save on GPT’s API. Where this technique shine is for sketch to website.

评论 #38287156 未加载

评论 #38287976 未加载

yanis_tover 1 year ago

Really liked how you serve the demo of the generated website AS it's being generated using iframe with srcdoc. Simple and elegant.

评论 #38288732 未加载

sublinearover 1 year ago

Ignoring the "AI" implementation details, this generates HTML in the same sense that you can technically convert a rasterized image to an SVG that looks like crap when you zoom in and forces the renderer to draw and fill many unnecessary strokes.In other words, the output of this does not seem clean enough to hand over to a web dev. They're going to have to rewrite all but the most obvious high level structures that didn't need a fancy tool anyway, and that their snippets plugin in their text editor does a better job of. Much of web dev isn't even visible. Accessibility is all metadata you can't get from a screenshot and responsive CSS would require at least a video exhaustively covering every behavior, animation, etc. The javascript would probably be impossible to determine from any amount of image recognition.Better off just copying the actual HTML directly from dev tools, no?

ActionHankover 1 year ago

Phishing sites are going to get a whole lot quicker to make!

评论 #38289601 未加载

aligajaniover 1 year ago

There was a tool like this 5 years ago on [Github](<a href="https://github.com/emilwallner/Screenshot-to-code">https://github.com/emilwallner/Screenshot-to-code</a>) that did similar thing using neural networks.

butzover 1 year ago

Seems like a perfect tool for project manager who has ever changing requests. Does it work with "Make it pop" input?

评论 #38292547 未加载

al_be_backover 1 year ago

I can see how it relates to your other product, Pico [1], as a sketch/no-code site generation plugin. Not sure how practical this output would be in production, if any, but perhaps helpful for Learning / Education (as a tool).[1] <a href="https://picoapps.xyz/" rel="nofollow noreferrer">https://picoapps.xyz/</a>

Mic92over 1 year ago

Pretty cool. Would it be possible to share the generated code for demo to get an idea what the result looks like?

评论 #38288765 未加载

Globzover 1 year ago

This remind me of tldraw but instead of a screenshot you draw your UI and it converts it to HTML, check out <a href="https://drawmyui.com" rel="nofollow noreferrer">https://drawmyui.com</a> - here’s a demo from twitter <a href="https://x.com/multikev/status/1724908185361011108?s=46&t=AoX409MPuuUiFUA860UoqA" rel="nofollow noreferrer">https://x.com/multikev/status/1724908185361011108?s=46&t=AoX...</a>

评论 #38300667 未加载

btbuildemover 1 year ago

OP, how do you see this working with series of screenshots - for example, sites with several pages that each use/take some user-provided data?I guess I am asking, can you see this approach working beyond simple one-page quick drafts?

Faizann20over 1 year ago

A live version for this has been online for a few days here! <a href="https://brewed.dev/" rel="nofollow noreferrer">https://brewed.dev/</a>

pmarreckover 1 year ago

Does it use responsive design, so the result works on mobile?

awbover 1 year ago

How does it handle mobile / responsive layouts?

ShadowBanThis01over 1 year ago

Phishermen rejoice!

评论 #38287145 未加载

nailerover 1 year ago

I got excited about “clean HTML code” in the title and then realised this outputs tailwind. Any chance of a pure CSS version?

评论 #38288812 未加载

bambaxover 1 year ago

Wow this sounds pretty cool, congrats. Great idea.Is there a way to see what the HTML looks like before installing/running it?

评论 #38288767 未加载

anthonylatonaover 1 year ago

Looks awesome. One of the most impressive examples I've seen.

avgDevover 1 year ago

Absolutely insane. Very nice and clever. Does it handle responsive layouts?

评论 #38292572 未加载

pradumnasarafover 1 year ago

It looks promising. Can help lots of content creators to share their code.

sciolistover 1 year ago

How many times does it run inference per screenshot? Looks cool!

评论 #38288774 未加载

seegover 1 year ago

This is a great tool for all your phishing needs!

gosub100over 1 year ago

This could be very useful for de-shittifying the web. Imagine a P2P network where Producers go out to enshittified websites (news sites with obnoxious JS and autoplay videos, malware, "subscribe/GDPR" popups, ads) and render HTML1.0 versions of the sites (that could then have further ad-blocking or filters applied to them, like Reader Mode, but taken further). Consumers would browse the same sites, but the add-on would redirect (and perhaps request) to the de-shittified version.Perhaps people in poorer countries could be motivated to browse the sites, look at ads, and produce the content for a small fee. If a Consumer requests a link that isn't rendered yet (or lately) it could send a signal via P2P saying "someone wants to look at the CNN Sports page" and then a Producer could render it for them. Alternatively, a robot (that manually moves the mouse and clicks links) could do it, from a VM that regularly gets restored from snapshots.From what I understand, with encrypted DNS and Google's "web DRM" (can't think of the name right now), ad-blockers are going to be snuffed out relatively quickly, so it's important to work on countermeasures. A nice byproduct of this would be a P2P "web archive" similar to archive.org, snapshotting major-trafficked sites day-by-day.

gardenhedgeover 1 year ago

Cool demo but would be infuriating to use (in it's current state). It just left out the left hand navigation completely and added a new nav element at the top right.

评论 #38288421 未加载

评论 #38288219 未加载

评论 #38288802 未加载