TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

TextDiffuser-2: Unleashing the power of language models for text rendering

154 pointsby bx376over 1 year ago

6 comments

Orasover 1 year ago
The problem, from the paper:<p>&gt; Several methods alleviated this issue by incorporating explicit text position and content as guidance on where and what text to render. However, these methods still suffer from several drawbacks, such as limited flexibility and automation, constrained capability of layout prediction, and restricted style diversity.<p>Looking at the diagram provided, they use GPT-4 to suggest the position following the text prompt.<p>I see it as very useful for making sure to have the text in the right position without doing manual work trying to find the right position. I&#x27;m not an expert, but doesn&#x27;t this method add another cost and overhead for calling Text-to-Image models?
评论 #38734389 未加载
whywhywhywhyover 1 year ago
Does Midjourney v6 use something similar to this because they both have a weird look to the text like amateurishly photoshopped look where it’s almost has different aliasing to the rest of the image looking not truly integrated.<p>Impressive it’s legible but some work is needed to get it to normal production quality.
marbanover 1 year ago
Recent comparison of what&#x27;s out there: <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;StableDiffusion&#x2F;comments&#x2F;18o1ole&#x2F;apparently_not_even_midjourney_v6_launched_today&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;StableDiffusion&#x2F;comments&#x2F;18o1ole&#x2F;ap...</a>
blixtover 1 year ago
It’s very smart, though using bounding boxes will most likely limit it to 2D contexts (and some head-on 3D contexts) since the text won’t follow the bounding box when perspective is involved. I’m sure it can be improved to support bounds that have 3D transforms though.
grorkover 1 year ago
I’m assuming the type foundry legal departments are getting ready to come for the image generators when they find out their typefaces have been vacuumed up and are now generating new content without licensing the typeface for use?
评论 #38736601 未加载
opjjfover 1 year ago
Kind of crazy that they use Breath of the Wild even in the examples. Why can this be generated if not by obviously stealing Nintendo IP?