TechEcho

6 comments

Orasover 1 year ago

The problem, from the paper:<p>> Several methods alleviated this issue by incorporating explicit text position and content as guidance on where and what text to render. However, these methods still suffer from several drawbacks, such as limited flexibility and automation, constrained capability of layout prediction, and restricted style diversity.<p>Looking at the diagram provided, they use GPT-4 to suggest the position following the text prompt.<p>I see it as very useful for making sure to have the text in the right position without doing manual work trying to find the right position. I'm not an expert, but doesn't this method add another cost and overhead for calling Text-to-Image models?

评论 #38734389 未加载

whywhywhywhyover 1 year ago

Does Midjourney v6 use something similar to this because they both have a weird look to the text like amateurishly photoshopped look where it’s almost has different aliasing to the rest of the image looking not truly integrated.<p>Impressive it’s legible but some work is needed to get it to normal production quality.

marbanover 1 year ago

Recent comparison of what's out there: <a href="https://www.reddit.com/r/StableDiffusion/comments/18o1ole/apparently_not_even_midjourney_v6_launched_today/" rel="nofollow noreferrer">https://www.reddit.com/r/StableDiffusion/comments/18o1ole/ap...</a>

blixtover 1 year ago

It’s very smart, though using bounding boxes will most likely limit it to 2D contexts (and some head-on 3D contexts) since the text won’t follow the bounding box when perspective is involved. I’m sure it can be improved to support bounds that have 3D transforms though.

grorkover 1 year ago

I’m assuming the type foundry legal departments are getting ready to come for the image generators when they find out their typefaces have been vacuumed up and are now generating new content without licensing the typeface for use?

评论 #38736601 未加载

opjjfover 1 year ago

Kind of crazy that they use Breath of the Wild even in the examples. Why can this be generated if not by obviously stealing Nintendo IP?

6 comments

Orasover 1 year ago

评论 #38734389 未加载

whywhywhywhyover 1 year ago

marbanover 1 year ago

blixtover 1 year ago

grorkover 1 year ago

评论 #38736601 未加载

opjjfover 1 year ago

Kind of crazy that they use Breath of the Wild even in the examples. Why can this be generated if not by obviously stealing Nintendo IP?

TextDiffuser-2: Unleashing the power of language models for text rendering

6 comments

TextDiffuser-2: Unleashing the power of language models for text rendering

6 comments