TechEcho

10 comments

simbolitover 1 year ago

I looked at this, and thought about it, and then I waited for an hour, and now I looked at it again, and I can't help but think this is useless.We can already weigh parts of prompts, we can already specify colors or styles for parts of the images. And even if we could not, none of this needs rich text.In the beginning I even think their comparisons are dishonest. They compare "plaintext" prompts with "rich text" prompts, but the rich text prompts contain more information. What? Like, seriously, who is surprised the following two prompts give different images?(1) "A girl with long hair sitting in a cafe, by a table with coffee on it, best quality, ultra detailed, dynamic pose."(2) "A girl with long [Richtext:orange] hair sitting in a cafe, by a table with coffee on it, best quality, ultra detailed, dynamic pose. [Footnote:The ceramic coffee cup with intricate design, a dance of earthy browns and delicate gold accents. The dark, velvety latte is in it.]"the worst part is "Font style indicates the styles of local regions". In the comparison with other methods section they actually have to specify in parentheses what each font means style-wise, because nobody knows and (let's be frank) nobody wants to learn.So why not just use these plaintext parentheses in the prompt?I really stopped myself from immediately posting my (rather negative) opinion, but after over an hour, it hasn't changed. As far as i can see, this isn't useful, rich text prompts are a gimmick.

评论 #37773780 未加载

评论 #37772557 未加载

评论 #37772377 未加载

评论 #37775603 未加载

评论 #37773632 未加载

Der_Einzigeover 1 year ago

I LOVE this.All of the techniques that they are showing have already existed for awhile in places like Automatic1111/ComfyUI or its extensions (i.e. regional prompting, attention weights). Having it connect so seamlessly with rich text is awesome and is a cool UI trick that might make normies notice it.Also, related, but NLP is extremely undertooled on the prompt engineering side. Most of the techniques here would work just fine on any LLM. If you don't believe me, read this: <a href="https://gist.github.com/Hellisotherpeople/45c619ee22aac6865ca4bb328eb58faf" rel="nofollow noreferrer">https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...</a>

评论 #37774000 未加载

littlestymaarover 1 year ago

While I don't think the rich text thing is particularly useful, I'm very impressed by the approach, especially how it manages to change the resulting image in a way you can control (that is, without regenerating the whole thing and ending up with something with random undesirable changes).The stability of the overall image during local changes makes me think that maybe this could be a key to video generation (because the biggest problem with existing diffusion-based approach for video is their instability from frame to frame).

minimaxirover 1 year ago

A relatively functionally similar approach is prompt term weighting with libraries such as compel: <a href="https://github.com/damian0815/compel">https://github.com/damian0815/compel</a>Prompt weighting alone can fix undesired aspects of an output, especially with SDXL and its dual text encoders.

评论 #37773981 未加载

pugworthyover 1 year ago

I would love to experiment with the idea of font interpretation. People can and do anthropomorphize fonts, but then they have names with meanings which might or might not be useful.For example, I'm wondering if a prompt written in Comic Sans should be turned into a comic-style illustration, or does it come out as a simplistic and childish drawing? Is a gothic font meant to imply a style of architecture, old Germanic peoples, or goth music and style?See also <a href="https://design.tutsplus.com/articles/the-psychology-of-fonts--cms-34943" rel="nofollow noreferrer">https://design.tutsplus.com/articles/the-psychology-of-fonts...</a>

评论 #37772945 未加载

atleastoptimalover 1 year ago

This is very cool, but it's gimmicky. All of the rich text could simply be a modifier before or after the word (such as an adjective or phrase). Given most LLM work is plain text, this benefit isn't as neatly transferable as prompt engineering.

评论 #37776161 未加载

评论 #37774682 未加载

90-00-09over 1 year ago

I like this idea. It could be handy to be able to focus on individual descriptions in complex prompts. Is this then mostly a "UI" feature that is being translated to a traditional prompt?(As a side note: using decorative typefaces was an unconvincing example.)

评论 #37799051 未加载

LASRover 1 year ago

How well does this work with LLMs? Anyone tried this? I am curious about the references and footnotes approach the most.

评论 #37772954 未加载

PixelForgover 1 year ago

I'm impressed by the pixel art generation, will definitely try it.

评论 #37799052 未加载

gorenbover 1 year ago

my god, i think midjourney and dalle should do this now

10 comments

simbolitover 1 year ago

评论 #37773780 未加载

评论 #37772557 未加载

评论 #37772377 未加载

评论 #37775603 未加载

评论 #37773632 未加载

Der_Einzigeover 1 year ago

评论 #37774000 未加载

littlestymaarover 1 year ago

minimaxirover 1 year ago

评论 #37773981 未加载

pugworthyover 1 year ago

评论 #37772945 未加载

atleastoptimalover 1 year ago

评论 #37776161 未加载

评论 #37774682 未加载

90-00-09over 1 year ago

评论 #37799051 未加载

LASRover 1 year ago

How well does this work with LLMs? Anyone tried this? I am curious about the references and footnotes approach the most.

评论 #37772954 未加载

PixelForgover 1 year ago

I'm impressed by the pixel art generation, will definitely try it.

评论 #37799052 未加载

gorenbover 1 year ago

my god, i think midjourney and dalle should do this now

Expressive text-to-image generation with rich text

10 comments

Expressive text-to-image generation with rich text

10 comments