TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Finetuning GPT-2 to Generate Beatles Lyrics

55 点作者 eugenhotaj超过 5 年前

4 条评论

gwern超过 5 年前
His data formatting could be improved here. Title + authors would be better off denoted somehow, like using quotes, and the separate songs should be explicitly delimited using &#x27;&lt;|endoftext|&gt;&#x27; - looking at the samples in <a href="https:&#x2F;&#x2F;github.com&#x2F;EugenHotaj&#x2F;beatles&#x2F;blob&#x2F;master&#x2F;gpt_2_generated.txt" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;EugenHotaj&#x2F;beatles&#x2F;blob&#x2F;master&#x2F;gpt_2_gene...</a> , GPT-2 does manage to mostly figure out that the songs are separate, but omitting &#x27;&lt;|endoftext|&gt;&#x27; makes it harder on GPT-2, more prone to runons (already a problem with GPT-2), and also makes prompting less effective (since you can&#x27;t prompt it like &#x27;&lt;|endoftext|&gt;&quot;On The Run&quot; by John Lennon\n&#x27; to make it generate lyrics for a specific title &amp; author). Also wouldn&#x27;t be bad if he had included the specific commands + hyperparameters for the nshepperd repo he&#x27;s apparently using, even if only the defaults along the lines of the examples in my own writeup ( <a href="https:&#x2F;&#x2F;www.gwern.net&#x2F;GPT-2" rel="nofollow">https:&#x2F;&#x2F;www.gwern.net&#x2F;GPT-2</a> ).<p>I&#x27;m not surprised that GPT-2-117M has memorized songs by the end of training, it&#x27;s not a very large corpus of songs. Hard to learn and generalize well from it. If one were working more on this, it&#x27;d probably make sense to train on a much larger and varied corpus of song (with inline metadata properly formatted to allow controllable generation); something like RapGenius, maybe?
评论 #21335212 未加载
lostmsu超过 5 年前
Or any lyrics: <a href="http:&#x2F;&#x2F;billion.dev.losttech.software:2095&#x2F;" rel="nofollow">http:&#x2F;&#x2F;billion.dev.losttech.software:2095&#x2F;</a><p>And the blog article: <a href="https:&#x2F;&#x2F;habr.com&#x2F;post&#x2F;453232&#x2F;" rel="nofollow">https:&#x2F;&#x2F;habr.com&#x2F;post&#x2F;453232&#x2F;</a> (also there&#x27;s no paywall here)
评论 #21333735 未加载
kastnerkyle超过 5 年前
Tricks in beam search to force rhyme schemes, or techniques like constrained markov chains (c.f. <a href="https:&#x2F;&#x2F;redylan.neocities.org&#x2F;#&#x2F;how-it-works&#x2F;" rel="nofollow">https:&#x2F;&#x2F;redylan.neocities.org&#x2F;#&#x2F;how-it-works&#x2F;</a> and <a href="https:&#x2F;&#x2F;github.com&#x2F;gabrielebarbieri&#x2F;markovchain" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;gabrielebarbieri&#x2F;markovchain</a>) can give really strong results in lyric &#x2F; structured text generation.<p>Might be worth investigating if you are interested in this application.
评论 #21334877 未加载
HeWhoLurksLate超过 5 年前
Tomorrow, anybody?