TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Train a language model to talk like you

209 点作者 MasterScrat超过 5 年前

17 条评论

MasterScrat超过 5 年前
You may have seen my recent post about [Chatistics: a Python tool to parse your Messenger&#x2F;Hangouts&#x2F;WhatsApp&#x2F;Telegram chat logs into DataFrames](<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22069699" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22069699</a>).<p>This notebook uses the exported chat logs to train a simple GPT&#x2F;GPT2 conversational model! It uses Google Colab, a notebook platform that allows you to train complex models online for free.<p>The approach is super simple: it takes all your chat logs, turns them into this format:<p>&gt; &lt;speaker1&gt; Hi<p>&gt; &lt;speaker2&gt; Hey - how are you?<p>&gt; &lt;speaker1&gt; Great, thanks!<p>&gt; ...<p>...then simply trains a GPT model on this corpus. In practice, I found that the default parameters (including using GPT and not GPT2) give the best resources for this setup.<p>This notebook will be part of our workshop &quot;Meet your Artificial Self&quot; happening this Saturday at AMLD 2020 in Lausanne, Switzerland: <a href="https:&#x2F;&#x2F;appliedmldays.org&#x2F;workshops&#x2F;meet-your-artificial-self-generate-text-that-sounds-like-you" rel="nofollow">https:&#x2F;&#x2F;appliedmldays.org&#x2F;workshops&#x2F;meet-your-artificial-sel...</a><p>Feedback is welcome! :D
评论 #22113395 未加载
capableweb超过 5 年前
I got a bit tricked by the title here on HN. Maybe we can replace `talk` with `write`? Thought this was something that could learn how I speak and could generate sound from that, but seems to just be able written language, which is not nearly as interesting (for me).
评论 #22113427 未加载
评论 #22110982 未加载
arethuza超过 5 年前
I&#x27;m disappointed that this is about typed text rather than actual talking - I had hoped that training something that talked like me might assist technology vendors in actually creating voice recognition technology that works for me.<p>And yes my problems with voice recognition are probably due to my Scottish accent.... ;-)
Tenoke超过 5 年前
I&#x27;ve been playing with training different sizes[0] of gpt on my own chat data precisely for this reason.<p>Coincidentally, today I was even planning to publish my last post and notebook for training gpt2-1.5b and then chatting to oneself with the model. I left it for tomorrow though.. Maybe a mistake.<p>There is quite a lot you can do and talking to my trained model which is responding to me as me can be real weird at times. It&#x27;s definitely the most engaged Ive been with gpt while talking to myself.<p>Having said that you seem to train here on very little. Still - cool demo.<p>[0] <a href="https:&#x2F;&#x2F;svilentodorov.xyz&#x2F;blog&#x2F;gpt-345M-finetune&#x2F;" rel="nofollow">https:&#x2F;&#x2F;svilentodorov.xyz&#x2F;blog&#x2F;gpt-345M-finetune&#x2F;</a>
评论 #22113277 未加载
评论 #22113449 未加载
perturbation超过 5 年前
This is cool - might be worth training a simple discriminator model to identify <i>your</i> utterances, and then you can use the plug-and-play language model (PPLM - <a href="https:&#x2F;&#x2F;github.com&#x2F;huggingface&#x2F;transformers&#x2F;blob&#x2F;master&#x2F;examples&#x2F;pplm&#x2F;run_pplm.py" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;huggingface&#x2F;transformers&#x2F;blob&#x2F;master&#x2F;exam...</a>) to generate utterances modeling a specific speaker without special tokens. Could also take less time to fine-tune.
the-dude超过 5 年前
I totally missed that Lyrebird was acquired : <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21006405" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21006405</a>
data_ders超过 5 年前
My curiosity is tempered by the fact that I&#x27;ve seen this episode of Black Mirror before... :)<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Be_Right_Back" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Be_Right_Back</a>
评论 #22110663 未加载
bryanrasmussen超过 5 年前
A computer trained to talk like me would spend a lot of time swearing and whining about how it can&#x27;t take it anymore, which I admit would be pretty funny.
raidicy超过 5 年前
This is part of a workshop series[0]. Does anyone know if the talks&#x2F;shops will be recorded?<p>[0]<a href="https:&#x2F;&#x2F;appliedmldays.org&#x2F;workshops" rel="nofollow">https:&#x2F;&#x2F;appliedmldays.org&#x2F;workshops</a>
评论 #22112130 未加载
thisisastopsign超过 5 年前
I’ve never used PyTorch before... is this running within my local machine, or is there some API in here that’s also sending data to Google to also train their models? Asking a privacy point-of-view..
评论 #22107912 未加载
评论 #22108091 未加载
woefulregret超过 5 年前
throwaway, duh.<p>When I was a teenager I wrote a very graphic and very disturbing work of fiction that was archived on a popular erotica text website.. I have had anxiety for many years now that eventually someone will glue the authorship of that story to my identity.. If people in my real life discover my fantasies from years back because of my writing signature, I do not want to guess where that will leave me.. I am not looking forward to the future!!
fudged71超过 5 年前
Could you train this on a Q&amp;A&#x2F;FAQ corpus and get somewhat relevant results? (And is there any better tool for doing this?)
评论 #22110459 未加载
MadWombat超过 5 年前
Oh, oobee doo<p>I wanna be like you<p>I wanna walk like you, talk like you, too<p>You&#x27;ll see it&#x27;s true someone like me<p>Can learn to be like someone like you
alfonsodev超过 5 年前
This is going to be useful for when we fully turn into cyborgs.
评论 #22110926 未加载
nickster超过 5 年前
I wonder if they are using this in Android Messenger or Gmail for the suggested responses.
评论 #22111731 未加载
heybrandons超过 5 年前
Thanks for sharing MasterScrat! This looks fun!
brainzap超过 5 年前
Train it on Fred Rogers