TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Train a language model to talk like you

209 pointsby MasterScratover 5 years ago

17 comments

MasterScratover 5 years ago
You may have seen my recent post about [Chatistics: a Python tool to parse your Messenger&#x2F;Hangouts&#x2F;WhatsApp&#x2F;Telegram chat logs into DataFrames](<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22069699" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22069699</a>).<p>This notebook uses the exported chat logs to train a simple GPT&#x2F;GPT2 conversational model! It uses Google Colab, a notebook platform that allows you to train complex models online for free.<p>The approach is super simple: it takes all your chat logs, turns them into this format:<p>&gt; &lt;speaker1&gt; Hi<p>&gt; &lt;speaker2&gt; Hey - how are you?<p>&gt; &lt;speaker1&gt; Great, thanks!<p>&gt; ...<p>...then simply trains a GPT model on this corpus. In practice, I found that the default parameters (including using GPT and not GPT2) give the best resources for this setup.<p>This notebook will be part of our workshop &quot;Meet your Artificial Self&quot; happening this Saturday at AMLD 2020 in Lausanne, Switzerland: <a href="https:&#x2F;&#x2F;appliedmldays.org&#x2F;workshops&#x2F;meet-your-artificial-self-generate-text-that-sounds-like-you" rel="nofollow">https:&#x2F;&#x2F;appliedmldays.org&#x2F;workshops&#x2F;meet-your-artificial-sel...</a><p>Feedback is welcome! :D
评论 #22113395 未加载
capablewebover 5 years ago
I got a bit tricked by the title here on HN. Maybe we can replace `talk` with `write`? Thought this was something that could learn how I speak and could generate sound from that, but seems to just be able written language, which is not nearly as interesting (for me).
评论 #22113427 未加载
评论 #22110982 未加载
arethuzaover 5 years ago
I&#x27;m disappointed that this is about typed text rather than actual talking - I had hoped that training something that talked like me might assist technology vendors in actually creating voice recognition technology that works for me.<p>And yes my problems with voice recognition are probably due to my Scottish accent.... ;-)
Tenokeover 5 years ago
I&#x27;ve been playing with training different sizes[0] of gpt on my own chat data precisely for this reason.<p>Coincidentally, today I was even planning to publish my last post and notebook for training gpt2-1.5b and then chatting to oneself with the model. I left it for tomorrow though.. Maybe a mistake.<p>There is quite a lot you can do and talking to my trained model which is responding to me as me can be real weird at times. It&#x27;s definitely the most engaged Ive been with gpt while talking to myself.<p>Having said that you seem to train here on very little. Still - cool demo.<p>[0] <a href="https:&#x2F;&#x2F;svilentodorov.xyz&#x2F;blog&#x2F;gpt-345M-finetune&#x2F;" rel="nofollow">https:&#x2F;&#x2F;svilentodorov.xyz&#x2F;blog&#x2F;gpt-345M-finetune&#x2F;</a>
评论 #22113277 未加载
评论 #22113449 未加载
perturbationover 5 years ago
This is cool - might be worth training a simple discriminator model to identify <i>your</i> utterances, and then you can use the plug-and-play language model (PPLM - <a href="https:&#x2F;&#x2F;github.com&#x2F;huggingface&#x2F;transformers&#x2F;blob&#x2F;master&#x2F;examples&#x2F;pplm&#x2F;run_pplm.py" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;huggingface&#x2F;transformers&#x2F;blob&#x2F;master&#x2F;exam...</a>) to generate utterances modeling a specific speaker without special tokens. Could also take less time to fine-tune.
the-dudeover 5 years ago
I totally missed that Lyrebird was acquired : <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21006405" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=21006405</a>
data_dersover 5 years ago
My curiosity is tempered by the fact that I&#x27;ve seen this episode of Black Mirror before... :)<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Be_Right_Back" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Be_Right_Back</a>
评论 #22110663 未加载
bryanrasmussenover 5 years ago
A computer trained to talk like me would spend a lot of time swearing and whining about how it can&#x27;t take it anymore, which I admit would be pretty funny.
raidicyover 5 years ago
This is part of a workshop series[0]. Does anyone know if the talks&#x2F;shops will be recorded?<p>[0]<a href="https:&#x2F;&#x2F;appliedmldays.org&#x2F;workshops" rel="nofollow">https:&#x2F;&#x2F;appliedmldays.org&#x2F;workshops</a>
评论 #22112130 未加载
thisisastopsignover 5 years ago
I’ve never used PyTorch before... is this running within my local machine, or is there some API in here that’s also sending data to Google to also train their models? Asking a privacy point-of-view..
评论 #22107912 未加载
评论 #22108091 未加载
woefulregretover 5 years ago
throwaway, duh.<p>When I was a teenager I wrote a very graphic and very disturbing work of fiction that was archived on a popular erotica text website.. I have had anxiety for many years now that eventually someone will glue the authorship of that story to my identity.. If people in my real life discover my fantasies from years back because of my writing signature, I do not want to guess where that will leave me.. I am not looking forward to the future!!
fudged71over 5 years ago
Could you train this on a Q&amp;A&#x2F;FAQ corpus and get somewhat relevant results? (And is there any better tool for doing this?)
评论 #22110459 未加载
MadWombatover 5 years ago
Oh, oobee doo<p>I wanna be like you<p>I wanna walk like you, talk like you, too<p>You&#x27;ll see it&#x27;s true someone like me<p>Can learn to be like someone like you
alfonsodevover 5 years ago
This is going to be useful for when we fully turn into cyborgs.
评论 #22110926 未加载
nicksterover 5 years ago
I wonder if they are using this in Android Messenger or Gmail for the suggested responses.
评论 #22111731 未加载
heybrandonsover 5 years ago
Thanks for sharing MasterScrat! This looks fun!
brainzapover 5 years ago
Train it on Fred Rogers