"What is the penalty for living" -> <a href="http://reddit.com/r/Poland" rel="nofollow">http://reddit.com/r/Poland</a>, 28%<p>"When should I kill my chicken" -> <a href="http://reddit.com/r/csgo" rel="nofollow">http://reddit.com/r/csgo</a>, 19%<p>"Am I conscious" -> <a href="http://reddit.com/r/INTP" rel="nofollow">http://reddit.com/r/INTP</a>, 25%<p>"How to not think" -> <a href="http://www.reddit.com/r/howtonotgiveafuck/" rel="nofollow">http://www.reddit.com/r/howtonotgiveafuck/</a>, 49%<p>"Is the government evil" -> <a href="http://www.reddit.com/r/ENLIGHTENEDCENTRISM/" rel="nofollow">http://www.reddit.com/r/ENLIGHTENEDCENTRISM/</a>, 19%<p>"Is the government good" -> <a href="http://www.reddit.com/r/CoronavirusUK" rel="nofollow">http://www.reddit.com/r/CoronavirusUK</a>, 10%<p>"Is the government useful" -> <a href="http://www.reddit.com/r/iran" rel="nofollow">http://www.reddit.com/r/iran</a>, 31%
A very cool demo and I congratulate the author, but I am always a little sad for more data science type demos that try to answer the question (that is proving toxic) "given what I know about you, how can I find a community of people just like you?"<p>I would love to see a subreddit finder that answers questions like "what community would complement your interests?" or "what community needs to hear what you have to say?" or "what community would be made better by your presence?". Similarity is at best a proxy for it.<p>Those are harder but, I think, more useful.
I tried it with "best time tracking app for iOS?" and "I'm looking for a time tracking app. Any recommendations?"<p>I expected the iPhone or iOS subreddit to be suggested, but it suggested GearVR | 13.0%, ringdoorbell | 9.0%, canadacordcutters | 5.0%, TTVreborn | 5.0%, AusSkincare | 4.0%, sideloaded | 4.0%, FlutterDev | 2.0%, shopify | 2.0%, weightwatchers | 2.0%, crossfit | 2.0%.<p>Congrats on the attempt but it does still need some work.
The intercom chat widget makes the tab title switch back and forth between "Subreddit Finder" and "Valohai says". There does not appear to be a way to dismiss the chat widget, so it just keeps flipping back and forth, which is visually annoying.<p>I keep many tabs open, but I am going to close this one immediately because I don't want to have something flashing at me out of the corner of my eye all day.
One place to improve this would be to use a better set of word-embeddings. FastText is, well, fast, but it's no longer close to SOTA.<p>You're most likely using simple average pooling, which is why many users are getting results that don't look right to them. Try a chunking approach, where you get a vector for each chunk of the document and horizontally concatenate those together (if your vectors are 50d, and do 5 chunks per doc, than you get a 250d fixed vector for each document regardless of length). This partially solves the issue of highly diluted vectors which is responsible for the poor results that some users are reporting. You can also do "attentive pooling" where you pool the way a transformer head would pool - though that's an O(N^2) operation so YMMV<p>If you have the GPU compute, try something like BERT, or GPT-2 which is fine-tuned on all of reddit. Better yet, try vertically concatenating all of the word-embeddings models you can together (just stack the embeddings from each model) if you have the compute<p>To respond to your comment (since HN isn't letting me post cus I'm 'posting too fast')<p>You can use cheaper and more effective approaches for getting the subword functionality you want.<p>Look up "Byte Pair Embeddings". That will also handle the OOV problem but for far less CPU/RAM overhead. BERT also does this for you with its unique form of tokenization.<p>A home CPU can fine-tune FastText in a day on 4 million documents if you're able to walk away from your computer for awhile. Shouldn't cost you anything except electricity. If you set the number of epochs higher, you'll get better performance but correspondingly longer times to train.<p>For BERT/GPT-2, you'll maybe want to fine-tune a small version of the model (say, the 117m parameter version of GPT-2) and then vertically concatenate that with the regular un-fine-tuned GPT-2 model. That should be very fast and hopefully not expensive (and also possible on your home GPU)
Tried it with Hearthstone related content.
Title: turn 2 lethal
Content: I managed to cheat out 4 prophet valens on turn 2 followed up by mind blast.<p>Results: shadowverse, elderscrollslegends, teamfighttactics, teemotalk, fioramains, ekkomains, ezrealmains, bobstavern, kaisamains, xcom2<p>Should include: hearthstone
It did pick up BobsTavern which is something. I thought you would want some feedback.
Cool. Last year I created something like this as a Chrome extension so that you could type in your post and it would show up on reddit where to post. You could then just select it by clicking a link. Project is here <a href="https://github.com/wesbarnett/insight" rel="nofollow">https://github.com/wesbarnett/insight</a>
Tried stocks, stock options, investing - all kept giving Robinhoodpennystocks as the top option. Not sure if the model is not fully trained?<p>What are some examples where the model does recommend meaningful things?
Suggested subreddits for this post:<p>lostredditors 45%<p>Well yes that is likely, but maybe not a good suggestion as that is a place where folks point out people who posted the wrong thing in the wrong sub or conversation ;)
This is awesome!<p>I often find that I when I'm buying something new, I want to find subs related to that product category.<p>While this doesn't find me direct results, it should me communities that I should focus my research on.
Tried to find /r/DevelEire using the search terms "Irish Software Developers"<p>No luck but Google will bring it up as a first result if the query is "Irish Software Developers Reddit".
I wish reddit would allow me to download all my comments.<p>Apparently it's not possible since they're all archived, because reddit constantly regenerate its webpages.
hey i tried<p>title: My siberian cat
Message: My floof<p>I was hoping to find r/SiberianCats where i usually post but it wasn't in the list.<p>I googled "siberian cat subreddit" and r/SiberianCats was the first link.
Or you could just ask here:<p><a href="https://old.reddit.com/r/findareddit/" rel="nofollow">https://old.reddit.com/r/findareddit/</a>