Show HN: AskHN

612 pointsby kvhover 2 years ago

43 comments

setgreeover 2 years ago

As is often true of GPT responses, there's some nonsense interspersed here, e.g. the claim that R has "a more mature package universe" than Python.I think this is false, but if you're reading quickly, it sounds cogent enough. As Sarah Constantin observed about GPT2 [0]:> if you skim text, you miss obvious absurdities. The point is OpenAI HAS achieved the ability to pass the Turing test against humans on autopilot...The mental motion of “I didn’t really parse that paragraph, but sure, whatever, I’ll take the author’s word for it” is, in my introspective experience, absolutely identical to “I didn’t really parse that paragraph because it was bot-generated and didn’t make any sense so I couldn’t possibly have parsed it”, except that in the first case, I assume that the error lies with me rather than the text. This is not a safe assumption in a post-GPT2 world. Instead of “default to humility” (assume that when you don’t understand a passage, the passage is true and you’re just missing something) the ideal mental action in a world full of bots is “default to null” (if you don’t understand a passage, assume you’re in the same epistemic state as if you’d never read it at all.)[0] <a href="https://www.skynettoday.com/editorials/humans-not-concentrating" rel="nofollow">https://www.skynettoday.com/editorials/humans-not-concentrat...</a>

评论 #34902651 未加载

评论 #34903825 未加载

评论 #34911684 未加载

评论 #34905021 未加载

评论 #34904710 未加载

have_faithover 2 years ago

It seems to write in the generic "style" of GPT, instead of in the style I would recognise as a HN poster. Is that because of something baked into how the training process works? It lacks a sort of casualness or air of superiority ;)

评论 #34898565 未加载

评论 #34899166 未加载

评论 #34900539 未加载

评论 #34901449 未加载

评论 #34900977 未加载

georgelyonover 2 years ago

Am I correct in understanding that this doesn't actually generate answers based on HN, but instead finds semantically-near comments and sends them verbatim to GPT to summarize? Seems like a good enough hack, though I'd love a detailed writeup of how to actually specialize an existing LLM with additional training data (like HN).

评论 #34902053 未加载

评论 #34901097 未加载

评论 #34898530 未加载

评论 #34902417 未加载

flappysolutionsover 2 years ago

I'm quite pleasantly surprised by this response:Question: what are the biggest fallacies of people that post on hackernews?Answer: One of the biggest fallacies of people that post on Hacker News is the nirvana fallacy, which is when someone jumps from "thing x is imperfect" to "thing x is worthless" without considering any of the benefits. People also tend to be familiar with certain technologies and will discount the flaws of existing technology vs. the flaws of new technology. Another fallacy is the assumption that there is one best tool for doing things - the one that supports the largest amount of use cases. Additionally, people may use appeals to authority, invent their own solutions rather than use existing ones, and assume that the approaches they first learned are most correct. Finally, people may assume open source software is a gift and that dependencies don't need to be tested.

评论 #34905453 未加载

评论 #34903037 未加载

LeoPantheraover 2 years ago

I'm a little surprised that Hacker News comments weren't already in the GPT-3 training set. I just assumed that OpenAI had vacuumed up most of the web already.

评论 #34898233 未加载

jerpintover 2 years ago

Nice! We built something very similar recently, it is more like "Ask your documentation" but very similar implementations otherwise.See a demo on the huggingface transformers documentation: <a href="https://huggingface.co/spaces/jerpint/buster" rel="nofollow">https://huggingface.co/spaces/jerpint/buster</a>code: <a href="https://github.com/jerpint/buster">https://github.com/jerpint/buster</a>

评论 #34903204 未加载

评论 #34904168 未加载

boywitharupeeover 2 years ago

For those who are wondering,HN data is indexed with embeddings for semantic search. When queried, it finds closest article, top comments and summarizes with GPT-3.GPT-3 serves as a rendering tool for compressed comments.

btbuildemover 2 years ago

My own experiments made me think that the impact of finetuning is comparable to that of a molecule in a drop in a bucket.> “AskHN” is a GPT-3 bot I trained on a corpus of over 6.5 million Hacker News comments to represent the collective wisdom of the HN community in a single bot.I'm assuming you used the openai fine-tuning pathway to make a custom model?Have you tested the responses on vanilla GPT3 vs your custom model?I'd be curious to see the comparison.

评论 #34900605 未加载

评论 #34900325 未加载

SilverBirchover 2 years ago

This might be a dumb question, but is this based on the collective wisdom of HN. Because I would say that the collective wisdom is just as much in the interaction of the comments and the ranking of those comments as it is in the comments themselves. If you just injest all the comments wholesale, aren't you rather getting the average wisdom of HN?

评论 #34899948 未加载

fleddrover 2 years ago

Let's admit that HN's culture is that many of us are confidentially wrong, which we cover up with impressive technical jargon. As such, any wrong answer in this AI is in fact correct.

评论 #34905290 未加载

MikeTheRockerover 2 years ago

I love this! I used to append "reddit" to my Google search queries to get best results, but the quality of dialog over there has really dropped in recent years. These days I've switched to appending "hackernews", but this is even better.

评论 #34902540 未加载

评论 #34900106 未加载

monkeydustover 2 years ago

Nice work! Been playing with Langchain and was not aware of patterns.app.This whole space is moving so fast its hard to keep up for someone whos immediate day job doesn't revolve around this space. Congrats.

egypturnashover 2 years ago

Is there a way to opt out of one's comments being used for this?

评论 #34898897 未加载

评论 #34899988 未加载

评论 #34898372 未加载

评论 #34898615 未加载

评论 #34898949 未加载

评论 #34898368 未加载

PetrBrzyBrzekover 2 years ago

Hi, thanks for the interesting article. I have a question about Pinecone. What is the cost of storing all these vectors?

kerogerokeroover 2 years ago

Anyone here know how to generate subtitles automatically using AI when a video is playing on the web?Was planning to see how I can build something like these sites but without the need to regularly update the subtitles catalog: <a href="https://subscene.be" rel="nofollow">https://subscene.be</a> <a href="https://subtitlecat.com" rel="nofollow">https://subtitlecat.com</a> <a href="https://subtitletrans.com" rel="nofollow">https://subtitletrans.com</a> <a href="https://my-subs.co" rel="nofollow">https://my-subs.co</a>

评论 #34906773 未加载

la64710over 2 years ago

Is there any LLM model that can be self hosted and fed a corpus of data to ingest for question answering? The part I find difficult is how to feed (not train) the open LLM models with entire dataset not available to public?

评论 #34900961 未加载

renewiltordover 2 years ago

ChatGPT and friends always talk like those Microsoft and Apple forum responders with 100k reputation.I see that you are asking about "How to get a job at Google". I will help you with "How to get a job at Google". In order to solve the problem of "How to get a job at Google" please follow the following steps first:- rewrite your resume in Google Docs- reinstall Chrome- apply to the jobLet me know if I can help further with "How to get a job at Google". I like using it, but I have to tune my prompts to make sure that they don't bullshit me before getting to the point.

leobgover 2 years ago

I like the project. Had been wanting to do this myself for a long time, because HN has become the first place I go to nowadays for answers, and I value the intelligence and experience distilled in the comments here.I do not like that it seems to be effectively an ad.> Embedding every single one of the 6.5 eligible comments was prohibitively time-consuming and expensive (12 hours and ~$2,000).Does anybody understand what he’s talking about here? Assuming 6.5 million comments and an average token length of 70 we'd be looking at $180 ($0.0004 / 1K tokens).

motohagiographyover 2 years ago

Nice. I just sort of assumed early on my comments were training some future AI, and I hope that in some small way I have been able to moderate some of its stupider urges.A version where you can turn knobs of flavored contributors would be pretty funny. I know my comment style is easily identifiable and reproducable, and it encodes a certain type of logical conjugation, albeit biased with some principles and trigger topics, and I think there is enough material on HN that there may be such a thing as a distinct, motohagiographic lens. :)

badrabbitabout 2 years ago

Some day I will sue people like OP (if they're monetizing it) and OpenAI for monetizing my public posts. You can use, reuse and alter public speech but when you earn ad dollars...yeah part of that is mine if your model used my public content. I probably won't actually sue but someone will.I am not a lawyer but there has to be a jurisdiction where I can establish standing at least.

gnicholasover 2 years ago

Love that it includes sources — this makes it much more valuable because you can tell if it's giving useful information or just blowing smoke.

clark-kentover 2 years ago

> 4. Index the embeddings in a databaseIf Op is reading. I'm curious about the database you are using to store the embeddings. Pinecone, Weaviate ...?

评论 #34900883 未加载

DanielBMarkhamover 2 years ago

Related question: I've written probably a million words over my lifetime.Is there an easy way to load up GPT with my thoughts to have it be a fake me?

评论 #34901647 未加载

评论 #34905481 未加载

einpoklumover 2 years ago

> The methodology I used here is a generic, scalable solution for distilling a knowledge corpus into an embodied intelligenceThe methodology used here is a generic solution for distilling a non-generic corpus of utterances of , into a generic platitude machine.

osigurdsonover 2 years ago

I have an experiment that uses the embeddings to visualize clusterings of HN comments (using tsne). Not super useful but interesting to view the comments in 3D and seeing how similar ones cluster together into mostly relevant themes.

ada1981over 2 years ago

Hmm. I thought perhaps he was going to take the questions from askHN and the top upvoted comments and fine tuning a model with that as the prompt / reply pair.Curious how that would differ; but would be an expensive endevour.

jaequeryover 2 years ago

Seeing a ton of projects utilizing chatgpt nowadays. Are the project owners basically paying the API costs out of pocket ? Think it would add up pretty quick especially if front page on HN.

wastuabout 2 years ago

Instead of being summarize tool, this bot is really useful if you want to search related hn post based on abstract thinking imo.

underlinesabout 2 years ago

Just to be sure: This is NOT a finetuned GTP model, but rather standard GPT-3 API, used to summarize search results of a HN Comments DB, based on user input. Right?

tosemlepover 2 years ago

Did you also ingest dead comments to the corpus?I would very much like to see the ghost of Terry pop up from time to time, to offer his wisdom and unique style of response.

MuffinFlavoredover 2 years ago

> I trained on a corpus of over 6.5 million Hacker News commentsHow long did it take to scrape them and train the "corpus" on this content?

评论 #34898341 未加载

评论 #34898407 未加载

adversaryIdiotover 2 years ago

I Didn't know the api supported downloading all of its database. Are you the reason HN has sporadic downtime lately? ;)

Existenceblinksabout 2 years ago

Ask HN: here is my idea, can I build this in a weekendAI: of course .. here is your bash script (220 lines long)

bilekasover 2 years ago

"He only went and did it... " !

dalmo3over 2 years ago

This is nice! The official algolia search is useless.Otoh, did I miss something or is it only on discord?

评论 #34900736 未加载

smsm42over 2 years ago

How to get a job at Google? Oh, that's easy, just get a PhD.Thanks bottie, very use, much helpful.

helsontaveras18over 2 years ago

Now that we have this bot to answer questions for us, I think we can all go home!

AndrewKemendoover 2 years ago

First thing I saw is my answers to someones question.Can you cut me a distro of myself?

hbnover 2 years ago

Amazing, an AI that is incapable of picking up on jokes or sarcasm!

m3kw9over 2 years ago

I thought chatgpt may already used hacker news (Reddit) to train?

sorokodabout 2 years ago

the collective wisdom of the HN communityMade me smile

tomr75over 2 years ago

could you do this for medical journal articles?

评论 #34901792 未加载

pknerdover 2 years ago

Can anyone help me to guide some tutorials using GPT-3 model on a certain dataset. I am a Python programmer.