TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: AskHN

612 pointsby kvhover 2 years ago

43 comments

setgreeover 2 years ago
As is often true of GPT responses, there&#x27;s some nonsense interspersed here, e.g. the claim that R has &quot;a more mature package universe&quot; than Python.<p>I think this is false, but if you&#x27;re reading quickly, it sounds cogent enough. As Sarah Constantin observed about GPT2 [0]:<p>&gt; if you skim text, you miss obvious absurdities. The point is OpenAI HAS achieved the ability to pass the Turing test against humans on autopilot...The mental motion of “I didn’t really parse that paragraph, but sure, whatever, I’ll take the author’s word for it” is, in my introspective experience, absolutely identical to “I didn’t really parse that paragraph because it was bot-generated and didn’t make any sense so I couldn’t possibly have parsed it”, except that in the first case, I assume that the error lies with me rather than the text. This is not a safe assumption in a post-GPT2 world. Instead of “default to humility” (assume that when you don’t understand a passage, the passage is true and you’re just missing something) the ideal mental action in a world full of bots is “default to null” (if you don’t understand a passage, assume you’re in the same epistemic state as if you’d never read it at all.)<p>[0] <a href="https:&#x2F;&#x2F;www.skynettoday.com&#x2F;editorials&#x2F;humans-not-concentrating" rel="nofollow">https:&#x2F;&#x2F;www.skynettoday.com&#x2F;editorials&#x2F;humans-not-concentrat...</a>
评论 #34902651 未加载
评论 #34903825 未加载
评论 #34911684 未加载
评论 #34905021 未加载
评论 #34904710 未加载
have_faithover 2 years ago
It seems to write in the generic &quot;style&quot; of GPT, instead of in the style I would recognise as a HN poster. Is that because of something baked into how the training process works? It lacks a sort of casualness or air of superiority ;)
评论 #34898565 未加载
评论 #34899166 未加载
评论 #34900539 未加载
评论 #34901449 未加载
评论 #34900977 未加载
georgelyonover 2 years ago
Am I correct in understanding that this doesn&#x27;t actually <i>generate</i> answers based on HN, but instead finds semantically-near comments and sends them verbatim to GPT to summarize? Seems like a good enough hack, though I&#x27;d love a detailed writeup of how to actually specialize an existing LLM with additional training data (like HN).
评论 #34902053 未加载
评论 #34901097 未加载
评论 #34898530 未加载
评论 #34902417 未加载
flappysolutionsover 2 years ago
I&#x27;m quite pleasantly surprised by this response:<p>Question: what are the biggest fallacies of people that post on hackernews?<p>Answer: One of the biggest fallacies of people that post on Hacker News is the nirvana fallacy, which is when someone jumps from &quot;thing x is imperfect&quot; to &quot;thing x is worthless&quot; without considering any of the benefits. People also tend to be familiar with certain technologies and will discount the flaws of existing technology vs. the flaws of new technology. Another fallacy is the assumption that there is one best tool for doing things - the one that supports the largest amount of use cases. Additionally, people may use appeals to authority, invent their own solutions rather than use existing ones, and assume that the approaches they first learned are most correct. Finally, people may assume open source software is a gift and that dependencies don&#x27;t need to be tested.
评论 #34905453 未加载
评论 #34903037 未加载
LeoPantheraover 2 years ago
I&#x27;m a little surprised that Hacker News comments weren&#x27;t already in the GPT-3 training set. I just assumed that OpenAI had vacuumed up most of the web already.
评论 #34898233 未加载
jerpintover 2 years ago
Nice! We built something very similar recently, it is more like &quot;Ask your documentation&quot; but very similar implementations otherwise.<p>See a demo on the huggingface transformers documentation: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jerpint&#x2F;buster" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;jerpint&#x2F;buster</a><p>code: <a href="https:&#x2F;&#x2F;github.com&#x2F;jerpint&#x2F;buster">https:&#x2F;&#x2F;github.com&#x2F;jerpint&#x2F;buster</a>
评论 #34903204 未加载
评论 #34904168 未加载
boywitharupeeover 2 years ago
For those who are wondering,<p>HN data is indexed with embeddings for semantic search. When queried, it finds closest article, top comments and summarizes with GPT-3.<p>GPT-3 serves as a rendering tool for compressed comments.
btbuildemover 2 years ago
My own experiments made me think that the impact of finetuning is comparable to that of a molecule in a drop in a bucket.<p>&gt; “AskHN” is a GPT-3 bot I trained on a corpus of over 6.5 million Hacker News comments to represent the collective wisdom of the HN community in a single bot.<p>I&#x27;m assuming you used the openai fine-tuning pathway to make a custom model?<p>Have you tested the responses on vanilla GPT3 vs your custom model?<p>I&#x27;d be curious to see the comparison.
评论 #34900605 未加载
评论 #34900325 未加载
SilverBirchover 2 years ago
This might be a dumb question, but is this based on the <i>collective</i> wisdom of HN. Because I would say that the collective wisdom is just as much in the interaction of the comments and the ranking of those comments as it is in the comments themselves. If you just injest all the comments wholesale, aren&#x27;t you rather getting the <i>average</i> wisdom of HN?
评论 #34899948 未加载
fleddrover 2 years ago
Let&#x27;s admit that HN&#x27;s culture is that many of us are confidentially wrong, which we cover up with impressive technical jargon. As such, any wrong answer in this AI is in fact correct.
评论 #34905290 未加载
MikeTheRockerover 2 years ago
I love this! I used to append &quot;reddit&quot; to my Google search queries to get best results, but the quality of dialog over there has really dropped in recent years. These days I&#x27;ve switched to appending &quot;hackernews&quot;, but this is even better.
评论 #34902540 未加载
评论 #34900106 未加载
monkeydustover 2 years ago
Nice work! Been playing with Langchain and was not aware of patterns.app.<p>This whole space is moving so fast its hard to keep up for someone whos immediate day job doesn&#x27;t revolve around this space. Congrats.
egypturnashover 2 years ago
Is there a way to opt out of one&#x27;s comments being used for this?
评论 #34898897 未加载
评论 #34899988 未加载
评论 #34898372 未加载
评论 #34898615 未加载
评论 #34898949 未加载
评论 #34898368 未加载
PetrBrzyBrzekover 2 years ago
Hi, thanks for the interesting article. I have a question about Pinecone. What is the cost of storing all these vectors?
kerogerokeroover 2 years ago
Anyone here know how to generate subtitles automatically using AI when a video is playing on the web?<p>Was planning to see how I can build something like these sites but without the need to regularly update the subtitles catalog: <a href="https:&#x2F;&#x2F;subscene.be" rel="nofollow">https:&#x2F;&#x2F;subscene.be</a> <a href="https:&#x2F;&#x2F;subtitlecat.com" rel="nofollow">https:&#x2F;&#x2F;subtitlecat.com</a> <a href="https:&#x2F;&#x2F;subtitletrans.com" rel="nofollow">https:&#x2F;&#x2F;subtitletrans.com</a> <a href="https:&#x2F;&#x2F;my-subs.co" rel="nofollow">https:&#x2F;&#x2F;my-subs.co</a>
评论 #34906773 未加载
la64710over 2 years ago
Is there any LLM model that can be self hosted and fed a corpus of data to ingest for question answering? The part I find difficult is how to feed (not train) the open LLM models with entire dataset not available to public?
评论 #34900961 未加载
renewiltordover 2 years ago
ChatGPT and friends always talk like those Microsoft and Apple forum responders with 100k reputation.<p>I see that you are asking about &quot;How to get a job at Google&quot;. I will help you with &quot;How to get a job at Google&quot;. In order to solve the problem of &quot;How to get a job at Google&quot; please follow the following steps first:<p>- rewrite your resume in Google Docs<p>- reinstall Chrome<p>- apply to the job<p>Let me know if I can help further with &quot;How to get a job at Google&quot;. I like using it, but I have to tune my prompts to make sure that they don&#x27;t bullshit me before getting to the point.
leobgover 2 years ago
I like the project. Had been wanting to do this myself for a long time, because HN has become the first place I go to nowadays for answers, and I value the intelligence and experience distilled in the comments here.<p>I do not like that it seems to be effectively an ad.<p>&gt; Embedding every single one of the 6.5 eligible comments was prohibitively time-consuming and expensive (12 hours and ~$2,000).<p>Does anybody understand what he’s talking about here? Assuming 6.5 million comments and an average token length of 70 we&#x27;d be looking at $180 ($0.0004 &#x2F; 1K tokens).
motohagiographyover 2 years ago
Nice. I just sort of assumed early on my comments were training some future AI, and I hope that in some small way I have been able to moderate some of its stupider urges.<p>A version where you can turn knobs of flavored contributors would be pretty funny. I know my comment style is easily identifiable and reproducable, and it encodes a certain type of logical conjugation, albeit biased with some principles and trigger topics, and I think there is enough material on HN that there may be such a thing as a distinct, motohagiographic lens. :)
badrabbitabout 2 years ago
Some day I will sue people like OP (if they&#x27;re monetizing it) and OpenAI for monetizing my public posts. You can use, reuse and alter public speech but when you earn ad dollars...yeah part of that is mine if your model used my public content. I probably won&#x27;t actually sue but someone will.<p>I am not a lawyer but there has to be a jurisdiction where I can establish standing at least.
gnicholasover 2 years ago
Love that it includes sources — this makes it much more valuable because you can tell if it&#x27;s giving useful information or just blowing smoke.
clark-kentover 2 years ago
&gt; 4. Index the embeddings in a database<p>If Op is reading. I&#x27;m curious about the database you are using to store the embeddings. Pinecone, Weaviate ...?
评论 #34900883 未加载
DanielBMarkhamover 2 years ago
Related question: I&#x27;ve written probably a million words over my lifetime.<p>Is there an easy way to load up GPT with my thoughts to have it be a fake me?
评论 #34901647 未加载
评论 #34905481 未加载
einpoklumover 2 years ago
&gt; The methodology I used here is a generic, scalable solution for distilling a knowledge corpus into an embodied intelligence<p>The methodology used here is a generic solution for distilling a non-generic corpus of utterances of , into a generic platitude machine.
osigurdsonover 2 years ago
I have an experiment that uses the embeddings to visualize clusterings of HN comments (using tsne). Not super useful but interesting to view the comments in 3D and seeing how similar ones cluster together into mostly relevant themes.
ada1981over 2 years ago
Hmm. I thought perhaps he was going to take the questions from askHN and the top upvoted comments and fine tuning a model with that as the prompt &#x2F; reply pair.<p>Curious how that would differ; but would be an expensive endevour.
jaequeryover 2 years ago
Seeing a ton of projects utilizing chatgpt nowadays. Are the project owners basically paying the API costs out of pocket ? Think it would add up pretty quick especially if front page on HN.
wastuabout 2 years ago
Instead of being summarize tool, this bot is really useful if you want to search related hn post based on abstract thinking imo.
underlinesabout 2 years ago
Just to be sure: This is NOT a finetuned GTP model, but rather standard GPT-3 API, used to summarize search results of a HN Comments DB, based on user input. Right?
tosemlepover 2 years ago
Did you also ingest dead comments to the corpus?<p>I would very much like to see the ghost of Terry pop up from time to time, to offer his wisdom and unique style of response.
MuffinFlavoredover 2 years ago
&gt; I trained on a corpus of over 6.5 million Hacker News comments<p>How long did it take to scrape them and train the &quot;corpus&quot; on this content?
评论 #34898341 未加载
评论 #34898407 未加载
adversaryIdiotover 2 years ago
I Didn&#x27;t know the api supported downloading all of its database. Are you the reason HN has sporadic downtime lately? ;)
Existenceblinksabout 2 years ago
Ask HN: here is my idea, can I build this in a weekend<p>AI: of course .. here is your bash script (220 lines long)
bilekasover 2 years ago
&quot;He only went and did it... &quot; !
dalmo3over 2 years ago
This is nice! The official algolia search is useless.<p>Otoh, did I miss something or is it only on discord?
评论 #34900736 未加载
smsm42over 2 years ago
How to get a job at Google? Oh, that&#x27;s easy, just get a PhD.<p>Thanks bottie, very use, much helpful.
helsontaveras18over 2 years ago
Now that we have this bot to answer questions for us, I think we can all go home!
AndrewKemendoover 2 years ago
First thing I saw is my answers to someones question.<p>Can you cut me a distro of myself?
hbnover 2 years ago
Amazing, an AI that is incapable of picking up on jokes or sarcasm!
m3kw9over 2 years ago
I thought chatgpt may already used hacker news (Reddit) to train?
sorokodabout 2 years ago
<i>the collective wisdom of the HN community</i><p>Made me smile
tomr75over 2 years ago
could you do this for medical journal articles?
评论 #34901792 未加载
pknerdover 2 years ago
Can anyone help me to guide some tutorials using GPT-3 model on a certain dataset. I am a Python programmer.