TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Why do this LLM companies trained on dangerous content all together?

1 pointsby jsdeveloperabout 1 year ago
Hi All,<p>Within minutes of talk and creating imaginary scenario, any one is able to jail break an AI model and get recipes to make dangerous stuff. The question is why would this companies train the LLM on such dangerous data altogether? And to whom are they providing such extra services , clearly if you have trained it and is already serving it ?<p>Selecting the dataset was clearly their job and all terrible outcomes can simply be avoided at the training itself.

1 comment

joegibbsabout 1 year ago
The datasets are so huge, I don&#x27;t think there&#x27;s any way to make sure that everything in the training data is safe - e.g. GPT3 was trained on 45TB of data, even using LLMs to classify that would be too expensive (GPT3.5 Turbo is priced at $1&#x2F;million tokens, each token is about 4 bytes, so in the tens of millions to run it on its own training data - you could use other methods but they&#x27;re less effective and stuff would still slip through), let alone having people do it.