Training for one trillion parameter model backed by Intel and US govt has begun

227 pointsby goplayoutsideover 1 year ago

19 comments

py4over 1 year ago

It's not clear from the article whether it's a dense model or MoE. This matters when it comes to comparing with GPT-4 - in terms of # params - which is reported to be MoE

评论 #38409472 未加载

评论 #38406605 未加载

评论 #38405665 未加载

washadjeffmadover 1 year ago

A lot of this was new to me, but it looks like Intel hopes to use this to demonstrate the linear scaling capacity of their Aurora nodes.Argonne installs final components of Aurora supercomputer (22 June 2023): <a href="https://www.anl.gov/article/argonne-installs-final-components-of-aurora-supercomputer" rel="nofollow noreferrer">https://www.anl.gov/article/argonne-installs-final-component...</a>Aurora Supercomputer Blade Installation Complete (22 June 2023): <a href="https://www.intel.com/content/www/us/en/newsroom/news/aurora-supercomputer-blade-installation-complete.html" rel="nofollow noreferrer">https://www.intel.com/content/www/us/en/newsroom/news/aurora...</a>Intel® Data Center GPU Max Series, previously codename Ponte Vecchio (31 May 2023): <a href="https://www.intel.com/content/www/us/en/developer/articles/technical/intel-data-center-gpu-max-series-overview.html" rel="nofollow noreferrer">https://www.intel.com/content/www/us/en/developer/articles/t...</a>

评论 #38405721 未加载

kirubakaranover 1 year ago

Could the weights be FOIA'd?

评论 #38410030 未加载

评论 #38406862 未加载

pulse7over 1 year ago

Isn't GPT-4 already over 1T parameters? And GPT-5 should be even "an order of magnitude" bigger than GPT-4...

评论 #38404958 未加载

评论 #38402560 未加载

评论 #38403740 未加载

bradley13over 1 year ago

The solution won't be just "bigger". A model with a trillion parameters will be more expensive to train and to run, but is unlikely to be better. Think of the early days of flight, you had biplanes; then you had triplanes. You could have followed that farther, and added more wings - but it wouldn't have improved things.Improving AI will involve architectural changes. No human requires the amount of training data we are already giving the models. Improvements will make more efficient use of that data, and (no idea how - innovation required) allow them to generalize and reason from that data.

评论 #38402766 未加载

评论 #38404802 未加载

评论 #38404110 未加载

评论 #38402589 未加载

评论 #38403812 未加载

评论 #38402936 未加载

评论 #38406199 未加载

评论 #38403470 未加载

评论 #38402899 未加载

评论 #38406761 未加载

评论 #38403604 未加载

评论 #38406993 未加载

评论 #38406750 未加载

评论 #38406320 未加载

WhitneyLandover 1 year ago

Is anything known about what extent if any non-public domain books are used for LLM’s?One example is the Google books project made digital quite a few texts, but I’ve never heard if Google considers these fair game to train on for Bard.Most of the copyright discussions I’ve seen have been around images and code but not much about books.Seems to become more relevant as things scale up as indicated by this article.

评论 #38417103 未加载

Footnote7341over 1 year ago

It will be interesting to see what the government can do here. Can they use their powers to get their hands on the most data?im still skeptical because new techniques are going to give an order of magnitude efficiency boost to transformer models, so 'just waiting' seems like the best approach for now. I dont think they will be able to just skip to the finish line by having the most money.

评论 #38406434 未加载

评论 #38405221 未加载

upsidesincludeover 1 year ago

Haha, this is funny because everyone is talking about this as if it is designed to be like the LLMs we have access to.The training parameters will be the databases of info scooped up and integrated into profiles of every person and their entire digital footprint, queriable and responsive to direct questioning

darklycan51over 1 year ago

Ah yeah, this sounds like such a great thing, state of the art unreleased tech + 1 trillion parameters based by data accessed by the patriot act.Such a wholesome thing. I don't want to hear 2 years from now how China is evil for using "AI" when the government is attempting to weaponize AI, of course other governments will start doing it as well.

yieldcrvover 1 year ago

Mistral 7B parameter models are quite goodAlready fine tuned and conversationalits like education is more important than needing a trillion parameter brainiac

评论 #38403725 未加载

charcircuitover 1 year ago

Sadly, I expect this to be a waste of money compared to just using GPT-4. It's hard get to SoA performance.

评论 #38406784 未加载

评论 #38406665 未加载

lucubratoryover 1 year ago

Had to happen eventually.

lostmsuover 1 year ago

Are they gonna release the weights?

kaffeeringeover 1 year ago

What does it cost?

mark_l_watsonover 1 year ago

Cool! Purpose built, trained in science related content.As a US taxpayer and as a Libertarian, I approve of this project!

_heimdallover 1 year ago

I'm sure the government's mission is also to develop an AGI that benefits us all.

评论 #38403785 未加载

评论 #38404795 未加载

评论 #38406244 未加载

评论 #38403602 未加载

评论 #38404598 未加载

评论 #38406279 未加载

ajdegolover 1 year ago

Wasn't the answer 42?Also, first question to the new model: "So... any way we could do this with fewer parameters?"

评论 #38403200 未加载

评论 #38406283 未加载

Toolbox1337over 1 year ago

Not bad for a start by the US government. A bit more than half as powerful as the 1.7 trillion parameter GPT4 model.

评论 #38402530 未加载

vouaobrasilover 1 year ago

Not surprising. Despite the enormous energy costs and the threat to humanity by creating technology that we can't control, governments and corporations will build bigger and more sophisticated models just because they have to do so to compete.It is the prisoner's dilemma: we end up with a super-advanced AI that will disrupt and make society worse because entities are competing where the metric of success is short-term monetary gain.It's ridiculous. Humanity should give up this useless development of AI.

评论 #38402711 未加载

评论 #38402616 未加载

评论 #38402537 未加载

评论 #38402593 未加载