DeepThought-8B: A small, capable reasoning model

134 点作者 AnhTho_FR6 个月前

18 条评论

tkgally5 个月前

There's been a rush of releases of reasoning models in the past couple of weeks. This one looks interesting, too.I found the following video from Sam Witteveen to be a useful introduction to a few of those models:<a href="https://youtu.be/vN8jBxEKkVo" rel="nofollow">https://youtu.be/vN8jBxEKkVo</a>

CGamesPlay5 个月前

In what way did they "release" this? I can't find it in hugging face or ollama, and they only seem to have a "try online" link in the article. "Self-sovereign intelligence", indeed.

评论 #42280960 未加载

评论 #42280947 未加载

tanakai245 个月前

Legally, you cannot name the llama3 based models like that, YOu have to use, llama in the name<a href="https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blob/main/LICENSE" rel="nofollow">https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct/blo...</a>

评论 #42280679 未加载

评论 #42280931 未加载

jb_briant5 个月前

Am I wrong to think that "reasoning model" is a misleading marketing term?Isn't it a LLM with an algo wrapper?

评论 #42280245 未加载

评论 #42280256 未加载

评论 #42282077 未加载

评论 #42280408 未加载

codetrotter5 个月前

Given the name they gave it, someone with access should ask it for the “Answer to the Ultimate Question of Life, The Universe, and Everything”If the answer is anything other than a simple “42”, I will be thoroughly disappointed. (The answer has to be just “42”, not a bunch of text about the Hitchhikers Guide to the Galaxy and all that.)

评论 #42282135 未加载

asah5 个月前

"what is the population of manhattan below central park"ChatGPT-o1-preview: 647,000 (based on 2023 data, breaking it down by community board area): <a href="https://chatgpt.com/share/674b3f5b-29c4-8007-b1b6-5e0a4aeaf0e9" rel="nofollow">https://chatgpt.com/share/674b3f5b-29c4-8007-b1b6-5e0a4aeaf0...</a> (this appears to be the most correct, judging from census data)DeepThought-8B: 200,000 (based on 2020 census data) Claude: 300-350,000 Gemini: 2.7M during peak times (strange definition of population !)I followed up with DeepThought-8B: "what is the population of all of manhattan, and how does that square with only having 200,000 below CP" and it cut off its answer, but in the reasoning box it updated its guess to 400,000 by estimating as a fraction of land area.

igleria5 个月前

I asked it "Describe how a device for transportation of living beings would be able to fly while looking like a sphere" and it just never returned an output

评论 #42281416 未加载

评论 #42283650 未加载

nyoomboom5 个月前

The reasoning steps look reasonable and the interface is simple and beautiful, though Deepthought-8b fails to disambiguate the term "the ruliad" as the technical concept from Wolfram physics, from this company's name Ruliad. Maybe that isn't in the training data, because it misunderstood the problem when asked "what is the simplest rule of the ruliad?" and went on to reason about the company's core principles. Cool release, waiting for the next update.

评论 #42280554 未加载

euroderf5 个月前

I am very impressed. I asked chat.ruliad.co<pre><code> Beginning from the values for fundamental physical constants, is it possible to derive the laws of entropy ? </code></pre> and then based on its response to that I asked it<pre><code> Based on this analysis, can you identify and describe where the dividing line is between (a) the region where (microscopic/atomic) processes are reversible, and (b) the region where macroscopic processes are irreversible ?</code></pre>

评论 #42282865 未加载

评论 #42282585 未加载

chvid5 个月前

Is the source code available for this? And who is behind the company?

评论 #42280702 未加载

lowyek5 个月前

I asked it 'find two primes whose sum is 123' .. it is in deep thought from 5 minutes just looping and looping over seemingly repeated hallucinations of right path. (btw, chatgpt immediately answers 61 and 62 lol.. so much for intelligence)

评论 #42280458 未加载

评论 #42280860 未加载

sans_souse5 个月前

It looks nice, but my chrome browser on android has all sorts of trouble with rendering the animated bits, so it ends up skipping frames throughout my navigating and clicks. Add to that; the model doesn't respond at all in my multiple attempts, it's a waste of time until it's remedied.

rkagerer5 个月前

Is it possible to try it without logging in?Can you log in with anything other than a Google account?I was excited by the tagline "Self-Sovereign", but it appears this is not.

reissbaker5 个月前

"Model A 13B", "Model B 20B" etc are pretty vapid claims. Which actual models? There are plenty of terrible high-param-count models from a year or two ago. The benchmark seems meaningless without saying what models are actually being compared against... And "13B" in particular is pretty sketchy: are they comparing it against Llama 2 13B? Even an untuned Llama 3.1 8B would destroy that in any benchmark.Smells a little grifty to me...

sushidev5 个月前

It’s just a web page. How to try the model?

评论 #42281531 未加载

wongarsu5 个月前

A bit off-topic, but that comparison graph is a great example why you should buy your designer a cheap secondary screen. I was viewing it on my second monitor and had to lean in to make out the off-white bar for Model D on the light-grey background. Moved the window over to my main screen and it's clear as day, five nice shades of coffee on a light-gray background.

评论 #42280073 未加载

评论 #42280378 未加载

评论 #42280026 未加载

评论 #42282087 未加载

评论 #42281956 未加载

评论 #42282049 未加载

评论 #42283427 未加载

andai5 个月前

[flagged]

评论 #42283360 未加载

评论 #42283746 未加载

评论 #42283419 未加载

kgeist5 个月前

Not bad, asked it to count Rs in "strawberry" and Ns in "international", it answered correctly, and it was fast.

评论 #42280270 未加载

评论 #42280214 未加载

评论 #42281816 未加载

评论 #42282491 未加载

评论 #42280074 未加载

评论 #42280047 未加载