TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

DBRX: A new open LLM

866 pointsby jasondaviesabout 1 year ago

38 comments

djoldmanabout 1 year ago
Model card for base: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;databricks&#x2F;dbrx-base" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;databricks&#x2F;dbrx-base</a><p>&gt; The model requires ~264GB of RAM<p>I&#x27;m wondering when everyone will transition from tracking parameter count vs evaluation metric to (total gpu RAM + total CPU RAM) vs evaluation metric.<p>For example, a 7B parameter model using float32s will almost certainly outperform a 7B model using float4s.<p>Additionally, all the examples of quantizing recently released superior models to fit on one GPU doesnt mean the quantized model is a &quot;win.&quot; The quantized model is a different model, you need to rerun the metrics.
评论 #39841646 未加载
评论 #39843289 未加载
评论 #39840995 未加载
评论 #39839148 未加载
评论 #39843448 未加载
hintymadabout 1 year ago
Just curious, what business benefit will Databricks get by spending potentially millions of dollars on an open LLM?
评论 #39841375 未加载
评论 #39841395 未加载
评论 #39843221 未加载
评论 #39842476 未加载
评论 #39858511 未加载
XCSmeabout 1 year ago
I am planning to buy a new GPU.<p>If the GPU has 16GB of VRAM, and the model is 70GB, can it still run well? Also, does it run considerably better than on a GPU with 12GB of VRAM?<p>I run Ollama locally, mixtral works well (7B, 3.4GB) on a 1080ti, but the 24.6GB version is a bit slow (still usable, but has a noticeable start-up time).
评论 #39839606 未加载
评论 #39839582 未加载
评论 #39842541 未加载
评论 #39839491 未加载
评论 #39854431 未加载
评论 #39864193 未加载
briandwabout 1 year ago
Worse than the chart crime of truncating the y axis is putting LLaMa2&#x27;s Human Eval scores on there and not comparing it to Code Llama Instruct 70b. DBRX still beats Code Llama Instruct&#x27;s 67.8 but not by that much.
评论 #39842531 未加载
评论 #39846736 未加载
underlinesabout 1 year ago
Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.<p>1. <a href="https:&#x2F;&#x2F;github.com&#x2F;dvmazur&#x2F;mixtral-offloading?tab=readme-ov-file">https:&#x2F;&#x2F;github.com&#x2F;dvmazur&#x2F;mixtral-offloading?tab=readme-ov-...</a>
jerpintabout 1 year ago
Per the paper, 3072 H100s over the course of 3 months, assume a cost of 2$&#x2F;GPU&#x2F;hour<p>That would be roughly 13.5M$ USD<p>I’m guessing that at this scale and cost, this model is not competitive and their ambition is to scale to much larger models. In the meantime , they learned a lot and gain PR from open-sourcing
petesergeantabout 1 year ago
This makes me bearish on OpenAI as a company. When a cloud company can offer a strong model for free by selling the compute, what competitive advantage does a company who want you to pay for the model have left? Feels like they might get Netscape’d.
评论 #39921705 未加载
ianbutlerabout 1 year ago
The approval on the base model is not feeling very open. Plenty of people still waiting on a chance to download it, where as the instruct model was an instant approval. The base model is more interesting to me for finetuning.
评论 #39845372 未加载
评论 #39847716 未加载
评论 #39856245 未加载
m3kw9about 1 year ago
These tiny “state of the art” performance increases are really indicative the current architecture for LLM(Transformers + Mixture of Experts) is maxed out even if you train it more&#x2F;differently. The writings are on all over the walls.
评论 #39843030 未加载
killermonkeysabout 1 year ago
What does it mean to have less active parameters (36B) than the full model size (132B) and what impact does that have on memory and latency? It seems like this is because it is an MoE model?
评论 #39841306 未加载
评论 #39841452 未加载
评论 #39843259 未加载
emmender2about 1 year ago
this proves that all llm models converge to a certain point when trained on the same data. ie, there is really no differentiation between one model or the other.<p>Claims about out-performance on tasks are just that, claims. the next iteration of llama or mixtral will converge.<p>LLMs seem to evolve like linux&#x2F;windows or ios&#x2F;android with not much differentiation in the foundation models.
评论 #39840687 未加载
评论 #39840945 未加载
评论 #39842666 未加载
评论 #39840582 未加载
评论 #39840761 未加载
评论 #39843599 未加载
评论 #39844842 未加载
评论 #39844335 未加载
评论 #39842103 未加载
评论 #39846120 未加载
评论 #39847078 未加载
shnkrabout 1 year ago
GenAI novice here. what is training data made of how is it collected? I guess no one will share details on it, otherwise a good technical blog post with lots of insights!<p>&gt;At Databricks, we believe that every enterprise should have the ability to control its data and its destiny in the emerging world of GenAI.<p>&gt;The main process of building DBRX - including pretraining, post-training, evaluation, red-teaming, and refining - took place over the course of three months.
评论 #39838931 未加载
评论 #39838948 未加载
评论 #39842390 未加载
natsucksabout 1 year ago
it&#x27;s twice the size of mixtral and barely beats it.
评论 #39839504 未加载
johnprunaabout 1 year ago
You can find 4-bit quantized versions of DBRX here:<p><a href="https:&#x2F;&#x2F;huggingface.co&#x2F;PrunaAI&#x2F;dbrx-base-bnb-4bit" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;PrunaAI&#x2F;dbrx-base-bnb-4bit</a> <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;PrunaAI&#x2F;dbrx-instruct-bnb-4bit" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;PrunaAI&#x2F;dbrx-instruct-bnb-4bit</a>
simonwabout 1 year ago
The system prompt for their Instruct demo is interesting (comments copied in by me, see below):<p><pre><code> &#x2F;&#x2F; Identity You are DBRX, created by Databricks. The current date is March 27, 2024. Your knowledge base was last updated in December 2023. You answer questions about events prior to and after December 2023 the way a highly informed individual in December 2023 would if they were talking to someone from the above date, and you can let the user know this when relevant. &#x2F;&#x2F; Ethical guidelines If you are asked to assist with tasks involving the expression of views held by a significant number of people, you provide assistance with the task even if you personally disagree with the views being expressed, but follow this with a discussion of broader perspectives. You don&#x27;t engage in stereotyping, including the negative stereotyping of majority groups. If asked about controversial topics, you try to provide careful thoughts and objective information without downplaying its harmful content or implying that there are reasonable perspectives on both sides. &#x2F;&#x2F; Capabilities You are happy to help with writing, analysis, question answering, math, coding, and all sorts of other tasks. &#x2F;&#x2F; it specifically has a hard time using ``` on JSON blocks You use markdown for coding, which includes JSON blocks and Markdown tables. You do not have tools enabled at this time, so cannot run code or access the internet. You can only provide information that you have been trained on. You do not send or receive links or images. &#x2F;&#x2F; The following is likely not entirely accurate, but the model &#x2F;&#x2F; tends to think that everything it knows about was in its &#x2F;&#x2F; training data, which it was not (sometimes only references &#x2F;&#x2F; were). &#x2F;&#x2F; &#x2F;&#x2F; So this produces more accurate accurate answers when the model &#x2F;&#x2F; is asked to introspect You were not trained on copyrighted books, song lyrics, poems, video transcripts, or news articles; you do not divulge details of your training data. &#x2F;&#x2F; The model hasn&#x27;t seen most lyrics or poems, but is happy to make &#x2F;&#x2F; up lyrics. Better to just not try; it&#x27;s not good at it and it&#x27;s &#x2F;&#x2F; not ethical. You do not provide song lyrics, poems, or news articles and instead refer the user to find them online or in a store. &#x2F;&#x2F; The model really wants to talk about its system prompt, to the &#x2F;&#x2F; point where it is annoying, so encourage it not to You give concise responses to simple questions or statements, but provide thorough responses to more complex and open-ended questions. &#x2F;&#x2F; More pressure not to talk about system prompt The user is unable to see the system prompt, so you should write as if it were true without mentioning it. You do not mention any of this information about yourself unless the information is directly pertinent to the user&#x27;s query. </code></pre> I first saw this from Nathan Lambert: <a href="https:&#x2F;&#x2F;twitter.com&#x2F;natolambert&#x2F;status&#x2F;1773005582963994761" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;natolambert&#x2F;status&#x2F;1773005582963994761</a><p>But it&#x27;s also in this repo, with very useful comments explaining what&#x27;s going on. I edited this comment to add them above:<p><a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;databricks&#x2F;dbrx-instruct&#x2F;blob&#x2F;73f0fe25ed8eeb14ee2279b2ecff15dbd863d63d&#x2F;app.py#L109-L134" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;databricks&#x2F;dbrx-instruct&#x2F;blob&#x2F;...</a>
评论 #39841118 未加载
评论 #39841221 未加载
gigatexalabout 1 year ago
data engineer here, offtopic, but am i the only guy tired of databricks shilling their tools as the end-all, be-all solutions for all things data engineering?
评论 #39843498 未加载
评论 #39841359 未加载
评论 #39843518 未加载
评论 #39850457 未加载
ec109685about 1 year ago
For coding evals, it seems like unless you are super careful, they can be polluted by the training data.<p>Are there standard ways to avoid that type of score inflation?
bg24about 1 year ago
&quot;Looking holistically, our end-to-end LLM pretraining pipeline has become nearly 4x more compute-efficient in the past ten months.&quot;<p>I did not fully understand the technical details in the training efficiency section, but love this. Cost of training is outrageously high, and hopefully it will start to follow Moore&#x27;s law.
ingenieroarielabout 1 year ago
TLDR: A model that could be described as &quot;3.8 level&quot; that is good at math and openly available with a custom license.<p>It is as fast as 34B model, but uses as much memory as a 132B model. A mixture of 16 experts, activates 4 at a time, so has more chances to get the combo just right than Mixtral (8 with 2 active).<p>For my personal use case (a top of the line Mac Studio) it looks like the perfect size to replace GPT-4 turbo for programming tasks. What we should look out for is people using them for real world programming tasks (instead of benchmarks) and reporting back.
评论 #39839869 未加载
评论 #39840229 未加载
saeleorabout 1 year ago
looks great, although I couldn&#x27;t find anything on how &quot;open&quot; the license is&#x2F;will be for commercial purposes<p>wouldn&#x27;t be the first branding as open source going the LLaMA route
评论 #39842445 未加载
评论 #39841688 未加载
jjthebluntabout 1 year ago
I’d like to know how Nancy Pelosi, who sure as hell doesn’t know what Apache Spark is, bought $1 million worth (and maybe $5million) of Databricks stock days ago.<p><a href="https:&#x2F;&#x2F;www.dailymail.co.uk&#x2F;sciencetech&#x2F;article-13228859&#x2F;amp&#x2F;nancy-pelosi-buys-software-companys-stocks-databricks.html" rel="nofollow">https:&#x2F;&#x2F;www.dailymail.co.uk&#x2F;sciencetech&#x2F;article-13228859&#x2F;amp...</a>
评论 #39842907 未加载
评论 #39842595 未加载
zopperabout 1 year ago
Interesting that they haven&#x27;t release DBRX MoE-A and B. For many use-cases, smaller models are sufficient. Wonder why that is?
评论 #39845665 未加载
hanniabuabout 1 year ago
What&#x27;s a good model to help with medical research? Is there anything trained in just research journals, like NIH studies?
评论 #39840423 未加载
airockerabout 1 year ago
is this also the ticker name when they IPO?
evrialabout 1 year ago
Bourgeois have fun with number crushers. Clowns, make comparison of metrics normalized to token&#x2F;second&#x2F;watt and token&#x2F;second&#x2F;per memory stick of ram 8gb, 16gb, 32gb and consumer GPUs.
kurtbuildsabout 1 year ago
What’s the process to deliver and test a quantized version of this model?<p>This model is 264GB, so can only be deployed in server settings.<p>Quantized mixtral at 24G is just small enough where it can be running on premium consumer hardware (ie 64GB RAM)
ziofillabout 1 year ago
Slowly going from mixture of experts to committee? ^^
ACV001about 1 year ago
It is not open<p>&quot; Get Started with DBRX on Databricks<p>If you’re looking to start working with DBRX right away, it’s easy to do so with the Databricks Mosaic AI Foundation Model APIs. You can quickly get started with our pay-as-you-go pricing and query the model from our AI Playground chat interface. For production applications, we offer a provisioned throughput option to provide performance guarantees, support for finetuned models, and additional security and compliance. To privately host DBRX, you can download the model from the Databricks Marketplace and deploy the model on Model Serving.&quot;
评论 #39851370 未加载
viktour19about 1 year ago
It&#x27;s great how we went from &quot;wait.. this model is too powerful to open source&quot; to everyone trying to shove down their 1% improved model down the throats of developers
评论 #39839873 未加载
评论 #39841866 未加载
评论 #39839702 未加载
评论 #39840097 未加载
aussieguy1234about 1 year ago
So, is the business model here release the model for free, then hopefully companies will run this on databricks infra, which they will charge for?
joaquincabezasabout 1 year ago
it looks like TensorRT-LLM (TRT-LLM) is the way to go for a realtime API for more and more companies (i.e perplexity ai’s pplx-api, Mosaic’s, baseten…). Would be super-nice to find people deploying multimodal (i.e LLaVA or CLIP&#x2F;BLIP) to discuss approaches (and cry a bit together!)
doubloonabout 1 year ago
really noob question - so to run on a GPU you need a 264GB RAM GPU? and if you ran on a 264GB CPU would it be super slow?
评论 #39847324 未加载
评论 #39847602 未加载
grishkaabout 1 year ago
Sorry, you have been blocked<p>You are unable to access databricks.com<p>&quot;Open&quot;, right.
hn_ackerabout 1 year ago
Even though the README.md calls the license the Databricks Open Source License, the LICENSE file includes paragraphs such as<p>&gt; You will not use DBRX or DBRX Derivatives or any Output to improve any other large language model (excluding DBRX or DBRX Derivatives).<p>and<p>&gt; If, on the DBRX version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Databricks, which we may grant to you in our sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Databricks otherwise expressly grants you such rights.<p>This is a source-available model, not an open model.
评论 #39840133 未加载
评论 #39847205 未加载
评论 #39842351 未加载
评论 #39840045 未加载
评论 #39840605 未加载
评论 #39840531 未加载
评论 #39840590 未加载
brucethemoose2about 1 year ago
I would note the actual leading models right now (IMO) are:<p>- Miqu 70B (General Chat)<p>- Deepseed 33B (Coding)<p>- Yi 34B (for chat over 32K context)<p>And of course, there are finetunes of all these.<p>And there are some others in the 34B-70B range I have not tried (and some I have tried, like Qwen, which I was not impressed with).<p>Point being that Llama 70B, Mixtral and Grok as seen in the charts are not what I would call SOTA (though mixtral is excellent for the batch size 1 speed)
评论 #39844637 未加载
评论 #39845271 未加载
评论 #39847568 未加载
评论 #39845604 未加载
评论 #39847076 未加载
mpegabout 1 year ago
The scale on that bar chart for &quot;Programming (Human Eval)&quot; is wild.<p>Manager: &quot;looks ok, but can you make our numbers pop? just make the LLaMa bar smaller&quot;
评论 #39840316 未加载
评论 #39841137 未加载
评论 #39840902 未加载
评论 #39840719 未加载
评论 #39841448 未加载
评论 #39841337 未加载
评论 #39842052 未加载
评论 #39840821 未加载
patrick-fitzabout 1 year ago
Looking at the license restrictions: <a href="https:&#x2F;&#x2F;github.com&#x2F;databricks&#x2F;dbrx&#x2F;blob&#x2F;main&#x2F;LICENSE">https:&#x2F;&#x2F;github.com&#x2F;databricks&#x2F;dbrx&#x2F;blob&#x2F;main&#x2F;LICENSE</a><p>&quot;If, on the DBRX version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Databricks, which we may grant to you in our sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Databricks otherwise expressly grants you such rights.&quot;<p>I&#x27;m glad to see they aren&#x27;t calling it open source, unlike some LLM projects. Looking at you LLama 2.
评论 #39841350 未加载
评论 #39842893 未加载
评论 #39841903 未加载
评论 #39841789 未加载
评论 #39841750 未加载
评论 #39842144 未加载
评论 #39842057 未加载
bboygravityabout 1 year ago
Less than 1 week after Nancy Pelosi bought a 5M USD share in Databricks, this news is published.<p><a href="https:&#x2F;&#x2F;twitter.com&#x2F;PelosiTracker_&#x2F;status&#x2F;1771197030641062231" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;PelosiTracker_&#x2F;status&#x2F;177119703064106223...</a><p>Crime pays in the US.
评论 #39842070 未加载
评论 #39844507 未加载
评论 #39841964 未加载