TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to make MongoDB not suck for analytics

94 点作者 ayw将近 7 年前

17 条评论

stickfigure将近 7 年前
I tried using MongoDB for the customer-facing analytics of a large e-commerce marketplace. It didn&#x27;t work very well. The problem is that at some point you end up wanting joins.<p>MongoDB was actually the third try. My first two attempts were BigQuery and Keen, neither of which worked out because they support only one index - time. Users want to slice and dice by various axes! And there&#x27;s an obvious additional index you need - &quot;merchant&quot; - which column stores usually say propose setting up isolated partitions for. If you do that, you can&#x27;t ask questions across the whole system!<p>We ended up with Postgres. It was actually faster than MongoDB for simple aggregations, and joins made it much better&#x2F;faster for complicated queries. Of course it only works quickly if your dataset fits in RAM, but terabyte-size instances are pretty affordable and give you a lot of headroom.<p>That was a couple years ago. I don&#x27;t know what they&#x27;re using now, probably the same. It was a frantic few weeks figuring out what was going to work - each of those systems made it to production and quickly discovered to be inadequate in vivo. If you&#x27;re in a startup, even if you&#x27;re using exotic NoSQL systems like Google Cloud Datastore or DynamoDB - just use Postgres or MySQL for analytics. It will work long enough for you to figure out something else when you need it.
评论 #17440045 未加载
评论 #17440543 未加载
评论 #17444829 未加载
评论 #17442562 未加载
评论 #17441319 未加载
评论 #17439628 未加载
larrydag将近 7 年前
This is a huge concern for me at my current organization. Dev has decided to put all data into mongoDB. Yet all decisions are based on that data and the tools we have do not allow for seamless flow (ETL) from mongoDB. That data is important for deriving decisions that affect revenue and costs. Where are solutions for the data analysts and scientists? Frankly I&#x27;m pretty sick of hearing it can just be automated.<p>In my mind there has to be a decent &quot;business intelligence stack&quot;. I&#x27;m not sure I&#x27;m coining that because I didn&#x27;t get good search results from that phrase. Believe me I&#x27;ve been trying to find solutions. I believe there is big opportunity in building out this sort of stack that bridges data management and data analysis. Sure you can call IBM, Microsoft, Dell, HP but be prepared for big costs and huge software bloat. I would like simplified solutions and options that can fit with most industry standard tools.<p>I&#x27;m also willing to work with anyone on this as well.
评论 #17439289 未加载
评论 #17439041 未加载
评论 #17440035 未加载
评论 #17439288 未加载
评论 #17438978 未加载
评论 #17439842 未加载
评论 #17441431 未加载
评论 #17439504 未加载
评论 #17438810 未加载
评论 #17439173 未加载
codingdave将近 7 年前
I&#x27;m not aware of any analytics platform that runs directly from the source data. There is just about always some kind of ETL process, or at the very least, a data transformation process to shape the data as needed, to provide data that works well for the reporting. So while information on making MongoDB performant for such things is mildly interesting... it just isn&#x27;t how analytics are generally architected.
jrochkind1将近 7 年前
What is the benefit of having it in mongo in the first place, in this scenario?
评论 #17438618 未加载
评论 #17438798 未加载
eddd将近 7 年前
I kind of a hoped it&#x27;ll end up a joke saying &quot;Don&#x27;t use mongo&quot;. Last time I used it was 2.4 and it was the worst db experience ever. Back then It was more sane to craft a solution with PG and HSTORE. Now, I think RedShift does the job, why would anyone use mongo on production for anything today?
评论 #17440924 未加载
sztanko将近 7 年前
Just try this out: <a href="https:&#x2F;&#x2F;github.com&#x2F;EXASOL&#x2F;docker-db" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;EXASOL&#x2F;docker-db</a> and you will be impressed. This is an embryo of a real analytical database.<p>Pros:<p>- an 8 CPU installation with 64gb memory will probably be hundred times faster then postgres.<p>-it supports full sql<p>- It is super stable, even as docker container<p>Cons:<p>- it does not support nested data<p>- once you reach volumes of around 2Tb, you will probably have to switch to a paid version (I mean, you still can continue running on a 200gb ram box, but it will be suboptimal)<p>P.s. I am not affiliated with Exasol.
评论 #17440226 未加载
georgewfraser将近 7 年前
There are several companies, including mine (Fivetran) that will replicate MongoDB into a columnar data warehouse for analytics. For most people, a commercial replication tool + a commercial columnar data warehouse is the best trade off of cost&#x2F;ease of use. Commercial DWHs deal with all the details of patching columnar formats under-the-hood, and commercial replication tools like us will deal with all the complexity of things like the mongo oplog. For not that much $ you can have a working system in like a day.
jrs95将近 7 年前
Okay, we get it, Mongo sucks. Or at least that seems to be the consensus. From what I can tell it seems they&#x27;ve improved their tech <i>a lot</i> though, and I have to wonder if a lot of the &quot;mongo sucks&quot; sentiment comes from either 1. Using early versions of Mongo that really did suck or 2. people having used Mongo at companies where nobody really knew how to use Mongo that well.
riboflavin将近 7 年前
Dremio helps with a lot of this, particularly the speed aspect – uses Parquet as well as Apache Arrow. (I work at Dremio.) Speeding things up: <a href="https:&#x2F;&#x2F;docs.dremio.com&#x2F;acceleration&#x2F;reflections.html" rel="nofollow">https:&#x2F;&#x2F;docs.dremio.com&#x2F;acceleration&#x2F;reflections.html</a>
评论 #17439210 未加载
评论 #17438709 未加载
squirrelicus将近 7 年前
Okay so... To make MongoDB not suck for analytics, ETL it in a different format. For engineers trained in backed systems, this is pretty obvious. After reading this, I also don&#x27;t know why I&#x27;d choose Pequot things over any other thing.<p>Baby&#x27;s first ETL -- just scan the db with a cursor and analyze the data in a script -- tends to cover 90% of the use cases for BI db analytics with almost zero resource consumption anyway. Point being don&#x27;t write a query to do analytics if your db can&#x27;t answer your questions performantly, and don&#x27;t build [latent, stale, slow] Enterprise ETL unless you really need it.
评论 #17439247 未加载
kockic将近 7 年前
I see that most of the `don&#x27;t use mongodb for analytics` are being down-voted, however I tend to agree with them. For all the people out there looking for the database for analytics please check Clickhouse from Yandex, it&#x27;s easy to get started, amazingly fast and open source.<p>Disclaimer: I am not affiliated with Yandex in anyway, just a happy customer
minitoar将近 7 年前
We use a similar technique at Interana. Our DB is a column store, but we break things up over the time dimension to keep file sizes of individual columns reasonable. One of these time buckets is essentially analogous to a single parquet file. In addition we split&#x2F;sort these buckets into smaller buckets as more events are added.
manigandham将近 7 年前
This is called ETL, to a data warehouse.<p>Regardless of the choice of primary database, this is nothing new and just shows how a lot of startup technical talent seems to be discovering the same things all the time, usually with needlessly convoluted approaches, and writing blog posts about it.
notoriousp将近 7 年前
Little bit offtopic but what product did you use to create those visualizations?
drej将近 7 年前
For those seeking tl;dr: The answer is not to use MongoDB.
评论 #17439773 未加载
评论 #17439720 未加载
评论 #17439714 未加载
评论 #17441406 未加载
dmitriid将近 7 年前
&gt; How to make MongoDB not suck for analytics<p>Easy: you don&#x27;t use Mongo
endymi0n将近 7 年前
Protip: MongoDB works absolutely best for analytics when it is replaced with a sane and scaleable column-oriented database like Redshift or BigQuery right before serving that report.