TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Storing 50M events per second in Elasticsearch

142 pointsby Benfromparisover 5 years ago

7 comments

jungletimeover 5 years ago
3 years ago, I made a simple calendar app in django, and I wanted to use Elasticsearch so users can search and find an event, and to use it to populate an upcoming events list. There&#x27;s only about 10,000 events in the database.<p>I quickly realized what a pain it is to use Elasticsearch, for a simple app like mine.<p>Pain points:<p>1) You have to setup and recreate part of your database in elastic search. So you essentially end up with two databases. Which now you have to keep in sync.<p>2) I was getting unpredictable query results from Elasticsearch, which after a few days, and much head scratching turned out to be that I was running out of memory.<p>3) When a user added a new event, it was not being added to elastcsearch index automatically. I could not figure out how to do this reliably. I could make it work reliably only after a sync of the entire Elasticsearch index. But this meant that it was next to useless, to use for the Upcoming Events List. Since I only wanted to sync the index once a day. Confusing the users, as to why their event was not showing up. And I gave up, and just ended up implementing the Upcoming Events List directly from my database in python.<p>4) Elasticsearch came without some security settings not set by default, and after a few months it was hacked. I had to download a new version and wasted more time.<p>I still use Elasticsearch, but only for search, and not the upcoming event list. And I don&#x27;t think it was worth the complexity that it added to my project.
评论 #21333726 未加载
评论 #21332059 未加载
评论 #21330730 未加载
评论 #21332752 未加载
评论 #21332509 未加载
评论 #21330756 未加载
评论 #21333907 未加载
评论 #21342689 未加载
评论 #21334436 未加载
mistrial9over 5 years ago
it appears in this document:<p>* DataDome is a security company, and gets web traffic in near real-time for clients; a lot of traffic in some cases with very specific numbers given, like daily peak loads.<p>* DataDome only retains records for 30 days, and the most attention is given to the most recent traffic, to detect attacks<p>* an ElasticSearch deployment records all of the traffic records downstream from Apache Flink; a new feature added to ES this year, improves the management of ES indexing, and that solved problems that DataDome was having.. things are better! write an engineering blog post !<p>* re-indexing is done nightly, and implemented in a cloud environment that can handle the (heavy) work to rebuild the indexing.<p>These numbers are impressive. Earlier criticisms of ES are being addressed, and ES is stable and a cornerstone of the architecture. A company called DataDome is providing real services in near real-time. Congratulations to the team and an interesting read.
评论 #21333494 未加载
FBISurveillanceover 5 years ago
I don&#x27;t think writing a clickbaity title like this is fair. You just write 200k large documents per second, period. Good for you but to be blatantly honest it&#x27;s actually not a lot.<p>I&#x27;m not saying you shouldn&#x27;t have written this post, but rather suggest you be fair to your readers (and yourself). Otherwise you could just make up random titles like &quot;Writing 1 trillion log lines per second&quot; (by storing 1,000,000 1-byte, newline-separated log lines per document).
outworlderover 5 years ago
This part left me scratching my head:<p>&gt; We have set “replica 0” in our indexes settings<p>&gt; Now let’s assume that node 3 goes down:<p>&gt; As expected, all shards from node 3 are moved to node 1 and node 2<p>No, as there are no shards that can be moved, as number of replicas was set to zero and one node went down. Not sure what they are trying to explain here.<p>&gt; In order to resolve this issue, we introduced a job which runs each day in order to update the mapping template and create the index for the day of tomorrow, with the right number of shards according to the number of hits our customer received the previous day.<p>This is a very common use-case(eg. logging), but it&#x27;s surprising that Elastic has nothing to automate this.
评论 #21331155 未加载
评论 #21330113 未加载
评论 #21332553 未加载
altmindover 5 years ago
&gt;&gt; Each day, during peak charge, our Elasticsearch cluster writes more than 200 000 documents per second<p>What is this 50M in the title?
评论 #21330153 未加载
评论 #21330310 未加载
dazootover 5 years ago
Reading this reminds me of the pains of running an ElasticSearch cluster. We just moved to Elassandra. No more red status.
评论 #21333887 未加载
philip1209over 5 years ago
I&#x27;ve seen elasticsearch clusters like this have consistency problems. Turns out it&#x27;s a problem in a security setting to have an off-by-one error.