TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How to cheaply use a vector DB to detect anomalies in logs at 1TB / day

3 pointsby arconis987over 1 year ago
I’m interested in playing with vector databases to detect interesting anomalies in a large volume of logs, like 1TB &#x2F; day.<p>Is it reasonable to attempt to generate embeddings for every log event that hits the system? At 1TB&#x2F;day, it’s like 1B log events per day, over 10k per second.<p>Would I just have to sample some tiny percentage of log events to generate embeddings for?<p>The volume feels too high, but I’m curious if others do this successfully. I want this to be reasonably cheap, like less than 1 cent per million log events.<p>Twitter seems to be doing something like this for all tweets at much higher volume. But I don’t want to spend too much money :)

2 comments

SushiHippieover 1 year ago
Maybe have a look at what netdata does, maybe not 1 to 1 applicable to your use case, but I&#x27;ve used netdata for monitoring my own servers which ingests thousands of datapoints per second and the anomaly detection seems to work.<p><a href="https:&#x2F;&#x2F;learn.netdata.cloud&#x2F;docs&#x2F;ml-and-troubleshooting&#x2F;machine-learning-ml-powered-anomaly-detection" rel="nofollow noreferrer">https:&#x2F;&#x2F;learn.netdata.cloud&#x2F;docs&#x2F;ml-and-troubleshooting&#x2F;mach...</a>
gwnywgover 1 year ago
Out of curiosity, are all logs coming through single pipe or is this aggregate of multiple sources and you could apply something before aggregation?