TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: How does your on-premise data analytics/processing stack look like?

3 pointsby hellwdover 4 years ago
I know that many companies and many data analytics or data processing solutions are in cloud and many can be also found in online articles. However I believe that there are many companies that can&#x27;t use cloud and they have to do everything on their private, isolated infrastructure.<p>Also there are companies that don&#x27;t have to deal with &quot;big data&quot; but they do need some data analytics solution, either batch or stream, probably for such companies some lightweight setup is enough.<p>I&#x27;m interested to hear how these companies&#x2F;teams have built their data analytics&#x2F;processing solutions, using what technologies etc...<p>The parts of the stack that are most interesting are:<p>1. What do you use for ETL (how do you import data to your warehouse)?<p>2. Do you have any kind of data quality reporting?<p>3. Which DB(s) you use?<p>4. How do you present your data?<p>5. What are the pros and cons of your setup?<p>Thanks in advance!

1 comment

Jugurthaover 4 years ago
&gt;<i>However I believe that there are many companies that can&#x27;t use cloud and they have to do everything on their private, isolated infrastructure.</i><p>Correct. There are regulations in some countries that force organizations to have data in their local infrastructures. There are some organizations that, by their nature, will rather have their data on their private clouds.<p>The stack differs from organization to organization, depending on the activity.<p>You&#x27;ll find everything. Sorry for being generic, but it&#x27;s kind of wild out there. One thing we do not to disrupt operations in these organizations is to work with them to have our own systems we control, and build connectors to acquire data in some form to work with it. We use Kafka, Spark, but have set Hadoop clusters for some clients, too. We used Python for most projects, Scala for other. We try and adapt to their data sources, for example when the data source is a Windows process in some machine that controls physical, moving, items where the client has hard constraints on the bandwidth of transmitted data.<p>We also helped some organizations acquire data from physical phenomena by building hardware to do so, and then repurposing software from our other projects to transmit the data from that hardware, and process these signals (protobufs, actors).<p>We work closely with their trams to decide what level of rawness we want the data. Sometimes we help them anonymize the data, too, working with their legal and security teams.<p>This also requires deep dives into their domain when the data sources are domain specific. Example: telecommunication billing systems, banking transaction data, reservoir characterization data, rail transportation control systems. We then source that data and route it to something the client wants to use.<p>We also build ML models, which is why they really hire us and the rest is something we have to do to get there. Then we develop applications to allow their people to either use these models, or train new models with new data.<p>Sorry for being generic, but it really depends. We have conversations with clients and adapt to their constraints, especially when these are <i>very</i> peculiar.<p>Now, doing that for many organizations and sometimes at the same time can be really taxing. This can lead people to actually leave the company as there is just too much to handle. Working on different projects is not for everyone. It also is slow and expensive: only extremely large organizations could afford our services. This is the primary reason we set to build our machine learning platform[0]. We want to accelerate this by 10x at least.<p>- [0]: <a href="https:&#x2F;&#x2F;iko.ai" rel="nofollow">https:&#x2F;&#x2F;iko.ai</a>