TechEcho

2 comments

perrygeoover 2 years ago

One approach is - don't. The front end visualization of data is scale dependent and you can often build systems that are order of magnitude faster by aggregating/filtering the data server-side to a resolution that's appropriate for the viewing window.Two examples:A 400px wide line chart with time on the X axis. At the absolute maximum, you can fit 400 values, likely less. Your dataset could be GB of time-series data but you can bin it into time buckets and return the mean.An map of the world based on Open Street Map data. If you're showing things at the continent scale, you can filter out 99+% of the map features server-side and return only relevant things (major highways and country borders for example). OSM is a 100GB dataset but maps that use this technique can be fetched and rendered in ms.Bonus points for aggregating to a fixed grid, which allows you to aggressively cache the results.If you're committed to handling the raw data on the client side, there's not much you can do other than write optimized web workers in wasm (effectively reinventing the aggregates and indexing provided by a database) and set realistic limits.

评论 #34455258 未加载

loa_observerover 2 years ago

I have been working on an open-source project called RATH, an automated visualization tool for data exploration. However, I have encountered some performance issues when working with large datasets (> 1 million rows).1. High memory usage: When loading large datasets, the memory usage can become quite high. (50MB csv will use 700MB memory in RATH)2. Slow computation tasks: Group-by, filter, bin, or even Cube operations can be slow and sometimes block the main thread.3. Slow chart rendering: Chart rendering can also be slow and sometimes block the main thread. (Currently using VegaLite)I have implemented some solutions to address these issues:1. For high memory usage, I am storing large raw data in indexedDB and reading it as needed. This reduces memory usage but can still consume a lot of memory when the data is loaded into the main thread.2. To improve the performance of computation tasks, I am using web workers for some data computations (such as group-by, bin, transform, and filters). I am also testing duckDB-wasm, but lack of some knowledge of its best practice.3. For slow chart rendering, I have tried using offscreen canvas to render the chart in a web worker. However, this approach creates a static canvas without any interactive features (such as tooltips, zooming, or callbacks for data selections). I am looking for methods on how to make the chart rendered in the web worker interactive.Any suggestions or experiences shared would be greatly appreciated.RATH Github Repo: <a href="https://github.com/Kanaries/Rath">https://github.com/Kanaries/Rath</a>RATH basic background: RATH is beyond an open-source alternative to Data Analysis and Visualization tools such as Tableau. It automates your Exploratory Data Analysis workflow with an Augmented Analytic engine by discovering patterns, insights, causals and presents those insights with powerful auto-generated multi-dimensional data visualization.

2 comments

perrygeoover 2 years ago

评论 #34455258 未加载

loa_observerover 2 years ago

Ask HN: How to handle large datasets in front end of data apps?

2 comments

Ask HN: How to handle large datasets in front end of data apps?

2 comments