TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: A Python container for dataclasses with multi-indexing and vector opps

122 pointsby joshlkover 4 years ago

9 comments

joshlkover 4 years ago
I came to the idea as I wanted to use the ergonomics and efficiency of Pandas DataFrames but realised it makes messy production code thats hard to maintain. This package provides the best of both worlds.<p>Description: A Python dataclass container with multi-indexing and bulk operations. Provides the typed benefits and ergonomics of dataclasses while having the efficiency of Pandas dataframes.<p>The container is based on data-oriented design by optimising the memory layout of the stored data, providing fast bulk operations and a smaller memory footprint for large collections. Bulk operations are enabled using Pandas which has a rich set of vectorised methods for both numerical and string data types.
评论 #24975421 未加载
unwindover 4 years ago
Meta: that&#x27;s a very random backslash in the title, very confusing in my opinion.<p>It&#x27;s just an abbreviated &quot;with&quot; so should be a forward slash (&quot;w&#x2F;&quot;) or, better imo, written out in full.<p>EDIT: And also &quot;opps&quot; isn&#x27;t a very good way to abbreviate &quot;operations&quot;, is it?
minimaxirover 4 years ago
This appears to be encroaching on Apache Arrow&#x27;s territory (which notably is headed by the creator of Pandas): <a href="https:&#x2F;&#x2F;arrow.apache.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;arrow.apache.org&#x2F;</a>
评论 #24977233 未加载
Jugurthaover 4 years ago
One of the reasons this is interesting to me is that we quickly hit one of MLflow&#x27;s limitations[0] in our machine learning platform[1]. Users collaborate in near real-time on notebooks, schedule long-running notebooks, and we automatically detect user&#x27;s models and then save them so they don&#x27;t have to. They can then deploy them in one click. However, MLflow has trouble with models requiring high-dimensional inputs, which is most non toy models I&#x27;ve seen.<p>The usual &quot;solution&quot; is to write custom wrapping code for this because it only supports 2D DataFrames, which is unacceptable for us because that would mean users would have to do it, so we&#x27;ll take care of this too.<p>- [0]: <a href="https:&#x2F;&#x2F;github.com&#x2F;mlflow&#x2F;mlflow&#x2F;issues&#x2F;3570" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mlflow&#x2F;mlflow&#x2F;issues&#x2F;3570</a><p>- [1]: <a href="https:&#x2F;&#x2F;iko.ai" rel="nofollow">https:&#x2F;&#x2F;iko.ai</a>
mplewisover 4 years ago
I don&#x27;t generally think that dataframes as a data interchange format are a good idea if you&#x27;re looking to build long-term maintainable code. Their structure is too mutable – in my experience, they lead to kitchen sink APIs where the easiest way to add a feature is to shove more columns into the dataframe.
评论 #24971566 未加载
评论 #24973854 未加载
jackricover 4 years ago
Excellent, I&#x27;ve been adding lots of dataclasses to a project that has some pandas parts. Love the additional productivity from pyright type checking and found the type ambiguity of Dataframes annoying.
hankdoupeover 4 years ago
This is nice!<p>I just implemented a pandas-like api for one of my projects, but I wound up building it on top of sortedcontainers: <a href="https:&#x2F;&#x2F;paramtools.dev&#x2F;api&#x2F;viewing-data.html" rel="nofollow">https:&#x2F;&#x2F;paramtools.dev&#x2F;api&#x2F;viewing-data.html</a><p>I almost swapped over to using Pandas Series like what you did, but I went with something (I think) is lighter weight.<p>It&#x27;s cool to see what an alternative implementation might have looked like!
karlicossover 4 years ago
Very cool, seems like something I wanted to implement for my life dashboard (health&#x2F;sleep&#x2F;exercise&#x2F;etc) <a href="https:&#x2F;&#x2F;github.com&#x2F;karlicoss&#x2F;dashboard" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;karlicoss&#x2F;dashboard</a><p>I&#x27;m heavily relying on pandas frames for manipulating the data, and have been thinking of ways to make it type safe[r], will give your library a try!
adornedCupcakeover 4 years ago
I don&#x27;t suppose there&#x27;s type checking on indexing via fields... IE df.at(field_value) typechecks no matter what type the first field of the record class is.
评论 #24977996 未加载