TE
테크에코
홈24시간 인기최신베스트질문쇼채용
GitHubTwitter
홈

테크에코

Next.js로 구축된 기술 뉴스 플랫폼으로 글로벌 기술 뉴스와 토론을 제공합니다.

GitHubTwitter

홈

홈최신베스트질문쇼채용

리소스

HackerNews API원본 HackerNewsNext.js

© 2025 테크에코. 모든 권리 보유.

Loading Pydantic models from JSON without running out of memory

132 포인트작성자: itamarst3일 전

10 comments

scolvin2일 전
Pydantic author here. We have plans for an improvement to pydantic where JSON is parsed iteratively, which will make way for reading a file as we parse it. Details in <a href="https:&#x2F;&#x2F;github.com&#x2F;pydantic&#x2F;pydantic&#x2F;issues&#x2F;10032">https:&#x2F;&#x2F;github.com&#x2F;pydantic&#x2F;pydantic&#x2F;issues&#x2F;10032</a>.<p>Our JSON parser, jiter (<a href="https:&#x2F;&#x2F;github.com&#x2F;pydantic&#x2F;jiter">https:&#x2F;&#x2F;github.com&#x2F;pydantic&#x2F;jiter</a>) already supports iterative parsing, so it&#x27;s &quot;just&quot; a matter of solving the lifetimes in pydantic-core to validate as we parse.<p>This should make pydantic around 3x faster at parsing JSON and significantly reduce the memory overhead.
评论 #44071756 未加载
评论 #44081810 未加载
fidotron3일 전
Having only recently encountered this, does anyone have any insight as to why it takes 2GB to handle a 100MB file?<p>This looks highly reminiscent (though not exactly the same, pedants) of why people used to get excited about using SAX instead of DOM for xml parsing.
评论 #44068605 未加载
评论 #44071483 未加载
评论 #44070569 未加载
jmugan3일 전
My problem isn&#x27;t running out of memory; it&#x27;s loading in a complex model where the fields are BaseModels and unions of BaseModels multiple levels deep. It doesn&#x27;t load it all the way and leaves some of the deeper parts as dictionaries. I need like almost a parser to search the space of different loads. Anyone have any ideas for software that does that?
评论 #44066263 未加载
评论 #44066245 未加载
评论 #44068042 未加载
评论 #44066306 未加载
deepsquirrelnet3일 전
Alternatively, if you had to go with json, you could consider using jsonl. I think I’d start by evaluating whether this is a good application for json. I tend to only want to use it for small files. Binary formats are usually much better in this scenario.
dgan3일 전
i gave up on python dataclasses &amp; json. Using protobufs object within the application itself. I also have a &quot;...Mixin&quot; class for almost every wire model, with extra methods<p>Automatic, statically typed deserialization is worth the trouble in my opinion
fjasdfas3일 전
So are there downsides to just always setting slots=True on all of my python data types?
评论 #44065613 未加载
zxilly3일 전
Maybe using mmap would also save some memory, I&#x27;m not quite sure if this can be implemented in Python.
评论 #44066723 未加载
thisguy473일 전
I&#x27;d like to see a comparison of ijson vs just `json.load(f)`. `ujson` would also be interesting to see.
评论 #44065600 未加载
kayson3일 전
How does the speed of the dataclass version compare?
m_ke3일 전
Or just dump pydantic and use msgspec instead: <a href="https:&#x2F;&#x2F;jcristharif.com&#x2F;msgspec&#x2F;" rel="nofollow">https:&#x2F;&#x2F;jcristharif.com&#x2F;msgspec&#x2F;</a>
评论 #44066139 未加载
评论 #44065918 未加载
评论 #44070715 未加载
评论 #44069064 未加载