I am using a third party service which would transcribe an audio file and give me the entire transcription in a json format, I would read this and make a new json file by transforming it and send to s3. I am using python/django for it along with json, requests and boto3 module, however I noticed as the application kept running the memory consumption kept increasing.<p>To fix this I had to write it as a streaming response to a json file and use ijson (https://pypi.org/project/ijson/) to read the content which decreased memory utilisation a lot. So while looking around, I found many people had the same issue, so this got me thinking how would I have sent this content if i had built the third party api myself.<p>Some previous questions asked on stackoverflow,<p>https://stackoverflow.com/questions/2400643/is-there-a-memory-efficient-and-fast-way-to-load-big-json-files-in-python<p>https://stackoverflow.com/questions/11057712/huge-memory-usage-of-pythons-json-module<p>How do you send a response via an api for very large content, along with their advantages/disadvantages.
Sounds like this is a personal problem and not a them problem. They’re sending you a json and you’re complaining that <i>your</i> memory usage is exploding. That’s on you. If you’re not actively cleaning up after yourself idk how you can blame the <i>external</i> api for your <i>internal</i> memory issues. That’s my two cents at least. You can’t possibly design an api that fixes a user’s poor design.
If you can chunk the response json-seq/xml (sax parser) may be worth looking at.<p>The server can incrementally stream a chunk and client incrementally consume a chunk keeping flat memory usage.<p>I think gRpc also supports streaming but don’t know much on it.<p>JSON is a bad format for large files as generally you need to read the entire file in to memory before you can use it as you observed.
Json is the default go-to standard for web services nowadays. But it doesn't mean it is the best format for your requirements. I'm not sure what an audio transcription is exactly, but if you can "stream" it you don't need json at all. Just use some basic serializer and stream that instead.
If you know the structure of the file so that you are comfortable reading it a bit at a time, and you’re confident the structure won’t change, C has several ways to control how much of the file you read in. You could just write a python module to handle this situation.