A big part of my job is to ingest data from hundreds of providers to process.<p>A provider that will push data to my S3 bucket, as parquet files, is the best I can think of.<p>I don't want no provider managed database to host, stupidly huge zipped CSV dumps, or whatever weird technology the provider can think of.<p>Seriously reading a file from S3 and stream processing the parquet is not the peak of engineering. What is there to complain about...<p>Some of the insanity I had to deal with:<p>- 350TB of zipped CSVs which columns and delimiter vary across time, shipped on HDD by mail<p>- Self hosting a MSSQL server for the provider to update<p>- Good old FTP polling at more or less predictable times<p>- Fancy you-name-it-shiny-new-tech remote database accounts<p>- The crappy client http website with custom SQL inspired query language to download zipped extracts<p>- The web 2.0 rest API, billed per request<p>- Snowflake data warehouse data lake big data<p>Please, don't be that guy, just push parquet flat files in my bucket.