I created this tool for myself because I often work with xlsx and csv files. This is usually done through Python Pandas. But if you just need to read these files and work with the values in the lines, you don't need to import the whole Pandas library. It is not even necessary to install Pandas for easier deployment. This can save you up to 2GB space if you do docker images with Pandas.<p>One of the things I still don't understand are services like snyk.io, which are supposed to do security analysis. But they penalize a tool like this for not having CoC, Contributing in the GitHub repository, and what is most shocking to me is that they measure Popularity. I understand that if more people are involved in the SW, it is probably safer. But penalizing someone for having few stars on GitHub seems weird to me. Especially when the tool is used by several people / companies and it has over 5,000 downloads.
You won't be able to use this if your file doesn't fit in RAM. This unnecessarily clones the file into a list instead of returning a generator and leaving the list conversion up to the user.
I maintain a similar project, load any CSV, manipulate and get stats, detect sensitive data, etc<p><a href="https://github.com/capitalone/DataProfiler" rel="nofollow">https://github.com/capitalone/DataProfiler</a><p>My question, how do you do header detection? That's a _very_ difficult problem.
There are many data engineers at companies who have to write custom little scripts to take data from spreadsheets into an analytics DB.<p>Thanks for removing some boilerplate from that process for people!
For the opposite direction, I have had good luck with the XlsxWriter library<p><a href="https://github.com/jmcnamara/XlsxWriter" rel="nofollow">https://github.com/jmcnamara/XlsxWriter</a>
Speaking of excel files, Does anyone know of a good way to port sheets with equations/functions to python? Sometimes I need a calculations from a sheet and I have to manually copy them over.
I usually import CSVs into a Python Pandas Dataframe and then iterate over the dataframe in a loop or manual line by line interventions and then beam the data out somewhere else...<p>Is this a better approach?
Nice, I did something like this and made it a gist ages ago for XLS, it's good you're putting up something that's more maintained and working with multiple formats.