TechEcho

10 comments

pytlicekabout 4 years ago

I created this tool for myself because I often work with xlsx and csv files. This is usually done through Python Pandas. But if you just need to read these files and work with the values in the lines, you don't need to import the whole Pandas library. It is not even necessary to install Pandas for easier deployment. This can save you up to 2GB space if you do docker images with Pandas.One of the things I still don't understand are services like snyk.io, which are supposed to do security analysis. But they penalize a tool like this for not having CoC, Contributing in the GitHub repository, and what is most shocking to me is that they measure Popularity. I understand that if more people are involved in the SW, it is probably safer. But penalizing someone for having few stars on GitHub seems weird to me. Especially when the tool is used by several people / companies and it has over 5,000 downloads.

评论 #26889617 未加载

评论 #26895399 未加载

BugsJustFindMeabout 4 years ago

You won't be able to use this if your file doesn't fit in RAM. This unnecessarily clones the file into a list instead of returning a generator and leaving the list conversion up to the user.

评论 #26891935 未加载

lettergramabout 4 years ago

I maintain a similar project, load any CSV, manipulate and get stats, detect sensitive data, etc<a href="https://github.com/capitalone/DataProfiler" rel="nofollow">https://github.com/capitalone/DataProfiler</a>My question, how do you do header detection? That's a _very_ difficult problem.

评论 #26896526 未加载

gpapilionabout 4 years ago

Isn’t this already in the csv module with dictreader?Xlsx I know nothing about.

评论 #26892016 未加载

评论 #26890260 未加载

评论 #26890175 未加载

评论 #26890052 未加载

psingabout 4 years ago

There are many data engineers at companies who have to write custom little scripts to take data from spreadsheets into an analytics DB.Thanks for removing some boilerplate from that process for people!

评论 #26891882 未加载

athoraxabout 4 years ago

For the opposite direction, I have had good luck with the XlsxWriter library<a href="https://github.com/jmcnamara/XlsxWriter" rel="nofollow">https://github.com/jmcnamara/XlsxWriter</a>

jquaintabout 4 years ago

Speaking of excel files, Does anyone know of a good way to port sheets with equations/functions to python? Sometimes I need a calculations from a sheet and I have to manually copy them over.

unixheroabout 4 years ago

I usually import CSVs into a Python Pandas Dataframe and then iterate over the dataframe in a loop or manual line by line interventions and then beam the data out somewhere else...Is this a better approach?

评论 #26889432 未加载

评论 #26890353 未加载

stuaxoabout 4 years ago

Nice, I did something like this and made it a gist ages ago for XLS, it's good you're putting up something that's more maintained and working with multiple formats.

impoppyabout 4 years ago

It’d be better to use namedtuple to avoid repeating same dictionary keys imo

评论 #26889694 未加载

10 comments

pytlicekabout 4 years ago

评论 #26889617 未加载

评论 #26895399 未加载

BugsJustFindMeabout 4 years ago

You won't be able to use this if your file doesn't fit in RAM. This unnecessarily clones the file into a list instead of returning a generator and leaving the list conversion up to the user.

评论 #26891935 未加载

lettergramabout 4 years ago

评论 #26896526 未加载

gpapilionabout 4 years ago

Isn’t this already in the csv module with dictreader?Xlsx I know nothing about.

评论 #26892016 未加载

评论 #26890260 未加载

评论 #26890175 未加载

评论 #26890052 未加载

psingabout 4 years ago

评论 #26891882 未加载

athoraxabout 4 years ago

For the opposite direction, I have had good luck with the XlsxWriter library<a href="https://github.com/jmcnamara/XlsxWriter" rel="nofollow">https://github.com/jmcnamara/XlsxWriter</a>

jquaintabout 4 years ago

Speaking of excel files, Does anyone know of a good way to port sheets with equations/functions to python? Sometimes I need a calculations from a sheet and I have to manually copy them over.

unixheroabout 4 years ago

评论 #26889432 未加载

评论 #26890353 未加载

stuaxoabout 4 years ago

Nice, I did something like this and made it a gist ages ago for XLS, it's good you're putting up something that's more maintained and working with multiple formats.

impoppyabout 4 years ago

It’d be better to use namedtuple to avoid repeating same dictionary keys imo

评论 #26889694 未加载

Show HN: sheet2dict – simple Python XLSX/CSV reader/to dictionary converter

10 comments

Show HN: sheet2dict – simple Python XLSX/CSV reader/to dictionary converter

10 comments