TechEcho

My background is as a software engineer at SMBs and startups, which typically lack a data scientist. Because I love numbers, charts, etc. I do a lot of amateur data science. I enjoy the analysis, but I despise the data preprocessing: it's common for an SMB to store data across 10+ cloud apps (Salesforce, Google Ads, Hubspot, Mixpanel, etc) but it's uncommon for them to store this data in a data warehouse. This means I need to learn each app's API and write scripts to fetch data, transpose data, set proper data types, handle schema changes, network errors, etc.<p>I'd love an API like `sklearn.datasets`, but for cloud apps, so I'm thinking of creating one. For example, if I wanted to fetch Saleforce contacts, I'd simply call `petaldata.Salesforce(API_KEY).contacts().to_frame()` to load contacts into a Pandas Dataframe with proper types.<p>Good/bad idea? This would be for folks that aren't ready for a full ETL pipeline + data warehouse but want to dig through their data and experiment with ML algorithms quickly.

Ask HN: Good/bad idea – universal API for cloud app datasets?

no comments

Ask HN: Good/bad idea – universal API for cloud app datasets?

no comments