科技回声

1 comment

seddonm1将近 6 年前

Hi everyone I would like to show the project we have been working on for a couple of years: Arc, an opinionated framework for defining predictable, repeatable and manageable data transformation pipelines;- predictable in that data is used to define transformations - not code- repeatable in that if a job is executed multiple times it will produce the same result- manageable in that execution considerations and logging have been baked in from the start- MIT licensed open-source and cloud agnosticWe have seen that it is hard to scale data engineering teams in a code-first environment. Arc solves a lot of the problems we have seen data engineering/science teams struggle with. It:- makes data engineering accessible to audiences outside of data engineers - you don't need to be proficient at Scala/Spark to introduce data engineering into your team- has a Jupyter Notebook based development environment to quickly build logic- provides a clear path to production for machine learning (via MLTransform, TensorflowServingTransform or HTTPTransform for models as a service)- has a plugin system allowing federated development for any features not in the base frameworkCurrently it uses the Apache Spark execution engine but due to its declarative nature can be executed against future engines.Please let us know if you have any feedback/suggestions.

Show HN: Arc – a declarative data transformation framework

1 comment

Show HN: Arc – a declarative data transformation framework

1 comment