Hi Everyone! Super excited for our first release!!<p>I was product manager for Apache Hive at Hortonworks and earlier an early engineer at CUDA at NVIDIA - focusing on compiler optimizations.<p>As product manager, I saw so many customers struggle with Hive/Hadoop. So, I decided to build a product first company - that just works to solve real (and sometimes unsexy) pain points of customers.<p>We want to support entire Enterprise journey to Open Source (Apache Spark) and then to cloud (Kubernetes).<p>What I'm personally excited about is that with some compiler magic - I can make code and visual interfaces both work together - making all developers happy!<p>I'd love to hear what you think, what you wish we'd build, and I'll be here to answer any questions!
Very cool that you can go between drag-n-drop and code for development. How is Prophecy different from datacoral.com or Astronomer.io? Will it be open sourced (ex, dbt, airflow, Dagster)?
Out of personal (and painful) experience, using Spark for general-purpose ETL processes is a bad idea. Spark is meant to be used in highly-distributed systems and with tables that have like trillions of rows. RDDs use an optimized distributed programming model that takes a lot of practice and getting used to. Some operations are virtually impossible due to the fact that executors run on separate contexts. Caveat emptor.