This looks like an amalgamation of 8+ open source projects or industries with products put forth by companies that have dozens of employees and worked on their products for years.<p>It also doesn't even categorize the products they compete with correctly[0].<p>Why not contribute some of your resources to one of the many active open source libraries already trying to solve some of these problems, and focus your engineering efforts on your core product?<p>[0] Fivetran is only considered "Orchestrate" but is actually competes directly with Alooma in the Extract and Load. Also, there are DOZENS of company in that space.
<a href="https://gitlab.com/meltano/meltano/blob/master/README.md#data-science-lifecycle" rel="nofollow">https://gitlab.com/meltano/meltano/blob/master/README.md#dat...</a>
I can't understand why GitLab thinks they have to embark on a new project every so often instead of focusing on their current product and features. There is just a lot to work on, so many of the current features/products are half assed. At my place we moved to GitLab 2.5 years ago and updates where smoother back then but the past few months we had to hire a new sys admin for our build machines and GitLab server to follow on new issues created on GitLab.com and decide if it's safe release and even then he still reports 4-5 issues to GitLab support after every update. We were expecting it to be an easy `yum update` like a normal package but it's just getting worse update after update. It's so bad that my manager asked me to look into GitHub + another CI/CD solution.
Data pipelines are not a great subject for an open-source project. We've been building these for the last 3+ years at Fivetran, and I can tell you that the challenge is:<p><pre><code> - Studying each source to figure out the right data model
- Chasing down a million weird corner cases
- Working around dumb bugs in the data sources
</code></pre>
This is the kind of problem where paying for software really works better. When people build data pipelines in-house, they tend to hack at it until it works for their use case and then stop. When we build data pipelines, we map out every feature of the data source, implement the whole thing at once, and then put it through a beta period with <i>multiple</i> real users. This is easy to do when you have a tight-knit dev team; much harder for a group of part-time open-source contributors.
GitLab's usage of team members in marketing material is creeping me out (as does the whole team page[0]).<p>[0] <a href="https://about.gitlab.com/team/" rel="nofollow">https://about.gitlab.com/team/</a>
Reading this I was concerned that it would be written in Ruby. While Ruby is a reasonable language for server development, it has almost no data science community when compared with some other ecosystems.<p>I was very glad to see this is Python! Python has some of the best data tools out there, and a mature ecosystem for solving all the engineering problems that go along with a great data stack.
The page talks mentions MVC, and the issue page[0] keeps mentioning MVC as well. Was this supposed to be MVP, or something else? Model-view-controller doesn't make sense in the context.<p>[0] <a href="https://gitlab.com/meltano/meltano/issues/10" rel="nofollow">https://gitlab.com/meltano/meltano/issues/10</a>
Be interested to know all the competitors in this space. <a href="https://data.world/" rel="nofollow">https://data.world/</a> is one I am most familiar with.