TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Open-Source Data Replication and Anonymization

24 点作者 edrenova超过 1 年前
Hey HN, we&#x27;re Evis and Nick, we&#x27;re excited to be launching Neosync on HN!<p>Neosync is an open source data replication and anonymization project that helps developers create safe, anonymized test data and sync it across all environments for high-quality local, stage and CI testing.<p>This is how it works:<p>1. You select a job type. Today we support data sync jobs (these sync data between two databases and run on a schedule you define) and a data generation job (this generates synthetic data from scratch and sends it to a destination).<p>2. Next you define your source database and a destination(s) database (you can connect multiple destinations).<p>3. Next we pull in the schema from the source DB and then you can decide how to you to want transform your data. We ship with 40+ transformers (email, first name, address, random int64, random string, random float64, etc). You can create your own custom transformations as well. We&#x27;ve designed our transformers to be as flexible as possible so you can use them across almost any data type. You could also use Neosync in passthrough mode which means that none of the data will be transformed and you can use it for data replication.<p>4. Lastly, you can defined subsets. This is a way to filter the data that gets sent to the destination. You can provide a custom SQL query or filters to do this. For example, you can filter the data by an id, customerType, column, date, etc. This is very flexible.<p>And that&#x27;s it! The job will run on the schedule you determine. We handle things like retries and backoffs and referential integrity between tables.<p>We also ship with APIs, a CLI and Github action so that you can use Neosync to hydrate a CI database in your CI pipeline. We&#x27;re working on releasing a Terraform provider shortly.<p>Deployment is pretty straightforward. You can deploy Neosync using Docker Compose (we provide a script) or on Kubernetes using our helm chart.<p>So what&#x27;s next? Here&#x27;s a brief overview: Real time mode (hook up Neosync to Kafka&#x2F;SQS and anonymize and send the data to destinations in real time) and more connections (MongoDB, Snowflake, CSV). On the ML side, supporting use-cases like consistent data generation (providing a seed value), statistically consistent data and more. You can check out our roadmap in our Github project.<p>Here&#x27;s a brief loom demo: <a href="https:&#x2F;&#x2F;www.loom.com&#x2F;share&#x2F;66224a81074a464b92808ecdc82b7ddb?sid=fd02808d-fcfe-4485-a723-97e37b5680eb" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.loom.com&#x2F;share&#x2F;66224a81074a464b92808ecdc82b7ddb?...</a><p>We&#x27;d love your feedback and contributions. We strongly believe that your data should be yours and it should stay on your infrastructure and open source is the best way to bring that vision to life.

1 comment

matijash超过 1 年前
This sounds cool! How would we go about integrating this with <a href="https:&#x2F;&#x2F;github.com&#x2F;wasp-lang&#x2F;wasp">https:&#x2F;&#x2F;github.com&#x2F;wasp-lang&#x2F;wasp</a> ?
评论 #38572699 未加载