Infinite Git repos on Cloudflare workers

144 pointsby plesiv7 months ago

23 comments

koolba7 months ago

> We’re building Gitlip - the collaborative devtool for the AI era. An all-in-one combination of Git-powered version control, collaborative coding and 1-click deployments.Did they get a waiver from the git team to name it as such?Per the trademark policy, new “git${SUFFIX}” names aren’t allowed: <a href="https://git-scm.com/about/trademark" rel="nofollow">https://git-scm.com/about/trademark</a>>> In addition, you may not use any of the Marks as a syllable in a new word or as part of a portmanteau (e.g., "Gitalicious", "Gitpedia") used as a mark for a third-party product or service without Conservancy's written permission. For the avoidance of doubt, this provision applies even to third-party marks that use the Marks as a syllable or as part of a portmanteau to refer to a product or service's use of Git code.

评论 #41948541 未加载

评论 #41949148 未加载

评论 #41948826 未加载

评论 #41949043 未加载

ecshafer7 months ago

Github doesn't stop me from making an infinite number of git repos. Or maybe they do, but I have never hit the limit. And if I am hitting that limit, and become a large enterprise customer, I am sure they would work with me on getting around that limit.Where does this fit into a product? Maybe I am blind, but while this is cool, I don't really see where I would want this.

评论 #41948262 未加载

评论 #41948098 未加载

评论 #41953458 未加载

评论 #41949224 未加载

yjftsjthsd-h7 months ago

> It allows us to easily host an infinite number of repositoriesI like this system in general, but I don't understand why scaling the number of repos is treated as a pinch point? Are there git hosts that struggle with the number of repos hosted in particular? (I don't think the "Motivation" section answers this, either.)

评论 #41947953 未加载

评论 #41947824 未加载

评论 #41948332 未加载

jauntywundrkind7 months ago

> After extensive research, we rewrote significant parts of Emscripten to support asynchronous file system calls.> We ended up creating our own Emscripten filesystem on top of Durable Objects, which we call DOFS.> We abandoned the porting efforts and ended up implementing the missing Git server functionality ourselves by leveraging libgit2’s core functionality, studying all available documentation, and painstakingly investigating Git’s behavior.Using a ton of great open source & taking it all further. Would sure be great if ya'll could contribute some of this forward!Libgit2 is GPL with Linking Exception, and Emscripten MIT so I think legally everything is in the clear. But it sure would be such a boon to share.

评论 #41948018 未加载

sluongng7 months ago

@plesiv could you please elaborate on how repack/gc is handled with a libgit2 backend? I know that Alibaba has done something similar in the past based on libgit2, but I have yet to see another implementation in the wild like this.Very cool project. I hope Cloudflare workers can support more protocols like SSH and GRPC. It's one of the reasons why I prefer Fly.io over Cloudflare worker for special servers like this.

评论 #41948696 未加载

betaby7 months ago

Somewhat related question. Assume I have ~1k ~200MB XML files that get ~20% of their content changed. What are my best option to store them? While using vanilla git on a SSD raid10 works, that's quite slow in retrieving historical data dating back ~3-6 months. Are there other options for a quickie back-end? I'm fine with it being not that storage efficient to a degree.

评论 #41948659 未加载

评论 #41953534 未加载

评论 #41948223 未加载

评论 #41949808 未加载

评论 #41948572 未加载

评论 #41948292 未加载

skybrian7 months ago

Not having a technical limit is nice, because then it’s a matter of spending money. But whenever I see “infinite,” I ask what it will cost. How expensive is it to host git repos this way?As a hobbyist, “free” is pretty appealing. I’m pretty sure my repos on GitHub won’t cost me anything, and that’s unlikely to change anytime soon. Not sure about the new stuff.

评论 #41948467 未加载

VoidWhisperer7 months ago

Not the main purpose of the article but they mention they were working on a notetaking app oriented towards developers - did anything ever come of that? If not, does anyone know products that might fit this niche? (I currently use obsidian)

评论 #41948125 未加载

tln7 months ago

Congrats, you've done a lot of interesting work to get here.This could be a fantastic building block for headless CMS and the like.

评论 #41948160 未加载

seanvelasco7 months ago

this leverages Durable Objects, but as i remember from two years ago, DO's way of guaranteeing uniqueness is that there can only be once instance of that DO in the world.what if there are two users who wants to access the same DO repo at the same time, one in the US and the other in Singapore? the DO must live either in US servers or SG servers, but not at the same time. so one of the two users must have high latency then?then after some time, a user in Australia accesses this DO repo - the DO bounces to AU servers - US and SG users will have high latency?but please correct me if i'm wrong

评论 #41948422 未加载

tredre37 months ago

> Wanting to avoid managing the servers ourselves, we experimented with a serverless approach.I must be getting old but building a gigantic house of card of interlinked components only to arrive to a more limited solution is truly bizarre to me.The maintenance burden for a VPS: periodically run apt update upgrade. Use filesystem snapshots to create periodic backups. If something happens to your provider, spin up a new VM elsewhere with your last snapshot.The maintenance burden for your solution: Periodically merge upstream libgit2 in your custom fork, maintain your custom git server code and audit it for vulnerabilities, make sure everything still compiles with emscripten, deploy it. Rotate API keys to make sure your database service can talk to your storage service and your worker service. Then I don't even know how you'd backup all this to get it back online quickly if something happened to cloudflare. And all that only to end up with worse latency than a VPS, and more size constraints on the repo and objects.But hey, at least it scales infinitely!

评论 #41948413 未加载

评论 #41948479 未加载

Spunkie7 months ago

I've been wondering what to do to backup our github repos other than keeping a local copy and/or dumping them on something like S3.I would love to use this to serve as a live/working automatic backup for my github repos on CF infrastructure.

yellow_lead7 months ago

The latency on the examples seems quite slow, around 7 seconds to a full load for me.<a href="https://gitlip.com/@nataliemarleny/test-repo" rel="nofollow">https://gitlip.com/@nataliemarleny/test-repo</a>

评论 #41948054 未加载

ericyd7 months ago

Engaging read! For me, just the right balance of technical detail and narrative content. It's a hard balance to strike and I'm sure preferences vary widely which makes it an impossible target for every audience.

csomar7 months ago

This picked my interest as I am working on a Git product and using Cloudflare Workers for most of my back-end. I looked through the options, but the hard limit for Cloudflare workers and the fact that most interesting repos (that is companies you want to sell to) have repos in the Gbs means the platform is not fit for this.I am ending up with AWS lambdas. Not only that solves the Wasm issue but you can have up to 10Gb of memory on a single instance. That is close to enough for most use cases. 100Mb? Not really.

gkoberger7 months ago

This is really cool! I've been building something on libgit2 + EFS, and this approach is really interesting.Between libgit2 on emscripten, the number of file writes to DO, etc, how is performance?

markphip7 months ago

I wonder if they considered or looked at using JGit? <a href="https://github.com/eclipse-jgit/jgit">https://github.com/eclipse-jgit/jgit</a>It provides client and server API. The latter is used by Gerrit for its server. <a href="https://www.gerritcodereview.com" rel="nofollow">https://www.gerritcodereview.com</a>Not sure what the Java to WASM story is if that is a requirement for what they need.

stavros7 months ago

This is a very impressive technical achievement, and it's clear that a lot of work went into it.Unfortunately, the entrepreneur in me continues that thought with "work that could have gone into finding customers instead". Now you have a system that could store "infinite" git repos, but how many customers?

评论 #41950494 未加载

评论 #41950288 未加载

scosman7 months ago

Serverless git repos: super coolBut I can't figure out what makes this an AI company. Seems like a collaboration tool?

评论 #41948309 未加载

iampims7 months ago

Some serious engineering here. Kudos!

nathants7 months ago

this is very cool!i prototyped a similar serverless git product recently using a different technique.i used aws lambda holding leases in dynamo backed by s3. i zipped git binaries into the lambda and invoked them directly. i used shallow clone style repos stored in chunks in s3, that could be merged as needed in lambda /tmp.lambda was nice because for cpu heavy ops like merging many shallow clones, i could do that in a larger cpu lambda, and cache the result.other constraints were similar to what is described here. mainly that an individual push/pull cannot exceed the api gateway max payload size, a few MB.i looked at isomorphic, but did not try emscripten libgit2. using cloudflare is nice because of free egress, which opens up many new use cases that don’t make sense on $0.10/GB egress.i ended up shelving this while i build a different product. glad to see others pursuing the same thing, serverless git is an obvious win! do you back your repos with r2?for my own git usage, what i ended up building was a trustless git system backed by dynamo and s3 directly. this removes the push/pull size limit, and makes storage trustless. this uses git functionality i had no idea about prior, git bundle and unbundle[1]. they are used for transfer of git objects without a server, serverless git! this one i published[2].good luck with your raise and your project. looking forward to the next blog. awesome stuff.1. <a href="https://git-scm.com/docs/git-bundle" rel="nofollow">https://git-scm.com/docs/git-bundle</a>2. <a href="https://github.com/nathants/git-remote-aws">https://github.com/nathants/git-remote-aws</a>

gavindean907 months ago

I really like the idea if file system over durable objects

评论 #41950181 未加载

bagels7 months ago

Infinite sounds like a bug happened. It's obviously not infinite, some resource will eventually be exhausted, in this case, memory.