Their architecture descriptions starts with a strawman:<p>> Usually distributed file systems split each file into chunks, a central master keeps a mapping of filenames, chunk indices to chunk handles, and also which chunks each chunk server has.<p>> The main drawback is that the central master can't handle many small files efficiently, and since all read requests need to go through the chunk master, so it might not scale well for many concurrent users.<p>The chunk server architecture has been first put into production with the Google File System AFAIK. And it has been designed specifically for large files (what search needed at that time). So no surprise.<p>But that's only one architecture for a DFS. There are also block-based DFS (like GPFS),
object-based DFS (Lustre.), cluster file systems (OCFS), and other architectures. They exhibit other characteristics.<p>Telling from the architecture and Wiki, it does not seem to be a file system at its core, but an object store with a file translation layer. One of the core problems of this approach is that in-place updates usually mean read-modify-write (if the object store has immutable objects, like most have, with Ceph being a notable exception).<p>From the replication page:<p>> If one replica is missing, there are no automatic repair right away. This is to prevent over replication due to transient volume sever failures or disconnections. In stead, the volume will just become readonly. For any new writes, just assign a different file id to a different volume.<p>This sounds like the architecture and implementation is still pretty basic. Distributed storage without redundancy (working redundancy!) is not that interesting.<p>Sorry to be that critical (great that someone writes a distributed file system!), but I think it is important to add some context. And the seaweed auther seems to have problems with bold statements either...<p>Disclaimer: I also work on a distributed file system (with unified access via S3 ;)
Evercam has used Seaweed for a few years. We've 1344TB of mostly jpegs and use the filer for folder structure. It's worked well for us, especially with low cost Hetzner SX boxes. I'd echo other people's positive comments about the maintainer's responsiveness & support. Happy to (try and) answer questions.
If Oracle wins the Supreme Court case against Google, aren't all these "like S3" or S3 API compatible solutions (whether block storage competitors or file systems) at risk?
We've been running SeaweedFS in production serving images and other small files. We're not using Filer functionality just the underlying volume storage. We wrote our own asynchronous replication on top of the volume servers since we couldn't rely on synchronous replication across datacenters. The maintainer is super responsive and is quick to review our PRs. Happy to answer any questions.
Whenever you introduce a new solution into a problem space that already has plenty of options, you are obligated to state why your (new) solution is needed in the first place, IMO.<p>They did it well:<p>> Most other distributed file systems seem more complicated than necessary.<p>> SeaweedFS is meant to be fast and simple, in both setup and operation. If you do not understand how it works when you reach here, we've failed! Please raise an issue with any questions or update this file with clarifications.<p><a href="https://github.com/chrislusf/seaweedfs#compared-to-other-file-systems" rel="nofollow">https://github.com/chrislusf/seaweedfs#compared-to-other-fil...</a><p>However, since I never had to touch hdfs after installing it in the first place, I wonder what the difficulties in operation are, that they tried to overcome here?
This looks almost exactly like the kind of data store I need for an application. I have previously considered using minio (too inflexible wrt to adding more shards / replicas), a homebrew system based on something like ScyllaDB (needs code on top to act like a blob store) or S3/B2 (too slow and/or expensive wrt to transfer costs). Is anyone using this in production and can share a story of how stable and hard to run it is?
The architecture reminds me of 'mogilefs', which has a similar mechanism of filename to file storage.<p><a href="https://github.com/mogilefs/mogilefs-docs/blob/master/HighLevelOverview.md" rel="nofollow">https://github.com/mogilefs/mogilefs-docs/blob/master/HighLe...</a><p>It's an old system from the folks @ Danga, but the mailing list still sees random activity now and then...
I really wish this project, or other object storage systems modelled after haystack, would get more traction. I think it is reasonable to expect that your object storage system should support both small objects (< 10k) and large objects (> 1MB) transparently, but in my experience none of the heavily used open-source object stores (ceph, swift) can actually support small objects adequately.
Some differentiators that aren't immediately obvious in the comparison:<p>> SeaweedFS Filer metadata store can be any well-known and proven data stores, e.g., Cassandra, Mongodb, Redis, Elastic Search, MySql, Postgres, MemSql, TiDB, CockroachDB, Etcd etc, and is easy to customized.<p>I'm not very familiar with other DFS's but at the very least glusterfs stores metadata as xattrs on an underlying filesystem and so has no need of an external data store.<p>Also, SeaweedFS has a "master" server (single centralized with failover to secondary) and "volume servers" (responsible for data).
This is interesting. I've been looking for a file system for a non-RAID disk array I want to set up at home, and this seems to have <i>some</i> of the characteristics I'm looking for. The primary downfall for my particular use case seems to be that I want to use partiy-based error correction rather than (or, in addition to) replication, because I want the array to be able to survive a failure of any N disks in the array.<p>Is there anything like that out there (other than Unraid, which I kinda don't like)?
I am not a large user whatsoever but I've been using SeaweedFS for a few years now.<p>It is archiving and serving more than 40,000 images on a webapp I built for the small team I work with.<p>I run SeaweedFS on two machines and it serves all images I host.<p>I wanted to kick the tires because I was always fascinated by Facebook's Haystack.<p>It has been simple, reliable, and robust. I really like it and hope if one of my side projects ever take off at some point, I get to test it with a much bigger load.
This is really cool. The killer feature I see is to be able to have a cloud storage tier for warm data that goes off to s3, while keeping the hot storage local. Does anyone know of another option that allows this kind of hybrid local / s3 storage that also has a filesystem interface?
We are running SeaweedFS successfully in production for a few years. We are serving and storing mostly user-uploaded images (around 100TB). It works surprisingly stable and the maintainer is usually responsive when we encounter issues.
If you want something similar that also supports NFS, then there's leofs: <a href="https://github.com/leo-project/leofs" rel="nofollow">https://github.com/leo-project/leofs</a>
I have been following seaweedFS since forever. Played with it on my own homelab.<p>But I don't know if there's a major shop that uses it. Anyone knows?
Geez, what's with those weird project names? For a second I expected/hoped this would be some cool hack storing data in actual seaweed. (You know like pingfs..) No it's not! It's some S3 k8s ... thing. Nothing wrong with that but come on, choose a better name!<p>And no I'm not particularly fond of the namr CockroachDB either.
Looking good!<p>Take a look at Gasper (<a href="https://talhof8.github.com/gasper" rel="nofollow">https://talhof8.github.com/gasper</a>).