MPEG-G: the ugly

149 pointsby kierankover 6 years ago

7 comments

rayinerover 6 years ago

This bit is revealing:> There are clear non-royalty based incentives for large companies to develop new compression algorithms and drive the industry forward. Both Google and Facebook have active data compression teams, lead by some of the world's top experts in the field.Google and Facebook can afford to spend money on R&D because they throw off gobs of money from near-monopolies in important economic sectors. This is one of the archetypal models for R&D, and has a lot of precedent: AT&T Bell Labs (bankrolled by AT&T's telephone monopoly) and Xerox PARC (bankrolled by the copier monopoly built on Xerox's patents). Much of the really fundamental technologies underlying computing were developed this way.But MPEG is thirty years old now, and the MPEG-1 standard is 25 years old. Until recently, the MPEG standard has been pushed forward not by a single giant corporation that can afford to bankroll everything, but a consortium of companies using patents and licensing to recover their investment into the R&D. This is one of the other archetypal models for R&D. Many of the other fundamental technologies underlying computing were developed this way.(The third archetypal model is the government-funded project, e.g. TCP/IP, which is also an example of a monopoly bankrolling R&D.)The "benevolent monopoly" model obviously has advantages for open source--because the company bankrolls R&D by monetizing something else, it can afford to release the results of the research for everyone to use. But it's not sustainable without the sponsor (and we know this, because open source has been around for a long time, and there is little precedent for a high-performance video codec designed by an independent group of open source developers).[1]I see people demonizing MPEG and espousing reliance on Google and FB as the way forward, but it's not clear to me that everyone fully understands the implications of that approach.[1] Query whether Theora counts--it was based on an originally proprietary, patented codec.

评论 #18213871 未加载

评论 #18214458 未加载

评论 #18213790 未加载

pdkl95over 6 years ago

> "barrage of 12 patents from GenomSys"Based on the patent titles (I'll see if I can read some of them in detail tomorrow), most of these sounds almost exactly like the code I wrote while working at the JGI[1][2][3] in the early 2000s that managed moving large amounts of reads from the ABI (Sanger) sequencers, running it through phred/phrap, and storing it all so the biologists could access it easily. This included a custom Huffman tree based encoder/decoder to efficiently store FASTA files at (iirc) about ~2.5 bit/base (quality scores were just stored as packed array of bytes), a very large MySQL backend, and a large set of Perl libraries that provided easy access to reads/libraries/assemblies/etc. It was certainly a "method and apparatus" for "storing and accessing" + "indexing" bioinformatics data using a "compact representation" that provided many different types of "selective access".I even had code that did a LD_PRELOAD hack on (circa 2002) Consed that intercepted calls to open(2) to load reads automagically from the DB. Reading Huffman encoded data in bulk from the DB (instead of one file per read) reduced the network bandwidth required to open an assembly with all it's aligned reads by ~90%. That sounds a lot like "transmission of bioinformatics data" over a network and "access ... structured in access units". It defiantly involved "reconstruction of genomic reference sequences from compressed genomic sequence reads".They may have a more efficient compression method, and we didn't do anything re: "multiple genomic descriptors" (was that even a thing <2004?), but... no... they didn't invent what is basically a bioinformatics-specific variations of the same methods used everywhere in the computer industry for as long as "text file formats" have existed.[1] <a href="https://jgi.doe.gov/" rel="nofollow">https://jgi.doe.gov/</a>[2] These are my personal comments and opinions only, which are not endorsed by or currently affiliated with the Joint Genome Institute, Lawrence Berkeley National Laboratory, or the U.S. Department Of Energy.[3] While I have no idea if any of that code even exists today (I left the JGI in 2004), I did mark the source files with the BSD license, since there was historical precedent.

评论 #18212587 未加载

评论 #18213411 未加载

mbreeseover 6 years ago

If there are really patents protecting this format, it makes it a complete non-starter for a great deal of work (commercial and academic). Posts like this scare me. I don't want to devote effort to support a format that I might not be able to use in the future. The only thing that I could think of that might work is putting the patents in some sort of defensive portfolio in much the same way that the Open Invention Network protects Linux.I understand the desire to develop bioinformatics file formats in a more disciplined way than we have done in the past, but this process seems like it may be more of a pain than a benefit. Unfortunately, I couldn't see some of the MPEG-G talks at ISMB this year (other talks were concurrent).Could anyone explain what the benefits of the MPEG-G format is over something like CRAM? I mean, we were already starting to get close to the theoretical minimum in terms of file size. I personally would like to see more support for encryption and robustness (against bitrot) in formats, but this could be done in a very similar way to current formats.

评论 #18212683 未加载

0xcde4c3dbover 6 years ago

How did MPEG get involved in genomic data? I thought it was specifically chartered by ISO for audiovisual formats.

评论 #18213542 未加载

ezoeover 6 years ago

So, they got a lot of patents for DNA compression standard MPEG-G. But there is no way to get a license of it? That's unusable standard for 20 years!What are they thinking?

jascensoover 6 years ago

All MPEG standards have patents and this one is not an exception. If companies are interested they can license its use (assuming fair terms). This is far better than having proprietary formats which are locked or formats made by a single company which you don't know the patent situation clearly. Also, companies involved invested in the development of this standard and expect some return.What I don't like in this post, is the call for non-adoption when the author has a competing format (CRAM) for which the patent situation and the performance is not clear. It seems a biased opinion.

评论 #18212804 未加载

评论 #18213163 未加载

评论 #18212774 未加载

评论 #18213649 未加载

RichardStallmanover 6 years ago

It is a mistake to take for granted that "more technological advance" is worth the price society would pay for it. That price, imposed through patents, is unacceptable in this case.We are better off if other people encode in older, less efficient codecs that we can support in in free/libre software, than if they encode the files a little smaller and we are forbidden by the MPEG patent portfolio to handle it with free software.See <a href="https://www.gnu.org/philosophy/software-literary-patents.html" rel="nofollow">https://www.gnu.org/philosophy/software-literary-patents.htm...</a> and <a href="https://www.gnu.org/philosophy/limit-patent-effect.html" rel="nofollow">https://www.gnu.org/philosophy/limit-patent-effect.html</a>.You'll note that I do not use the term "open source". Since 1983, I have led the free software movement, which campaigns to win freedom in our computing by insisting on software that respects users' freedom. Open source was coined in 1998 to discard the ethical foundation and present the software as a mere matter of convenience.See <a href="https://gnu.org/philosophy/open-source-misses-the-point.html" rel="nofollow">https://gnu.org/philosophy/open-source-misses-the-point.html</a> for more explanation of the difference between free software and open source. See also <a href="https://thebaffler.com/salvos/the-meme-hustler" rel="nofollow">https://thebaffler.com/salvos/the-meme-hustler</a> for Evgeny Morozov's article on the same point.Which one you advocate is up to you. If you stand for freedom, please show it -- by saying "free" and "libre", rather than "open".