Moderna mRNA sequence released to GitHub [pdf]

1248 pointsby aty268about 4 years ago

56 comments

drtzabout 4 years ago

I'm sure it's more complex than I grasp as a layperson, but I'm utterly amazed at how simple this _appears_. I get the feeling that this is something I have a better chance of understanding than the average SaaS Terms and Conditions.I expected to have to scroll through pages upon pages of indecipherable text. Instead it's no bigger than a large paragraph of text, and I can easily fit it on my screen.

评论 #26629150 未加载

评论 #26629996 未加载

评论 #26628948 未加载

评论 #26628934 未加载

评论 #26629188 未加载

评论 #26630453 未加载

评论 #26629613 未加载

评论 #26628979 未加载

评论 #26629157 未加载

评论 #26632855 未加载

评论 #26629213 未加载

评论 #26631075 未加载

评论 #26632212 未加载

评论 #26629888 未加载

评论 #26630529 未加载

评论 #26633029 未加载

评论 #26641891 未加载

评论 #26629575 未加载

评论 #26632711 未加载

评论 #26631679 未加载

评论 #26636514 未加载

评论 #26646901 未加载

评论 #26636264 未加载

评论 #26629921 未加载

评论 #26630198 未加载

评论 #26631775 未加载

评论 #26628970 未加载

评论 #26630308 未加载

评论 #26630728 未加载

andrewclabout 4 years ago

Cool, but it's the lipid delivery system that is the secret sauce. This is equivalent to giving the source code without a compiler to build it.

评论 #26629137 未加载

评论 #26632953 未加载

评论 #26629948 未加载

joeyhabout 4 years ago

My first thought was `wdiff pdizer moderna`. It's short enough to post here in its entirity, but I guess I had better not, anyway it's easy enough to extract from the pdf. Add a space after every letter and wdiff can find the common sequences nicely.Short except for flavor, this is from near the beginning:A[-G-]AGA{+A+}GAA{+ATATAAGAC+}CCCG{+GCGCCG+}CCACCATGTTCGTGTTCCTGGTGCTGCTGCC[-T-]{+C+}

评论 #26629294 未加载

评论 #26629267 未加载

koengabout 4 years ago

The thinking behind attaching a PDF with colors and not a Genbank file is why we can't have nice things in biotechnology.

评论 #26628902 未加载

评论 #26629250 未加载

评论 #26629111 未加载

评论 #26629315 未加载

评论 #26628945 未加载

flemhansabout 4 years ago

Despite how complex this really is, and how many "gotchas" there might be when using this repository, it's nice that it gets a shitload of attention. As a united humanity we should strive to solve our common problems.

sp1ritabout 4 years ago

If my little knowledge from biology class serves me correct, RNA uses Udenine instead of Thymine. But in this document it uses T.Can somebody explain to me why?

评论 #26629268 未加载

评论 #26629192 未加载

评论 #26629184 未加载

评论 #26632420 未加载

评论 #26629128 未加载

评论 #26630898 未加载

zappo2938about 4 years ago

Wow Looks like it is analogous to having a header on a TCP packet. [0] Here is an animation of mRNA encoding translated to proteins inside a ribosome. [1]"The ribosome is composed of one large and one small sub unit that assemble around the messenger RNA, which then passes through the ribosome like a computer tape. The amino acid building blocks, that's the small glowing red molecules, are carried into the ribosome attached to specific transfer RNAs; that's the larger green molecules also referred to as tRNA. The small sub unit of the ribosome positions the mRNA so that it can be read in groups of three letters known as a codon."Very analogous indeed.[0] <a href="https://xerocrypt.wordpress.com/2014/07/22/how-to-read-almost-raw-tcpip-packet-headers-without-the-tools/" rel="nofollow">https://xerocrypt.wordpress.com/2014/07/22/how-to-read-almos...</a>[1] <a href="https://www.youtube.com/watch?v=TfYf_rPWUdY" rel="nofollow">https://www.youtube.com/watch?v=TfYf_rPWUdY</a>

评论 #26628868 未加载

评论 #26629041 未加载

jonplackettabout 4 years ago

Rather disappointingly, neither sequence includes the string 'GATTACA'

评论 #26631954 未加载

评论 #26636829 未加载

评论 #26634068 未加载

csenseabout 4 years ago

The Human Genome Project was completed almost two decades ago, and somebody solved the protein folding problem recently.Why are we still doing genetics at the machine code level? Shouldn't we have some compilers, assemblers and linkers by now?

评论 #26633594 未加载

评论 #26632577 未加载

评论 #26633159 未加载

评论 #26634264 未加载

评论 #26632072 未加载

评论 #26632310 未加载

评论 #26635814 未加载

elliekellyabout 4 years ago

I’m a little confused by the title? Looking at the document, it seems to me (knowing next to nothing about this field) it includes both Pfizer and Moderna’s protein spike sequence in figures 1 and 2, respectively. Is that correct?It’s also interesting the way it’s worded: that the sequence was “assembled from $vaccine”. Does that mean whoever published this has backed into these sequences rather than having gathered this information directly from the source(s)?

评论 #26628942 未加载

评论 #26628870 未加载

评论 #26628866 未加载

wonderwonderabout 4 years ago

We are simply programmable machines, its pretty interesting that all of human life can be reduced down to 30k editable microservices.

评论 #26628862 未加载

评论 #26629883 未加载

评论 #26630426 未加载

评论 #26629879 未加载

bionhowardabout 4 years ago

we wrote some code last year to build a big Trie of the whole transcriptome -- you could use it to fuzzy-search to see if this mRNA is within some edit distance of any piece of normal human RNA, because then it could theoretically cause side effects via RNA interference. stopped the project because I can't afford to develop a gene therapy right now, but the fuzzy search worked<a href="https://github.com/bionicles/coronavirus" rel="nofollow">https://github.com/bionicles/coronavirus</a>to make the trie use the function here. the variable K is the length of the Kmers (runs of RNA). Larger values are gonna take a lot longer. ( warning: big job, uses multiprocessing...pypy recommended for speed ) <a href="https://github.com/bionicles/coronavirus/blob/b6f0db9dd8aaf7475aebd75dfcafe77194a65e8d/bio_firewall.py#L100" rel="nofollow">https://github.com/bionicles/coronavirus/blob/b6f0db9dd8aaf7...</a>then you could use this recursive function to generate potential matches within some cutoff <a href="https://github.com/bionicles/coronavirus/blob/b6f0db9dd8aaf7475aebd75dfcafe77194a65e8d/bio_firewall.py#L174" rel="nofollow">https://github.com/bionicles/coronavirus/blob/b6f0db9dd8aaf7...</a>the function right below it converts the generator to a list. then you could save thatenjoy

kart23about 4 years ago

What are the purple and blue sections after the stop codon for? I read a little about the 3' region, but for the vaccine, are these sections taken from a particular natural human sequence, or specially engineered for something else?

评论 #26628629 未加载

评论 #26628725 未加载

评论 #26628757 未加载

评论 #26628891 未加载

评论 #26628605 未加载

评论 #26629154 未加载

yrralabout 4 years ago

Related: Here's a article from late last year describing and explaining the source code of Pfizer vaccine:<a href="https://berthub.eu/articles/posts/reverse-engineering-source-code-of-the-biontech-pfizer-vaccine/" rel="nofollow">https://berthub.eu/articles/posts/reverse-engineering-source...</a>It's a very interesting read and I hope the author makes another post explaining the differences of the two mrna vaccines.

评论 #26629088 未加载

spullaraabout 4 years ago

I highly recommend reading about Ribosomes. They are made up of two pieces that were likely independent at some time. It becomes quite clear that "life" began as a machine that all it could do was replicate itself:<a href="https://en.wikipedia.org/wiki/Ribosome" rel="nofollow">https://en.wikipedia.org/wiki/Ribosome</a>You can think of RNA as a copy of a section of DNA. They look very much like computer programs except rather than producing code, the Ribosome can read them and translate each codon for an amino acid into its corresponding actual amino acid that it then binds together into a protein. The execution engine is the environment of the cell. All highly probabilistic rather than deterministic. I can't imagine any programmer not finding them completely fascinating.

karolkozubabout 4 years ago

It looks like a machine code snippet. I wonder if we'll develop high level languages and compilers for genetic code in the future.

评论 #26631935 未加载

jturollaabout 4 years ago

Please someone... create some abstraction language for this bio-assembly code. Can we make LLVM compile this? :joy:

评论 #26630113 未加载

ineedasernameabout 4 years ago

It's also short enough to post the whole thing to Wikipedia, so that's probably inevitable along with some very entertaining edit wars.

ur-whaleabout 4 years ago

What this does, as a non-biotech person, I believe I understand at a high level: plonk this code into a ribosome and out comes the desired protein.What I don't understand is:<pre><code> a) how the m-RNA code relates to the produced protein (i.e I can read C-code and get an idea of what is does fairly quickly, but can the same be said of m-RNA and the resulting protein)? b) how did they get their hands on that code in the first place? Do the coronaviruses use m-RNA as well? Was then a coronavirus somehow "dissected" to get at the spike protein "source code"?</code></pre>

评论 #26629163 未加载

评论 #26629276 未加载

评论 #26629181 未加载

评论 #26629158 未加载

评论 #26629610 未加载

dooopyabout 4 years ago

I compared the spike encoding regions, and it looks like they're quite different...I wonder if the codons wind up coding for different amino acids. And who got it right?

评论 #26632463 未加载

mrfusionabout 4 years ago

The lipid container is weird to me. Is that all it takes to send instructions inside a cell? Seems like a security hole. Why haven’t viruses evolved to have a lipid container?

评论 #26630704 未加载

评论 #26630900 未加载

VectorLockabout 4 years ago

People joked a lot about "injectible source code / machine code" but it is kind of interesting injecting yourself with something that has the source on github.

评论 #26628787 未加载

评论 #26628559 未加载

评论 #26628782 未加载

评论 #26628846 未加载

评论 #26628616 未加载

flobosgabout 4 years ago

> So how different is the mRNA in the Moderna, BioNTech/Pfizer & CureVac vaccines? There are 1274 codon positions. 808 are identical across all 3 vaccines. 103 are unique to Moderna, 249 unique to BioNTech, 230 to CureVac<a href="https://twitter.com/PowerDNS_Bert/status/1375091898797453326" rel="nofollow">https://twitter.com/PowerDNS_Bert/status/1375091898797453326</a>

mushroomzuluabout 4 years ago

<a href="https://www.instagram.com/tv/CIYYq_rCV9F/?utm_source=ig_web_copy_link" rel="nofollow">https://www.instagram.com/tv/CIYYq_rCV9F/?utm_source=ig_web_...</a>

mrfusionabout 4 years ago

So what moves the new protein out of your cells once the rna is processed? Don’t most proteins stay inside the cell?

评论 #26630988 未加载

em3rgent0rdrabout 4 years ago

Are there any visual compilers that simulate the process of using these sequences to assemble a protein?

tibbydudezaabout 4 years ago

So you have a header/footer sequence that we sort of know is required (remember the MZ and chksum for .EXE files) but we have no idea what that bits in between does except we can read the letters and copied it in part from the actual virus.

评论 #26633653 未加载

aty268about 4 years ago

'A group of Stanford researchers has hacked Moderna’s messenger RNA (mRNA) vaccine for the novel coronavirus, Motherboard first reported on Monday, and published its entire genetic sequence on the open-source code repository Github.'<a href="https://gizmodo.com/stanford-scientists-post-entire-mrna-sequence-for-moder-1846576268" rel="nofollow">https://gizmodo.com/stanford-scientists-post-entire-mrna-seq...</a>

评论 #26628737 未加载

verytrivialabout 4 years ago

There are people who could memorize this. And it would weirdly be more useful than digits of π!

评论 #26632854 未加载

narratorabout 4 years ago

So I guess Josiah Zayner has to pick up on this now and do a DIY Moderna COVID vaccine video. He already did a DIY vaccine video with full open source documentation on how to do it yourself.<a href="http://www.josiahzayner.com/2020/12/i-made-covid-19-vaccine-in-my-kitchen.html" rel="nofollow">http://www.josiahzayner.com/2020/12/i-made-covid-19-vaccine-...</a>

The_rationalistabout 4 years ago

I would love to see the output structure from Alphafold of this RNA source code

nsxwolfabout 4 years ago

ELI5, Why are the sequences different if they result in the same spike protein?

评论 #26632169 未加载

评论 #26632210 未加载

评论 #26629239 未加载

pknerdabout 4 years ago

Can someone give me the link of FASTA files of these sequences?

stevefrench93about 4 years ago

I wouldn't install beta software on a production system though.

评论 #26629992 未加载

husamiaabout 4 years ago

if you have understanding of how the sequence mutates then you can predict what the next strain is going to be and design spike protein that matches it.

stjohnswartsabout 4 years ago

ELI5 could this be used by "evil governments" to make designer pathogens to release during doomsday situations (say by North Korean leaders in their suicide bunkers if things went badly) ?

obilgicabout 4 years ago

so how are the first and the second dose different?

评论 #26628770 未加载

评论 #26631066 未加载

评论 #26630593 未加载

评论 #26628930 未加载

sktrdieabout 4 years ago

No package.json found, won't install.

StaticRiceabout 4 years ago

Archive.org mirror: <a href="https://web.archive.org/web/20210326214140/https://raw.githubusercontent.com/NAalytics/Assemblies-of-putative-SARS-CoV2-spike-encoding-mRNA-sequences-for-vaccines-BNT-162b2-and-mRNA-1273/main/Assemblies%20of%20putative%20SARS-CoV2-spike-encoding%20mRNA%20sequences%20for%20vaccines%20BNT-162b2%20and%20mRNA-1273.docx.pdf" rel="nofollow">https://web.archive.org/web/20210326214140/https://raw.githu...</a>

singularity2001about 4 years ago

tangential: do biologists sometimes use some form of base 64 encoding for their triplets? so instead of AAG.TCA.GGA just g5F or something?other than the obvious advantage of being shorter, it would also be easier to read: the boundaries would be unambiguous and each char would correspond directly to and amino acid (if applicable/coding)

评论 #26632403 未加载

anonuabout 4 years ago

This is amazing. It appears quite "simple" - of course I know nothing about this part of the sciences.I do think back to the early days of Covid when there were all these predictions around when a vaccine would show up. It seemed like there was knowledge that the mRNA platform would be the likely solution and probably by April we knew a vaccine would be possible - it just took 6+ months to test.Thinking about that timeline amazes me.

peter303about 4 years ago

One of Modernas cofounders, MIT Prof Robert Langer, was profiled on 60 Minutes a few years back as MITs most prolific patent holder. He specialized in nanoparticle delivery systems to any desired internal tissue. One can deliver medicine, nutrients, diagnostics, etc where and when they want. Vaccines are just a small of subset of these applications.

djmipsabout 4 years ago

Where's the JSON versions?

squarefootabout 4 years ago

As a software/hardware guy who knows less than zero about the subject: is this something that (given the right resources) makes possible to replicate the vaccines? I mean in countries where they can't afford enough vaccines but already have or could invest in the ability to replicate them without caring about patents.

aden1neabout 4 years ago

Why not in fasta format?

brian_hermanabout 4 years ago

<a href="https://github.com/brianherman/Assemblies-of-putative-SARS-CoV2-spike-encoding-mRNA-sequences-for-vaccines-BNT-162b2-and-mRNA-1273" rel="nofollow">https://github.com/brianherman/Assemblies-of-putative-SARS-C...</a> I posted some txt files with the lines removed and stuff.

评论 #26628963 未加载

ibraheemdevabout 4 years ago

Is this all another medical company needs to start manufacturing and selling the vaccine themselves? Or is this sequence licensed/proprietary in some way?

评论 #26630822 未加载

omletabout 4 years ago

Where is the 5G stack?

a-dubabout 4 years ago

i'm a dna noob: is it possible to do the growing and sampling thing to get the sequence from a sample of the vaccine or does the bubble of fat get in the way?

p0rkbellyabout 4 years ago

obligatory:"I could have done this in a weekend"

person_of_colorabout 4 years ago

How long before we can 3d print an mRNA vaccine?

bvanderveenabout 4 years ago

> .docx.pdfCargo-cult much?

savrajsinghabout 4 years ago

My question is does the Johnson & Johnson DNA-based vaccine encode for the exact same spike protein, or a different one they chose to target? From this PDF I conclude both the moderna and Pfizer vaccines target the same protein.

rjvirabout 4 years ago

This should be an NFT, I'd love to own an NFT of the RNA sequence of the Moderna vaccine.

评论 #26629759 未加载

omletabout 4 years ago

Where is the code about 5G modem?

plattypabout 4 years ago

Who would have thought it'd be this simple<pre><code> if covid?(dna) block_virus(dna) end</code></pre>

评论 #26629631 未加载

56 comments

drtzabout 4 years ago

评论 #26629150 未加载

评论 #26629996 未加载

评论 #26628948 未加载

评论 #26628934 未加载

评论 #26629188 未加载

评论 #26630453 未加载

评论 #26629613 未加载

评论 #26628979 未加载

评论 #26629157 未加载

评论 #26632855 未加载

评论 #26629213 未加载

评论 #26631075 未加载

评论 #26632212 未加载

评论 #26629888 未加载

评论 #26630529 未加载

评论 #26633029 未加载

评论 #26641891 未加载

评论 #26629575 未加载

评论 #26632711 未加载

评论 #26631679 未加载

评论 #26636514 未加载

评论 #26646901 未加载

评论 #26636264 未加载

评论 #26629921 未加载

评论 #26630198 未加载

评论 #26631775 未加载

评论 #26628970 未加载

评论 #26630308 未加载

评论 #26630728 未加载

andrewclabout 4 years ago

Cool, but it's the lipid delivery system that is the secret sauce. This is equivalent to giving the source code without a compiler to build it.

评论 #26629137 未加载

评论 #26632953 未加载

评论 #26629948 未加载

joeyhabout 4 years ago

评论 #26629294 未加载

评论 #26629267 未加载

koengabout 4 years ago

The thinking behind attaching a PDF with colors and not a Genbank file is why we can't have nice things in biotechnology.

评论 #26628902 未加载

评论 #26629250 未加载

评论 #26629111 未加载

评论 #26629315 未加载

评论 #26628945 未加载

flemhansabout 4 years ago

sp1ritabout 4 years ago

If my little knowledge from biology class serves me correct, RNA uses Udenine instead of Thymine. But in this document it uses T.Can somebody explain to me why?

评论 #26629268 未加载

评论 #26629192 未加载

评论 #26629184 未加载

评论 #26632420 未加载

评论 #26629128 未加载

评论 #26630898 未加载

zappo2938about 4 years ago

评论 #26628868 未加载

评论 #26629041 未加载

jonplackettabout 4 years ago

Rather disappointingly, neither sequence includes the string 'GATTACA'

评论 #26631954 未加载

评论 #26636829 未加载

评论 #26634068 未加载

csenseabout 4 years ago

评论 #26633594 未加载

评论 #26632577 未加载

评论 #26633159 未加载

评论 #26634264 未加载

评论 #26632072 未加载

评论 #26632310 未加载

评论 #26635814 未加载

elliekellyabout 4 years ago

评论 #26628942 未加载

评论 #26628870 未加载

评论 #26628866 未加载

wonderwonderabout 4 years ago

We are simply programmable machines, its pretty interesting that all of human life can be reduced down to 30k editable microservices.

评论 #26628862 未加载

评论 #26629883 未加载

评论 #26630426 未加载

评论 #26629879 未加载

bionhowardabout 4 years ago

kart23about 4 years ago

评论 #26628629 未加载

评论 #26628725 未加载

评论 #26628757 未加载

评论 #26628891 未加载

评论 #26628605 未加载

评论 #26629154 未加载

yrralabout 4 years ago

评论 #26629088 未加载

spullaraabout 4 years ago

karolkozubabout 4 years ago

It looks like a machine code snippet. I wonder if we'll develop high level languages and compilers for genetic code in the future.

评论 #26631935 未加载

jturollaabout 4 years ago

Please someone... create some abstraction language for this bio-assembly code. Can we make LLVM compile this? :joy:

评论 #26630113 未加载

ineedasernameabout 4 years ago

It's also short enough to post the whole thing to Wikipedia, so that's probably inevitable along with some very entertaining edit wars.

ur-whaleabout 4 years ago

评论 #26629163 未加载

评论 #26629276 未加载

评论 #26629181 未加载

评论 #26629158 未加载

评论 #26629610 未加载

dooopyabout 4 years ago

I compared the spike encoding regions, and it looks like they're quite different...I wonder if the codons wind up coding for different amino acids. And who got it right?

评论 #26632463 未加载

mrfusionabout 4 years ago

The lipid container is weird to me. Is that all it takes to send instructions inside a cell? Seems like a security hole. Why haven’t viruses evolved to have a lipid container?

评论 #26630704 未加载

评论 #26630900 未加载

VectorLockabout 4 years ago

People joked a lot about "injectible source code / machine code" but it is kind of interesting injecting yourself with something that has the source on github.

评论 #26628787 未加载

评论 #26628559 未加载

评论 #26628782 未加载

评论 #26628846 未加载

评论 #26628616 未加载

flobosgabout 4 years ago

mushroomzuluabout 4 years ago

<a href="https://www.instagram.com/tv/CIYYq_rCV9F/?utm_source=ig_web_copy_link" rel="nofollow">https://www.instagram.com/tv/CIYYq_rCV9F/?utm_source=ig_web_...</a>

mrfusionabout 4 years ago

So what moves the new protein out of your cells once the rna is processed? Don’t most proteins stay inside the cell?

评论 #26630988 未加载

em3rgent0rdrabout 4 years ago

Are there any visual compilers that simulate the process of using these sequences to assemble a protein?

tibbydudezaabout 4 years ago

评论 #26633653 未加载

aty268about 4 years ago

评论 #26628737 未加载

verytrivialabout 4 years ago

There are people who could memorize this. And it would weirdly be more useful than digits of π!

评论 #26632854 未加载

narratorabout 4 years ago

The_rationalistabout 4 years ago

I would love to see the output structure from Alphafold of this RNA source code

nsxwolfabout 4 years ago

ELI5, Why are the sequences different if they result in the same spike protein?

评论 #26632169 未加载

评论 #26632210 未加载

评论 #26629239 未加载

pknerdabout 4 years ago

Can someone give me the link of FASTA files of these sequences?

stevefrench93about 4 years ago

I wouldn't install beta software on a production system though.

评论 #26629992 未加载

husamiaabout 4 years ago

if you have understanding of how the sequence mutates then you can predict what the next strain is going to be and design spike protein that matches it.

stjohnswartsabout 4 years ago

ELI5 could this be used by "evil governments" to make designer pathogens to release during doomsday situations (say by North Korean leaders in their suicide bunkers if things went badly) ?

obilgicabout 4 years ago

so how are the first and the second dose different?

评论 #26628770 未加载

评论 #26631066 未加载

评论 #26630593 未加载

评论 #26628930 未加载

sktrdieabout 4 years ago

No package.json found, won't install.

StaticRiceabout 4 years ago

singularity2001about 4 years ago

评论 #26632403 未加载

anonuabout 4 years ago

peter303about 4 years ago

djmipsabout 4 years ago

Where's the JSON versions?

squarefootabout 4 years ago

aden1neabout 4 years ago

Why not in fasta format?

brian_hermanabout 4 years ago

评论 #26628963 未加载

ibraheemdevabout 4 years ago

Is this all another medical company needs to start manufacturing and selling the vaccine themselves? Or is this sequence licensed/proprietary in some way?

评论 #26630822 未加载

omletabout 4 years ago

Where is the 5G stack?

a-dubabout 4 years ago

i'm a dna noob: is it possible to do the growing and sampling thing to get the sequence from a sample of the vaccine or does the bubble of fat get in the way?

p0rkbellyabout 4 years ago

obligatory:"I could have done this in a weekend"

person_of_colorabout 4 years ago

How long before we can 3d print an mRNA vaccine?

bvanderveenabout 4 years ago

> .docx.pdfCargo-cult much?

savrajsinghabout 4 years ago

rjvirabout 4 years ago

This should be an NFT, I'd love to own an NFT of the RNA sequence of the Moderna vaccine.

评论 #26629759 未加载

omletabout 4 years ago

Where is the code about 5G modem?

plattypabout 4 years ago

Who would have thought it'd be this simple<pre><code> if covid?(dna) block_virus(dna) end</code></pre>

评论 #26629631 未加载