I know it's old hat on HN, but I just wanted to point out how close to science fiction this article is. This technique to edit genomes is only a decade old. This startup is able to run sub-second searches without requiring any of their own infrastructure.<p>It costs them less than a <i>$100</i> a month.<p>It was written by an <i>intern</i>.
While reading, I just kept wondering why this search needs to be in the cloud at all. Finding 20 byte strings in 3GB can be done on a laptop very quickly.
AWS Lambda is great for inconsistent atomic workloads. However, I had a fairly disappointing experience with Lamda when I tested it just last week.<p>For example, you cannot send dynamic response headers using the AWS API Gateway (the complementary service to expose HTTP endpoints). In my case I wanted to change the mime-type depending on JSON vs JSONP response.<p>It's also not possible to connect Lambda directly to ElastiCache and mostly you are expected to work with S3 or DynamoDB (Amazon's proprietary JSON store and what was mostly responsible for the data outage recently in US East). ElastiCache would allow easy persistence which is why it's surprising it can't be connected to given that it's an AWS service (you can connect to it by creating an EC2 proxy but that would defeat the purpose of a serverless architecture).<p>Some other oddities were sniffing the response body to set HTTP headers as opposed to just allowing your Lambda function to set the HTTP header directly or parsing the JSON response as opposed to doing a regex match.
Hi Vineet,<p>I'm a huge fan of CRISPR. I've been following it closely since I heard Radiolab's podcast about it.<p>I'm also the founder of the JAWS framework, which is an open-source application framework built entirely on AWS Lambda and AWS API Gateway: <a href="https://github.com/jaws-framework/JAWS" rel="nofollow">https://github.com/jaws-framework/JAWS</a><p>I would LOVE to grab a coffee with you or anyone on your team some time, and chat about lambda or CRISPR, or anything really :) I live in Oakland and my email address is austen[at]servant.co<p>Also, will you be at Re:invent? I'm doing a breakout session on JAWS and I'll be there all week.<p>Good luck to you!<p>Austen
"Our old server infrastructure cost thousands of dollars each month just for server costs.<p>Using the new Lambda infrastructure, we pay for the number of Lambda invocations, the total duration of the requests, and the number of S3 requests. This comes out to $60/monthfor hundreds of thousands of CRISPR searches!"<p>Well, how much of that money you spent on EBS storage for your copies of genome data?<p>EC2 instances could read from S3 directly as lambda does, maybe that could alleviate the cost a lot.<p>Using AMI S3 backed instances could save a lot too.<p>But great work, nonetheless!
Of note, the latest thing in reference genomes is representing them as a graph data structure, which importantly allows variation to be incorporated. Some of the newest methods for mapping short DNA fragments (that come out of the most common type of sequencers) take this approach. They use a genome index though, which takes a lot of computational effort to build before hand.<p>Anyway, benchling wants to avoid genome indexes from the sounds of it, in case users upload their own genomes. Having said that, if someone is doing multiple searches, it would quickly become more efficient to just index the genome. I would have thought most people seriously concerned about off target CRISPR hits would be using high quality reference genomes though.
I recently have started looking harder at lambda after realizing that you can use 1M requests / month for free indefinitely. I just worry about vendor lock-in with services like this - if for whatever reason you want to move away its a rewrite at best. If amazon was to open source the lambda implementation allowing me to run my services somewhere else with a config change id probably buy into it completely and never move away...
I wonder if it would be possible to go the other way. How close is CRISPR to a primitive of Turing complete computation?<p>Take it from s/xxx/yyy/ into being /bin/sed. And then run the search in wetware.
They might be able to save a bit on costs by caching locally. Lambda instances can be reused if TPS is high enough. I think the limit is 500MB in the /tmp directory.
How are you getting such quick responses from S3? In our own testing using Java. It was taking over 500ms just to initiate the connection with S3 from Lambda.