Serverless File Uploads – Netlify

51 pointsby peterdeminover 8 years ago

12 comments

mnuttover 8 years ago

This article makes it sound like upload requests are restricted to your domain via CORS, but that is one of the pitfalls of thinking about everything through a serverless lens: while another site could not use javascript to directly upload files to your s3 bucket, a malicious user could absolutely make a backend request to receive signed tokens for uploading to your s3 bucket. Additionally, it doesn't look like the policy is locked down so they could overwrite your existing files with malicious ones.

评论 #13639797 未加载

评论 #13647171 未加载

mamispover 8 years ago

How is using a node.js server to generate signed requests to upload files onto S3 servers, "serverless"?

评论 #13640302 未加载

评论 #13639629 未加载

评论 #13641524 未加载

评论 #13639857 未加载

评论 #13639690 未加载

评论 #13639849 未加载

评论 #13639626 未加载

chrisballingerover 8 years ago

We built something similar to this for our Kickflip.io HLS/S3 live video streaming service. The original version is closed source but we rewrote a generic open source Python library called storage_provisioner [1] for a client. It's minimal and super simple.We also made a Django-Rest-Framework module that wraps storage_provisioner called django_broadcast [2]. Working with AWS/S3 can be a pain, hopefully these tools can help.1. <a href="https://github.com/PerchLive/storage_provisioner" rel="nofollow">https://github.com/PerchLive/storage_provisioner</a>2. <a href="https://github.com/PerchLive/django-broadcast" rel="nofollow">https://github.com/PerchLive/django-broadcast</a>

ameliusover 8 years ago

Silly nomenclature. This isn't "serverless" at all.Suggestion: "File uploads using 3rd party server".As an added benefit, this title shows that the data might not be safe from the curiosity of other entities.

评论 #13639843 未加载

评论 #13639880 未加载

评论 #13639813 未加载

评论 #13640197 未加载

shakeel_mohamedover 8 years ago

What's wrong with doing all of this on the front-end? I recently did just that after generating the signature and policy locally.See this guide: <a href="https://aws.amazon.com/articles/1434" rel="nofollow">https://aws.amazon.com/articles/1434</a>

评论 #13639902 未加载

评论 #13639858 未加载

mrmondoover 8 years ago

I really wish people would stop using the term serverless, it's not useful and it's highly misleading.

danielrhodesover 8 years ago

You do need a server to create a token from your access key and secret. However, this doesn't really go very far in protecting your bucket, as somebody could just grab that token and upload whatever they want.So an additional layer of security is creating an upload bucket with a policy where all objects over 24h old are deleted. When somebody finishes uploading a file, you ping your server and move the file from the upload bucket to the real bucket.Another trick is putting Cloudfront in front of that bucket. You can then upload to any Cloudfront server, which will then put the file in your bucket -- the reduced latency to a Cloudfront (vs S3) will increase the speed at which you upload by quite a bit.

cyberferretover 8 years ago

Nice work. I have several web apps that run on EC2 instances, that have my users uploading via Browser -> EC2 -> S3. This can cause high latency on some of my smaller EC2 boxes, which is annoying, it else forces Elastic Beanstalk to spool up more instances unnecessarily when it thinks traffic is being flooded when large files are being uploaded.I've always wondered about the best strategy to go 'serverless' with the file uploads and have the user's browser essentially upload direct to S3, and this tutorial gives a great insight into that - thanks.

curiousAlover 8 years ago

"Each returned URL is unique and valid for a single usage, under the specified conditions."Where and how is the url actually invalidated after it is used? (or are you relying on expiration as invalidation?)

jest3r1over 8 years ago

"Serverless"So is this 100% client-side, or is there a (server) dependency?

asciimikeover 8 years ago

What I've found people really want when they say "serverless" in this context is, "direct file upload without a proxy server", which is basically what BaaS' like Firebase and Parse do...<pitch>Firebase Storage (<a href="https://firebase.google.com/docs/storage/" rel="nofollow">https://firebase.google.com/docs/storage/</a>) provides clients that perform secure, serverless uploads and downloads. Instead of doing the dance with a server minting signed URLs, it uses a rules engine that lets developers specify declarative rules to authorize client operations. You can write rules to match object prefixes, restrict access to particular user or set of users, check the contents of the file metadata (size, content type, other headers), or check a current file with the prefix to not overwrite it.If HackerNews formatting were more forgiving of code snippets, I'd post one here, but instead have to link to the docs (<a href="https://firebase.google.com/docs/storage/security/secure-files" rel="nofollow">https://firebase.google.com/docs/storage/security/secure-fil...</a>).We've found that this model is more performant and less expensive (no need for a proxy server), as well as lower cognitive load on developers, as they think about what they want the end result to be, rather than how they need to build up the end result.And since I know people will bring it up: there are definitely limitations in flexibility (you're using a DSL), and a steeper learning curve for the very complicated use cases. The goal here is make it trivial for 90% of use cases and possible for 9%; rather than making it possible for 100% and equally difficult for everyone. Tradeoffs...</pitch>And if you want other examples, Parse did a similar thing with role based access control to a Parse File, allowing direct client upload and access by only a set of users. S3 and GCS can do this as well, assuming their (relatively coarse) IAM models are granular enough for you (and you're a authorized principle in their systems, which is often the harder thing).Bringing this full circle, "serverless" typically involves a switch from writing code (imperative) to writing config (declarative). You're not validating JWTs signing URLs, or writing middleware, you're letting services know how to configure those primitives for you. In some ways the Serverless framework does abstract this for you (hey look, I didn't provision a VM), and in some ways it doesn't (you still wrote code to generate a signed URL).Disclosure: I built Firebase Storage

评论 #13640585 未加载

madmodover 8 years ago

I use this technique extensively in several production systems.As others have mentioned having an expiration policy is a good idea. Also you can mitigate charges from malicious activity by using rate limits on the signing endpoint. (API gateway supports this.) Using infrequent access or reduced redundancy storage might also be a good idea if you expect a lot of traffic. It's also good to limit the CORS policy on the bucket to the needed domains and headers.Signed metadata headers are very useful when combined with S3 event handlers (SQS or straight to Lambda.) using a HEAD request on the uploaded objects. This is a great technique for post processing an upload without requiring client trust or an external data store. (With a separate falliable request which could lead to consistency issues.)Edit: It is also critically important to have some randomness in each key path so it is ungessable. Otherwise user files would be overwriteable by an attacker. (Many file names are easily guessable and an attacker with many tries could eventually stuff malware in for example.) I used guids for this because they are both URL and S3 key safe. If keeping the original file name is needed I put it in a metadata value and rename the file on download using a Content-Disposition header. Making the S3 headers work with symbols in file names can be tricky but encoding it as a JSON string works around most issues.In order to overcome the 30 second request limit in API gateway for longer post processing while still offering realtime client feedback you can set up an S3 event handler to trigger the post processing lambda which then updates a DynamoDB record with the S3 key as it's id. A status endpoint lambda is then polled by the client with the S3 key for status events.For more complex post processing and client side workflows I have used key prefixes (folders) each with seaprate event handlers or CORS configurations. IAM polices with conditions including S3 key prefixes are used to restrict access. Using the S3 API copy command can move large objects quickly between workflow steps.Also enabling server side encryption is a must imo. Be sure to specify AWS signature version 4 in the S3 constructor so that all parts of the request are signed. (Otherwise some older regions may not sign metadata headers.)Also the S3 API copy command has an interesting append feature which can be used to build objects iteratively. I once toyed with the idea of using it to create large zip files of many S3 objects efficiently but ended up not needing it. Someday I would like to try that because it could be great for a lot of web apps where users can select a random list of files to download.Also I (re)implemented most of the above this week using CloudFormation and the newer AWS Serverless Template (not the serverless.com project but the actual AWS feature.) which allows for really easy deployment.