Build a Scalable Remote Code Execution Engine in Under 500 lines of Code
Exploring the architecture and code for a cloud-native code execution system with request-level resource limits and isolation.
Motivation
A few years ago, inspired by Crafting Interpreters by Robert Nystrom, I developed a compiler for a toy programming language called Tok. This project, spanning several months, helped me peek under the hood and understand how programming languages are designed, and how compilers are built.
Despite the effort, the compiler has remained mostly unused, tucked away in a GitHub repository. The only way for anyone to use it has been to clone the repository and build the compiler from the source—an effort I recognize is too cumbersome for just trying out a new tool (and I've admittedly been too lazy to provide a binary release).
Recently, I decided to do something about this and tried to build an online platform that allows users to write and execute Tok code directly in their web browser. The code is executed remotely in the cloud, and results are returned to the user. In addition to making Tok more accessible, this effort allowed me to learn a few new things that I plan to share in this issue.
Challenges
Setting up a server on an EC2 instance to compile and execute incoming code is straightforward. However, there are several critical things we need to think about:
Scalability: How do we handle potentially thousands of concurrent requests? Is it easy to scale up resources to meet user demand?
Resource Management: How can we ensure that a single heavy request does not monopolize system resources and degrade performance for other users?
Security and Isolation: Given that we're executing user-provided code, it is crucial to execute each piece of code in a fully isolated environment. This isolation helps prevent unauthorized access to the host machine and secures the system against malicious attacks.
Latency Optimization: How do we minimize response times for each request?
Architecture
Our system is designed as a cloud-native application, which can naturally scale with increasing load.
We'll use Kubernetes to orchestrate the various services, all of which will be implemented in Go.
Our system will consist of a Coordinator service which performs authentication, authorization, rate limiting, and other auxiliary tasks, and a Code Execution Service which runs the code.
Code Execution Service
At the core of our system lies the code execution service, which runs incoming code and returns the results. This service consists of a ClusterIP, which acts as a load balancer for all numerous code executor replicas behind it. The ClusterIP distributes requests across these replicas and is only accessible internally within the cluster—not exposed to the internet.
You may be wondering why we have a separate coordinator service, when k8s already provides us a load balancer. Is it not easier to put all the code executors directly behind a public load balancer, and let them handle authentication? Adding a few additional lines of code to the executor is surely better than having to manage an entirely separate coordinator service, right?
To answer this you must understand that rate-limiting, authentication, and such services will often talk to databases and other external services. The code executors are designed to scale up with increasing load, and it is possible for us to have a few thousand executors running at peak hours. This would mean each of these executors will have to maintain its own database connection. A few thousand open connections are enough to bog down the database, and potentially take the entire service offline.
It is much safer to keep this logic inside a coordinator service, which consists of fewer replicas and therefore creates fewer database connections.
Each code executor is a simple Go server set up to process the code it receives. As previously highlighted, executing this code in a secure and resource-constrained manner is important. To achieve this, we run compile and execute each request within its own docker container. This container should have everything required to compile and execute the code.
Note: we use code-executor and compiler-server interchangeably in this article.
Compiler Container
Each request spins up a compiler to execute code, to minimize the container startup time, the compiler Docker image must be as lightweight as possible. Our approach involves using a multi-stage Dockerfile to build the image efficiently.
Consider the following Dockerfile for the compiler:
This is a multi-stage Dockerfile to build our image.
The first stage of our Dockerfile uses
ubuntu:20.04
as the base. Here, we install CMake along with essential build tools. We then clone thectok
repository—our Tok compiler written in C—and use CMake to compile the source into a binary.In the second stage, we shift to a minimal
alpine
base image, approximately 8MB in size, where we copy over the binary from the first stage. This method ensures that the final image contains only the necessary executable and its dependencies, omitting all the build-specific tools and libraries, thus significantly reducing the image size.
Since I am building these images on a Mac, I use the platform=linux/amd64
flag with each base image to ensure compatibility with the cloud environments.
I use the following commands to build and push this image to Docker hub.
$ docker build -t jyotindersingh/ctok .
$ docker push jyotindersingh/ctok
Compiler Server (code executor)
The compiler-server represents our code executor. It is a lightweight server listening on port :8080
. Upon receiving a request to execute code, it performs initial validations and safety checks, then hands off the code as a string to the compileAndRunCode
utility.
This method does a few things:
Creates a temporary file to store the code (by default this file exists inside the
/tmp/*
directory of the container).Utilizing the lightweight Compiler Docker compiler image previously created, it spins up a Docker container with specific resource limits set (
"-m 65m --cpus 0.1"
).The server also attaches a context to the container launch command, with a 5-second timeout. This ensures that no piece of code can end up running indefinitely.
This container uses a bind-mount to access the temporary file (
"-v /tmp/tmp-code.tok:/tmp/tmp-code.tok"
).
Understanding bind-mounts
A bind mount links a file or directory from the host machine to a container, using its absolute path. This is crucial because our setup involves a Docker in Docker configuration where the compiler container runs within another container hosted by a Kubernetes pod. Due to this nesting, the inner compiler container might not locate the temporary file on the host machine's filesystem since it's effectively looking in the wrong place (the compiler-server container creates the temporary file, but this file doesn’t exist at the same location inside the host machine of the pod).
To address this, we must ensure that the outer container (compiler-server) and the inner compiler container share a bind-mounted volume on the host machine. This shared volume means both containers can access the file seamlessly, despite the nested Docker setup.
The bind mount created in this step is half of the story, we will need to create a bind mount to the same shared directory when we launch the code executor pod in the upcoming sections.
Finally, we containerize this server using the following docker image:
As before, we have a multi-stage docker image. The first stage compiles the Go server into an executable binary. The second stage places it into a Docker in Docker Linux image capable of spinning up nested Docker containers.
I use the following commands to build and push this image to Docker hub.
$ docker build -t jyotindersingh/compiler-server .
$ docker push jyotindersingh/compiler-server
Kubernetes Setup
We will define two pieces of infrastructure:
code-execution-deployment
: This deployment manages the replicas of our compiler-server containers, ensuring high availability and fault tolerance.code-execution-service
: A ClusterIP service that performs load balancing across the compiler-server replicas.
code-execution-deployment
This deployment configuration has a few important details:
replicas
: This setting determines the number of compiler-server instances we want running, with each instance operating in its own Kubernetes pod.volumeMounts
: We use shared volumes to facilitate the interaction between the nested containers. As we’ve implemented previously, the inner compiler container bind-mounts the/tmp/
directory. We replicate this configuration for the outer compiler-server container. Additionally, the host machine’s Docker socket is mounted into the compiler-server container. This setup is crucial for spawning new Docker containers within the compiler-server.
Security and Resource Access: Typically, mounting the host’s Docker socket within a container is not advisable as it can grant the nested containers access to the host’s resources, circumventing the resource constraints imposed by the outer container.
Alternative Approaches: The preferred method would be to run the Docker-in-Docker container in
privileged
mode rather than mounting the Docker socket. This approach can potentially offer better isolation and security. I've encountered challenges with this method, however, I encourage experimenting with it to possibly refine and secure the setup.
code-execution-service
This configuration defines a ClusterIP
with an open :8080
port available inside the cluster.
Coordinator Service
The coordinator a simple passthrough service, which accepts requests from users and forwards them to the code-execution-service
.
If you want to add auth, rate-limiting, and other auxiliary features - this is where they should be added.
Dockerfile
The Docker image builds the server and then copies it into a lightweight alpine image.
I use the following commands to build and push this image to Docker hub.
$ docker build -t jyotindersingh/compiler-coordinator .
$ docker push jyotindersingh/compiler-coordinator
Kubernetes Setup
We define two infrastructure components:
coordinator-deployment
: This deployment manages the replicas of our coordinator containers, ensuring high availability and fault tolerance.coordinator-service
: A ClusterIP service that performs load balancing across the compiler-server replicas.
coordinator-deployment
coordinator-service
The coordinator-service
defines a simple LoadBalancer, which users can make requests to.
Deployment
You can use minikube
to test this setup locally.
minikube start
Apply the configurations
kubectl apply -f code-execution-depl.yaml
kubectl apply -f code-execution-srv.yaml
kubectl apply -f coordinator-depl.yaml
kubectl apply -f coordinator-srv.yaml
Get the externally available IP address for the coordinator service
minikube service coordinator-service --url
Create a request to the server
curl --location '<ip address>/execute' \
--header 'Content-Type: text/plain' \
--data 'print "Hello, World!";'
This should print the following output to your screen
Hello, World!
Concluding thoughts
In this issue, we discussed how one can design and implement a horizontally scalable remote code execution service. We also went into the nitty-gritty of sharing files with nested containers, some security considerations, and how to deploy the overall setup to the cloud.
The complete code for this project is available on my GitHub.