Fly GPUs quickstart

You can use any base image for your Dockerfile, but it is convenient to base it on ubuntu:22.04 and install libraries from NVIDIA’s official apt repository: RUN apt install -y cuda-nvcc-12-2 libcublas-12-2 libcudnn8 is usually enough.
Notes:
- Do not install meta packages like: cuda-runtime-*
- cuda-libraries-12-2 is good, but a bulky start. Once you know what libs are needed at build and runtime, please pick accordingly to optimize final image size.
- Use multi-stage docker builds as much as possible.
From flyctl, create an app using either fly launch or fly apps create.
Note: GPUs are not available in all regions. There are three GPU types available: Nvidia L40S, A100-PCIe-40GB, and A100-SXM4-80GB.

Currently GPUs are available in the following regions:
- l40s: ord
- a100-40gb: ord
- a100-80gb: ams, iad, mia, sjc, syd
Create or modify the fly.toml config file in the project source directory, replacing values with your own:
```
app = "my-gpu-app"
primary_region = "ord"
vm.size = "a100-40gb"

# Use a volume to store LLMs or any big file that doesn't fit in a Docker image
[[mounts]]
source = "data"
destination = "/data"

[http_service]
internal_port = 8080
auto_stop_machines = false
```
Notes:
- Make sure vm.size is set in fly.toml, valid values are a100-40gb, a100-80gb and l40s.
- Make sure to include a [[mounts]] section in fly.toml.
- The volume gets created automatically by fly deploy.
- Use the volume to store the models and large files that can’t be shipped as a docker image.
Deploy your app:
```
fly deploy
```

That’s pretty much it to get an app running with a Machine on a GPU.

Volumes and GPU Machines

Important: If you create any additional volumes, they need to be created with the same constraints as your Machine.

Here’s an example of creating a new one hundred gigabyte volume for storing ML models in the ord region, on a machine with a GPU:

fly volumes create models \
  --size 100 \
  --vm-gpu-kind a100-40gb \
  --region ord

Example Dockerfile:

FROM ubuntu:22.04 as base
RUN apt update -q && apt install -y ca-certificates wget && \
    wget -qO /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
    dpkg -i /cuda-keyring.deb && apt update -q

FROM base as builder
RUN apt install -y --no-install-recommends git cuda-nvcc-12-2
RUN git clone --depth=1 https://github.com/nvidia/cuda-samples.git /cuda-samples
RUN cd /cuda-samples/Samples/1_Utilities/deviceQuery && \
    make && install -m 755 deviceQuery /usr/local/bin

FROM base as runtime
#RUN apt install -y --no-install-recommends libcudnn8 libcublas-12-2
COPY --from=builder /usr/local/bin/deviceQuery /usr/local/bin/deviceQuery
CMD ["sleep", "inf"]

Examples using Fly GPUs

Elixir Llama2-13b on Fly GPUs: https://gist.github.com/chrismccord/59a5e81f144a4dfb4bf0a8c3f2673131
Github fly-apps repos with the gpu topic: https://github.com/orgs/fly-apps/repositories?q=topic%3Agpu
Fly.io CUDA example: https://gist.github.com/dangra/f8123001fe0f2453a8cd638b89738465
Deploying CLIP on Fly.io: https://gist.github.com/simonw/52c7734e34cac2b26ea1378845674edc

Edit this page on GitHub