Fly GPUs quickstart

  1. You can use any base image for your Dockerfile, but it is convenient to base it on ubuntu:22.04 and install libraries from NVIDIA’s official apt repository: RUN apt install -y cuda-nvcc-12-2 libcublas-12-2 libcudnn8 is usually enough.

    Notes:

    • Do not install meta packages like: cuda-runtime-*
    • cuda-libraries-12-2 is good, but a bulky start. Once you know what libs are needed at build and runtime, please pick accordingly to optimize final image size.
    • Use multi-stage docker builds as much as possible.
  2. From flyctl, create an app using either fly launch or fly apps create.

    Note: GPUs are not available in all regions.

    There are three GPU types available: Nvidia L40S, A100-PCIe-40GB and A100-SXM4-80GB (datasheet). And we offer two Machine size presets a100-40gb and a100-80gb available in the following regions:

    Currently GPUs are available in the following regions:

    • l40s: ord
    • a100-40gb: ord
    • a100-80gb: ams, iad, mia, sjc, syd


    Follow this thread for updates.

  3. Create or modify the fly.toml config file in the project source directory, replacing values with your own:

    app = "my-gpu-app"
    primary_region = "ord"
    vm.size = "a100-40gb"
    
    # Use a volume to store LLMs or any big file that doesn't fit in a Docker image
    [[mounts]]
    source = "data"
    destination = "/data"
    
    [http_service]
    internal_port = 8080
    auto_stop_machines = false
    

    Notes:

    • Make sure vm.size is set in fly.toml, valid values are a100-40gb, a100-80gb and l40s.
    • Make sure to include a [[mounts]] section in fly.toml.
    • The volume gets created automatically by fly deploy.
    • Use the volume to store the models and large files that can’t be shipped as a docker image.
  4. Deploy your app:

    fly deploy
    

That’s pretty much it to get an app running with a Machine on a GPU.

Important: If you create any additional volumes, they need to be created with the same constraints as your Machine. See the following example.

A volume needs to be created with the same constraints as your Machine. Here’s an example of creating a one hundred gigabyte volume for storing ML models in the ord region, on a machine with a GPU:

fly volumes create models \
  --size 100 \
  --vm-gpu-kind a100-pcie-40gb \
  --region ord

Example Dockerfile:

FROM ubuntu:22.04 as base
RUN apt update -q && apt install -y ca-certificates wget && \
    wget -qO /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
    dpkg -i /cuda-keyring.deb && apt update -q

FROM base as builder
RUN apt install -y --no-install-recommends git cuda-nvcc-12-2
RUN git clone --depth=1 https://github.com/nvidia/cuda-samples.git /cuda-samples
RUN cd /cuda-samples/Samples/1_Utilities/deviceQuery && \
    make && install -m 755 deviceQuery /usr/local/bin

FROM base as runtime
#RUN apt install -y --no-install-recommends libcudnn8 libcublas-12-2
COPY --from=builder /usr/local/bin/deviceQuery /usr/local/bin/deviceQuery
CMD ["sleep", "inf"]

Examples