Metrics on Fly.io
The Fly.io platform includes a fully-managed metrics solution to help you easily monitor your Fly apps. It includes the following components:
- Prometheus on Fly.io: Managed Prometheus-compatible time series storage
- Dashboards: Managed Grafana with detailed visualizations of all built-in metrics
- Built-in Metrics: Metrics automatically sent from every Fly app you deploy
- Custom Metrics: Expose additional metrics from Fly apps for further customization
Prometheus on Fly.io
Prometheus is a popular open source monitoring system used to store and query metrics efficiently, with a stable HTTP querying API compatible with a range of systems.
Prometheus on Fly.io is a fully-managed service based on VictoriaMetrics. It supports most common Prometheus querying API endpoints:
/api/v1/query
/api/v1/query_range
/api/v1/series
/api/v1/labels
/api/v1/label/<label_name>/values
/api/v1/status/tsdb
/api/v1/targets
/federate
Note that remote read (/api/v1/read
) remote storage integration
is not supported.
MetricsQL
Prometheus queries are typically based on the PromQL query language. Prometheus on Fly.io queries use VictoriaMetrics MetricsQL, a backwards-compatible query language that fixes user experience issues and adds useful features and functions on top of PromQL.
Key features:
- Better
rate()
andincrease()
functions that just work. No need forirate
workarounds or appending Grafana’s magical$__rate_interval
selector to every query. In fact, you can even omit the square brackets entirely and MetricsQL will do the right thing. - Many more label manipulation functions
such as
drop_common_labels
,label_set
, etc. topk_avg
, which returns the topk
time series averaged across the entire series range (not just individual points), plus the sum of all remaining series in an “other” label. Useful for giving a small, filtered view across a potentially large number of series.
Querying
Queries can be sent to the following endpoint:
https://api.fly.io/prometheus/<org-slug>/
You’ll need to authenticate with a Fly Access Token sent in the standard Bearer Token format (e.g., an HTTP request header Authorization: Bearer <TOKEN>
), and you may only query series scoped to your organizations.
Manually
Find your Organization slug
List your organizations, find the org slug and set it as a local variable.
flyctl orgs list
ORG_SLUG=[org-slug]
Get an access token
TOKEN=$(flyctl auth token)
Test it out!
curl https://api.fly.io/prometheus/$ORG_SLUG/api/v1/query \
--data-urlencode 'query=sum(increase(fly_edge_http_responses_count)) by (app, status)' \
-H "Authorization: Bearer $TOKEN"
Dashboards
For more advanced metrics monitoring, you can use dashboards to organize and visualize complex Prometheus queries.
The Metrics tab on the Fly.io Dashboard provides an overview of your Fly apps using the built-in metrics stored in Prometheus.
Managed Grafana
Grafana is a popular open source data visualization web application, that allows you to compose queries against data sources into dynamic, reusable dashboards.
We provide a managed Grafana instance at fly-metrics.net, preconfigured with your Prometheus data source and detailed dashboards covering the full set of built-in metrics.
You can also use the Explore panel to run ad-hoc queries against the preconfigured Prometheus datasource, or create/import additional dashboards for further customization or to visualize custom metrics.
Switch between your Fly.io Organizations by clicking the “Switch organization” link beneath the user icon in the lower-left of the screen.
External or self-hosted Grafana
You can also configure your Prometheus endpoint with an existing Grafana installation, or host one on Fly.io. Either way, you can set it up like this:
- Add a Prometheus data source (Settings -> Data Sources -> Add data source -> Prometheus)
- Fill the form with the following:
- HTTP -> URL:
https://api.fly.io/prometheus/<org-slug>/
- Custom HTTP Headers -> + Add Header:
- Header:
Authorization
, Value:Bearer <token>
- Header:
You’re all set.
We publish our Fly.io Dashboards to Grafana.com for use with external Grafana instances. To install, just import the dashboard using the listed IDs. If you’d like to contribute changes to the dashboards, we have created a repository for them.
Built-in metrics
Fly apps automatically publish a number of built-in metrics.
Metric types are all Gauges unless otherwise marked.
Metrics with names ending in _count
are all Counters.
Histogram metrics with a base name of <name>
expose multiple series:
<name>_bucket{le}
<name>_sum
<name>_count
Standard Labels
All published series include the following labels:
app
: App nameregion
: Fly.io Regionhost
: 4-character host ID (lowercase hexadecimal)instance
: App instance ID (for all series exceptfly_edge_
andfly_volume_
)
If your app exposes custom metrics with the same labels, they will be overwritten.
Proxy series
Any app using a TCP-based handler (HTTP, TLS or straight TCP) publishes edge
and app
proxy metrics:
Labels:
proxy_id
: “blue” or “green” (flips when the proxy is restarted/updated)
Edge - fly_edge_
fly_edge_http_responses_count{status}
fly_edge_http_response_time_seconds{status} (Histogram)
fly_edge_tcp_connects_count
fly_edge_tcp_disconnects_count
fly_edge_data_out (Counter, bytes)
fly_edge_data_in (Counter, bytes)
fly_edge_tls_handshake_errors{servername} (Counter)
fly_edge_tls_handshake_time_seconds{version} (Histogram)
App - fly_app_
fly_app_concurrency
fly_app_http_responses_count{status}
fly_app_http_response_time_seconds{status} (Histogram)
fly_app_connect_time_seconds (Histogram)
fly_app_tcp_connects_count
fly_app_tcp_disconnects_count
Instance series - fly_instance_
Derived from the /proc
file system of your app VMs.
fly_instance_up = 1
shows the VM is reporting correctly.
Instance memory - fly_instance_memory_
Derived from /proc/meminfo
. All units are in bytes.
fly_instance_memory_mem_total
fly_instance_memory_mem_free
fly_instance_memory_mem_available
fly_instance_memory_buffers
fly_instance_memory_cached
fly_instance_memory_swap_cached
fly_instance_memory_active
fly_instance_memory_inactive
fly_instance_memory_swap_total
fly_instance_memory_swap_free
fly_instance_memory_dirty
fly_instance_memory_writeback
fly_instance_memory_slab
fly_instance_memory_shmem
fly_instance_memory_vmalloc_total
fly_instance_memory_vmalloc_used
fly_instance_memory_vmalloc_chunk
Instance Load and CPU
load_average
is derived from/proc/loadavg
(getloadavg
). It’s a “system load average” measuring the number of processes in the system run queue, with samples representing averages over 1, 5, and 15minutes
.cpu
is derived from/proc/stat
, and counts the amount of time each CPU (cpu_id
) has spent performing different kinds of work (mode
, which may be one ofuser
,nice
,system
,idle
,iowait
,irq
,softirq
,steal
,guest
,guest_nice
).
The time unit is ‘clock ticks’ of centiseconds (0.01 seconds).
fly_instance_load_average{minutes}
fly_instance_cpu{cpu_id, mode} (Counter, centiseconds)
Instance Disks - fly_instance_disk_
Counters derived from fields 1-11 of /proc/diskstats
. The unit for time_
series is milliseconds, and the unit for sectors_
is 512-byte sectors.
Labels:
device
: Published for the ephemeral VM root disk (vdb
) and any mounted Volume (vdc
).
fly_instance_disk_reads_completed
fly_instance_disk_reads_merged
fly_instance_disk_sectors_read
fly_instance_disk_time_reading
fly_instance_disk_writes_completed
fly_instance_disk_writes_merged
fly_instance_disk_sectors_written
fly_instance_disk_time_writing
fly_instance_disk_io_in_progress
fly_instance_disk_time_io
fly_instance_disk_time_io_weighted
Instance Networking - fly_instance_net_
Counters derived from /proc/net/dev
.
Labels:
device
: interface name, eithereth0
ordummy0
(ignore).
fly_instance_net_recv_bytes
fly_instance_net_recv_packets
fly_instance_net_recv_errs
fly_instance_net_recv_drop
fly_instance_net_recv_fifo
fly_instance_net_recv_frame
fly_instance_net_recv_compressed
fly_instance_net_recv_multicast
fly_instance_net_sent_bytes
fly_instance_net_sent_packets
fly_instance_net_sent_errs
fly_instance_net_sent_drop
fly_instance_net_sent_fifo
fly_instance_net_sent_colls
fly_instance_net_sent_carrier
fly_instance_net_sent_compressed
Instance File Descriptors - fly_instance_filefd_
Information about allocated, and maximum allowed allocated file descriptors derived from /proc/sys/fs/file-nr
.
fly_instance_filefd_allocated
fly_instance_filefd_maximum
Instance Filesystem - fly_instance_filesystem_
Filesystem metrics derived from VFS File System Information.
Labels:
mount
: mount point name(s),/
and if using Volumes, the destination name in fly.toml.
fly_instance_filesystem_blocks
fly_instance_filesystem_block_size
fly_instance_filesystem_blocks_free
fly_instance_filesystem_blocks_avail
Volumes - fly_volume_
Labels:
id
: Volume ID
If you’re using Volumes for any of your organization’s apps, you’ll be able to query these series,
derived from the LSize
and Data%
of the volume’s thin LV.
fly_volume_size_bytes
fly_volume_used_pct (0-100)
Postgres - pg_
If you have a Postgres database hosted on Fly.io, you’ll automatically get the following series,
published via postgres_exporter
:
pg_stat_activity_count
pg_stat_activity_max_tx_duration
pg_stat_archiver_archived_count
pg_stat_archiver_failed_count
pg_stat_bgwriter_buffers_alloc
pg_stat_bgwriter_buffers_backend_fsync
pg_stat_bgwriter_buffers_backend
pg_stat_bgwriter_buffers_checkpoint
pg_stat_bgwriter_buffers_clean
pg_stat_bgwriter_checkpoint_sync_time
pg_stat_bgwriter_checkpoint_write_time
pg_stat_bgwriter_checkpoints_req
pg_stat_bgwriter_checkpoints_timed
pg_stat_bgwriter_maxwritten_clean
pg_stat_bgwriter_stats_reset
pg_stat_database_blk_read_time
pg_stat_database_blk_write_time
pg_stat_database_blks_hit
pg_stat_database_blks_read
pg_stat_database_conflicts_confl_bufferpin
pg_stat_database_conflicts_confl_deadlock
pg_stat_database_conflicts_confl_lock
pg_stat_database_conflicts_confl_snapshot
pg_stat_database_conflicts_confl_tablespace
pg_stat_database_conflicts
pg_stat_database_deadlocks
pg_stat_database_numbackends
pg_stat_database_stats_reset
pg_stat_database_tup_deleted
pg_stat_database_tup_fetched
pg_stat_database_tup_inserted
pg_stat_database_tup_returned
pg_stat_database_tup_updated
pg_stat_database_xact_commit
pg_stat_database_xact_rollback
pg_stat_replication_pg_current_wal_lsn_bytes
pg_stat_replication_pg_wal_lsn_diff
pg_stat_replication_reply_time
pg_replication_lag
pg_database_size_bytes
Custom Metrics
For further customization beyond built-in metrics, Fly apps can expose a metrics endpoint we’ll automatically scrape every 15 seconds and send the results to Prometheus.
Configuration
Add a [metrics]
section to your application’s fly.toml
:
[metrics]
port = 9091
path = "/metrics" # default for most prometheus exporters
If your app uses multiple processes, you can add multiple [[metrics]]
sections, each with its own set of processes
:
[[metrics]]
port = 9394
path = "/metrics"
processes = ["web"]
[[metrics]]
port = 9113
path = "/metrics"
processes = ["proxy"]
Instrumentation
Instrument your app and expose your metrics on 0.0.0.0
.
There are many supported client libraries as well as off-the-shelf exporters able to return Prometheus-formatted metrics.
Authentication
Authenticating to the Prometheus API can be achieved a few different ways, depending on the level of access you want your token to have.
Fly Access Token
As in the earlier example, a full access token can be generated with flyctl auth token
and then passed as a bearer token in the Authorization
header. The header looks like:
Authorization: Bearer THE_TOKEN
Fly org-restricted or read-only token
This kind of token or “macaroon” can be scoped to a single organization or configured to only allow read operations, which can be safer than using a full-blown read-write token that grants access to all organizations under your account.
Generating tokens
Create an org-restricted token:
fly token create org -o THE_ORGANIZATION
Create a read-only org-restricted token:
fly token create readonly
These tokens look like this once generated: FlyV1 fm2_lJPECAAAAAAAAC7txBAzYI6PRWhHLT...(a lot of base64-encoded text)
.
Using tokens
To use one of these tokens in an HTTP header, the “FlyV1” identifier replaces the “Bearer” token identifier. So it would look like this:
Authorization: FlyV1 fm2_lJPECAAAAAAAAC7txBAzYI6PRWhHLT...(a lot of base64-encoded text)