Configuration
Grey uses a YAML configuration file to define probes and control how they are executed. This guide is intended to walk you through the various configuration options available and how to use them to configure your probes.
Top-Level Configuration
State Database
The state
option configures a database file where Grey will store probe execution state for persistence across application restarts. When configured, probe history, availability metrics, and state transitions are automatically saved to disk using a high-performance embedded database.
state: ./state.redb
If not specified, probe state will only be kept in memory and will be lost when the application restarts. The database file uses the .redb
extension and will be created automatically if it doesn't exist.
Probes
Probes are the core of Grey's configuration. Each probe defines a single target and a set of validators that will be used to assert that the target is healthy. In addition to these properties, a probe has a name
, and a policy governing how frequently it is executed and how timeouts and retries should be handled.
probes:
- name: example
policy:
interval: 5s
timeout: 2s
retries: 3
target: !Http
url: https://example.com
validators:
http.status: !OneOf [200]
http.header.content-type: !Contains "text/html"
Name
The name
property is a unique identifier for the probe. It is used to identify the probe in the traces that are emitted by Grey and should be a short, descriptive name. By convention we recommend using the format <service>.<environment>[.<subcomponent>]
, for example: vault.production
or nomad.staging.leader
. In practice, however, Grey doesn't enforce any constraints on this value and you're welcome to use it as you see fit.
Policy
The policy
property defines how Grey will execute your probe, including how frequently, how long to wait for a response, and how many times to retry if the probe fails. In the future, additional policy options may be introduced to control exponential back-off, circuit breaking, and other behaviours.
When configuring your policy, keep in mind that both interval
and timeout
are specified in milliseconds. The retries
property is an integer value that specifies the number of times that the probe will be executed (also known as "attempts") before considering it failed if an issue is encountered.
Warning
The timeout
property applies to the entire probe's execution, including the time taken to perform any retries, and should be configured to allow time for retries to occur if you expect them to be needed.
The decision to apply the timeout to the entire probe execution is intentional and designed to avoid retry storms in the event that the target service is degrading in the face of increased load. By not retrying on timeouts, Grey avoids introducing non-linear degradation scenarios.
Target
The target
property defines the target that will be probed. This is where you specify the type of target (e.g. !Http
) and any configuration options that are specific to that target type. For example, the !Http
target type accepts a url
property that specifies the URL that will be probed.
You can read more about the various target types in the Targets section of the documentation.
Validators
The validators
property defines the set of validators that will be used to assert that the target is healthy. Each validator targets a specific field and accepts a distinct set of configuration options which are documented on their respective pages.
You can read more about the various validators in the Validators section of the documentation.
Tips
You can read more about the fields available for each target in the Targets section of the documentation.
Status Dashboard
Grey includes an optional web-based user interface that provides real-time visibility into probe status and execution history. The UI can be enabled on any node and integrates seamlessly with clustering to provide a unified view of your service health. It's a great way to provide a status page for your users.
ui:
enabled: true
listen: 0.0.0.0:3000
title: "Grey Health Monitor"
logo: "https://example.com/logo.svg"
You can read more about UI configuration options in the User Interface section of the documentation.
Clustering
Grey supports distributed probing through its clustering feature, which enables multiple Grey instances to coordinate probe execution and share results. This is particularly useful for scaling probe execution across multiple nodes, providing redundancy, and enabling probes to be executed from different network locations while maintaining a centralized view through the web UI.
cluster:
enabled: true
listen: 0.0.0.0:8888
peers:
- 10.0.0.2:8888
- 10.0.0.3:8888
secret: /pL7XKDj1UrAGjNMv3t9jmb9leDOZT+64KkYE8k7UH8=
You can read more about clustering in the Clustering section of the documentation.