Flux Monitoring and Reporting
The Flux Operator supervises the Flux controllers and provides a unified view of all the Flux resources that define the GitOps workflows for the target cluster. The operator generates reports, emits events, and exports Prometheus metrics to help with monitoring and troubleshooting Flux.
Flux Status Reporting
The Flux Operator automatically generates a report that reflects the observed state of the Flux installation. The report provides information about the installed components and their readiness, the Flux distribution details, reconcilers statistics, cluster sync status and more.
The report is generated as a custom resource of kind FluxReport
, named flux
,
located in the same namespace where the operator is running.
Flux installation method
The report is available no matter the tool used to install Flux,
be it the flux
CLI, Terraform, Helm or the Flux Operator itself.
For the report to be accurate, the operator must be running
in the same namespace where the Flux controllers are deployed.
To view the report in YAML format run:
kubectl -n flux-system get fluxreport/flux -o yaml
The operator updates the report at regular intervals, by default every five minutes. To manually trigger the reconciliation of the report, run:
kubectl -n flux-system annotate --overwrite fluxreport/flux \
reconcile.fluxcd.io/requestedAt="$(date +%s)"
Find more information about the reporting features in the Flux Report API documentation.
Flux Instance Events
The Flux Operator emits events to the Kubernetes API server to report on the status of the Flux instance. The events are useful to monitor the Flux lifecycle and troubleshoot upgrade issues.
To list the events related to the Flux instance, run:
kubectl -n flux-system events --for fluxinstance/flux
The Flux Operator integrates with notification-controller. To receive notifications with the events issued by the operator, you can configure alerting as follows:
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
name: slack-bot
namespace: flux-system
spec:
type: slack
channel: general
address: https://slack.com/api/chat.postMessage
secretRef:
name: slack-bot-token
---
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: flux-operator
namespace: flux-system
spec:
providerRef:
name: slack-bot
eventSeverity: info
eventSources:
- kind: FluxInstance
name: flux
Besides Slack, the notification-controller supports other providers like Microsoft Teams, Datadog, Grafana, etc., for more information see the alert provider documentation.
Prometheus Metrics
The Flux Operator exports metrics in the Prometheus format for monitoring
and alerting purposes. The metrics are exposed inside the cluster by the
flux-operator
Kubernetes Service on the 8080
port.
On clusters where the Prometheus Operator is installed, the metrics can be scraped
by creating a ServiceMonitor
resource as follows:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: flux-operator
namespace: flux-system
labels:
release: kube-prometheus-stack
spec:
namespaceSelector:
matchNames:
- flux-system
selector:
matchLabels:
app.kubernetes.io/name: flux-operator
endpoints:
- targetPort: 8080
path: /metrics
interval: 60s
scrapeTimeout: 30s
Helm Chart
The Flux Operator Helm chart includes a ServiceMonitor
resource that can
be enabled by setting the serviceMonitor.create
value to true
.
On clusters with Prometheus auto-discovery enabled, the metrics are automatically scraped
from the flux-operator
pods that have the prometheus.io/scrape: "true"
annotation.
Flux Instance Metrics
The Flux Operator exports metrics for the FluxInstance resource. These metrics are refreshed every time the operator reconciles the instance.
Metrics:
flux_instance_info{uid, kind, name, exported_namespace, ready, suspended, registry, revision}
Labels:
uid
: The Kubernetes unique identifier of the resource.kind
: The kind of the resource (e.g.FluxInstance
).name
: The name of the resource (e.g.flux
).exported_namespace
: The namespace where the resource is deployed (e.g.flux-system
).ready
: The readiness status of the resource (e.g.True
,False
orUnkown
).reason
: The reason for the readiness status (e.g.Progressing
,BuildFailed
,HealthCheckFailed
, etc.).suspended
: The suspended status of the resource (e.g.True
orFalse
).registry
: The container registry used by the instance (e.g.ghcr.io/fluxcd
).revision
: The Flux revision installed by the instance (e.g.v2.3.0@sha256:75aa209c6a...
).
Flux Resource Metrics
The Flux Operator exports metrics for all Flux resources found in the cluster. These metrics are refreshed at the same time with the update of the FluxReport.
Metrics:
flux_resource_info{uid, kind, name, exported_namespace, ready, suspended, ...}
Common labels:
uid
: The Kubernetes unique identifier of the resource.kind
: The kind of the resource (e.g.GitRepository
,Kustomization
, etc.).name
: The name of the resource (e.g.flux-system
).exported_namespace
: The namespace of the resource (e.g.flux-system
).ready
: The readiness status of the resource (e.g.True
,False
orUnkown
).reason
: The reason for the readiness status (e.g.Progressing
,BuildFailed
,HealthCheckFailed
, etc.).suspended
: The suspended status of the resource (e.g.True
orFalse
).
Specific labels per resource kind:
Resource Kind | Labels |
---|---|
Kustomization | revision , source_name , path |
GitRepository | revision , url , ref |
OCIRepository | revision , url , ref |
Bucket | revision , url , ref |
HelmRelease | revision , source_name |
HelmChart | revision , source_name |
HelmRepository | revision , url |
Receiver | url |
ImageRepository | url |
ImagePolicy | source_name |
ImageUpdateAutomation | source_name |
Controller Runtime Metrics
The Flux Operator exports Kubernetes controller runtime metrics and Go runtime metrics.
Relevant metrics for troubleshooting:
controller_runtime_reconcile_errors_total{controller}
: Total number of reconciliation errors per controller.rest_client_requests_total{code, method}
: Number of Kubernetes API requests, partitioned by status code and method.go_memstats_alloc_bytes
: Number of bytes allocated and still in use.go_goroutines
: Number of goroutines that currently exist.workqueue_longest_running_processor_seconds
: Longest time a workqueue item has been processed.