Integrations · Logs · Fluent Bit

Fluent Bit on Kubernetes

End-to-end walkthrough for shipping logs from your Kubernetes cluster to Nightlamp via a Fluent Bit DaemonSet. Covers RBAC, the ConfigMap, opt-in by pod label, healthcheck filtering, and verifying that lines are landing in your Issue queue.

Prerequisites

  • A Kubernetes cluster you can apply manifests to (any flavor: GKE, EKS, AKS, DOKS, kind, k3s).
  • A Nightlamp app registered. Open Getting started if you haven't created one — you'll need its app ID and a DSN key.
  • Knowledge of which container runtime your nodes use. Almost every cluster shipped after early 2022 uses containerd; older or legacy clusters may still use Docker. The parser config differs.
Already shipping with Promtail or Vector? Skip to the log subscriptions page — those clients drop into Nightlamp with just a Loki push URL + two headers, no DaemonSet rewrite needed. This page is for teams setting up cluster log shipping from scratch.

How it works

Nightlamp accepts pushes on the standard Loki push API — POST a payload of { streams: [{ stream: {…labels…}, values: [[ts, line], …] }] } to https://api.nightlamp.app/api/loki/api/v1/push with two headers identifying the source app:

Required request headers

X-Nightlamp-App-Id: <your-app-id>
X-Nightlamp-Dsn-Key: <your-dsn-key>

Fluent Bit's loki output plugin handles the request shape natively. We use it as a DaemonSet — one pod per node — so every container's stdout/stderr is tailed without per-app sidecars. An opt-in label on each pod template controls which workloads actually ship; pods without the label are silently ignored, so adding Fluent Bit doesn't accidentally fire-hose every legacy workload's logs into Nightlamp.


1. ServiceAccount + RBAC

Fluent Bit's kubernetes filter calls the Kubernetes API to enrich each log line with pod labels + namespace metadata. The pod needs read access to pods and namespaces:

rbac.yaml

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit-read
rules:
  - apiGroups: [""]
    resources: ["pods", "namespaces"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluent-bit-read
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluent-bit-read
subjects:
  - kind: ServiceAccount
    name: fluent-bit
    namespace: default

2. Secret with your DSN

Don't hard-code the DSN in the ConfigMap — Fluent Bit reads it from an env var, which is sourced from a Secret. That way the manifest is safe to commit; the Secret is applied separately.

Create the Secret (replace the value, then apply)

kubectl create secret generic fluent-bit-dsn \
  --namespace=default \
  --from-literal=dsn=<your-dsn-key>
Rotating the DSN. Re-run the create secret with --dry-run=client -o yaml | kubectl apply -f - to overwrite, then kubectl rollout restart daemonset/fluent-bit so pods pick up the new env var. Old DSNs stop authenticating at the next push.

3. ConfigMap

The pipeline: tail every node's container logs → kubernetes filter decorates with pod metadata → grep drops anything without the opt-in label → grep drops 2xx /health probe spam → loki output ships what's left.

configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: default
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         5
        Daemon        Off
        Log_Level     info
        Parsers_File  parsers.conf

    [INPUT]
        Name              tail
        Path              /var/log/containers/*.log
        # Use 'cri' for containerd nodes (almost everything modern).
        # Use 'docker' for legacy Docker-runtime nodes.
        Parser            cri
        Tag               kube.*
        Refresh_Interval  5
        Mem_Buf_Limit     32MB
        Skip_Long_Lines   On
        DB                /var/log/flb_kube.db

    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        Keep_Log            Off
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On

    # Opt-in gate: only pods labeled with nightlamp.app/id ship.
    [FILTER]
        Name    grep
        Match   kube.*
        Regex   $kubernetes['labels']['nightlamp.app/id'] .+

    # Drop healthcheck spam (any 2xx response on a /health* path).
    [FILTER]
        Name    grep
        Match   kube.*
        Exclude log /health[^"]*" 2\d\d

    [OUTPUT]
        Name                       loki
        Match                      kube.*
        Host                       api.nightlamp.app
        Port                       443
        Tls                        On
        Tls.Verify                 On
        Uri                        /api/loki/api/v1/push
        Header                     X-Nightlamp-App-Id <your-app-id>
        Header                     X-Nightlamp-Dsn-Key ${NIGHTLAMP_DSN}
        Labels                     job=fluent-bit, app=$kubernetes['labels']['nightlamp.app/id'], pod=$kubernetes['pod_name'], namespace=$kubernetes['namespace_name'], container=$kubernetes['container_name']
        Auto_Kubernetes_Labels     Off
        Line_Format                json
        Remove_Keys                kubernetes,stream

  parsers.conf: |
    # CRI (containerd) line format:
    #   <time> <stream> <logtag> <message>
    [PARSER]
        Name        cri
        Format      regex
        Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z
Multiple Nightlamp apps on one cluster? Add a rewrite_tag filter that re-tags by $kubernetes['labels']['nightlamp.app/id'], then declare one [OUTPUT] block per app id — each with its own X-Nightlamp-App-Id literal and X-Nightlamp-Dsn-Key env var. Same DaemonSet, different Loki targets per workload.
Healthcheck path isn't /health*? The default exclude regex (log /health[^"]*" 2\d\d) catches every 2xx response on a path starting with /health — that includes /healthz, /health/ready, /health/live, /health-check, and even nested mounts like /api/health/ready. Apps using a different liveness/readiness path (/_status, /ping, /up, /ready) need a small swap; see the troubleshooting section below.

4. DaemonSet

One Fluent Bit pod per node, mounting /var/log from the host so it can tail container log files. The DSN env var is sourced from the Secret you created in step 2.

daemonset.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: default
  labels:
    app: fluent-bit
spec:
  selector:
    matchLabels: { app: fluent-bit }
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      serviceAccountName: fluent-bit
      tolerations:
        - operator: Exists
          effect: NoSchedule
        - operator: Exists
          effect: NoExecute
      containers:
        - name: fluent-bit
          image: fluent/fluent-bit:3.1
          env:
            - name: NIGHTLAMP_DSN
              valueFrom:
                secretKeyRef:
                  name: fluent-bit-dsn
                  key: dsn
          resources:
            requests: { cpu: 50m, memory: 64Mi }
            limits:   { cpu: 200m, memory: 200Mi }
          volumeMounts:
            - { name: varlog,    mountPath: /var/log }
            - { name: config,    mountPath: /fluent-bit/etc/ }
            - { name: dbpath,    mountPath: /var/log/flb-state }
      volumes:
        - { name: varlog, hostPath: { path: /var/log } }
        - name: config
          configMap: { name: fluent-bit-config }
        - { name: dbpath, hostPath: { path: /var/log/flb-state, type: DirectoryOrCreate } }

5. Opt your workloads in

Add the nightlamp.app/id label to the pod template of every Deployment / StatefulSet / Job whose logs you want to ship:

In your app's Deployment manifest

spec:
  template:
    metadata:
      labels:
        nightlamp.app/id: <your-app-id>

Pods without the label are silently dropped at the opt-in filter — their logs stay on the node. This is deliberate: rolling Fluent Bit out across a cluster shouldn't accidentally start firing every legacy workload's stdout into Nightlamp.


6. Apply + verify

  1. Apply the manifests

    In the order: rbac → secret → configmap → daemonset

    kubectl apply -f rbac.yaml
    kubectl apply -f configmap.yaml
    kubectl apply -f daemonset.yaml

    Skip secret.yaml if you created it via kubectl create secret in step 2.

  2. Watch the rollout

    DaemonSet should report Ready on every node

    kubectl rollout status daemonset/fluent-bit -n default --timeout=120s
    kubectl get pods -n default -l app=fluent-bit -o wide
  3. Generate a test log line from a labeled pod

    From any pod with the nightlamp.app/id label

    kubectl exec -it <your-pod> -- sh -c 'echo "hello-from-nightlamp-fluent-bit"'
  4. Confirm arrival in Nightlamp. Open your app's Issue queue or a LogQL query in the dashboard with {app="<your-app-id>"}; the line should appear within ~30 seconds.

Troubleshooting

DaemonSet pods are CreateContainerConfigError

The Secret your DaemonSet references doesn't exist or has the wrong key. Check kubectl describe pod -l app=fluent-bit for the missing key name, then kubectl get secret fluent-bit-dsn -o yaml to verify.

Pods are Ready but no logs arrive in Nightlamp

Tail Fluent Bit's own stderr to see what's failing:

Fluent Bit's own logs

kubectl logs ds/fluent-bit -n default --tail=200
  • HTTP 401 from the Loki output → DSN value is wrong or the app id and DSN don't match. Double-check both headers in the OUTPUT block.
  • HTTP 404 → the URL is wrong; verify the path is exactly /api/loki/api/v1/push (note the leading /api/loki/).
  • HTTP 429 → you're rate-limited; reduce Flush cadence or filter more aggressively at the FILTER stage.
  • No outbound traffic at all → the FILTER pipeline dropped every record. Check the pod has the nightlamp.app/id label and not every line matches the /health exclude regex.

Healthcheck noise is leaking through

The default exclude regex (log /health[^"]*" 2\d\d) catches any 2xx response on a path that starts with /health — that already covers /health, /healthz, /health/ready, /health/live, /health-check, and nested mounts like /api/health/ready. 5xx responses on those paths are kept (genuine failure signal). If your app uses a non-/health path, edit the second grep filter in the ConfigMap.

For a single non-default path (e.g. /_status):

exclude /_status 2xx

[FILTER] Name grep Match kube.* Exclude log /_status[^"]*" 2\d\d

For multiple paths (e.g. /_status + /ping), stack one grep filter per path. Fluent Bit runs each filter in declaration order; both can drop. This is also the format-portable choice — works identically in the classic INI fluent-bit.conf shown here and the native YAML fluent-bit.yaml format introduced in 3.2.

exclude multiple healthcheck paths

[FILTER] Name grep Match kube.* Exclude log /_status[^"]*" 2\d\d [FILTER] Name grep Match kube.* Exclude log /ping[^"]*" 2\d\d

A single-regex alternation (/(?:_status|ping)[^"]*" 2\d\d) also works in the classic INI format. If you use Fluent Bit's YAML pipeline config, quote the value (exclude: 'log /(?:_status|ping)[^"]*" 2\d\d') so YAML's scanner doesn't treat the leading ? of (?: as a special token.

Test regex changes locally with fluent-bit -c <config> --dry-run before re-applying. 4xx/5xx responses on these paths survive the filter, which is intentional — a probe returning 503 is exactly what you want surfaced.

"unknown parser cri" or malformed timestamps

Your nodes use Docker, not containerd. Swap the parser: in the [INPUT] block, set Parser docker, and remove the custom CRI parser from parsers.conf (Fluent Bit ships with the Docker parser built in).


Next steps

  • Log subscriptions — equivalent push setups for Promtail and Grafana Alloy, plus the AWS CloudWatch pull flow if your logs live there instead.
  • Alert rules — turn shipped log lines into actionable issues.
  • API reference — the underlying push endpoint, in case you want to wire your own client.