OpenTelemetry: Sending Traces From Ingress-Nginx to Multi-Tenant Grafana Tempo

Helpful instructions to help you on your Observability journey

patst
Better Programming

--

OpenTelemetry logo by https://opentelemetry.io/

OpenTelemetry has established itself as the standard for Observability after emerging out of the OpenTracing and OpenCensus standards. It tries to standardise different kinds of Signals (logs, traces, metrics). The Observability signal relevant to this article is traces.

To better understand, it’s worth reading the basic OpenTelemetry concepts.

We use Grafana Tempo as the backend for our traces. It allows us to store the trace information in an inexpensive object storage instead of hosting an ElasticSearch database when using a tool like Jaeger.

Because we are providing shared Kubernetes clusters for application development teams, another very important feature of Grafana Tempo is multi-tenancy.

Basic Flow

Each development team has a set of Kubernetes namespaces with a tenant_id label. The applications are boosted with an OpenTelemetry library (see the list of supported languages at this link).

Trace flow, source: self-drawn

Traces originate in the applications, then are sent to an OpenTelemetry collector installed in the cluster with the awesome OpenTelemetry Kubernetes Operator. A collector pipeline has three main parts:

  • Receivers are components that accept traces sent from workloads. We use the built-in otlp receiver, which accepts traces send via gRPC. A collector can accept traces from different sources using different receiver configurations.
  • Processors can transform, filter, or route traces. We use the processor to enrich traces with information about Kubernetes Pod and Namespace names of the sending workloads and extract a tenant id from a Namespace label.
  • Exporters send traces to backends like Grafana Tempo. Here we use the otlp exporter via gRPC.

The collector pipeline for the workloads extracts the tenant id from a Namespace label using the k8sattributes processor. (The processor requires some permissions to query the Kubernetes API server. See the processor's README file.)

receivers:
otlp:
protocols:
grpc:
processors:
k8sattributes/default:
extract:
labels:
- tag_name: tenantId
key: "tenant-id"
from: namespace
routing:
from_attribute: tenantId
attribute_source: resource
table:
- value: tenant1
exporters: [logging, otlp/tenant1]
- value: tenant2
exporters: [logging, otlp/tenant2]
exporters:
otlp/tenant1:
endpoint: tempo.grafana.svc.cluster.local:4317
tls:
insecure: true
headers:
X-Scope-OrgID: tenant1
otlp/tenant2:
endpoint: tempo.grafana.svc.cluster.local:4317
headers:
X-Scope-OrgID: tenant2
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [k8sattributes/default, routing]
exporters: [ otlp/tenant1, otlp/tenant2]

Next, we define an exporter for each tenant we have. They all configure the same endpoint URL for Grafana tempo but have a different X-Scope-OrgID header value. The routing processor routes traces to the different exporters depending on the tenant id.

This is a bit cumbersome, but there is no way to use variables inside the headers block for the otlp exporter. This would enable us to have only one exporter entry with variable content in the X-Scope-OrgID header.

Ingress-Nginx and OpenTelemetry

Ingress-Nginx has supported OpenTelemetry since version 1.7.0 (released on March 24th, 2023. See the pull request #9062). We were looking forward to this new feature because it allows the collection trace information not only at the workload level but already one step earlier when the traffic enters the Kubernetes cluster.

To enable the OpenTelemetry integration using the Ingress-Nginx helm chart, just add these parameters:

controller:
config:
enable-opentelemetry: "true"
opentelemetry-config: "/etc/nginx/opentelemetry.toml"
opentelemetry-operation-name: "HTTP $request_method $service_name $uri"
opentelemetry-trust-incoming-span: "true"
otlp-collector-host: "otel-collector.grafana.svc.cluster.local"
otlp-collector-port: "4317"
otel-max-queuesize: "2048"
otel-schedule-delay-millis: "5000"
otel-max-export-batch-size: "512"
otel-service-name: "nginx-proxy" # Opentelemetry resource name
otel-sampler: "AlwaysOn" # Also: AlwaysOff, TraceIdRatioBased
otel-sampler-ratio: "1.0"
otel-sampler-parent-based: "false"

While the workloads are always running inside a Kubernetes namespace, which has the tenant id attached as a label, the Ingress-Nginx controller is a shared deployment operated by the platform team, and the tenant information cannot be extracted from the namespace.

We had to extract the tenant information from another location to route the traces originating in the ingress controller to the correct Grafana Tempo tenant.

Ingress-Nginx OpenTelemetry directives

While the Ingress-Nginx controller documentation currently falls short, we found the required information in the module which is used for the OpenTelemetry part. It is located in the following GitHub project.

We were able to find the directives which are provided by the OpenTelemetry module. Especially the directive opentelemetry_attribute was very interesting for us. It allows us to add custom attributes to spans.

We configured the ingress controller to add these directives to each server block in the nginx config. The configuration example below is for the helm chart:

controller:
config:
server-snippet: |
opentelemetry_attribute "ingress.namespace" "$namespace";
opentelemetry_attribute "ingress.service_name" "$service_name";
opentelemetry_attribute "ingress.name" "$ingress_name";
opentelemetry_attribute "ingress.upstream" "$proxy_upstream_name";

After making this configuration change, traces sent from the ingress controller contain information about the Ingresses namespace, name, and where it routes traffic to.

But where to get the tenant id from? Lucky for us, all of our namespaces follow a naming schema like tenantId-<namespaceName>. We can extract the tenant id from the namespace attribute attached to the spans.

The collector pipeline above is changed to make this possible. Here’s what the code looks like:

receivers:
otlp:
protocols:
grpc:
processors:
k8sattributes/default:
extract:
labels:
- tag_name: tenantId
key: "tenant-id"
from: namespace
# in case of ingress-nginx traces the ingress namespace is in the attributes.
# we need to extract the account id from there and eventually transfer it to the resources
# see Resource: https://opentelemetry.io/docs/concepts/glossary/#resource
# vs Attribute: https://opentelemetry.io/docs/concepts/glossary/#attribute
attributes/ingress:
actions:
- key: "ingress.namespace"
pattern: "(?P<temp_tenant_id>.{6})"<<<
action: extract # this overwrites existing keys
groupbyattrs: # copy the attribute containing the tenant id to the resource attribute which is used for routing
keys:
- temp_tenant_id
resource/ingress:
attributes:
- key: "tenantId"
from_attribute: temp_tenant_id
action: insert
- key: "temp_tenant_id"
action: delete
routing:
from_attribute: tenantId
attribute_source: resource
table:
- value: tenant1
exporters: [logging, otlp/tenant1]
- value: tenant2
exporters: [logging, otlp/tenant2]
exporters:
otlp/tenant1:
endpoint: tempo.grafana.svc.cluster.local:4317
tls:
insecure: true
headers:
X-Scope-OrgID: tenant1
otlp/tenant2:
endpoint: tempo.grafana.svc.cluster.local:4317
headers:
X-Scope-OrgID: tenant2
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [k8sattributes/default, attributes/ingress, groupbyattrs, resource/ingress, routing]
exporters: [ otlp/tenant1, otlp/tenant2]

Using this pipeline, the namespace information of the Ingress-Nginx is used. The tenant id is extracted with a regular expression and stored in a temporary attribute. Then the attribute is copied to the resources using the groupbyattrs processor.

This is required because OpenTelemetry has the concept of attributes and resources. The k8sattributes processor stores the information in the resources part. We use these attributes already for routing (this is why we configured attribute_source: resource).

This is a bit ugly, but we needed help finding a better way to do this. If you have ideas feel free to put suggestions in the comments!

Next Steps

The collector pipeline above can be further optimised by using the following:

  • the batch processor for sending traces batched
  • the tail-sampling processor to filter out traces for endpoints like /actuator or /healthwhich are called very frequently by the Kubernetes API server but bring no real value to be recorded in Grafana Tempo

Grafana Tempo and the OpenTelemetry collector usage are very new for us now. If any important points are missing above and you see further optimisations leave a hint in the comments.

Hopefully, the instructions help you as well on your Observability journey!

Further Read

--

--

Passionate about process automation, developer experience, cloud platforms and platform engineering