Concepts and terminology
Does 'sampling' mean dropping traces or keeping them?
Does 'sampling' mean dropping traces or keeping them?
Why is dropping health checks called 'sampling' if I'm dropping 100% of them?
Why is dropping health checks called 'sampling' if I'm dropping 100% of them?
/api/products traffic; both are sampling decisions, just at different keep percentages.Why does Odigos always keep or drop the entire trace?
Why does Odigos always keep or drop the entire trace?
How rules are evaluated
If a trace matches rules in more than one category, which category wins?
If a trace matches rules in more than one category, which category wins?
Within a category, do rules have a priority order?
Within a category, do rules have a priority order?
Why does 'Keep at most 25%' not always mean exactly 25%?
Why does 'Keep at most 25%' not always mean exactly 25%?
If I only create Highly Relevant rules and no Cost Reduction rules, what gets dropped?
If I only create Highly Relevant rules and no Cost Reduction rules, what gets dropped?
Can I combine multiple scope types (e.g. namespace + language) in a single rule?
Can I combine multiple scope types (e.g. namespace + language) in a single rule?
prod OR staging). Across different scope types, selections are combined with AND (e.g. namespace prod AND language Java matches only Java workloads in prod—not every workload in prod and not every Java workload across the cluster). See Source scope.Operational impact
Does adding any sampling rule increase load on the gateway?
Does adding any sampling rule increase load on the gateway?
Why do gateways sometimes crash instead of just dropping data?
Why do gateways sometimes crash instead of just dropping data?
Should I scale up (bigger pods) or scale out (more replicas) for tail sampling?
Should I scale up (bigger pods) or scale out (more replicas) for tail sampling?
What happens to in-flight traces if a gateway crashes?
What happens to in-flight traces if a gateway crashes?
My sampler looks empty or inactive—why is it still affecting my pipeline?
My sampler looks empty or inactive—why is it still affecting my pipeline?
/ route. Once tail mode is on, the gateway buffers traces for the aggregation window, which raises memory and latency to export. If you don’t need tail behavior, either remove unused or test sampling rules from the cluster, or disable tail sampling explicitly in Odigos configuration (Helm values, or the UI Settings page). See When any sampler enables tail mode.Tail sampling specifics
Why is the aggregation window 30 seconds, and can I change it?
Why is the aggregation window 30 seconds, and can I change it?
sampling.tailSampling.traceAggregationWaitDuration (for example '45s' or '15s'); the same value applies to every trace in the cluster. Larger windows hold more memory and delay traces appearing in your backend; smaller windows risk evaluating partial traces and fragmenting a single trace across multiple sampling decisions. See Aggregation window.Does the 30-second timer start at the first span or the last span of a trace?
Does the 30-second timer start at the first span or the last span of a trace?
What is trace-based load balancing, and why does tail sampling need it?
What is trace-based load balancing, and why does tail sampling need it?
Built-in rules
What exactly does the built-in Kubernetes Health Probes rule cover?
What exactly does the built-in Kubernetes Health Probes rule cover?
exec and tcpSocket probes, or probes routed through ingress rewrites or service-mesh wrappers, are not auto-handled and need an explicit Noisy rule. See Built-in rule: Kubernetes Health Probes.Can I disable or change the built-in 'Keep All Error Traces' rule?
Can I disable or change the built-in 'Keep All Error Traces' rule?
Why isn't there a built-in rule for high-latency traces?
Why isn't there a built-in rule for high-latency traces?
Are there built-in operations beyond HTTP and Kafka (e.g. SQS, gRPC, database calls)?
Are there built-in operations beyond HTTP and Kafka (e.g. SQS, gRPC, database calls)?