Denial of Service (Memory Exhaustion due to Cartesian explosion in binary expression evaluation)

HIGH

grafana/grafana

Commit: 7d62590d0008

Affected: Versions prior to 12.4.0 (i.e., all releases before this fix)

2026-04-07 14:32 UTC

Description

Summary of the observed vulnerability and fix: - Issue type: Denial of Service (Memory Exhaustion) in Grafana's expression binary operations. - Root cause: When evaluating binary math expressions (e.g., A || B), Grafana could need to materialize a combinatorial number of output series (cartesian explosion) if the left and right sides return many series with compatible labels. This can lead to excessive memory usage and potential OOM failures before the expression even completes. - Fix provided by commit 7d62590d00088e691d8b5a8ba9f613cf285da736: - Introduces a pre-execution memory limit for a single math expression binary operation (default 1 GiB). The limit can be configured via [expressions] math_expression_memory_limit. - Adds a memory estimator that replays the union/combination logic without allocating the actual output to compute an estimated memory footprint for the potential binary operation. - Enforces the limit during expression evaluation; if the estimated memory exceeds the limit, the evaluation returns a descriptive error instead of proceeding, thereby preventing an OOM. - Extends the command handling to thread the memory limit into math command execution and passes the limit to the expression engine (WithMemoryLimit). - Includes tests for the memory estimation logic to validate the behavior. - Vulnerability type and impact: Denial of Service via Memory Exhaustion due to cartesian explosion in binary expression evaluation. The fix prevents excessive memory allocation by estimating memory usage and aborting when the limit would be exceeded. - Affected area: Expressions/mathexp execution path and configuration (defaults.ini, sample.ini) to expose a memory limit control. - Severity/confidence: HIGH confidence that this is a meaningful DoS mitigation for users crafting expressions that produce large intermediate results. The fix is preventative (pre-execution estimate) rather than a reactive crash fix. - Versions affected: Prior to this commit (pre-12.4.0). Grafana 12.4.0 includes the fix. - Practical implication: Users who craft expressions that would otherwise generate a large number of intermediate series (via cartesian-like explosions) will now receive a controlled error indicating the mismatch/over-limit situation, rather than risking memory exhaustion or server instability.

Proof of Concept

PoC idea (practical repro in a test environment):
- Environment: A Grafana instance running a vulnerable release (pre-12.4.0) with expressions enabled and a default memory limit of 1 GiB (or a smaller limit if configured differently).
- Data setup: Create two queries A and B that each return a large set of time series with identical label sets. For example, generate N = M = 2000 distinct series where every A-series and every B-series shares the exact same label keys/values (e.g., {service="web", instance="i-<index>"}, with 2000 distinct values of <index> but identical label maps across A and B).
- Expression: A || B (binary operation that would combine A and B series).
- Expected behavior before fix: The engine would attempt to evaluate the binary operation, leading to a combinatorial explosion (up to N*M unions) and excessive memory usage that could culminate in OOM or a crash on large deployments.
- Expected behavior after fix (with PoC tooling configured to the pre-fix state): The expression evaluation will invoke the memory estimator, detect that estimated memory usage exceeds the configured limit (default 1 GiB), and return a descriptive error rather than performing the operation. The error message will indicate the number of pairings and the estimated memory footprint, guiding users to adjust the queries or enable the limit.
- Minimal code-path PoC (conceptual, executable in Grafana’s test harness or unit tests):
  - Build two Results sets A and B, each containing a large number of Series with identical labels (e.g., N = 2000 entries, label map L).
  - Construct a dummy BinaryNode that represents A || B (the exact parse tree shape used by Grafana’s expression engine).
  - Instantiate a State with MemoryLimit = 1<<30 (1 GiB).
  - Call the internal memory estimator (estimateBinaryMemory) with A and B and verify it reports est.unions == N*M (or a large number) and est.bytes > MemoryLimit.
  - Invoke the memory check (checkBinaryMemoryLimit) via the execution path and observe the error returned, confirming the pre-execution protection.
- Notes: The repository provides a unit-test file exp_memory_limit_test.go and a new exp_memory_limit_test.go that exercises memory estimation logic. A full, actionable PoC in a live Grafana deployment would replicate the above in a test schema and confirm that the engine errors out with a descriptive message when the limit is exceeded.
- Concrete error you should see when limit is exceeded: a descriptive message indicating the expression, the number of input series, the number of unions, the estimated memory footprint, and the configured limit, prompting the user to adjust queries or memory settings.

Commit Details

Author: William Wernert

Date: 2026-04-07 13:32 UTC

Message:

Expressions: Add memory limit for math expression binary operations (#121945)

* Expressions: Add memory limit for math expression binary operations

Prevent OOM kills from cartesian explosions when label sets diverge
in binary operations like $A || $B. Estimates output cost before
allocating and fails with an actionable error when the limit is
exceeded. Default: 1 GiB, configurable via [expressions].

* Docs: Add math_expression_memory_limit to configuration reference

* Expressions: Use exact union counts and per-pair costs in memory estimate

* Expressions: Clarify that memory limit is a pre-execution estimate

Triage Assessment

Vulnerability Type: Denial of Service (Memory Exhaustion)

Confidence: HIGH

Reasoning:

Introduces a memory limit for math expression binary operations to prevent out-of-memory due to cartesian explosions when label sets diverge. This is a preventative fix against memory exhaustion and potential DoS via crafted queries. It adds pre-execution memory estimation and raises an error when the limit is exceeded.

Verification Assessment

Vulnerability Type: Denial of Service (Memory Exhaustion due to Cartesian explosion in binary expression evaluation)

Confidence: HIGH

Affected Versions: Versions prior to 12.4.0 (i.e., all releases before this fix)

Code Diff

diff --git a/conf/defaults.ini b/conf/defaults.ini
index 0445f61fcca7f..571221cd04518 100644
--- a/conf/defaults.ini
+++ b/conf/defaults.ini
@@ -2204,6 +2204,12 @@ default_timezone = browser
 # Enable or disable the expressions functionality.
 enabled = true
 
+# Maximum estimated memory (in bytes) for a single math expression binary
+# operation. Memory usage is estimated before the expression runs. Protects
+# against cartesian explosions when label sets on both sides of a binary
+# operation are incompatible. Set to 0 to disable. Default: 1073741824 (1 GiB).
+math_expression_memory_limit = 1073741824
+
 [geomap]
 # Set the JSON configuration for the default basemap
 default_baselayer_config =

diff --git a/conf/sample.ini b/conf/sample.ini
index 1598232378098..f118d12ad7333 100644
--- a/conf/sample.ini
+++ b/conf/sample.ini
@@ -2103,6 +2103,12 @@ default_datasource_uid =
 # Enable or disable the expressions functionality.
 ;enabled = true
 
+# Maximum estimated memory (in bytes) for a single math expression binary
+# operation. Memory usage is estimated before the expression runs. Protects
+# against cartesian explosions when label sets on both sides of a binary
+# operation are incompatible. Set to 0 to disable. Default: 1073741824 (1 GiB).
+;math_expression_memory_limit = 1073741824
+
 [geomap]
 # Set the JSON configuration for the default basemap
 ;default_baselayer_config = `{

diff --git a/pkg/expr/commands.go b/pkg/expr/commands.go
index 8883a8c3fda68..5337326fb2f18 100644
--- a/pkg/expr/commands.go
+++ b/pkg/expr/commands.go
@@ -15,6 +15,7 @@ import (
 	"github.com/grafana/grafana/pkg/expr/metrics"
 
 	"github.com/grafana/grafana/pkg/infra/tracing"
+	"github.com/grafana/grafana/pkg/setting"
 )
 
 // Command is an interface for all expression commands.
@@ -29,11 +30,12 @@ type MathCommand struct {
 	RawExpression string
 	Expression    *mathexp.Expr
 	refID         string
+	memoryLimit   int64 // maximum estimated output bytes for a binary operation; 0 disables
 }
 
 // NewMathCommand creates a new MathCommand. It will return an error
 // if there is an error parsing expr.
-func NewMathCommand(refID, expr string) (*MathCommand, error) {
+func NewMathCommand(refID, expr string, memoryLimit int64) (*MathCommand, error) {
 	parsedExpr, err := mathexp.New(expr)
 	if err != nil {
 		return nil, err
@@ -42,11 +44,12 @@ func NewMathCommand(refID, expr string) (*MathCommand, error) {
 		RawExpression: expr,
 		Expression:    parsedExpr,
 		refID:         refID,
+		memoryLimit:   memoryLimit,
 	}, nil
 }
 
 // UnmarshalMathCommand creates a MathCommand from Grafana's frontend query.
-func UnmarshalMathCommand(rn *rawNode) (*MathCommand, error) {
+func UnmarshalMathCommand(rn *rawNode, cfg *setting.Cfg) (*MathCommand, error) {
 	rawExpr, ok := rn.Query["expression"]
 	if !ok {
 		return nil, errors.New("command is missing an expression")
@@ -56,7 +59,7 @@ func UnmarshalMathCommand(rn *rawNode) (*MathCommand, error) {
 		return nil, fmt.Errorf("math expression is expected to be a string, got %T", rawExpr)
 	}
 
-	gm, err := NewMathCommand(rn.RefID, exprString)
+	gm, err := NewMathCommand(rn.RefID, exprString, cfg.MathExpressionMemoryLimit)
 	if err != nil {
 		return nil, fmt.Errorf("invalid math command: %w", err)
 	}
@@ -75,7 +78,7 @@ func (gm *MathCommand) Execute(ctx context.Context, _ time.Time, vars mathexp.Va
 	_, span := tracer.Start(ctx, "SSE.ExecuteMath")
 	span.SetAttributes(attribute.String("expression", gm.RawExpression))
 	defer span.End()
-	return gm.Expression.Execute(gm.refID, vars, tracer)
+	return gm.Expression.Execute(gm.refID, vars, tracer, mathexp.WithMemoryLimit(gm.memoryLimit))
 }
 
 func (gm *MathCommand) Type() string {

diff --git a/pkg/expr/mathexp/exp.go b/pkg/expr/mathexp/exp.go
index 77ca7f7eddd3b..54a9ef2b592e4 100644
--- a/pkg/expr/mathexp/exp.go
+++ b/pkg/expr/mathexp/exp.go
@@ -30,6 +30,11 @@ type State struct {
 	Drops     map[string]map[string][]data.Labels // binary node text -> LH/RH -> Drop Labels
 	DropCount int64
 
+	// MemoryLimit is the maximum estimated output size (in bytes) for a single
+	// binary operation. If the estimated allocation exceeds this, walkBinary
+	// returns an error instead of proceeding. A value of 0 disables the limit.
+	MemoryLimit int64
+
 	tracer tracing.Tracer
 }
 
@@ -49,8 +54,21 @@ func New(expr string, funcs ...map[string]parse.Func) (*Expr, error) {
 	return e, nil
 }
 
+// ExecuteOption is a functional option for configuring expression execution.
+type ExecuteOption func(*State)
+
+// WithMemoryLimit sets the maximum estimated memory (in bytes) that a single
+// binary operation may allocate for its output. When the estimate exceeds
+// this limit, evaluation returns a descriptive error. A value of 0 disables
+// the limit.
+func WithMemoryLimit(limit int64) ExecuteOption {
+	return func(s *State) {
+		s.MemoryLimit = limit
+	}
+}
+
 // Execute applies a parse expression to the context and executes it
-func (e *Expr) Execute(refID string, vars Vars, tracer tracing.Tracer) (r Results, err error) {
+func (e *Expr) Execute(refID string, vars Vars, tracer tracing.Tracer, opts ...ExecuteOption) (r Results, err error) {
 	s := &State{
 		Expr:  e,
 		Vars:  vars,
@@ -58,6 +76,9 @@ func (e *Expr) Execute(refID string, vars Vars, tracer tracing.Tracer) (r Result
 
 		tracer: tracer,
 	}
+	for _, opt := range opts {
+		opt(s)
+	}
 	return e.executeState(s)
 }
 
@@ -306,6 +327,166 @@ func (e *State) union(aResults, bResults Results, biNode *parse.BinaryNode) []*U
 	return unions
 }
 
+const (
+	// bytesPerDatapoint is the estimated heap cost per datapoint in the output
+	// series: 24 bytes for time.Time (wall, ext, loc) + 16 bytes for *float64
+	// (pointer + value).
+	bytesPerDatapoint = 40
+
+	// bytesPerBPoint is the estimated heap cost per entry in the bPoints map
+	// used by biSeriesSeries: ~30 bytes for the time string key + 8 bytes for
+	// *float64 + ~50 bytes for map bucket overhead.
+	bytesPerBPoint = 88
+
+	// frameOverhead is the estimated fixed cost per output Value for the
+	// data.Frame, data.Field structs, and metadata.
+	frameOverhead = 500
+)
+
+// labelBytes returns the total byte size of all keys and values in a label set.
+func labelBytes(labels data.Labels) int {
+	n := 0
+	for k, v := range labels {
+		n += len(k) + len(v)
+	}
+	return n
+}
+
+// binaryMemoryEstimate holds the result of estimateBinaryMemory.
+type binaryMemoryEstimate struct {
+	unions int   // actual number of unions label matching would produce
+	bytes  int64 // estimated total output allocation in bytes
+	aCount int   // number of Values on the A side
+	bCount int   // number of Values on the B side
+}
+
+// seriesLen returns the datapoint count for a Value if it is a Series, or 0.
+func seriesLen(v Value) int {
+	if s, ok := v.(Series); ok {
+		return s.Len()
+	}
+	return 0
+}
+
+// estimateBinaryMemory replays the label matching logic from union() to count
+// the exact number of unions and accumulate a per-pair memory estimate, without
+// allocating Union structs or output frames.
+//
+// For each matching pair, the estimate accounts for the actual types and sizes
+// of both values (Series length, label bytes). The only remaining
+// over-estimation is using max(aLen, bLen) for the output series length instead
+// of the timestamp intersection, which is not knowable without scanning
+// datapoints.
+func estimateBinaryMemory(aResults, bResults Results) binaryMemoryEstimate {
+	est := binaryMemoryEstimate{
+		aCount: len(aResults.Values),
+		bCount: len(bResults.Values),
+	}
+	if est.aCount == 0 || est.bCount == 0 {
+		return est
+	}
+
+	addPair := func(a, b Value) {
+		est.unions++
+
+		aDP := seriesLen(a)
+		bDP := seriesLen(b)
+		outputDP := max(aDP, bDP)
+
+		aLB := labelBytes(a.GetLabels())
+		bLB := labelBytes(b.GetLabels())
+		outputLB := max(aLB, bLB)
+
+		pairCost := int64(outputDP)*bytesPerDatapoint +
+			int64(outputLB) +
+			frameOverhead
+		// biSeriesSeries builds a bPoints map from one side.
+		if aDP > 0 && bDP > 0 {
+			pairCost += int64(outputDP) * bytesPerBPoint
+		}
+
+		est.bytes += pairCost
+	}
+
+	if est.aCount == 1 || est.bCount == 1 {
+		aNoData := aResults.Values[0].Type() == parse.TypeNoData
+		bNoData := bResults.Values[0].Type() == parse.TypeNoData
+		if aNoData || bNoData {
+			addPair(aResults.Values[0], bResults.Values[0])
+			return est
+		}
+	}
+
+	for _, a := range aResults.Values {
+		for _, b := range bResults.Values {
+			aLabels := a.GetLabels()
+			bLabels := b.GetLabels()
+			switch {
+			case aLabels.Equals(bLabels) || len(aLabels) == 0 || len(bLabels) == 0:
+				addPair(a, b)
+			case len(aLabels) == len(bLabels):
+				// skip: invalid union
+			case aLabels.Contains(bLabels):
+				addPair(a, b)
+			case bLabels.Contains(aLabels):
+				addPair(a, b)
+			}
+		}
+	}
+
+	// Mirror the fallback in union(): if nothing matched and both sides have
+	// exactly one value, they get combined anyway.
+	if est.unions == 0 && est.aCount == 1 && est.bCount == 1 {
+		addPair(aResults.Values[0], bResults.Values[0])
+	}
+
+	return est
+}
+
+// checkBinaryMemoryLimit estimates the output cost of a binary operation by
+// replaying the label matching logic and computing per-pair costs. Returns a
+// descriptive error if the estimate exceeds the configured memory limit.
+func (e *State) checkBinaryMemoryLimit(ar, br Results, node *parse.BinaryNode) error {
+	est := estimateBinaryMemory(ar, br)
+
+	if est.bytes <= e.MemoryLimit {
+		return nil
+	}
+
+	aVar := node.Args[0].String()
+	bVar := node.Args[1].String()
+
+	return fmt.Errorf(
+		"expression %q attempted to combine %d series from %s with %d series from %s, "+
+			"producing %d series pairs (estimated memory: %s, limit: %s). "+
+			"This usually means the label sets returned by your queries are no longer compatible. "+
+			"When labels match, this expression would produce ~%d pairs. "+
+			"Check that the queries for %s and %s return series with the same label names and values",
+		node.String(),
+		est.aCount, aVar,
+		est.bCount, bVar,
+		est.unions,
+		formatBytes(est.bytes),
+		formatBytes(e.MemoryLimit),
+		max(est.aCount, est.bCount),
+		aVar, bVar,
+	)
+}
+
+// formatBytes returns a human-readable byte size string.
+func formatBytes(b int64) string {
+	switch {
+	case b >= 1<<30:
+		return fmt.Sprintf("%.1f GiB", float64(b)/float64(1<<30))
+	case b >= 1<<20:
+		return fmt.Sprintf("%.1f MiB", float64(b)/float64(1<<20))
+	case b >= 1<<10:
+		return fmt.Sprintf("%.1f KiB", float64(b)/float64(1<<10))
+	default:
+		return fmt.Sprintf("%d B", b)
+	}
+}
+
 func (e *State) walkBinary(node *parse.BinaryNode) (Results, error) {
 	res := Results{Values: Values{}}
 	ar, err := e.walk(node.Args[0])
@@ -316,6 +497,13 @@ func (e *State) walkBinary(node *parse.BinaryNode) (Results, error) {
 	if err != nil {
 		return res, err
 	}
+
+	if e.MemoryLimit > 0 {
+		if err := e.checkBinaryMemoryLimit(ar, br, node); err != nil {
+			return res, err
+		}
+	}
+
 	unions := e.union(ar, br, node)
 	for _, uni := range unions {
 		var value Value

diff --git a/pkg/expr/mathexp/exp_memory_limit_test.go b/pkg/expr/mathexp/exp_memory_limit_test.go
new file mode 100644
index 0000000000000..65f1e09d2ed3d
--- /dev/null
+++ b/pkg/expr/mathexp/exp_memory_limit_test.go
@@ -0,0 +1,298 @@
+package mathexp
+
+import (
+	"testing"
+	"time"
+
+	"github.com/grafana/grafana-plugin-sdk-go/data"
+	"github.com/grafana/grafana/pkg/infra/tracing"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestEstimateBinaryMemory(t *testing.T) {
+	tests := []struct {
+		name          string
+		a             Results
+		b             Results
+		expectUnions  int
+		expectNonZero bool // true if bytes should be > 0
+	}{
+		{
+			name:         "empty A returns zero",
+			a:            Results{},
+			b:            Results{Values: Values{makeSeries("b", data.Labels{"id": "1"})}},
+			expectUnions: 0,
+		},
+		{
+			name: "matching labels produce 1:1 unions",
+			a: Results{Values: Values{
+				makeSeries("a", data.Labels{"id": "1"}, tp{time.Unix(1, 0), float64Pointer(1)}),
+				makeSeries("a", data.Labels{"id": "2"}, tp{time.Unix(1, 0), float64Pointer(2)}),
+			}},
+			b: Results{Values: Values{
+				makeSeries("b", data.Labels{"id": "1"}, tp{time.Unix(1, 0), float64Pointer(10)}),
+				makeSeries("b", data.Labels{"id": "2"}, tp{time.Unix(1, 0), float64Pointer(20)}),
+			}},
+			expectUnions:  2,
+			expectNonZero: true,
+		},
+		{
+			name: "non-matching labels of same length produce zero unions",
+			a: Results{Values: Values{
+				makeSeries("a", data.Labels{"a_id": "1"}),
+				makeSeries("a", data.Labels{"a_id": "2"}),
+			}},
+			b: Results{Values: Values{
+				makeSeries("b", data.Labels{"b_id": "1"}),
+				makeSeries("b", data.Labels{"b_id": "2"}),
+			}},
+			expectUnions: 0,
+		},
+		{
+			name: "empty labels on one side produce cartesian product",
+			a: Results{Values: Values{
+				makeSeries("a", data.Labels{}, tp{time.Unix(1, 0), float64Pointer(1)}),
+				makeSeries("a", data.Labels{}, tp{time.Unix(1, 0), float64Pointer(2)}),
+			}},
+			b: Results{Values: Values{
+				makeSeries("b", data.Labels{"id": "1"}, tp{time.Unix(1, 0), float64Pointer(10)}),
+				makeSeries("b", data.Labels{"id": "2"}, tp{time.Unix(1, 0), float64Pointer(20)}),
+			}},
+			expectUnions:  4,
+			expectNonZero: true,
+		},
+		{
+			name: "subset labels produce fan-out",
+			a: Results{Values: Values{
+				makeSeries("a", data.Labels{"service": "web", "endpoint": "/api"}, tp{time.Unix(1, 0), float64Pointer(1)}),
+				makeSeries("a", data.Labels{"service": "web", "endpoint": "/health"}, tp{time.Unix(1, 0), float64Pointer(2)}),
+			}},
+			b: Results{Values: Values{
+				makeSeries("b", data.Labels{"service": "web"}, tp{time.Unix(1, 0), float64Pointer(10)}),
+			}},
+			expectUnions:  2,
+			expectNonZero: true,
+		},
+		{
+			name: "single non-matching values fall back to 1 union",
+			a: Results{Values: Values{
+				makeSeries("a", data.Labels{"id": "1"}, tp{time.Unix(1, 0), float64Pointer(1)}),
+			}},
+			b: Results{Values: Values{
+				makeSeries("b", data.Labels{"id": "2"}, tp{time.Unix(1, 0), float64Pointer(10)}),
+			}},
+			expectUnions:  1,
+			expectNonZero: true,
+		},
+		{
+			name: "NoData on one side produces 1 union",
+			a:    Results{Values: Values{NewNoData()}},
+			b: Results{Values: Values{
+				makeSeries("b", data.Labels{"id": "1"}, tp{time.Unix(1, 0), float64Pointer(10)}),
+			}},
+			expectUnions:  1,
+			expectNonZero: true,
+		},
+	}
+
+	for _, tc := range tests {
+		t.Run(tc.name, func(t *testing.T) {
+			est := estimateBinaryMemory(tc.a, tc.b)
+			assert.Equal(t, tc.expectUnions, est.unions)
+			assert.Equal(t, len(tc.a.Values), est.aCount)
+			assert.Equal(t, len(tc.b.Values), est.bCount)
+			if tc.expectNonZero {
+				assert.Greater(t, est.bytes, int64(0))
+			} else {
+				assert.Equal(t, int64(0), est.bytes)
+			}
+		})
+	}
+}
+
+func TestEstimateBinaryMemory_per_pair_cost_ref
... [truncated]

← Back to Alerts View on GitHub →