Information disclosure / cross-tenant data leakage via shared cache

MEDIUM
grafana/grafana
Commit: 60ffda0f72bd
Affected: Grafana 12.4.0 and earlier (Dashvalidator metrics caching keyed by datasource UID only)
2026-05-27 12:36 UTC

Description

Vulnerability: cross-tenant information disclosure via a shared metrics cache. Before this fix, MetricsCache cached results using only the datasource UID as the key. If multiple orgs used the same datasource UID, a cache entry could be reused across orgs, allowing one org to observe another org's metrics data from the cache. The patch scopes the cache per organization by including orgID in the cache key and in the singleflight key, effectively isolating cache entries per organization. This reduces the risk of information leakage between tenants when using the dashvalidator metrics cache.

Proof of Concept

Proof-of-concept (conceptual, demonstrating cross-org leakage prior to the fix): Assumptions: - Two Grafana orgs, OrgA (id=1) and OrgB (id=2), both use the same datasource UID 'ds-uid-1'. - The dashvalidator metrics cache previously keyed only by datasource UID, so a single cache entry could be shared between OrgA and OrgB. - The metrics provider returns different data per org (to illustrate leakage). Steps: 1) OrgA requests metrics for ds-uid-1 -> cache miss -> provider returns data for OrgA (e.g., ["metric_a", "metric_b", "orgA_sensitive"]). 2) OrgB requests metrics for ds-uid-1 -> cache hit on key 'ds-uid-1' -> OrgB receives OrgA's cached metrics (e.g., ["metric_a", "metric_b", "orgA_sensitive"]) instead of OrgB-specific data. Impact: OrgB gains visibility into OrgA's metrics data cached for the same datasource UID. Code (illustrative, not the Grafana codebase): package main import ( "fmt" ) type MetricsCacheOld struct { entries map[string][]string // key: datasourceUID only } type MetricsCacheNew struct { entries map[string][]string // key: "{orgID}/{datasourceUID}" } func (c *MetricsCacheOld) Get(orgID int, dsUID string, provider *MockProvider) []string { key := dsUID // old behavior: key by UID only if v, ok := c.entries[key]; ok { return v } data := provider.GetMetrics(orgID, dsUID) c.entries[key] = data return data } func (c *MetricsCacheNew) Get(orgID int, dsUID string, provider *MockProvider) []string { key := fmt.Sprintf("%d/%s", orgID, dsUID) // new behavior: key by orgID/datasourceUID if v, ok := c.entries[key]; ok { return v } data := provider.GetMetrics(orgID, dsUID) c.entries[key] = data return data } type MockProvider struct{} func (p *MockProvider) GetMetrics(orgID int, dsUID string) []string { // Returns data that differs per org to illustrate leakage in old cache if orgID == 1 { return []string{"metric_a", "metric_b", "orgA_sensitive"} } return []string{"metric_a", "metric_b", "orgB_sensitive"} } func main() { provider := &MockProvider{} oldCache := &MetricsCacheOld{entries: make(map[string][]string)} newCache := &MetricsCacheNew{entries: make(map[string][]string)} // OrgA fetches data dataOrgA := oldCache.Get(1, "ds-uid-1", provider) fmt.Println("OrgA (old):", dataOrgA) // OrgB fetches data with the same UID (old cache would leak OrgA's data) dataOrgBOld := oldCache.Get(2, "ds-uid-1", provider) fmt.Println("OrgB (old, potential leak):", dataOrgBOld) // New cache should isolate by org; OrgB gets OrgB data dataOrgANew := newCache.Get(1, "ds-uid-1", provider) fmt.Println("OrgA (new):", dataOrgANew) dataOrgBNew := newCache.Get(2, "ds-uid-1", provider) fmt.Println("OrgB (new):", dataOrgBNew) }

Commit Details

Author: Alexa Vargas

Date: 2026-05-27 11:35 UTC

Message:

Suggested Dashboards: `dashvalidator` scope metrics cache entries by org (#125514) `dashboardValidatorApp` scope metrics cache entries by org

Triage Assessment

Vulnerability Type: Information disclosure

Confidence: MEDIUM

Reasoning:

The changes scope the metrics cache by organization (orgID) rather than per datasource UID alone. This prevents potential cross-tenant data leakage by ensuring cached metrics are isolated per organization, reducing information disclosure risk between orgs.

Verification Assessment

Vulnerability Type: Information disclosure / cross-tenant data leakage via shared cache

Confidence: MEDIUM

Affected Versions: Grafana 12.4.0 and earlier (Dashvalidator metrics caching keyed by datasource UID only)

Code Diff

diff --git a/apps/dashvalidator/pkg/app/app.go b/apps/dashvalidator/pkg/app/app.go index 9cf34cb234648..79f4e0ffdbb77 100644 --- a/apps/dashvalidator/pkg/app/app.go +++ b/apps/dashvalidator/pkg/app/app.go @@ -333,6 +333,7 @@ func handleCheckRoute( Name: name, URL: ds.URL, HTTPClient: httpClient, // Pass authenticated client + OrgID: orgID, }) dsLogger.Debug("Datasource configured successfully for validation") diff --git a/apps/dashvalidator/pkg/cache/cache.go b/apps/dashvalidator/pkg/cache/cache.go index dc5999ad0d814..f80b1b9364f31 100644 --- a/apps/dashvalidator/pkg/cache/cache.go +++ b/apps/dashvalidator/pkg/cache/cache.go @@ -28,15 +28,15 @@ type cacheEntry struct { } // MetricsCache provides TTL-based caching for metrics fetched from datasources. -// It caches results per datasource UID and runs background cleanup +// It caches results per (orgID, datasource UID) pair and runs background cleanup // to remove expired entries. // Providers are registered via RegisterProvider and looked up by datasource type. // Implements app.Runnable via Run() to manage the cleanup goroutine lifecycle. type MetricsCache struct { mu sync.RWMutex - entries map[string]*cacheEntry // key: datasourceUID + entries map[string]*cacheEntry // key: "{orgID}/{datasourceUID}" providers map[string]MetricsProvider // key: datasource type (e.g., "prometheus") - sf singleflight.Group // coalesces concurrent fetches for the same datasourceUID + sf singleflight.Group // coalesces concurrent fetches for the same (orgID, datasourceUID) pair } // NewMetricsCache creates a new MetricsCache. @@ -81,11 +81,13 @@ func (c *MetricsCache) Run(ctx context.Context) error { // The provider is looked up from the registered providers by datasource type. // On cache hit (non-expired entry), returns cached metrics immediately. // On cache miss or expiration, fetches from provider and caches the result. -func (c *MetricsCache) GetMetrics(ctx context.Context, dsType, datasourceUID, datasourceURL string, +func (c *MetricsCache) GetMetrics(ctx context.Context, orgID int64, dsType, datasourceUID, datasourceURL string, client *http.Client) ([]string, error) { + cacheKey := fmt.Sprintf("%d/%s", orgID, datasourceUID) + // Check cache first (read lock) c.mu.RLock() - cached, exists := c.entries[datasourceUID] + cached, exists := c.entries[cacheKey] provider := c.providers[dsType] c.mu.RUnlock() @@ -99,10 +101,10 @@ func (c *MetricsCache) GetMetrics(ctx context.Context, dsType, datasourceUID, da } // Cache miss or expired — use singleflight to coalesce concurrent fetches - v, err, _ := c.sf.Do(datasourceUID, func() (any, error) { + v, err, _ := c.sf.Do(cacheKey, func() (any, error) { // Re-check cache: another goroutine may have populated it while we waited c.mu.RLock() - cached, exists := c.entries[datasourceUID] + cached, exists := c.entries[cacheKey] c.mu.RUnlock() if exists && time.Now().Before(cached.expiresAt) { return cached.metrics, nil @@ -116,7 +118,7 @@ func (c *MetricsCache) GetMetrics(ctx context.Context, dsType, datasourceUID, da // Don't cache results with zero TTL if result.TTL > 0 { c.mu.Lock() - c.entries[datasourceUID] = &cacheEntry{ + c.entries[cacheKey] = &cacheEntry{ metrics: result.Metrics, metricsSet: toMetricsSet(result.Metrics), expiresAt: time.Now().Add(result.TTL), @@ -139,11 +141,13 @@ func (c *MetricsCache) GetMetrics(ctx context.Context, dsType, datasourceUID, da // // NOTE: This method delegates to GetMetrics for fetching/caching logic. // If GetMetrics' caching behavior changes, this method must be updated in lockstep. -func (c *MetricsCache) GetMetricsSet(ctx context.Context, dsType, datasourceUID, datasourceURL string, +func (c *MetricsCache) GetMetricsSet(ctx context.Context, orgID int64, dsType, datasourceUID, datasourceURL string, client *http.Client) (map[string]bool, error) { + cacheKey := fmt.Sprintf("%d/%s", orgID, datasourceUID) + // Check cache — return the pre-built set on hit c.mu.RLock() - cached, exists := c.entries[datasourceUID] + cached, exists := c.entries[cacheKey] c.mu.RUnlock() if exists && time.Now().Before(cached.expiresAt) { @@ -151,14 +155,14 @@ func (c *MetricsCache) GetMetricsSet(ctx context.Context, dsType, datasourceUID, } // Delegate to GetMetrics to handle provider lookup, fetching, and caching - metrics, err := c.GetMetrics(ctx, dsType, datasourceUID, datasourceURL, client) + metrics, err := c.GetMetrics(ctx, orgID, dsType, datasourceUID, datasourceURL, client) if err != nil { return nil, err } // Re-check cache — GetMetrics will have stored the entry (with metricsSet) if TTL > 0 c.mu.RLock() - cached, exists = c.entries[datasourceUID] + cached, exists = c.entries[cacheKey] c.mu.RUnlock() if exists && time.Now().Before(cached.expiresAt) { diff --git a/apps/dashvalidator/pkg/cache/cache_test.go b/apps/dashvalidator/pkg/cache/cache_test.go index dcb17f72d8fb9..b03740477ce05 100644 --- a/apps/dashvalidator/pkg/cache/cache_test.go +++ b/apps/dashvalidator/pkg/cache/cache_test.go @@ -12,15 +12,28 @@ import ( "github.com/stretchr/testify/require" ) -// mockProvider implements MetricsProvider for testing -type mockProvider struct { +// mockProviderBase holds the synchronized call-counter shared by all test providers. +type mockProviderBase struct { mu sync.Mutex callCount int - metrics []string - ttl time.Duration - err error } +func (m *mockProviderBase) getCallCount() int { + m.mu.Lock() + defer m.mu.Unlock() + return m.callCount +} + +// mockProvider implements MetricsProvider for testing +type mockProvider struct { + mockProviderBase + metrics []string + ttl time.Duration + err error +} + +var _ MetricsProvider = (*mockProvider)(nil) + func (m *mockProvider) GetMetrics(ctx context.Context, datasourceUID, datasourceURL string, client *http.Client) (*MetricsResult, error) { m.mu.Lock() @@ -35,12 +48,6 @@ func (m *mockProvider) GetMetrics(ctx context.Context, datasourceUID, datasource }, nil } -func (m *mockProvider) getCallCount() int { - m.mu.Lock() - defer m.mu.Unlock() - return m.callCount -} - // setupTest creates a new MetricsCache and registers a mock provider for "test" type func setupTest(mockProv *mockProvider) *MetricsCache { cache := NewMetricsCache() @@ -59,7 +66,7 @@ func TestMetricsCache_CacheMiss_FetchesFromProvider(t *testing.T) { } cache := setupTest(mockProv) - metrics, err := cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + metrics, err := cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.ElementsMatch(t, []string{"metric_a", "metric_b"}, metrics) @@ -74,12 +81,12 @@ func TestMetricsCache_CacheHit_DoesNotFetchAgain(t *testing.T) { cache := setupTest(mockProv) // First call - cache miss - metrics1, err := cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + metrics1, err := cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.Equal(t, 1, mockProv.getCallCount()) // Second call - cache hit - metrics2, err := cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + metrics2, err := cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.ElementsMatch(t, metrics1, metrics2) require.Equal(t, 1, mockProv.getCallCount()) // Still 1, no new call @@ -93,17 +100,17 @@ func TestMetricsCache_DifferentDatasources_SeparateCacheEntries(t *testing.T) { cache := setupTest(mockProv) // First datasource - _, err := cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom1:9090", nil) + _, err := cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom1:9090", nil) require.NoError(t, err) require.Equal(t, 1, mockProv.getCallCount()) // Second datasource - separate cache entry - _, err = cache.GetMetrics(context.Background(), "test", "ds-uid-2", "http://prom2:9090", nil) + _, err = cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-2", "http://prom2:9090", nil) require.NoError(t, err) require.Equal(t, 2, mockProv.getCallCount()) // First datasource again - cache hit - _, err = cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom1:9090", nil) + _, err = cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom1:9090", nil) require.NoError(t, err) require.Equal(t, 2, mockProv.getCallCount()) // Still 2 } @@ -120,7 +127,7 @@ func TestMetricsCache_ExpiredEntry_FetchesAgain(t *testing.T) { cache := setupTest(mockProv) // First call - _, err := cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + _, err := cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.Equal(t, 1, mockProv.getCallCount()) @@ -128,7 +135,7 @@ func TestMetricsCache_ExpiredEntry_FetchesAgain(t *testing.T) { time.Sleep(20 * time.Millisecond) // Second call after expiration - _, err = cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + _, err = cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.Equal(t, 2, mockProv.getCallCount()) // New call made } @@ -141,12 +148,12 @@ func TestMetricsCache_ZeroTTL_DoesNotCache(t *testing.T) { cache := setupTest(mockProv) // First call - _, err := cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + _, err := cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.Equal(t, 1, mockProv.getCallCount()) // Second call - should fetch again since zero TTL means no caching - _, err = cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + _, err = cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.Equal(t, 2, mockProv.getCallCount()) } @@ -162,7 +169,7 @@ func TestMetricsCache_ProviderError_ReturnsError(t *testing.T) { } cache := setupTest(mockProv) - metrics, err := cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + metrics, err := cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.Error(t, err) require.ErrorIs(t, err, providerErr) @@ -178,7 +185,7 @@ func TestMetricsCache_ProviderError_DoesNotCache(t *testing.T) { cache := setupTest(mockProv) // First call - error - _, err := cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + _, err := cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.Error(t, err) require.Equal(t, 1, mockProv.getCallCount()) @@ -190,7 +197,7 @@ func TestMetricsCache_ProviderError_DoesNotCache(t *testing.T) { mockProv.mu.Unlock() // Second call - should fetch again (error not cached) - metrics, err := cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + metrics, err := cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.NotNil(t, metrics) require.Equal(t, 2, mockProv.getCallCount()) @@ -220,7 +227,7 @@ func TestMetricsCache_ConcurrentAccess_ThreadSafe(t *testing.T) { if idx%2 == 0 { uid = "ds-uid-2" } - metrics, err := cache.GetMetrics(context.Background(), "test", uid, "http://prom:9090", nil) + metrics, err := cache.GetMetrics(context.Background(), int64(1), "test", uid, "http://prom:9090", nil) if err != nil { errCount.Add(1) return @@ -254,13 +261,13 @@ func TestMetricsCache_CleanupRemovesExpiredEntries(t *testing.T) { cache := setupTest(mockProv) // Populate cache - _, err := cache.GetMetrics(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + _, err := cache.GetMetrics(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.Equal(t, 1, mockProv.getCallCount()) // Verify entry exists cache.mu.RLock() - _, exists := cache.entries["ds-uid-1"] + _, exists := cache.entries["1/ds-uid-1"] cache.mu.RUnlock() require.True(t, exists) @@ -272,7 +279,7 @@ func TestMetricsCache_CleanupRemovesExpiredEntries(t *testing.T) { // Verify entry was removed cache.mu.RLock() - _, exists = cache.entries["ds-uid-1"] + _, exists = cache.entries["1/ds-uid-1"] cache.mu.RUnlock() require.False(t, exists) } @@ -335,7 +342,7 @@ func TestMetricsCache_GetMetricsSet_CacheMiss_ReturnsSet(t *testing.T) { } cache := setupTest(mockProv) - set, err := cache.GetMetricsSet(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + set, err := cache.GetMetricsSet(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.Len(t, set, 2) @@ -353,12 +360,12 @@ func TestMetricsCache_GetMetricsSet_CacheHit_ReturnsCachedSet(t *testing.T) { cache := setupTest(mockProv) // First call - cache miss - set1, err := cache.GetMetricsSet(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + set1, err := cache.GetMetricsSet(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.Equal(t, 1, mockProv.getCallCount()) // Second call - cache hit - set2, err := cache.GetMetricsSet(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + set2, err := cache.GetMetricsSet(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.Equal(t, 1, mockProv.getCallCount()) // Still 1, no new call require.Equal(t, set1, set2) @@ -373,7 +380,7 @@ func TestMetricsCache_GetMetricsSet_ZeroTTL_BuildsSetOnTheFly(t *testing.T) { // First call - provider returns TTL=0, so GetMetrics skips caching. // GetMetricsSet falls through to toMetricsSet() on the returned slice. - set, err := cache.GetMetricsSet(context.Background(), "test", "ds-uid-1", "http://prom:9090", nil) + set, err := cache.GetMetricsSet(context.Background(), int64(1), "test", "ds-uid-1", "http://prom:9090", nil) require.NoError(t, err) require.Len(t, set, 2) require.True(t, set["metric_a"]) @@ -381,7 +388,7 @@ func TestMetricsCache_GetMetricsSet_ZeroTTL_BuildsSetOnTheFly(t *testing.T) { require.Equal(t, 1, mockProv.getCallCount()) // Second call - no cache entry exists, so provider is called again - set2, err := cache.GetMetricsSet(contex ... [truncated]
← Back to Alerts View on GitHub →