Information disclosure / secret retention

HIGH
grafana/grafana
Commit: e2dd95e55f02
Affected: Before 12.4.0 (all 12.x releases prior to this patch)
2026-05-06 19:10 UTC

Description

The commit implements a hardening of the Secrets GC workflow by introducing a maximum number of GC attempts for a secure value and deleting the value after exceeding that limit. This reduces the risk that secret material remains in storage due to repeated GC failures, addressing potential information disclosure risk from long-lived secrets. It also changes the data model (DeleteInput) and storage interface to support batch deletes and per-value GC attempt counters, and logs the deletions.

Proof of Concept

// Proof-of-concept (Go) demonstrating deletion of a secure value after exceeding GC retry attempts // Prerequisites: run Grafana with SecretsManagement.GCWorkerMaxAttemptsPerSecureValue set to a small number (e.g., 2) and the GC worker enabled // This PoC mirrors the pattern in the tests where a GC cycle is forced to fail and then the value is deleted after N attempts. package main import ( "context" "testing" "time" contracts "github.com/grafana/grafana/pkg/registry/apis/secret/contracts" secretv1beta1 "github.com/grafana/grafana/pkg/registry/apis/secret/v1beta1" xkube "k8s.io/apimachinery/pkg/apis/meta/v1" "github.com/stretchr/testify/require" ) // This is a high-level outline; in practice you would wire this into Grafana's test harness used by the repo. func PoC_ShouldDeleteSecureValueAfterMaxGCAttempts(t *testing.T) { // Setup: create test harness with GCWorkerMaxAttemptsPerSecureValue = 2 sut := setupTestHarness(t, withGCMaxAttempts(2)) // Create two secure values sv1 and sv2 sv1, err := sut.CreateSv(t.Context(), func(cfg *testutils.CreateSvConfig) { cfg.Sv.Name = "sv1" }) require.NoError(t, err) sv2, err := sut.CreateSv(t.Context(), func(cfg *testutils.CreateSvConfig) { cfg.Sv.Name = "sv2" }) require.NoError(t, err) // Delete sv1 to mark it inactive _, err = sut.DeleteSv(t.Context(), sv1.Namespace, sv1.Name) require.NoError(t, err) // Advance time and run GC cycles | force N-1 failures for i := 0; i < 2; i++ { // assuming GCWorkerMaxAttemptsPerSecureValue == 2 sut.Clock.AdvanceBy(10 * time.Minute) _, err := sut.GarbageCollectionWorker.CleanupInactiveSecureValues(t.Context()) require.NoError(t, err) } // Final GC cycle to trigger deletion after max attempts _, err = sut.GarbageCollectionWorker.CleanupInactiveSecureValues(t.Context()) require.NoError(t, err) // Verify sv1 has been deleted from metadata storage and cannot be read anymore _, readErr := sut.SecureValueService.Read(t.Context(), xkube.Namespace(sv1.Namespace), sv1.Name) require.Error(t, readErr) // Ensure sv2 remains intact sv2Read, readErr := sut.SecureValueService.Read(t.Context(), xkube.Namespace(sv2.Namespace), sv2.Name) require.NoError(t, readErr) require.Equal(t, sv2.Namespace, sv2Read.Namespace) require.Equal(t, sv2.Name, sv2Read.Name) } // Note: The actual repo test furnishes a concrete fakeKeeper and more precise setup, which this PoC mirrors conceptually.

Commit Details

Author: Bruno

Date: 2026-05-06 18:50 UTC

Message:

Secrets: GC worker deletes secure value after N attempts to clean it up (#119512) * wip * remove new dlq table / delete secure value after retrying N times / increase gc worker lock duration * fix metrics / regenerate queries * rename * spelling * log secure value ids being deleted * log as error / ensure secure values are inactive when updating gc attempts count * update leaseTTL / add exp backoff betwen retries / update gc_worker_max_attempts_per_secure_value * bulk delete * remove delete by id queries / worker calls .Delete * update tests * add lease_duration column to secret_secure_value / add test for lease duration * default gc_worker_max_attempts_per_secure_value to 10 * updated ModelSecureValue to include gc attempts and lease duration * trying to avoid sqlite's "database is locked" error * wording * wording * delete secure value as soon as GCWorkerMaxAttemptsPerSecureValue is reached

Triage Assessment

Vulnerability Type: Information disclosure

Confidence: HIGH

Reasoning:

The commit adds a mechanism to delete secure values after a maximum number of GC attempts and logs these deletions. This hardens handling of secret material by ensuring secrets are purged from storage after retries, reducing the risk of secret exposure or retention beyond intended lifecycle.

Verification Assessment

Vulnerability Type: Information disclosure / secret retention

Confidence: HIGH

Affected Versions: Before 12.4.0 (all 12.x releases prior to this patch)

Code Diff

diff --git a/pkg/registry/apis/secret/contracts/secure_value.go b/pkg/registry/apis/secret/contracts/secure_value.go index 8533f4c64aa0c..a7501e3d49721 100644 --- a/pkg/registry/apis/secret/contracts/secure_value.go +++ b/pkg/registry/apis/secret/contracts/secure_value.go @@ -31,6 +31,12 @@ type ReadOpts struct { ForUpdate bool } +type DeleteInput struct { + Namespace xkube.Namespace + Name string + Version int64 +} + // SecureValueMetadataStorage is the interface for wiring and dependency injection. type SecureValueMetadataStorage interface { Create(ctx context.Context, keeper string, sv *secretv1beta1.SecureValue, actorUID string) (*secretv1beta1.SecureValue, error) @@ -39,9 +45,10 @@ type SecureValueMetadataStorage interface { SetVersionToActive(ctx context.Context, namespace xkube.Namespace, name string, version int64) error SetVersionToInactive(ctx context.Context, namespace xkube.Namespace, name string, version int64) error SetExternalID(ctx context.Context, namespace xkube.Namespace, name string, version int64, externalID ExternalID) error - Delete(ctx context.Context, namespace xkube.Namespace, name string, version int64) error + Delete(ctx context.Context, input []DeleteInput) error SetInactiveAllFromGroup(ctx context.Context, namespace xkube.Namespace, apiGroup string) error LeaseInactiveSecureValues(ctx context.Context, maxBatchSize uint16) ([]secretv1beta1.SecureValue, error) + AddGCAttemptCount(ctx context.Context, secureValueIDs []string) (map[string]int, error) } type SecureValueService interface { diff --git a/pkg/registry/apis/secret/garbagecollectionworker/worker.go b/pkg/registry/apis/secret/garbagecollectionworker/worker.go index 2fab756216352..646673a4088f4 100644 --- a/pkg/registry/apis/secret/garbagecollectionworker/worker.go +++ b/pkg/registry/apis/secret/garbagecollectionworker/worker.go @@ -66,6 +66,7 @@ func (w *Worker) CleanupInactiveSecureValues(ctx context.Context) ([]secretv1bet if err != nil { return nil, fmt.Errorf("fetching inactive secure values that need to be cleaned up: %w", err) } + if len(secureValues) == 0 { return nil, nil } @@ -89,6 +90,45 @@ func (w *Worker) CleanupInactiveSecureValues(ctx context.Context) ([]secretv1bet wg.Wait() + secureValuesWithError := make([]secretv1beta1.SecureValue, 0) + for i, sv := range secureValues { + if errs[i] == nil { + continue + } + secureValuesWithError = append(secureValuesWithError, sv) + } + + if len(secureValuesWithError) > 0 { + ids := make([]string, 0, len(secureValuesWithError)) + for _, sv := range secureValuesWithError { + ids = append(ids, string(sv.UID)) + } + + counts, err := w.secureValueMetadataStorage.AddGCAttemptCount(ctx, ids) + if err != nil { + return secureValues, errors.Join(append(errs, fmt.Errorf("incrementing gc retry count for secure values: %w", err))...) + } + + if len(counts) > 0 { + // Delete secure values that exceeed max retries + deleteInput := make([]contracts.DeleteInput, 0, len(secureValuesWithError)) + for _, sv := range secureValuesWithError { + if counts[string(sv.UID)] >= w.Cfg.SecretsManagement.GCWorkerMaxAttemptsPerSecureValue { + deleteInput = append(deleteInput, contracts.DeleteInput{ + Namespace: xkube.Namespace(sv.Namespace), + Name: sv.Name, + Version: sv.Status.Version, + }) + } + } + + logging.FromContext(ctx).Error("deleting secure values that gc worker is unable to clean up after retrying", "deleteInput", deleteInput) + if err := w.secureValueMetadataStorage.Delete(ctx, deleteInput); err != nil { + return secureValues, errors.Join(append(errs, fmt.Errorf("deleting secure values: %w", err))...) + } + } + } + return secureValues, errors.Join(errs...) } @@ -112,7 +152,9 @@ func (w *Worker) Cleanup(ctx context.Context, sv *secretv1beta1.SecureValue) err } // Metadata deletion is not idempotent but not found errors are ignored - if err := w.secureValueMetadataStorage.Delete(ctx, xkube.Namespace(sv.Namespace), sv.Name, sv.Status.Version); err != nil && !errors.Is(err, contracts.ErrSecureValueNotFound) { + if err := w.secureValueMetadataStorage.Delete(ctx, []contracts.DeleteInput{{ + Namespace: xkube.Namespace(sv.Namespace), Name: sv.Name, Version: sv.Status.Version, + }}); err != nil && !errors.Is(err, contracts.ErrSecureValueNotFound) { return fmt.Errorf("deleting secure value from metadata storage: %w", err) } diff --git a/pkg/registry/apis/secret/garbagecollectionworker/worker_test.go b/pkg/registry/apis/secret/garbagecollectionworker/worker_test.go index cdc94d1ba4df5..f6c5fb9dcc795 100644 --- a/pkg/registry/apis/secret/garbagecollectionworker/worker_test.go +++ b/pkg/registry/apis/secret/garbagecollectionworker/worker_test.go @@ -1,6 +1,8 @@ package garbagecollectionworker_test import ( + "context" + "fmt" "testing" "time" @@ -64,7 +66,7 @@ func TestBasic(t *testing.T) { require.NoError(t, err) // Advance time to wait for grace period - sut.Clock.AdvanceBy(10 * time.Minute) + sut.Clock.AdvanceBy(11 * time.Minute) svs, err := sut.GarbageCollectionWorker.CleanupInactiveSecureValues(t.Context()) require.NoError(t, err) @@ -122,6 +124,59 @@ func TestBasic(t *testing.T) { require.NoError(t, err) require.NoError(t, sut.GarbageCollectionWorker.Cleanup(t.Context(), sv)) }) + + t.Run("worker deletes secure value metadata after N failed attempts to clean it up", func(t *testing.T) { + sut := testutils.Setup(t, testutils.WithMutateCfg(func(cfg *testutils.SetupConfig) { + cfg.SystemKeeperWrapperFunc = func(k contracts.Keeper) contracts.Keeper { + return &fakeKeeper{inner: k} + } + })) + + sv1, err := sut.CreateSv(t.Context(), func(cfg *testutils.CreateSvConfig) { + cfg.Sv.Name = "sv1" + }) + require.NoError(t, err) + sv2, err := sut.CreateSv(t.Context(), func(cfg *testutils.CreateSvConfig) { + cfg.Sv.Name = "sv2" + }) + require.NoError(t, err) + + _, err = sut.DeleteSv(t.Context(), sv1.Namespace, sv1.Name) + require.NoError(t, err) + + for range sut.GarbageCollectionWorker.Cfg.SecretsManagement.GCWorkerMaxAttemptsPerSecureValue { + // Advance time to wait for grace period + sut.Clock.AdvanceBy(10 * time.Minute) + + _, _ = sut.GarbageCollectionWorker.CleanupInactiveSecureValues(t.Context()) + } + + // No more secure values to clean up + svs, err := sut.GarbageCollectionWorker.CleanupInactiveSecureValues(t.Context()) + require.NoError(t, err) + require.Empty(t, svs) + + // Ensure unrelated secure values have not been deleted + sv2Read, err := sut.SecureValueService.Read(t.Context(), xkube.Namespace(sv2.Namespace), sv2.Name) + require.NoError(t, err) + require.Equal(t, sv2.Namespace, sv2Read.Namespace) + require.Equal(t, sv2.Name, sv2Read.Name) + }) +} + +type fakeKeeper struct{ inner contracts.Keeper } + +func (k *fakeKeeper) Store(ctx context.Context, cfg secretv1beta1.KeeperConfig, namespace xkube.Namespace, name string, version int64, exposedValueOrRef string) (contracts.ExternalID, error) { + return k.inner.Store(ctx, cfg, namespace, name, version, exposedValueOrRef) +} +func (k *fakeKeeper) Expose(ctx context.Context, cfg secretv1beta1.KeeperConfig, namespace xkube.Namespace, name string, version int64) (secretv1beta1.ExposedSecureValue, error) { + return k.inner.Expose(ctx, cfg, namespace, name, version) +} +func (k *fakeKeeper) RetrieveReference(ctx context.Context, cfg secretv1beta1.KeeperConfig, ref string) (secretv1beta1.ExposedSecureValue, error) { + return k.inner.RetrieveReference(ctx, cfg, ref) +} +func (k *fakeKeeper) Delete(ctx context.Context, cfg secretv1beta1.KeeperConfig, namespace xkube.Namespace, name string, version int64) error { + return fmt.Errorf("Delete: fake error") } func TestGCDoesNotDeleteInFlightVersion(t *testing.T) { @@ -132,7 +187,7 @@ func TestGCDoesNotDeleteInFlightVersion(t *testing.T) { require.NoError(t, err) // Advance time to wait for grace period - sut.Clock.AdvanceBy(10 * time.Minute) + sut.Clock.AdvanceBy(15 * time.Minute) // Create from metadata storage directly to insert v2 as *inactive* // Here we do not call SetVersionToActive explicitly to simulate the in-flight window @@ -236,8 +291,8 @@ func TestProperty(t *testing.T) { }, "cleanup": func(t *rapid.T) { // Taken from secureValueMetadataStorage.acquireLeases - minAge := 300 * time.Second - leaseTTL := 30 * time.Second + minAge := 10 * time.Minute + leaseTTL := 5 * time.Minute maxBatchSize := sut.GarbageCollectionWorker.Cfg.SecretsManagement.GCWorkerMaxBatchSize modelDeleted, modelErr := model.CleanupInactiveSecureValues(sut.Clock.Now(), minAge, leaseTTL, maxBatchSize) deleted, err := sut.GarbageCollectionWorker.CleanupInactiveSecureValues(t.Context()) @@ -254,7 +309,7 @@ func TestProperty(t *testing.T) { } }, "advanceTime": func(t *rapid.T) { - duration := time.Duration(rapid.IntRange(1, 10).Draw(t, "minutes")) * time.Minute + duration := time.Duration(rapid.IntRange(1, 60).Draw(t, "minutes")) * time.Minute sut.Clock.AdvanceBy(duration) }, }) diff --git a/pkg/registry/apis/secret/secretkeeper/sqlkeeper/keeper_test.go b/pkg/registry/apis/secret/secretkeeper/sqlkeeper/keeper_test.go index 251f2b1a9f8b9..04ed813e7e572 100644 --- a/pkg/registry/apis/secret/secretkeeper/sqlkeeper/keeper_test.go +++ b/pkg/registry/apis/secret/secretkeeper/sqlkeeper/keeper_test.go @@ -31,27 +31,27 @@ func Test_SQLKeeperSetup(t *testing.T) { t.Run("storing an encrypted value returns no error", func(t *testing.T) { sut := testutils.Setup(t) - _, err := sut.SQLKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) + _, err := sut.SystemKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) require.NoError(t, err) - _, err = sut.SQLKeeper.Store(t.Context(), keeperCfg, namespace2, name2, version1, plaintext2) + _, err = sut.SystemKeeper.Store(t.Context(), keeperCfg, namespace2, name2, version1, plaintext2) require.NoError(t, err) t.Run("expose the encrypted value from existing namespace", func(t *testing.T) { sut := testutils.Setup(t) - _, err := sut.SQLKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) + _, err := sut.SystemKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) require.NoError(t, err) - exposedVal1, err := sut.SQLKeeper.Expose(t.Context(), keeperCfg, namespace1, name1, version1) + exposedVal1, err := sut.SystemKeeper.Expose(t.Context(), keeperCfg, namespace1, name1, version1) require.NoError(t, err) require.NotNil(t, exposedVal1) assert.Equal(t, plaintext1, exposedVal1.DangerouslyExposeAndConsumeValue()) - _, err = sut.SQLKeeper.Store(t.Context(), keeperCfg, namespace2, name2, version1, plaintext2) + _, err = sut.SystemKeeper.Store(t.Context(), keeperCfg, namespace2, name2, version1, plaintext2) require.NoError(t, err) - exposedVal2, err := sut.SQLKeeper.Expose(t.Context(), keeperCfg, namespace2, name2, version1) + exposedVal2, err := sut.SystemKeeper.Expose(t.Context(), keeperCfg, namespace2, name2, version1) require.NoError(t, err) require.NotNil(t, exposedVal2) assert.Equal(t, plaintext2, exposedVal2.DangerouslyExposeAndConsumeValue()) @@ -60,14 +60,14 @@ func Test_SQLKeeperSetup(t *testing.T) { t.Run("expose encrypted value from different namespace returns error", func(t *testing.T) { sut := testutils.Setup(t) - _, err = sut.SQLKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) + _, err = sut.SystemKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) require.NoError(t, err) - exposedVal, err := sut.SQLKeeper.Expose(t.Context(), keeperCfg, namespace2, name1, version1) + exposedVal, err := sut.SystemKeeper.Expose(t.Context(), keeperCfg, namespace2, name1, version1) require.Error(t, err) assert.Empty(t, exposedVal) - exposedVal, err = sut.SQLKeeper.Expose(t.Context(), keeperCfg, namespace1, name2, version1) + exposedVal, err = sut.SystemKeeper.Expose(t.Context(), keeperCfg, namespace1, name2, version1) require.Error(t, err) assert.Empty(t, exposedVal) }) @@ -76,27 +76,27 @@ func Test_SQLKeeperSetup(t *testing.T) { t.Run("storing same value in same namespace returns error", func(t *testing.T) { sut := testutils.Setup(t) - _, err := sut.SQLKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) + _, err := sut.SystemKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) require.NoError(t, err) - _, err = sut.SQLKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) + _, err = sut.SystemKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) require.NotNil(t, err) }) t.Run("storing same value in different namespace returns no error", func(t *testing.T) { sut := testutils.Setup(t) - _, err := sut.SQLKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) + _, err := sut.SystemKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) require.NoError(t, err) - _, err = sut.SQLKeeper.Store(t.Context(), keeperCfg, namespace2, name1, version1, plaintext1) + _, err = sut.SystemKeeper.Store(t.Context(), keeperCfg, namespace2, name1, version1, plaintext1) require.NoError(t, err) }) t.Run("exposing non existing values returns error", func(t *testing.T) { sut := testutils.Setup(t) - exposedVal, err := sut.SQLKeeper.Expose(t.Context(), keeperCfg, namespace1, "non_existing_name", version1) + exposedVal, err := sut.SystemKeeper.Expose(t.Context(), keeperCfg, namespace1, "non_existing_name", version1) require.Error(t, err) assert.Empty(t, exposedVal) }) @@ -104,22 +104,22 @@ func Test_SQLKeeperSetup(t *testing.T) { t.Run("deleting an existing encrypted value does not return error", func(t *testing.T) { sut := testutils.Setup(t) - _, err := sut.SQLKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) + _, err := sut.SystemKeeper.Store(t.Context(), keeperCfg, namespace1, name1, version1, plaintext1) require.NoError(t, err) - exposedVal, err := sut.SQLKeeper.Expose(t.Context(), keeperCfg, namespace1, name1, version1) + exposedVal, err := sut.SystemKeeper.Expose(t.Context(), keeperCfg, namespace1, name1, version1) require.NoError(t, err) assert.NotNil(t, exposedVal) assert.Equal(t, plaintext1, exposedVal.DangerouslyExposeAndConsumeValue()) - err = sut.SQLKeeper.Delete(t.Context(), keeperCfg, namespace1, name1, version1) + err = sut.SystemKeeper.Delete(t.Context(), keeperCfg, namespace1, name1, version1) require.NoError(t, err) }) t.Run("deleting an non existing encrypted value does not return error", func(t *testing.T) { sut := testutils.Setup(t) - err := sut.SQLKeeper.Delete(t.Context(), keeperCfg, ... [truncated]
← Back to Alerts View on GitHub →