Race Condition

MEDIUM
grafana/grafana
Commit: b30779005b66
Affected: Pre-12.4.0 Grafana 12.x releases; the fix is included in 12.4.0 and later.
2026-04-30 16:25 UTC

Description

The commit adds a server-wide advisory lock around OpenFGA migrations to serialize them with Grafana's main migrator. OpenFGA's goose migrator opens its own DB connection and lacks cross-process locking, so concurrent pod startups could race to create/modify the database schema during migrations. By acquiring an advisory lock per database before running OpenFGA migrations and releasing it afterward, the change prevents concurrent migrations from interleaving, reducing the risk of an inconsistent or partially-initialized schema that could lead to security misconfigurations (e.g., incorrect access controls or initialization). This is a correctness/consistency fix with potential security implications rather than a traditional vulnerability patch.

Proof of Concept

PoC scenario to demonstrate the race condition without the fix: 1) Environment setup (PostgreSQL assumed): - Create a database for Grafana/OpenFGA migrations (e.g., grafana_migs). - Ensure two independent app instances (or two concurrent migration runners) can reach the same database. - Prepare a minimal migration that performs a non-idempotent DDL operation (e.g., creating a table) without using IF NOT EXISTS. 2) Simulate concurrent migrations (without advisory locking): - In two separate shells (or two container pods), run the same migration sequence at nearly the same time. For example, both execute: BEGIN; CREATE TABLE openfga_demo_migration (id SERIAL PRIMARY KEY); COMMIT; - Observe the race: the first transaction commits and creates the table; the second transaction, depending on timing, may fail with an error like "relation \"openfga_demo_migration\" already exists" or may sometimes succeed depending on isolation level and the migration framework, leading to an inconsistent schema state across pods. 3) Expected outcome without locking: - Possible errors during the second migration (e.g., duplicate object errors) or partial schema state if migrations are non-transactional or couple-up with non-idempotent steps. - This can lead to inconsistent initialization of OpenFGA-related tables and, subsequently, security posture (e.g., incorrect authorization rules). 4) Expected outcome with the fix (as implemented in the commit): - Both migrations are serialized via a server-wide advisory lock per database. The second process will wait for the lock, then proceed after the first completes, ensuring only one migration runs at a time. - Resulting schema state is consistent across pods, reducing the risk of misconfigurations or insecure initialization. Concrete reproduction commands (PostgreSQL; replace with your environment as needed): - Start two sessions running the same migration block concurrently: psql postgres://grafana:password@localhost:5432/grafana_migs -c "BEGIN; CREATE TABLE openfga_demo_migration (id SERIAL PRIMARY KEY); COMMIT;" & psql postgres://grafana:password@localhost:5432/grafana_migs -c "BEGIN; CREATE TABLE openfga_demo_migration (id SERIAL PRIMARY KEY); COMMIT;" & - Observe errors in one or both sessions indicating a race condition when the advisory lock is not used. Expected PoC Observations: - Without locking, you may see a duplicate-table error in one of the concurrent migrations, or a partially applied migration depending on timing. - With the advisory lock in place (as in the patch), the second process will wait for the lock, and only after the first completes will it attempt to apply the migration, resulting in a clean, serialized application of migrations.

Commit Details

Author: Alexander Zobnin

Date: 2026-04-30 16:07 UTC

Message:

Zanzana: Lock database during OpenFGA migrations (#123906) Wrap the OpenFGA goose migrator in the same advisory lock Grafana's main migrator uses so concurrent pod startups don't race on schema creation. The OpenFGA goose driver opens its own connection and has no cross-process locking of its own.

Triage Assessment

Vulnerability Type: Race Condition

Confidence: MEDIUM

Reasoning:

The commit wraps OpenFGA migrations in a server-wide advisory lock to prevent concurrent pod startups from racing to create or modify the database schema. This addresses a race condition in database migrations that could lead to inconsistent/misconfigured schemas and potential security implications (e.g., improper access control initialization or exposure).

Verification Assessment

Vulnerability Type: Race Condition

Confidence: MEDIUM

Affected Versions: Pre-12.4.0 Grafana 12.x releases; the fix is included in 12.4.0 and later.

Code Diff

diff --git a/pkg/services/authz/zanzana/store/migration/migrator.go b/pkg/services/authz/zanzana/store/migration/migrator.go index 0bd475475f06b..a11f9402bad1b 100644 --- a/pkg/services/authz/zanzana/store/migration/migrator.go +++ b/pkg/services/authz/zanzana/store/migration/migrator.go @@ -7,14 +7,15 @@ import ( "fmt" "strings" + "github.com/openfga/openfga/pkg/storage/migrate" + "github.com/pressly/goose/v3" + "github.com/grafana/grafana/pkg/infra/log" "github.com/grafana/grafana/pkg/services/sqlstore" "github.com/grafana/grafana/pkg/services/sqlstore/migrator" "github.com/grafana/grafana/pkg/setting" "github.com/grafana/grafana/pkg/util" "github.com/grafana/grafana/pkg/util/xorm" - "github.com/openfga/openfga/pkg/storage/migrate" - "github.com/pressly/goose/v3" ) var ( @@ -51,7 +52,7 @@ func Run(cfg *setting.Cfg, dbType string, grafanaDBConfig *sqlstore.DatabaseConf Engine: dbType, } - if err := runOpenFGAMigrations(migrationConfig, logger); err != nil { + if err := runOpenFGAMigrationsLocked(engine, m.Dialect, cfg, migrationConfig, logger); err != nil { return fmt.Errorf("failed to run openfga migrations: %w", err) } @@ -62,6 +63,59 @@ func Run(cfg *setting.Cfg, dbType string, grafanaDBConfig *sqlstore.DatabaseConf return nil } +// runOpenFGAMigrationsLocked acquires the same advisory lock that Grafana's +// main migrator uses for this database and runs the openfga migrations under +// it. openfga's goose migrator has no cross-process locking of its own, so +// without this wrapper concurrent pod startups race on schema creation. +// +// The lock is server-wide (MySQL GET_LOCK / Postgres pg_advisory_lock), so it +// serializes correctly even though the openfga goose driver opens its own +// database connection. For SQLite Dialect.Lock is a no-op, which is fine +// because SQLite deployments are single-process. +func runOpenFGAMigrationsLocked( + engine *xorm.Engine, + dialect migrator.Dialect, + cfg *setting.Cfg, + migrationConfig migrate.MigrationConfig, + logger log.Logger, +) error { + sec := cfg.Raw.Section("database") + if !sec.Key("migration_locking").MustBool(true) { + return runOpenFGAMigrations(migrationConfig, logger) + } + + dbName, err := dialect.GetDBName(engine.DataSourceName()) + if err != nil { + return fmt.Errorf("failed to derive db name for advisory lock: %w", err) + } + // Reuse the same key as Grafana's main migrator (no additional name) so + // openfga and Grafana migrations are mutually exclusive per database. + key, err := migrator.GenerateAdvisoryLockID(dbName) + if err != nil { + return fmt.Errorf("failed to generate advisory lock id: %w", err) + } + + lockCfg := migrator.LockCfg{ + Session: engine.NewSession(), + Key: key, + Timeout: sec.Key("locking_attempt_timeout_sec").MustInt(30), + } + defer lockCfg.Session.Close() + + logger.Info("Locking database for openfga migrations", "key", key) + if err := dialect.Lock(lockCfg); err != nil { + return fmt.Errorf("failed to acquire openfga migration lock: %w", err) + } + defer func() { + logger.Info("Unlocking database after openfga migrations") + if err := dialect.Unlock(lockCfg); err != nil { + logger.Warn("failed to release openfga migration lock", "error", err) + } + }() + + return runOpenFGAMigrations(migrationConfig, logger) +} + func runOpenFGAMigrations(migrationConfig migrate.MigrationConfig, logger log.Logger) error { err := migrate.RunMigrations(migrationConfig) if err == nil {
← Back to Alerts View on GitHub →