Race condition (use-after-free risk in CPU hotplug and per-CPU crypto acomp_ctx lifecycle)

MEDIUM

torvalds/linux

Commit: ef3c0f6cb798

Affected: <= v7.0-rc6

2026-04-25 13:14 UTC

Description

The patch fixes a race/liveness issue in zswap's per-CPU crypto accelerator context (acomp_ctx) lifecycle by tying the per-CPU acomp_ctx resources to the zswap pool lifetime. Previously, acomp_ctx resources could be allocated and referenced across CPU hotplug cycles and pool creation/destruction, which opened a window where CPU offline/online transitions and pool lifecycle could lead to use-after-free or inconsistent crypto state if acomp_ctx resources were freed or partially initialized during hotplug handling. The fix: - Allocates per-CPU acomp_ctx on pool creation (or CPU hotplug for CPUs online later) and keeps them allocated until pool destruction, preventing premature teardown during CPU offline events. - Introduces acomp_ctx_free() and moves cleanup logic there, ensuring proper deallocation of acomp_ctx resources even in migration scenarios. - Uses cpuhp_state to serialize pool preparation with CPU hotplug so CPUs aren’t offlined until the preparation completes. - Removes the old per-CPU acquire/release helpers in favor of direct mutex usage on acomp_ctx->mutex, and ensures proper cleanup paths on failures during pool creation. These changes mitigate a race between CPU hotplug, pool lifecycle, and per-CPU crypto context management, reducing the risk of use-after-free or memory/crypto state corruption during hotplug events.

Proof of Concept

High-level PoC (conceptual, not a weaponizable exploit):
1) Build and boot a vulnerable kernel state (pre-patch) where zswap allocates per-CPU acomp_ctx on pool creation but frees them on pool destruction or CPU hotunplug, and where cpuhp_prepare does not serialize properly with CPU offline.
2) Trigger pool creation for a zswap pool on CPU0. The code allocates per-CPU acomp_ctx for CPUs including CPU0 and registers the cpuhp_state for pool preparation.
3) Concurrently issue a CPU hotplug offline request for CPU0 (e.g., echo 0 > /sys/devices/system/cpu/cpu0/online).
4) If the race exists, acomp_ctx resources could be freed or partially initialized while a zswap operation is in progress on CPU0, potentially leading to use-after-free or memory/crypto state corruption when compress/decompress paths access acomp_ctx data.
5) Observe an oops/crash or memory corruption if the race is hit. This should be detectable with kernel oops logs, KASAN, or mem corruption detectors in a controlled test environment.
Notes:
- Do not perform this on production systems. Reproduce only in a controlled lab with a kernel built from vulnerable code paths.
- The PoC is for demonstration of the race concept; it is not a weaponizable exploit and does not provide specific exploit steps beyond describing the triggering condition.

Commit Details

Author: Kanchana P. Sridhar

Date: 2026-03-31 18:33 UTC

Message:

mm: zswap: tie per-CPU acomp_ctx lifetime to the pool

Currently, per-CPU acomp_ctx are allocated on pool creation and/or CPU
hotplug, and destroyed on pool destruction or CPU hotunplug.  This
complicates the lifetime management to save memory while a CPU is
offlined, which is not very common.

Simplify lifetime management by allocating per-CPU acomp_ctx once on pool
creation (or CPU hotplug for CPUs onlined later), and keeping them
allocated until the pool is destroyed.

Refactor cleanup code from zswap_cpu_comp_dead() into acomp_ctx_free() to
be used elsewhere.

The main benefit of using the CPU hotplug multi state instance startup
callback to allocate the acomp_ctx resources is that it prevents the cores
from being offlined until the multi state instance addition call returns.

  From Documentation/core-api/cpu_hotplug.rst:

    "The node list add/remove operations and the callback invocations are
     serialized against CPU hotplug operations."

Furthermore, zswap_[de]compress() cannot contend with
zswap_cpu_comp_prepare() because:

  - During pool creation/deletion, the pool is not in the zswap_pools
    list.

  - During CPU hot[un]plug, the CPU is not yet online, as Yosry pointed
    out. zswap_cpu_comp_prepare() will be run on a control CPU,
    since CPUHP_MM_ZSWP_POOL_PREPARE is in the PREPARE section of "enum
    cpuhp_state".

  In both these cases, any recursions into zswap reclaim from
  zswap_cpu_comp_prepare() will be handled by the old pool.

The above two observations enable the following simplifications:

 1) zswap_cpu_comp_prepare():

    a) acomp_ctx mutex locking:

       If the process gets migrated while zswap_cpu_comp_prepare() is
       running, it will complete on the new CPU. In case of failures, we
       pass the acomp_ctx pointer obtained at the start of
       zswap_cpu_comp_prepare() to acomp_ctx_free(), which again, can
       only undergo migration. There appear to be no contention
       scenarios that might cause inconsistent values of acomp_ctx's
       members. Hence, it seems there is no need for
       mutex_lock(&acomp_ctx->mutex) in zswap_cpu_comp_prepare().

    b) acomp_ctx mutex initialization:

       Since the pool is not yet on zswap_pools list, we don't need to
       initialize the per-CPU acomp_ctx mutex in
       zswap_pool_create(). This has been restored to occur in
       zswap_cpu_comp_prepare().

    c) Subsequent CPU offline-online transitions:

       zswap_cpu_comp_prepare() checks upfront if acomp_ctx->acomp is
       valid. If so, it returns success. This should handle any CPU
       hotplug online-offline transitions after pool creation is done.

 2) CPU offline vis-a-vis zswap ops:

    Let's suppose the process is migrated to another CPU before the
    current CPU is dysfunctional. If zswap_[de]compress() holds the
    acomp_ctx->mutex lock of the offlined CPU, that mutex will be
    released once it completes on the new CPU. Since there is no
    teardown callback, there is no possibility of UAF.

 3) Pool creation/deletion and process migration to another CPU:

    During pool creation/deletion, the pool is not in the zswap_pools
    list. Hence it cannot contend with zswap ops on that CPU. However,
    the process can get migrated.

    a) Pool creation --> zswap_cpu_comp_prepare()
                                --> process migrated:
                                    * Old CPU offline: no-op.
                                    * zswap_cpu_comp_prepare() continues
                                      to run on the new CPU to finish
                                      allocating acomp_ctx resources for
                                      the offlined CPU.

    b) Pool deletion --> acomp_ctx_free()
                                --> process migrated:
                                    * Old CPU offline: no-op.
                                    * acomp_ctx_free() continues
                                      to run on the new CPU to finish
                                      de-allocating acomp_ctx resources
                                      for the offlined CPU.

 4) Pool deletion vis-a-vis CPU onlining:

    The call to cpuhp_state_remove_instance() cannot race with
    zswap_cpu_comp_prepare() because of hotplug synchronization.

The current acomp_ctx_get_cpu_lock()/acomp_ctx_put_unlock() are deleted. 
Instead, zswap_[de]compress() directly call
mutex_[un]lock(&acomp_ctx->mutex).

The per-CPU memory cost of not deleting the acomp_ctx resources upon CPU
offlining, and only deleting them when the pool is destroyed, is 8.28 KB
on x86_64.  This cost is only paid when a CPU is offlined, until it is
onlined again.

Link: https://lore.kernel.org/20260331183351.29844-3-kanchanapsridhar2026@gmail.com
Co-developed-by: Kanchana P. Sridhar <kanchanapsridhar2026@gmail.com>
Signed-off-by: Kanchana P. Sridhar <kanchanapsridhar2026@gmail.com>
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Acked-by: Yosry Ahmed <yosry@kernel.org>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Triage Assessment

Vulnerability Type: Race condition

Confidence: MEDIUM

Reasoning:

The commit ties per-CPU crypto accelerator context lifetimes to the zswap pool and adjusts CPU hotplug handling to ensure acomp_ctx resources are allocated with pool creation and not torn down during CPU hot-unplug. This addresses potential race conditions between CPU hotplug, pool lifecycle, and crypto acomp resource management, which could otherwise lead to use-after-free or inconsistent crypto state during hotplug events. While not a classic vulnerability like RCE/SQLi, it mitigates a race/timeout scenario that could have security implications in memory safety and crypto handling.

Verification Assessment

Vulnerability Type: Race condition (use-after-free risk in CPU hotplug and per-CPU crypto acomp_ctx lifecycle)

Confidence: MEDIUM

Affected Versions: <= v7.0-rc6

Code Diff

diff --git a/mm/zswap.c b/mm/zswap.c
index c59045b59ffe05..4b5149173b0ec5 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -242,6 +242,34 @@ static inline struct xarray *swap_zswap_tree(swp_entry_t swp)
 **********************************/
 static void __zswap_pool_empty(struct percpu_ref *ref);
 
+static void acomp_ctx_free(struct crypto_acomp_ctx *acomp_ctx)
+{
+	if (!acomp_ctx)
+		return;
+
+	/*
+	 * If there was an error in allocating @acomp_ctx->req, it
+	 * would be set to NULL.
+	 */
+	if (acomp_ctx->req)
+		acomp_request_free(acomp_ctx->req);
+
+	acomp_ctx->req = NULL;
+
+	/*
+	 * We have to handle both cases here: an error pointer return from
+	 * crypto_alloc_acomp_node(); and a) NULL initialization by zswap, or
+	 * b) NULL assignment done in a previous call to acomp_ctx_free().
+	 */
+	if (!IS_ERR_OR_NULL(acomp_ctx->acomp))
+		crypto_free_acomp(acomp_ctx->acomp);
+
+	acomp_ctx->acomp = NULL;
+
+	kfree(acomp_ctx->buffer);
+	acomp_ctx->buffer = NULL;
+}
+
 static struct zswap_pool *zswap_pool_create(char *compressor)
 {
 	struct zswap_pool *pool;
@@ -263,19 +291,27 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
 
 	strscpy(pool->tfm_name, compressor, sizeof(pool->tfm_name));
 
-	pool->acomp_ctx = alloc_percpu(*pool->acomp_ctx);
+	/* Many things rely on the zero-initialization. */
+	pool->acomp_ctx = alloc_percpu_gfp(*pool->acomp_ctx,
+					   GFP_KERNEL | __GFP_ZERO);
 	if (!pool->acomp_ctx) {
 		pr_err("percpu alloc failed\n");
 		goto error;
 	}
 
-	for_each_possible_cpu(cpu)
-		mutex_init(&per_cpu_ptr(pool->acomp_ctx, cpu)->mutex);
-
+	/*
+	 * This is serialized against CPU hotplug operations. Hence, cores
+	 * cannot be offlined until this finishes.
+	 */
 	ret = cpuhp_state_add_instance(CPUHP_MM_ZSWP_POOL_PREPARE,
 				       &pool->node);
+
+	/*
+	 * cpuhp_state_add_instance() will not cleanup on failure since
+	 * we don't register a hotunplug callback.
+	 */
 	if (ret)
-		goto error;
+		goto cpuhp_add_fail;
 
 	/* being the current pool takes 1 ref; this func expects the
 	 * caller to always add the new pool as the current pool
@@ -292,6 +328,10 @@ static struct zswap_pool *zswap_pool_create(char *compressor)
 
 ref_fail:
 	cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node);
+
+cpuhp_add_fail:
+	for_each_possible_cpu(cpu)
+		acomp_ctx_free(per_cpu_ptr(pool->acomp_ctx, cpu));
 error:
 	if (pool->acomp_ctx)
 		free_percpu(pool->acomp_ctx);
@@ -322,9 +362,15 @@ static struct zswap_pool *__zswap_pool_create_fallback(void)
 
 static void zswap_pool_destroy(struct zswap_pool *pool)
 {
+	int cpu;
+
 	zswap_pool_debug("destroying", pool);
 
 	cpuhp_state_remove_instance(CPUHP_MM_ZSWP_POOL_PREPARE, &pool->node);
+
+	for_each_possible_cpu(cpu)
+		acomp_ctx_free(per_cpu_ptr(pool->acomp_ctx, cpu));
+
 	free_percpu(pool->acomp_ctx);
 
 	zs_destroy_pool(pool->zs_pool);
@@ -738,44 +784,41 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
 {
 	struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
 	struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
-	struct crypto_acomp *acomp = NULL;
-	struct acomp_req *req = NULL;
-	u8 *buffer = NULL;
-	int ret;
+	int ret = -ENOMEM;
 
-	buffer = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
-	if (!buffer) {
-		ret = -ENOMEM;
-		goto fail;
+	/*
+	 * To handle cases where the CPU goes through online-offline-online
+	 * transitions, we return if the acomp_ctx has already been initialized.
+	 */
+	if (acomp_ctx->acomp) {
+		WARN_ON_ONCE(IS_ERR(acomp_ctx->acomp));
+		return 0;
 	}
 
+	acomp_ctx->buffer = kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu));
+	if (!acomp_ctx->buffer)
+		return ret;
+
 	/*
 	 * In case of an error, crypto_alloc_acomp_node() returns an
 	 * error pointer, never NULL.
 	 */
-	acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
-	if (IS_ERR(acomp)) {
+	acomp_ctx->acomp = crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_node(cpu));
+	if (IS_ERR(acomp_ctx->acomp)) {
 		pr_err("could not alloc crypto acomp %s : %pe\n",
-				pool->tfm_name, acomp);
-		ret = PTR_ERR(acomp);
+				pool->tfm_name, acomp_ctx->acomp);
+		ret = PTR_ERR(acomp_ctx->acomp);
 		goto fail;
 	}
 
 	/* acomp_request_alloc() returns NULL in case of an error. */
-	req = acomp_request_alloc(acomp);
-	if (!req) {
+	acomp_ctx->req = acomp_request_alloc(acomp_ctx->acomp);
+	if (!acomp_ctx->req) {
 		pr_err("could not alloc crypto acomp_request %s\n",
 		       pool->tfm_name);
-		ret = -ENOMEM;
 		goto fail;
 	}
 
-	/*
-	 * Only hold the mutex after completing allocations, otherwise we may
-	 * recurse into zswap through reclaim and attempt to hold the mutex
-	 * again resulting in a deadlock.
-	 */
-	mutex_lock(&acomp_ctx->mutex);
 	crypto_init_wait(&acomp_ctx->wait);
 
 	/*
@@ -783,83 +826,17 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *node)
 	 * crypto_wait_req(); if the backend of acomp is scomp, the callback
 	 * won't be called, crypto_wait_req() will return without blocking.
 	 */
-	acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+	acomp_request_set_callback(acomp_ctx->req, CRYPTO_TFM_REQ_MAY_BACKLOG,
 				   crypto_req_done, &acomp_ctx->wait);
 
-	acomp_ctx->buffer = buffer;
-	acomp_ctx->acomp = acomp;
-	acomp_ctx->req = req;
-	mutex_unlock(&acomp_ctx->mutex);
+	mutex_init(&acomp_ctx->mutex);
 	return 0;
 
 fail:
-	if (!IS_ERR_OR_NULL(acomp))
-		crypto_free_acomp(acomp);
-	kfree(buffer);
+	acomp_ctx_free(acomp_ctx);
 	return ret;
 }
 
-static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node)
-{
-	struct zswap_pool *pool = hlist_entry(node, struct zswap_pool, node);
-	struct crypto_acomp_ctx *acomp_ctx = per_cpu_ptr(pool->acomp_ctx, cpu);
-	struct acomp_req *req;
-	struct crypto_acomp *acomp;
-	u8 *buffer;
-
-	if (!acomp_ctx)
-		return 0;
-
-	mutex_lock(&acomp_ctx->mutex);
-	req = acomp_ctx->req;
-	acomp = acomp_ctx->acomp;
-	buffer = acomp_ctx->buffer;
-	acomp_ctx->req = NULL;
-	acomp_ctx->acomp = NULL;
-	acomp_ctx->buffer = NULL;
-	mutex_unlock(&acomp_ctx->mutex);
-
-	/*
-	 * Do the actual freeing after releasing the mutex to avoid subtle
-	 * locking dependencies causing deadlocks.
-	 *
-	 * If there was an error in allocating @acomp_ctx->req, it
-	 * would be set to NULL.
-	 */
-	if (req)
-		acomp_request_free(req);
-	if (!IS_ERR_OR_NULL(acomp))
-		crypto_free_acomp(acomp);
-	kfree(buffer);
-
-	return 0;
-}
-
-static struct crypto_acomp_ctx *acomp_ctx_get_cpu_lock(struct zswap_pool *pool)
-{
-	struct crypto_acomp_ctx *acomp_ctx;
-
-	for (;;) {
-		acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
-		mutex_lock(&acomp_ctx->mutex);
-		if (likely(acomp_ctx->req))
-			return acomp_ctx;
-		/*
-		 * It is possible that we were migrated to a different CPU after
-		 * getting the per-CPU ctx but before the mutex was acquired. If
-		 * the old CPU got offlined, zswap_cpu_comp_dead() could have
-		 * already freed ctx->req (among other things) and set it to
-		 * NULL. Just try again on the new CPU that we ended up on.
-		 */
-		mutex_unlock(&acomp_ctx->mutex);
-	}
-}
-
-static void acomp_ctx_put_unlock(struct crypto_acomp_ctx *acomp_ctx)
-{
-	mutex_unlock(&acomp_ctx->mutex);
-}
-
 static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 			   struct zswap_pool *pool)
 {
@@ -872,7 +849,9 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	u8 *dst;
 	bool mapped = false;
 
-	acomp_ctx = acomp_ctx_get_cpu_lock(pool);
+	acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
+	mutex_lock(&acomp_ctx->mutex);
+
 	dst = acomp_ctx->buffer;
 	sg_init_table(&input, 1);
 	sg_set_page(&input, page, PAGE_SIZE, 0);
@@ -938,7 +917,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	else if (alloc_ret)
 		zswap_reject_alloc_fail++;
 
-	acomp_ctx_put_unlock(acomp_ctx);
+	mutex_unlock(&acomp_ctx->mutex);
 	return comp_ret == 0 && alloc_ret == 0;
 }
 
@@ -950,7 +929,8 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	struct crypto_acomp_ctx *acomp_ctx;
 	int ret = 0, dlen;
 
-	acomp_ctx = acomp_ctx_get_cpu_lock(pool);
+	acomp_ctx = raw_cpu_ptr(pool->acomp_ctx);
+	mutex_lock(&acomp_ctx->mutex);
 	zs_obj_read_sg_begin(pool->zs_pool, entry->handle, input, entry->length);
 
 	/* zswap entries of length PAGE_SIZE are not compressed. */
@@ -975,7 +955,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
 	}
 
 	zs_obj_read_sg_end(pool->zs_pool, entry->handle);
-	acomp_ctx_put_unlock(acomp_ctx);
+	mutex_unlock(&acomp_ctx->mutex);
 
 	if (!ret && dlen == PAGE_SIZE)
 		return true;
@@ -1795,7 +1775,7 @@ static int zswap_setup(void)
 	ret = cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE,
 				      "mm/zswap_pool:prepare",
 				      zswap_cpu_comp_prepare,
-				      zswap_cpu_comp_dead);
+				      NULL);
 	if (ret)
 		goto hp_fail;
 

← Back to Alerts View on GitHub →