Buffer overflow / memory-safety issue in device-mapper ioctl processing

HIGH

torvalds/linux

Commit: a5f998094fa3

Affected: <= v7.0-rc6 (prior to the for-7.1/dm-changes integration); includes 7.0-rc6 and earlier

2026-04-16 06:22 UTC

Description

The commit set contains device-mapper and DM core fixes that include memory-safety improvements around dynamic allocations and ioctl handling (notably in dm-bufio.c and related dm-cache/metadata paths). The key change is replacing a potentially unsafe fixed-size allocation for a dynamically-sized structure with a safe flexible-allocation approach (kzalloc_flex) and related ioctl/error-path hardening. This points to a genuine memory-safety bug (buffer overflow risk) in ioctl/Data-structure handling within the device-mapper subsystem, rather than a pure dependency bump or cosmetic cleanup.

Commit Details

Author: Linus Torvalds

Date: 2026-04-15 22:11 UTC

Message:

Merge tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Benjamin Marzinski:
 "There are fixes for some corner case crashes in dm-cache and
  dm-mirror, new setup functionality for dm-vdo, and miscellaneous minor
  fixes and cleanups, especially to dm-verity.

  dm-vdo:
   - Make dm-vdo able to format the device itself, like other dm
     targets, instead of needing a userspace formating program
   - Add some sanity checks and code cleanup

  dm-cache:
   - Fix crashes and hangs when operating in passthrough mode (which
     have been around, unnoticed, since 4.12), as well as a late
     arriving fix for an error path bug in the passthrough fix
   - Fix a corner case memory leak

  dm-verity:
   - Another set of minor bugfixes and code cleanups to the forward
     error correction code

  dm-mirror
   - Fix minor initialization bug
   - Fix overflow crash on a large devices with small region sizes

  dm-crypt
   - Reimplement elephant diffuser using AES library and minor cleanups

  dm-core:
   - Claude found a buffer overflow in /dev/mapper/contrl ioctl handling
   - make dm_mod.wait_for correctly wait for partitions
   - minor code fixes and cleanups"

* tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits)
  dm cache: fix missing return in invalidate_committed's error path
  dm: fix a buffer overflow in ioctl processing
  dm-crypt: Make crypt_iv_operations::post return void
  dm vdo: Fix spelling mistake "postive" -> "positive"
  dm: provide helper to set stacked limits
  dm-integrity: always set the io hints
  dm-integrity: fix mismatched queue limits
  dm-bufio: use kzalloc_flex
  dm vdo: save the formatted metadata to disk
  dm vdo: add formatting logic and initialization
  dm vdo: add synchronous metadata I/O submission helper
  dm vdo: add geometry block structure
  dm vdo: add geometry block encoding
  dm vdo: add upfront validation for logical size
  dm vdo: add formatting parameters to table line
  dm vdo: add super block initialization to encodings.c
  dm vdo: add geometry block initialization to encodings.c
  dm-crypt: Make crypt_iv_operations::wipe return void
  dm-crypt: Reimplement elephant diffuser using AES library
  dm-verity-fec: warn even when there were no errors
  ...

Triage Assessment

Vulnerability Type: Buffer Overflow

Confidence: HIGH

Reasoning:

The commit note explicitly mentions a fix for a buffer overflow in /dev/mapper/contrl ioctl handling, which is a memory-safety vulnerability. The diff includes related changes to ioctl handling and several robustness/fix commits across device-mapper components, reinforcing that this is a security fix rather than mere refactoring or feature work.

Verification Assessment

Vulnerability Type: Buffer overflow / memory-safety issue in device-mapper ioctl processing

Confidence: HIGH

Affected Versions: <= v7.0-rc6 (prior to the for-7.1/dm-changes integration); includes 7.0-rc6 and earlier

Code Diff

diff --git a/Documentation/admin-guide/device-mapper/verity.rst b/Documentation/admin-guide/device-mapper/verity.rst
index 3ecab1cff9c64c..eb9475d7e1965a 100644
--- a/Documentation/admin-guide/device-mapper/verity.rst
+++ b/Documentation/admin-guide/device-mapper/verity.rst
@@ -102,29 +102,42 @@ ignore_zero_blocks
     that are not guaranteed to contain zeroes.
 
 use_fec_from_device <fec_dev>
-    Use forward error correction (FEC) to recover from corruption if hash
-    verification fails. Use encoding data from the specified device. This
-    may be the same device where data and hash blocks reside, in which case
-    fec_start must be outside data and hash areas.
+    Use forward error correction (FEC) parity data from the specified device to
+    try to automatically recover from corruption and I/O errors.
 
-    If the encoding data covers additional metadata, it must be accessible
-    on the hash device after the hash blocks.
+    If this option is given, then <fec_roots> and <fec_blocks> must also be
+    given.  <hash_block_size> must also be equal to <data_block_size>.
 
-    Note: block sizes for data and hash devices must match. Also, if the
-    verity <dev> is encrypted the <fec_dev> should be too.
+    <fec_dev> can be the same as <dev>, in which case <fec_start> must be
+    outside the data area.  It can also be the same as <hash_dev>, in which case
+    <fec_start> must be outside the hash and optional additional metadata areas.
+
+    If the data <dev> is encrypted, the <fec_dev> should be too.
+
+    For more information, see `Forward error correction`_.
 
 fec_roots <num>
-    Number of generator roots. This equals to the number of parity bytes in
-    the encoding data. For example, in RS(M, N) encoding, the number of roots
-    is M-N.
+    The number of parity bytes in each 255-byte Reed-Solomon codeword.  The
+    Reed-Solomon code used will be an RS(255, k) code where k = 255 - fec_roots.
+
+    The supported values are 2 through 24 inclusive.  Higher values provide
+    stronger error correction.  However, the minimum value of 2 already provides
+    strong error correction due to the use of interleaving, so 2 is the
+    recommended value for most users.  fec_roots=2 corresponds to an
+    RS(255, 253) code, which has a space overhead of about 0.8%.
 
 fec_blocks <num>
-    The number of encoding data blocks on the FEC device. The block size for
-    the FEC device is <data_block_size>.
+    The total number of <data_block_size> blocks that are error-checked using
+    FEC.  This must be at least the sum of <num_data_blocks> and the number of
+    blocks needed by the hash tree.  It can include additional metadata blocks,
+    which are assumed to be accessible on <hash_dev> following the hash blocks.
+
+    Note that this is *not* the number of parity blocks.  The number of parity
+    blocks is inferred from <fec_blocks>, <fec_roots>, and <data_block_size>.
 
 fec_start <offset>
-    This is the offset, in <data_block_size> blocks, from the start of the
-    FEC device to the beginning of the encoding data.
+    This is the offset, in <data_block_size> blocks, from the start of <fec_dev>
+    to the beginning of the parity data.
 
 check_at_most_once
     Verify data blocks only the first time they are read from the data device,
@@ -180,11 +193,6 @@ per-block basis. This allows for a lightweight hash computation on first read
 into the page cache. Block hashes are stored linearly, aligned to the nearest
 block size.
 
-If forward error correction (FEC) support is enabled any recovery of
-corrupted data will be verified using the cryptographic hash of the
-corresponding data. This is why combining error correction with
-integrity checking is essential.
-
 Hash Tree
 ---------
 
@@ -212,6 +220,80 @@ The tree looks something like:
            / ... \             /   . . .  \             /           \
      blk_0 ... blk_127  blk_16256   blk_16383      blk_32640 . . . blk_32767
 
+Forward error correction
+------------------------
+
+dm-verity's optional forward error correction (FEC) support adds strong error
+correction capabilities to dm-verity.  It allows systems that would be rendered
+inoperable by errors to continue operating, albeit with reduced performance.
+
+FEC uses Reed-Solomon (RS) codes that are interleaved across the entire
+device(s), allowing long bursts of corrupt or unreadable blocks to be recovered.
+
+dm-verity validates any FEC-corrected block against the wanted hash before using
+it.  Therefore, FEC doesn't affect the security properties of dm-verity.
+
+The integration of FEC with dm-verity provides significant benefits over a
+separate error correction layer:
+
+- dm-verity invokes FEC only when a block's hash doesn't match the wanted hash
+  or the block cannot be read at all.  As a result, FEC doesn't add overhead to
+  the common case where no error occurs.
+
+- dm-verity hashes are also used to identify erasure locations for RS decoding.
+  This allows correcting twice as many errors.
+
+FEC uses an RS(255, k) code where k = 255 - fec_roots.  fec_roots is usually 2.
+This means that each k (usually 253) message bytes have fec_roots (usually 2)
+bytes of parity data added to get a 255-byte codeword.  (Many external sources
+call RS codewords "blocks".  Since dm-verity already uses the term "block" to
+mean something else, we'll use the clearer term "RS codeword".)
+
+FEC checks fec_blocks blocks of message data in total, consisting of:
+
+1. The data blocks from the data device
+2. The hash blocks from the hash device
+3. Optional additional metadata that follows the hash blocks on the hash device
+
+dm-verity assumes that the FEC parity data was computed as if the following
+procedure were followed:
+
+1. Concatenate the message data from the above sources.
+2. Zero-pad to the next multiple of k blocks.  Let msg be the resulting byte
+   array, and msglen its length in bytes.
+3. For 0 <= i < msglen / k (for each RS codeword):
+     a. Select msg[i + j * msglen / k] for 0 <= j < k.
+        Consider these to be the 'k' message bytes of an RS codeword.
+     b. Compute the corresponding 'fec_roots' parity bytes of the RS codeword,
+        and concatenate them to the FEC parity data.
+
+Step 3a interleaves the RS codewords across the entire device using an
+interleaving degree of data_block_size * ceil(fec_blocks / k).  This is the
+maximal interleaving, such that the message data consists of a region containing
+byte 0 of all the RS codewords, then a region containing byte 1 of all the RS
+codewords, and so on up to the region for byte 'k - 1'.  Note that the number of
+codewords is set to a multiple of data_block_size; thus, the regions are
+block-aligned, and there is an implicit zero padding of up to 'k - 1' blocks.
+
+This interleaving allows long bursts of errors to be corrected.  It provides
+much stronger error correction than storage devices typically provide, while
+keeping the space overhead low.
+
+The cost is slow decoding: correcting a single block usually requires reading
+254 extra blocks spread evenly across the device(s).  However, that is
+acceptable because dm-verity uses FEC only when there is actually an error.
+
+The list below contains additional details about the RS codes used by
+dm-verity's FEC.  Userspace programs that generate the parity data need to use
+these parameters for the parity data to match exactly:
+
+- Field used is GF(256)
+- Bytes are mapped to/from GF(256) elements in the natural way, where bits 0
+  through 7 (low-order to high-order) map to the coefficients of x^0 through x^7
+- Field generator polynomial is x^8 + x^4 + x^3 + x^2 + 1
+- The codes used are systematic, BCH-view codes
+- Primitive element alpha is 'x'
+- First consecutive root of code generator polynomial is 'x^0'
 
 On-disk format
 ==============

diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index c58a9a8ea54e92..a3fcdca7e6db31 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -226,6 +226,7 @@ config BLK_DEV_DM
 	select BLOCK_HOLDER_DEPRECATED if SYSFS
 	select BLK_DEV_DM_BUILTIN
 	select BLK_MQ_STACKING
+	select CRYPTO_LIB_SHA256 if IMA
 	depends on DAX || DAX=n
 	help
 	  Device-mapper is a low level volume manager.  It works by allowing
@@ -299,6 +300,7 @@ config DM_CRYPT
 	select CRYPTO
 	select CRYPTO_CBC
 	select CRYPTO_ESSIV
+	select CRYPTO_LIB_AES
 	select CRYPTO_LIB_MD5 # needed by lmk IV mode
 	help
 	  This device-mapper target allows you to create a device that

diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 60f7badec91f23..26fedf5883eff6 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -391,7 +391,7 @@ struct dm_buffer_cache {
 	 */
 	unsigned int num_locks;
 	bool no_sleep;
-	struct buffer_tree trees[];
+	struct buffer_tree trees[] __counted_by(num_locks);
 };
 
 static DEFINE_STATIC_KEY_FALSE(no_sleep_enabled);
@@ -2511,7 +2511,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
 	}
 
 	num_locks = dm_num_hash_locks();
-	c = kzalloc(sizeof(*c) + (num_locks * sizeof(struct buffer_tree)), GFP_KERNEL);
+	c = kzalloc_flex(*c, cache.trees, num_locks);
 	if (!c) {
 		r = -ENOMEM;
 		goto bad_client;

diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
index 57158c02d096ed..acd9b179fcb3f2 100644
--- a/drivers/md/dm-cache-metadata.c
+++ b/drivers/md/dm-cache-metadata.c
@@ -1023,6 +1023,12 @@ static bool cmd_write_lock(struct dm_cache_metadata *cmd)
 			return;			\
 	} while (0)
 
+#define WRITE_LOCK_OR_GOTO(cmd, label)		\
+	do {					\
+		if (!cmd_write_lock((cmd)))	\
+			goto label;		\
+	} while (0)
+
 #define WRITE_UNLOCK(cmd) \
 	up_write(&(cmd)->root_lock)
 
@@ -1714,17 +1720,6 @@ int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *
 	return r;
 }
 
-int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result)
-{
-	int r;
-
-	READ_LOCK(cmd);
-	r = blocks_are_unmapped_or_clean(cmd, 0, cmd->cache_blocks, result);
-	READ_UNLOCK(cmd);
-
-	return r;
-}
-
 void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd)
 {
 	WRITE_LOCK_VOID(cmd);
@@ -1791,11 +1786,8 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
 	new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
 					 CACHE_MAX_CONCURRENT_LOCKS);
 
-	WRITE_LOCK(cmd);
-	if (cmd->fail_io) {
-		WRITE_UNLOCK(cmd);
-		goto out;
-	}
+	/* cmd_write_lock() already checks fail_io with cmd->root_lock held */
+	WRITE_LOCK_OR_GOTO(cmd, out);
 
 	__destroy_persistent_data_objects(cmd, false);
 	old_bm = cmd->bm;
@@ -1824,3 +1816,12 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
 
 	return r;
 }
+
+int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result)
+{
+	READ_LOCK(cmd);
+	*result = cmd->clean_when_opened;
+	READ_UNLOCK(cmd);
+
+	return 0;
+}

diff --git a/drivers/md/dm-cache-metadata.h b/drivers/md/dm-cache-metadata.h
index 5f77890207fede..91f8706b41fdde 100644
--- a/drivers/md/dm-cache-metadata.h
+++ b/drivers/md/dm-cache-metadata.h
@@ -135,17 +135,17 @@ int dm_cache_get_metadata_dev_size(struct dm_cache_metadata *cmd,
  */
 int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *p);
 
-/*
- * Query method.  Are all the blocks in the cache clean?
- */
-int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result);
-
 int dm_cache_metadata_needs_check(struct dm_cache_metadata *cmd, bool *result);
 int dm_cache_metadata_set_needs_check(struct dm_cache_metadata *cmd);
 void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd);
 void dm_cache_metadata_set_read_write(struct dm_cache_metadata *cmd);
 int dm_cache_metadata_abort(struct dm_cache_metadata *cmd);
 
+/*
+ * Query method.  Was the metadata cleanly shut down when opened?
+ */
+int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result);
+
 /*----------------------------------------------------------------*/
 
 #endif /* DM_CACHE_METADATA_H */

diff --git a/drivers/md/dm-cache-policy-smq.c b/drivers/md/dm-cache-policy-smq.c
index b328d9601046b2..dd77a93fd68d2d 100644
--- a/drivers/md/dm-cache-policy-smq.c
+++ b/drivers/md/dm-cache-policy-smq.c
@@ -1589,14 +1589,18 @@ static int smq_invalidate_mapping(struct dm_cache_policy *p, dm_cblock_t cblock)
 {
 	struct smq_policy *mq = to_smq_policy(p);
 	struct entry *e = get_entry(&mq->cache_alloc, from_cblock(cblock));
+	unsigned long flags;
 
 	if (!e->allocated)
 		return -ENODATA;
 
+	spin_lock_irqsave(&mq->lock, flags);
 	// FIXME: what if this block has pending background work?
 	del_queue(mq, e);
 	h_remove(&mq->table, e);
 	free_entry(&mq->cache_alloc, e);
+	spin_unlock_irqrestore(&mq->lock, flags);
+
 	return 0;
 }
 

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 935ab79b1d0cd4..097315a9bf0f13 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -1462,11 +1462,19 @@ static void invalidate_complete(struct dm_cache_migration *mg, bool success)
 	struct cache *cache = mg->cache;
 
 	bio_list_init(&bios);
-	if (dm_cell_unlock_v2(cache->prison, mg->cell, &bios))
-		free_prison_cell(cache, mg->cell);
+	if (mg->cell) {
+		if (dm_cell_unlock_v2(cache->prison, mg->cell, &bios))
+			free_prison_cell(cache, mg->cell);
+	}
 
-	if (!success && mg->overwrite_bio)
-		bio_io_error(mg->overwrite_bio);
+	if (mg->overwrite_bio) {
+		// Set generic error if the bio hasn't been issued yet,
+		// e.g., invalidation or metadata commit failed before bio
+		// submission. Otherwise preserve the bio's own error status.
+		if (!success && !mg->overwrite_bio->bi_status)
+			mg->overwrite_bio->bi_status = BLK_STS_IOERR;
+		bio_endio(mg->overwrite_bio);
+	}
 
 	free_migration(mg);
 	defer_bios(cache, &bios);
@@ -1506,6 +1514,24 @@ static int invalidate_cblock(struct cache *cache, dm_cblock_t cblock)
 	return r;
 }
 
+static void invalidate_committed(struct work_struct *ws)
+{
+	struct dm_cache_migration *mg = ws_to_mg(ws);
+	struct cache *cache = mg->cache;
+	struct bio *bio = mg->overwrite_bio;
+	struct per_bio_data *pb = get_per_bio_data(bio);
+
+	if (mg->k.input) {
+		invalidate_complete(mg, false);
+		return;
+	}
+
+	init_continuation(&mg->k, invalidate_completed);
+	remap_to_origin_clear_discard(cache, bio, mg->invalidate_oblock);
+	dm_hook_bio(&pb->hook_info, bio, overwrite_endio, mg);
+	dm_submit_bio_remap(bio, NULL);
+}
+
 static void invalidate_remove(struct work_struct *ws)
 {
 	int r;
@@ -1518,10 +1544,8 @@ static void invalidate_remove(struct work_struct *ws)
 		return;
 	}
 
-	init_continuation(&mg->k, invalidate_completed);
+	init_continuation(&mg->k, invalidate_committed);
 	continue_after_commit(&cache->committer, &mg->k);
-	remap_to_origin_clear_discard(cache, mg->overwrite_bio, mg->invalidate_oblock);
-	mg->overwrite_bio = NULL;
 	schedule_commit(&cache->committer);
 }
 
@@ -1539,6 +1563,15 @@ static int invalidate_lock(str
... [truncated]

← Back to Alerts View on GitHub →