Buffer overflow / memory-safety issue in device-mapper ioctl processing
Description
The commit set contains device-mapper and DM core fixes that include memory-safety improvements around dynamic allocations and ioctl handling (notably in dm-bufio.c and related dm-cache/metadata paths). The key change is replacing a potentially unsafe fixed-size allocation for a dynamically-sized structure with a safe flexible-allocation approach (kzalloc_flex) and related ioctl/error-path hardening. This points to a genuine memory-safety bug (buffer overflow risk) in ioctl/Data-structure handling within the device-mapper subsystem, rather than a pure dependency bump or cosmetic cleanup.
Commit Details
Author: Linus Torvalds
Date: 2026-04-15 22:11 UTC
Message:
Merge tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Benjamin Marzinski:
"There are fixes for some corner case crashes in dm-cache and
dm-mirror, new setup functionality for dm-vdo, and miscellaneous minor
fixes and cleanups, especially to dm-verity.
dm-vdo:
- Make dm-vdo able to format the device itself, like other dm
targets, instead of needing a userspace formating program
- Add some sanity checks and code cleanup
dm-cache:
- Fix crashes and hangs when operating in passthrough mode (which
have been around, unnoticed, since 4.12), as well as a late
arriving fix for an error path bug in the passthrough fix
- Fix a corner case memory leak
dm-verity:
- Another set of minor bugfixes and code cleanups to the forward
error correction code
dm-mirror
- Fix minor initialization bug
- Fix overflow crash on a large devices with small region sizes
dm-crypt
- Reimplement elephant diffuser using AES library and minor cleanups
dm-core:
- Claude found a buffer overflow in /dev/mapper/contrl ioctl handling
- make dm_mod.wait_for correctly wait for partitions
- minor code fixes and cleanups"
* tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits)
dm cache: fix missing return in invalidate_committed's error path
dm: fix a buffer overflow in ioctl processing
dm-crypt: Make crypt_iv_operations::post return void
dm vdo: Fix spelling mistake "postive" -> "positive"
dm: provide helper to set stacked limits
dm-integrity: always set the io hints
dm-integrity: fix mismatched queue limits
dm-bufio: use kzalloc_flex
dm vdo: save the formatted metadata to disk
dm vdo: add formatting logic and initialization
dm vdo: add synchronous metadata I/O submission helper
dm vdo: add geometry block structure
dm vdo: add geometry block encoding
dm vdo: add upfront validation for logical size
dm vdo: add formatting parameters to table line
dm vdo: add super block initialization to encodings.c
dm vdo: add geometry block initialization to encodings.c
dm-crypt: Make crypt_iv_operations::wipe return void
dm-crypt: Reimplement elephant diffuser using AES library
dm-verity-fec: warn even when there were no errors
...
Triage Assessment
Vulnerability Type: Buffer Overflow
Confidence: HIGH
Reasoning:
The commit note explicitly mentions a fix for a buffer overflow in /dev/mapper/contrl ioctl handling, which is a memory-safety vulnerability. The diff includes related changes to ioctl handling and several robustness/fix commits across device-mapper components, reinforcing that this is a security fix rather than mere refactoring or feature work.
Verification Assessment
Vulnerability Type: Buffer overflow / memory-safety issue in device-mapper ioctl processing
Confidence: HIGH
Affected Versions: <= v7.0-rc6 (prior to the for-7.1/dm-changes integration); includes 7.0-rc6 and earlier
Code Diff
diff --git a/Documentation/admin-guide/device-mapper/verity.rst b/Documentation/admin-guide/device-mapper/verity.rst
index 3ecab1cff9c64c..eb9475d7e1965a 100644
--- a/Documentation/admin-guide/device-mapper/verity.rst
+++ b/Documentation/admin-guide/device-mapper/verity.rst
@@ -102,29 +102,42 @@ ignore_zero_blocks
that are not guaranteed to contain zeroes.
use_fec_from_device <fec_dev>
- Use forward error correction (FEC) to recover from corruption if hash
- verification fails. Use encoding data from the specified device. This
- may be the same device where data and hash blocks reside, in which case
- fec_start must be outside data and hash areas.
+ Use forward error correction (FEC) parity data from the specified device to
+ try to automatically recover from corruption and I/O errors.
- If the encoding data covers additional metadata, it must be accessible
- on the hash device after the hash blocks.
+ If this option is given, then <fec_roots> and <fec_blocks> must also be
+ given. <hash_block_size> must also be equal to <data_block_size>.
- Note: block sizes for data and hash devices must match. Also, if the
- verity <dev> is encrypted the <fec_dev> should be too.
+ <fec_dev> can be the same as <dev>, in which case <fec_start> must be
+ outside the data area. It can also be the same as <hash_dev>, in which case
+ <fec_start> must be outside the hash and optional additional metadata areas.
+
+ If the data <dev> is encrypted, the <fec_dev> should be too.
+
+ For more information, see `Forward error correction`_.
fec_roots <num>
- Number of generator roots. This equals to the number of parity bytes in
- the encoding data. For example, in RS(M, N) encoding, the number of roots
- is M-N.
+ The number of parity bytes in each 255-byte Reed-Solomon codeword. The
+ Reed-Solomon code used will be an RS(255, k) code where k = 255 - fec_roots.
+
+ The supported values are 2 through 24 inclusive. Higher values provide
+ stronger error correction. However, the minimum value of 2 already provides
+ strong error correction due to the use of interleaving, so 2 is the
+ recommended value for most users. fec_roots=2 corresponds to an
+ RS(255, 253) code, which has a space overhead of about 0.8%.
fec_blocks <num>
- The number of encoding data blocks on the FEC device. The block size for
- the FEC device is <data_block_size>.
+ The total number of <data_block_size> blocks that are error-checked using
+ FEC. This must be at least the sum of <num_data_blocks> and the number of
+ blocks needed by the hash tree. It can include additional metadata blocks,
+ which are assumed to be accessible on <hash_dev> following the hash blocks.
+
+ Note that this is *not* the number of parity blocks. The number of parity
+ blocks is inferred from <fec_blocks>, <fec_roots>, and <data_block_size>.
fec_start <offset>
- This is the offset, in <data_block_size> blocks, from the start of the
- FEC device to the beginning of the encoding data.
+ This is the offset, in <data_block_size> blocks, from the start of <fec_dev>
+ to the beginning of the parity data.
check_at_most_once
Verify data blocks only the first time they are read from the data device,
@@ -180,11 +193,6 @@ per-block basis. This allows for a lightweight hash computation on first read
into the page cache. Block hashes are stored linearly, aligned to the nearest
block size.
-If forward error correction (FEC) support is enabled any recovery of
-corrupted data will be verified using the cryptographic hash of the
-corresponding data. This is why combining error correction with
-integrity checking is essential.
-
Hash Tree
---------
@@ -212,6 +220,80 @@ The tree looks something like:
/ ... \ / . . . \ / \
blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767
+Forward error correction
+------------------------
+
+dm-verity's optional forward error correction (FEC) support adds strong error
+correction capabilities to dm-verity. It allows systems that would be rendered
+inoperable by errors to continue operating, albeit with reduced performance.
+
+FEC uses Reed-Solomon (RS) codes that are interleaved across the entire
+device(s), allowing long bursts of corrupt or unreadable blocks to be recovered.
+
+dm-verity validates any FEC-corrected block against the wanted hash before using
+it. Therefore, FEC doesn't affect the security properties of dm-verity.
+
+The integration of FEC with dm-verity provides significant benefits over a
+separate error correction layer:
+
+- dm-verity invokes FEC only when a block's hash doesn't match the wanted hash
+ or the block cannot be read at all. As a result, FEC doesn't add overhead to
+ the common case where no error occurs.
+
+- dm-verity hashes are also used to identify erasure locations for RS decoding.
+ This allows correcting twice as many errors.
+
+FEC uses an RS(255, k) code where k = 255 - fec_roots. fec_roots is usually 2.
+This means that each k (usually 253) message bytes have fec_roots (usually 2)
+bytes of parity data added to get a 255-byte codeword. (Many external sources
+call RS codewords "blocks". Since dm-verity already uses the term "block" to
+mean something else, we'll use the clearer term "RS codeword".)
+
+FEC checks fec_blocks blocks of message data in total, consisting of:
+
+1. The data blocks from the data device
+2. The hash blocks from the hash device
+3. Optional additional metadata that follows the hash blocks on the hash device
+
+dm-verity assumes that the FEC parity data was computed as if the following
+procedure were followed:
+
+1. Concatenate the message data from the above sources.
+2. Zero-pad to the next multiple of k blocks. Let msg be the resulting byte
+ array, and msglen its length in bytes.
+3. For 0 <= i < msglen / k (for each RS codeword):
+ a. Select msg[i + j * msglen / k] for 0 <= j < k.
+ Consider these to be the 'k' message bytes of an RS codeword.
+ b. Compute the corresponding 'fec_roots' parity bytes of the RS codeword,
+ and concatenate them to the FEC parity data.
+
+Step 3a interleaves the RS codewords across the entire device using an
+interleaving degree of data_block_size * ceil(fec_blocks / k). This is the
+maximal interleaving, such that the message data consists of a region containing
+byte 0 of all the RS codewords, then a region containing byte 1 of all the RS
+codewords, and so on up to the region for byte 'k - 1'. Note that the number of
+codewords is set to a multiple of data_block_size; thus, the regions are
+block-aligned, and there is an implicit zero padding of up to 'k - 1' blocks.
+
+This interleaving allows long bursts of errors to be corrected. It provides
+much stronger error correction than storage devices typically provide, while
+keeping the space overhead low.
+
+The cost is slow decoding: correcting a single block usually requires reading
+254 extra blocks spread evenly across the device(s). However, that is
+acceptable because dm-verity uses FEC only when there is actually an error.
+
+The list below contains additional details about the RS codes used by
+dm-verity's FEC. Userspace programs that generate the parity data need to use
+these parameters for the parity data to match exactly:
+
+- Field used is GF(256)
+- Bytes are mapped to/from GF(256) elements in the natural way, where bits 0
+ through 7 (low-order to high-order) map to the coefficients of x^0 through x^7
+- Field generator polynomial is x^8 + x^4 + x^3 + x^2 + 1
+- The codes used are systematic, BCH-view codes
+- Primitive element alpha is 'x'
+- First consecutive root of code generator polynomial is 'x^0'
On-disk format
==============
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index c58a9a8ea54e92..a3fcdca7e6db31 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -226,6 +226,7 @@ config BLK_DEV_DM
select BLOCK_HOLDER_DEPRECATED if SYSFS
select BLK_DEV_DM_BUILTIN
select BLK_MQ_STACKING
+ select CRYPTO_LIB_SHA256 if IMA
depends on DAX || DAX=n
help
Device-mapper is a low level volume manager. It works by allowing
@@ -299,6 +300,7 @@ config DM_CRYPT
select CRYPTO
select CRYPTO_CBC
select CRYPTO_ESSIV
+ select CRYPTO_LIB_AES
select CRYPTO_LIB_MD5 # needed by lmk IV mode
help
This device-mapper target allows you to create a device that
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 60f7badec91f23..26fedf5883eff6 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -391,7 +391,7 @@ struct dm_buffer_cache {
*/
unsigned int num_locks;
bool no_sleep;
- struct buffer_tree trees[];
+ struct buffer_tree trees[] __counted_by(num_locks);
};
static DEFINE_STATIC_KEY_FALSE(no_sleep_enabled);
@@ -2511,7 +2511,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
}
num_locks = dm_num_hash_locks();
- c = kzalloc(sizeof(*c) + (num_locks * sizeof(struct buffer_tree)), GFP_KERNEL);
+ c = kzalloc_flex(*c, cache.trees, num_locks);
if (!c) {
r = -ENOMEM;
goto bad_client;
diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
index 57158c02d096ed..acd9b179fcb3f2 100644
--- a/drivers/md/dm-cache-metadata.c
+++ b/drivers/md/dm-cache-metadata.c
@@ -1023,6 +1023,12 @@ static bool cmd_write_lock(struct dm_cache_metadata *cmd)
return; \
} while (0)
+#define WRITE_LOCK_OR_GOTO(cmd, label) \
+ do { \
+ if (!cmd_write_lock((cmd))) \
+ goto label; \
+ } while (0)
+
#define WRITE_UNLOCK(cmd) \
up_write(&(cmd)->root_lock)
@@ -1714,17 +1720,6 @@ int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *
return r;
}
-int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result)
-{
- int r;
-
- READ_LOCK(cmd);
- r = blocks_are_unmapped_or_clean(cmd, 0, cmd->cache_blocks, result);
- READ_UNLOCK(cmd);
-
- return r;
-}
-
void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd)
{
WRITE_LOCK_VOID(cmd);
@@ -1791,11 +1786,8 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT,
CACHE_MAX_CONCURRENT_LOCKS);
- WRITE_LOCK(cmd);
- if (cmd->fail_io) {
- WRITE_UNLOCK(cmd);
- goto out;
- }
+ /* cmd_write_lock() already checks fail_io with cmd->root_lock held */
+ WRITE_LOCK_OR_GOTO(cmd, out);
__destroy_persistent_data_objects(cmd, false);
old_bm = cmd->bm;
@@ -1824,3 +1816,12 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd)
return r;
}
+
+int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result)
+{
+ READ_LOCK(cmd);
+ *result = cmd->clean_when_opened;
+ READ_UNLOCK(cmd);
+
+ return 0;
+}
diff --git a/drivers/md/dm-cache-metadata.h b/drivers/md/dm-cache-metadata.h
index 5f77890207fede..91f8706b41fdde 100644
--- a/drivers/md/dm-cache-metadata.h
+++ b/drivers/md/dm-cache-metadata.h
@@ -135,17 +135,17 @@ int dm_cache_get_metadata_dev_size(struct dm_cache_metadata *cmd,
*/
int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *p);
-/*
- * Query method. Are all the blocks in the cache clean?
- */
-int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result);
-
int dm_cache_metadata_needs_check(struct dm_cache_metadata *cmd, bool *result);
int dm_cache_metadata_set_needs_check(struct dm_cache_metadata *cmd);
void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd);
void dm_cache_metadata_set_read_write(struct dm_cache_metadata *cmd);
int dm_cache_metadata_abort(struct dm_cache_metadata *cmd);
+/*
+ * Query method. Was the metadata cleanly shut down when opened?
+ */
+int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result);
+
/*----------------------------------------------------------------*/
#endif /* DM_CACHE_METADATA_H */
diff --git a/drivers/md/dm-cache-policy-smq.c b/drivers/md/dm-cache-policy-smq.c
index b328d9601046b2..dd77a93fd68d2d 100644
--- a/drivers/md/dm-cache-policy-smq.c
+++ b/drivers/md/dm-cache-policy-smq.c
@@ -1589,14 +1589,18 @@ static int smq_invalidate_mapping(struct dm_cache_policy *p, dm_cblock_t cblock)
{
struct smq_policy *mq = to_smq_policy(p);
struct entry *e = get_entry(&mq->cache_alloc, from_cblock(cblock));
+ unsigned long flags;
if (!e->allocated)
return -ENODATA;
+ spin_lock_irqsave(&mq->lock, flags);
// FIXME: what if this block has pending background work?
del_queue(mq, e);
h_remove(&mq->table, e);
free_entry(&mq->cache_alloc, e);
+ spin_unlock_irqrestore(&mq->lock, flags);
+
return 0;
}
diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 935ab79b1d0cd4..097315a9bf0f13 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -1462,11 +1462,19 @@ static void invalidate_complete(struct dm_cache_migration *mg, bool success)
struct cache *cache = mg->cache;
bio_list_init(&bios);
- if (dm_cell_unlock_v2(cache->prison, mg->cell, &bios))
- free_prison_cell(cache, mg->cell);
+ if (mg->cell) {
+ if (dm_cell_unlock_v2(cache->prison, mg->cell, &bios))
+ free_prison_cell(cache, mg->cell);
+ }
- if (!success && mg->overwrite_bio)
- bio_io_error(mg->overwrite_bio);
+ if (mg->overwrite_bio) {
+ // Set generic error if the bio hasn't been issued yet,
+ // e.g., invalidation or metadata commit failed before bio
+ // submission. Otherwise preserve the bio's own error status.
+ if (!success && !mg->overwrite_bio->bi_status)
+ mg->overwrite_bio->bi_status = BLK_STS_IOERR;
+ bio_endio(mg->overwrite_bio);
+ }
free_migration(mg);
defer_bios(cache, &bios);
@@ -1506,6 +1514,24 @@ static int invalidate_cblock(struct cache *cache, dm_cblock_t cblock)
return r;
}
+static void invalidate_committed(struct work_struct *ws)
+{
+ struct dm_cache_migration *mg = ws_to_mg(ws);
+ struct cache *cache = mg->cache;
+ struct bio *bio = mg->overwrite_bio;
+ struct per_bio_data *pb = get_per_bio_data(bio);
+
+ if (mg->k.input) {
+ invalidate_complete(mg, false);
+ return;
+ }
+
+ init_continuation(&mg->k, invalidate_completed);
+ remap_to_origin_clear_discard(cache, bio, mg->invalidate_oblock);
+ dm_hook_bio(&pb->hook_info, bio, overwrite_endio, mg);
+ dm_submit_bio_remap(bio, NULL);
+}
+
static void invalidate_remove(struct work_struct *ws)
{
int r;
@@ -1518,10 +1544,8 @@ static void invalidate_remove(struct work_struct *ws)
return;
}
- init_continuation(&mg->k, invalidate_completed);
+ init_continuation(&mg->k, invalidate_committed);
continue_after_commit(&cache->committer, &mg->k);
- remap_to_origin_clear_discard(cache, mg->overwrite_bio, mg->invalidate_oblock);
- mg->overwrite_bio = NULL;
schedule_commit(&cache->committer);
}
@@ -1539,6 +1563,15 @@ static int invalidate_lock(str
... [truncated]