Buffer overflow / memory-safety issue in device-mapper ioctl processing

HIGH
torvalds/linux
Commit: a5f998094fa3
Affected: <= v7.0-rc6 (prior to the for-7.1/dm-changes integration); includes 7.0-rc6 and earlier
2026-04-16 06:22 UTC

Description

The commit set contains device-mapper and DM core fixes that include memory-safety improvements around dynamic allocations and ioctl handling (notably in dm-bufio.c and related dm-cache/metadata paths). The key change is replacing a potentially unsafe fixed-size allocation for a dynamically-sized structure with a safe flexible-allocation approach (kzalloc_flex) and related ioctl/error-path hardening. This points to a genuine memory-safety bug (buffer overflow risk) in ioctl/Data-structure handling within the device-mapper subsystem, rather than a pure dependency bump or cosmetic cleanup.

Commit Details

Author: Linus Torvalds

Date: 2026-04-15 22:11 UTC

Message:

Merge tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Benjamin Marzinski: "There are fixes for some corner case crashes in dm-cache and dm-mirror, new setup functionality for dm-vdo, and miscellaneous minor fixes and cleanups, especially to dm-verity. dm-vdo: - Make dm-vdo able to format the device itself, like other dm targets, instead of needing a userspace formating program - Add some sanity checks and code cleanup dm-cache: - Fix crashes and hangs when operating in passthrough mode (which have been around, unnoticed, since 4.12), as well as a late arriving fix for an error path bug in the passthrough fix - Fix a corner case memory leak dm-verity: - Another set of minor bugfixes and code cleanups to the forward error correction code dm-mirror - Fix minor initialization bug - Fix overflow crash on a large devices with small region sizes dm-crypt - Reimplement elephant diffuser using AES library and minor cleanups dm-core: - Claude found a buffer overflow in /dev/mapper/contrl ioctl handling - make dm_mod.wait_for correctly wait for partitions - minor code fixes and cleanups" * tag 'for-7.1/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (62 commits) dm cache: fix missing return in invalidate_committed's error path dm: fix a buffer overflow in ioctl processing dm-crypt: Make crypt_iv_operations::post return void dm vdo: Fix spelling mistake "postive" -> "positive" dm: provide helper to set stacked limits dm-integrity: always set the io hints dm-integrity: fix mismatched queue limits dm-bufio: use kzalloc_flex dm vdo: save the formatted metadata to disk dm vdo: add formatting logic and initialization dm vdo: add synchronous metadata I/O submission helper dm vdo: add geometry block structure dm vdo: add geometry block encoding dm vdo: add upfront validation for logical size dm vdo: add formatting parameters to table line dm vdo: add super block initialization to encodings.c dm vdo: add geometry block initialization to encodings.c dm-crypt: Make crypt_iv_operations::wipe return void dm-crypt: Reimplement elephant diffuser using AES library dm-verity-fec: warn even when there were no errors ...

Triage Assessment

Vulnerability Type: Buffer Overflow

Confidence: HIGH

Reasoning:

The commit note explicitly mentions a fix for a buffer overflow in /dev/mapper/contrl ioctl handling, which is a memory-safety vulnerability. The diff includes related changes to ioctl handling and several robustness/fix commits across device-mapper components, reinforcing that this is a security fix rather than mere refactoring or feature work.

Verification Assessment

Vulnerability Type: Buffer overflow / memory-safety issue in device-mapper ioctl processing

Confidence: HIGH

Affected Versions: <= v7.0-rc6 (prior to the for-7.1/dm-changes integration); includes 7.0-rc6 and earlier

Code Diff

diff --git a/Documentation/admin-guide/device-mapper/verity.rst b/Documentation/admin-guide/device-mapper/verity.rst index 3ecab1cff9c64c..eb9475d7e1965a 100644 --- a/Documentation/admin-guide/device-mapper/verity.rst +++ b/Documentation/admin-guide/device-mapper/verity.rst @@ -102,29 +102,42 @@ ignore_zero_blocks that are not guaranteed to contain zeroes. use_fec_from_device <fec_dev> - Use forward error correction (FEC) to recover from corruption if hash - verification fails. Use encoding data from the specified device. This - may be the same device where data and hash blocks reside, in which case - fec_start must be outside data and hash areas. + Use forward error correction (FEC) parity data from the specified device to + try to automatically recover from corruption and I/O errors. - If the encoding data covers additional metadata, it must be accessible - on the hash device after the hash blocks. + If this option is given, then <fec_roots> and <fec_blocks> must also be + given. <hash_block_size> must also be equal to <data_block_size>. - Note: block sizes for data and hash devices must match. Also, if the - verity <dev> is encrypted the <fec_dev> should be too. + <fec_dev> can be the same as <dev>, in which case <fec_start> must be + outside the data area. It can also be the same as <hash_dev>, in which case + <fec_start> must be outside the hash and optional additional metadata areas. + + If the data <dev> is encrypted, the <fec_dev> should be too. + + For more information, see `Forward error correction`_. fec_roots <num> - Number of generator roots. This equals to the number of parity bytes in - the encoding data. For example, in RS(M, N) encoding, the number of roots - is M-N. + The number of parity bytes in each 255-byte Reed-Solomon codeword. The + Reed-Solomon code used will be an RS(255, k) code where k = 255 - fec_roots. + + The supported values are 2 through 24 inclusive. Higher values provide + stronger error correction. However, the minimum value of 2 already provides + strong error correction due to the use of interleaving, so 2 is the + recommended value for most users. fec_roots=2 corresponds to an + RS(255, 253) code, which has a space overhead of about 0.8%. fec_blocks <num> - The number of encoding data blocks on the FEC device. The block size for - the FEC device is <data_block_size>. + The total number of <data_block_size> blocks that are error-checked using + FEC. This must be at least the sum of <num_data_blocks> and the number of + blocks needed by the hash tree. It can include additional metadata blocks, + which are assumed to be accessible on <hash_dev> following the hash blocks. + + Note that this is *not* the number of parity blocks. The number of parity + blocks is inferred from <fec_blocks>, <fec_roots>, and <data_block_size>. fec_start <offset> - This is the offset, in <data_block_size> blocks, from the start of the - FEC device to the beginning of the encoding data. + This is the offset, in <data_block_size> blocks, from the start of <fec_dev> + to the beginning of the parity data. check_at_most_once Verify data blocks only the first time they are read from the data device, @@ -180,11 +193,6 @@ per-block basis. This allows for a lightweight hash computation on first read into the page cache. Block hashes are stored linearly, aligned to the nearest block size. -If forward error correction (FEC) support is enabled any recovery of -corrupted data will be verified using the cryptographic hash of the -corresponding data. This is why combining error correction with -integrity checking is essential. - Hash Tree --------- @@ -212,6 +220,80 @@ The tree looks something like: / ... \ / . . . \ / \ blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767 +Forward error correction +------------------------ + +dm-verity's optional forward error correction (FEC) support adds strong error +correction capabilities to dm-verity. It allows systems that would be rendered +inoperable by errors to continue operating, albeit with reduced performance. + +FEC uses Reed-Solomon (RS) codes that are interleaved across the entire +device(s), allowing long bursts of corrupt or unreadable blocks to be recovered. + +dm-verity validates any FEC-corrected block against the wanted hash before using +it. Therefore, FEC doesn't affect the security properties of dm-verity. + +The integration of FEC with dm-verity provides significant benefits over a +separate error correction layer: + +- dm-verity invokes FEC only when a block's hash doesn't match the wanted hash + or the block cannot be read at all. As a result, FEC doesn't add overhead to + the common case where no error occurs. + +- dm-verity hashes are also used to identify erasure locations for RS decoding. + This allows correcting twice as many errors. + +FEC uses an RS(255, k) code where k = 255 - fec_roots. fec_roots is usually 2. +This means that each k (usually 253) message bytes have fec_roots (usually 2) +bytes of parity data added to get a 255-byte codeword. (Many external sources +call RS codewords "blocks". Since dm-verity already uses the term "block" to +mean something else, we'll use the clearer term "RS codeword".) + +FEC checks fec_blocks blocks of message data in total, consisting of: + +1. The data blocks from the data device +2. The hash blocks from the hash device +3. Optional additional metadata that follows the hash blocks on the hash device + +dm-verity assumes that the FEC parity data was computed as if the following +procedure were followed: + +1. Concatenate the message data from the above sources. +2. Zero-pad to the next multiple of k blocks. Let msg be the resulting byte + array, and msglen its length in bytes. +3. For 0 <= i < msglen / k (for each RS codeword): + a. Select msg[i + j * msglen / k] for 0 <= j < k. + Consider these to be the 'k' message bytes of an RS codeword. + b. Compute the corresponding 'fec_roots' parity bytes of the RS codeword, + and concatenate them to the FEC parity data. + +Step 3a interleaves the RS codewords across the entire device using an +interleaving degree of data_block_size * ceil(fec_blocks / k). This is the +maximal interleaving, such that the message data consists of a region containing +byte 0 of all the RS codewords, then a region containing byte 1 of all the RS +codewords, and so on up to the region for byte 'k - 1'. Note that the number of +codewords is set to a multiple of data_block_size; thus, the regions are +block-aligned, and there is an implicit zero padding of up to 'k - 1' blocks. + +This interleaving allows long bursts of errors to be corrected. It provides +much stronger error correction than storage devices typically provide, while +keeping the space overhead low. + +The cost is slow decoding: correcting a single block usually requires reading +254 extra blocks spread evenly across the device(s). However, that is +acceptable because dm-verity uses FEC only when there is actually an error. + +The list below contains additional details about the RS codes used by +dm-verity's FEC. Userspace programs that generate the parity data need to use +these parameters for the parity data to match exactly: + +- Field used is GF(256) +- Bytes are mapped to/from GF(256) elements in the natural way, where bits 0 + through 7 (low-order to high-order) map to the coefficients of x^0 through x^7 +- Field generator polynomial is x^8 + x^4 + x^3 + x^2 + 1 +- The codes used are systematic, BCH-view codes +- Primitive element alpha is 'x' +- First consecutive root of code generator polynomial is 'x^0' On-disk format ============== diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index c58a9a8ea54e92..a3fcdca7e6db31 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -226,6 +226,7 @@ config BLK_DEV_DM select BLOCK_HOLDER_DEPRECATED if SYSFS select BLK_DEV_DM_BUILTIN select BLK_MQ_STACKING + select CRYPTO_LIB_SHA256 if IMA depends on DAX || DAX=n help Device-mapper is a low level volume manager. It works by allowing @@ -299,6 +300,7 @@ config DM_CRYPT select CRYPTO select CRYPTO_CBC select CRYPTO_ESSIV + select CRYPTO_LIB_AES select CRYPTO_LIB_MD5 # needed by lmk IV mode help This device-mapper target allows you to create a device that diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index 60f7badec91f23..26fedf5883eff6 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -391,7 +391,7 @@ struct dm_buffer_cache { */ unsigned int num_locks; bool no_sleep; - struct buffer_tree trees[]; + struct buffer_tree trees[] __counted_by(num_locks); }; static DEFINE_STATIC_KEY_FALSE(no_sleep_enabled); @@ -2511,7 +2511,7 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign } num_locks = dm_num_hash_locks(); - c = kzalloc(sizeof(*c) + (num_locks * sizeof(struct buffer_tree)), GFP_KERNEL); + c = kzalloc_flex(*c, cache.trees, num_locks); if (!c) { r = -ENOMEM; goto bad_client; diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c index 57158c02d096ed..acd9b179fcb3f2 100644 --- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -1023,6 +1023,12 @@ static bool cmd_write_lock(struct dm_cache_metadata *cmd) return; \ } while (0) +#define WRITE_LOCK_OR_GOTO(cmd, label) \ + do { \ + if (!cmd_write_lock((cmd))) \ + goto label; \ + } while (0) + #define WRITE_UNLOCK(cmd) \ up_write(&(cmd)->root_lock) @@ -1714,17 +1720,6 @@ int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy * return r; } -int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result) -{ - int r; - - READ_LOCK(cmd); - r = blocks_are_unmapped_or_clean(cmd, 0, cmd->cache_blocks, result); - READ_UNLOCK(cmd); - - return r; -} - void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd) { WRITE_LOCK_VOID(cmd); @@ -1791,11 +1786,8 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd) new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT, CACHE_MAX_CONCURRENT_LOCKS); - WRITE_LOCK(cmd); - if (cmd->fail_io) { - WRITE_UNLOCK(cmd); - goto out; - } + /* cmd_write_lock() already checks fail_io with cmd->root_lock held */ + WRITE_LOCK_OR_GOTO(cmd, out); __destroy_persistent_data_objects(cmd, false); old_bm = cmd->bm; @@ -1824,3 +1816,12 @@ int dm_cache_metadata_abort(struct dm_cache_metadata *cmd) return r; } + +int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result) +{ + READ_LOCK(cmd); + *result = cmd->clean_when_opened; + READ_UNLOCK(cmd); + + return 0; +} diff --git a/drivers/md/dm-cache-metadata.h b/drivers/md/dm-cache-metadata.h index 5f77890207fede..91f8706b41fdde 100644 --- a/drivers/md/dm-cache-metadata.h +++ b/drivers/md/dm-cache-metadata.h @@ -135,17 +135,17 @@ int dm_cache_get_metadata_dev_size(struct dm_cache_metadata *cmd, */ int dm_cache_write_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *p); -/* - * Query method. Are all the blocks in the cache clean? - */ -int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result); - int dm_cache_metadata_needs_check(struct dm_cache_metadata *cmd, bool *result); int dm_cache_metadata_set_needs_check(struct dm_cache_metadata *cmd); void dm_cache_metadata_set_read_only(struct dm_cache_metadata *cmd); void dm_cache_metadata_set_read_write(struct dm_cache_metadata *cmd); int dm_cache_metadata_abort(struct dm_cache_metadata *cmd); +/* + * Query method. Was the metadata cleanly shut down when opened? + */ +int dm_cache_metadata_clean_when_opened(struct dm_cache_metadata *cmd, bool *result); + /*----------------------------------------------------------------*/ #endif /* DM_CACHE_METADATA_H */ diff --git a/drivers/md/dm-cache-policy-smq.c b/drivers/md/dm-cache-policy-smq.c index b328d9601046b2..dd77a93fd68d2d 100644 --- a/drivers/md/dm-cache-policy-smq.c +++ b/drivers/md/dm-cache-policy-smq.c @@ -1589,14 +1589,18 @@ static int smq_invalidate_mapping(struct dm_cache_policy *p, dm_cblock_t cblock) { struct smq_policy *mq = to_smq_policy(p); struct entry *e = get_entry(&mq->cache_alloc, from_cblock(cblock)); + unsigned long flags; if (!e->allocated) return -ENODATA; + spin_lock_irqsave(&mq->lock, flags); // FIXME: what if this block has pending background work? del_queue(mq, e); h_remove(&mq->table, e); free_entry(&mq->cache_alloc, e); + spin_unlock_irqrestore(&mq->lock, flags); + return 0; } diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 935ab79b1d0cd4..097315a9bf0f13 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -1462,11 +1462,19 @@ static void invalidate_complete(struct dm_cache_migration *mg, bool success) struct cache *cache = mg->cache; bio_list_init(&bios); - if (dm_cell_unlock_v2(cache->prison, mg->cell, &bios)) - free_prison_cell(cache, mg->cell); + if (mg->cell) { + if (dm_cell_unlock_v2(cache->prison, mg->cell, &bios)) + free_prison_cell(cache, mg->cell); + } - if (!success && mg->overwrite_bio) - bio_io_error(mg->overwrite_bio); + if (mg->overwrite_bio) { + // Set generic error if the bio hasn't been issued yet, + // e.g., invalidation or metadata commit failed before bio + // submission. Otherwise preserve the bio's own error status. + if (!success && !mg->overwrite_bio->bi_status) + mg->overwrite_bio->bi_status = BLK_STS_IOERR; + bio_endio(mg->overwrite_bio); + } free_migration(mg); defer_bios(cache, &bios); @@ -1506,6 +1514,24 @@ static int invalidate_cblock(struct cache *cache, dm_cblock_t cblock) return r; } +static void invalidate_committed(struct work_struct *ws) +{ + struct dm_cache_migration *mg = ws_to_mg(ws); + struct cache *cache = mg->cache; + struct bio *bio = mg->overwrite_bio; + struct per_bio_data *pb = get_per_bio_data(bio); + + if (mg->k.input) { + invalidate_complete(mg, false); + return; + } + + init_continuation(&mg->k, invalidate_completed); + remap_to_origin_clear_discard(cache, bio, mg->invalidate_oblock); + dm_hook_bio(&pb->hook_info, bio, overwrite_endio, mg); + dm_submit_bio_remap(bio, NULL); +} + static void invalidate_remove(struct work_struct *ws) { int r; @@ -1518,10 +1544,8 @@ static void invalidate_remove(struct work_struct *ws) return; } - init_continuation(&mg->k, invalidate_completed); + init_continuation(&mg->k, invalidate_committed); continue_after_commit(&cache->committer, &mg->k); - remap_to_origin_clear_discard(cache, mg->overwrite_bio, mg->invalidate_oblock); - mg->overwrite_bio = NULL; schedule_commit(&cache->committer); } @@ -1539,6 +1563,15 @@ static int invalidate_lock(str ... [truncated]
← Back to Alerts View on GitHub →