| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| arduino-esp32 provides an Arduino core for the ESP32. Versions prior to 3.3.0-RC1 and 3.2.1 contain a HTTP Response Splitting vulnerability. The `sendHeader` function takes arbitrary input for the HTTP header name and value, concatenates them into an HTTP header line, and appends this to the outgoing HTTP response headers. There is no validation or sanitization of the `name` or `value` parameters before they are included in the HTTP response. If an attacker can control the input to `sendHeader` (either directly or indirectly), they could inject carriage return (`\r`) or line feed (`\n`) characters into either the header name or value. This could allow the attacker to inject additional headers, manipulate the structure of the HTTP response, potentially inject an entire new HTTP response (HTTP Response Splitting), and/or ause header confusion or other HTTP protocol attacks. Versions 3.3.0-RC1 and 3.2.1 contain a fix for the issue. |
| The Spring Security annotation detection mechanism may not correctly resolve annotations on methods within type hierarchies with a parameterized super type with unbounded generics. This can be an issue when using @PreAuthorize and other method security annotations, resulting in an authorization bypass.
Your application may be affected by this if you are using Spring Security's @EnableMethodSecurity feature.
You are not affected by this if you are not using @EnableMethodSecurity or if you do not use security annotations on methods in generic superclasses or generic interfaces.
This CVE is published in conjunction with CVE-2025-41249 https://spring.io/security/cve-2025-41249 . |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nft_objref: validate objref and objrefmap expressions
Referencing a synproxy stateful object from OUTPUT hook causes kernel
crash due to infinite recursive calls:
BUG: TASK stack guard page was hit at 000000008bda5b8c (stack is 000000003ab1c4a5..00000000494d8b12)
[...]
Call Trace:
__find_rr_leaf+0x99/0x230
fib6_table_lookup+0x13b/0x2d0
ip6_pol_route+0xa4/0x400
fib6_rule_lookup+0x156/0x240
ip6_route_output_flags+0xc6/0x150
__nf_ip6_route+0x23/0x50
synproxy_send_tcp_ipv6+0x106/0x200
synproxy_send_client_synack_ipv6+0x1aa/0x1f0
nft_synproxy_do_eval+0x263/0x310
nft_do_chain+0x5a8/0x5f0 [nf_tables
nft_do_chain_inet+0x98/0x110
nf_hook_slow+0x43/0xc0
__ip6_local_out+0xf0/0x170
ip6_local_out+0x17/0x70
synproxy_send_tcp_ipv6+0x1a2/0x200
synproxy_send_client_synack_ipv6+0x1aa/0x1f0
[...]
Implement objref and objrefmap expression validate functions.
Currently, only NFT_OBJECT_SYNPROXY object type requires validation.
This will also handle a jump to a chain using a synproxy object from the
OUTPUT hook.
Now when trying to reference a synproxy object in the OUTPUT hook, nft
will produce the following error:
synproxy_crash.nft: Error: Could not process rule: Operation not supported
synproxy name mysynproxy
^^^^^^^^^^^^^^^^^^^^^^^^ |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: avoid potential out-of-bounds in btrfs_encode_fh()
The function btrfs_encode_fh() does not properly account for the three
cases it handles.
Before writing to the file handle (fh), the function only returns to the
user BTRFS_FID_SIZE_NON_CONNECTABLE (5 dwords, 20 bytes) or
BTRFS_FID_SIZE_CONNECTABLE (8 dwords, 32 bytes).
However, when a parent exists and the root ID of the parent and the
inode are different, the function writes BTRFS_FID_SIZE_CONNECTABLE_ROOT
(10 dwords, 40 bytes).
If *max_len is not large enough, this write goes out of bounds because
BTRFS_FID_SIZE_CONNECTABLE_ROOT is greater than
BTRFS_FID_SIZE_CONNECTABLE originally returned.
This results in an 8-byte out-of-bounds write at
fid->parent_root_objectid = parent_root_id.
A previous attempt to fix this issue was made but was lost.
https://lore.kernel.org/all/4CADAEEC020000780001B32C@vpn.id2.novell.com/
Although this issue does not seem to be easily triggerable, it is a
potential memory corruption bug that should be fixed. This patch
resolves the issue by ensuring the function returns the appropriate size
for all three cases and validates that *max_len is large enough before
writing any data. |
| In the Linux kernel, the following vulnerability has been resolved:
kernel/sys.c: fix the racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths
The usage of task_lock(tsk->group_leader) in sys_prlimit64()->do_prlimit()
path is very broken.
sys_prlimit64() does get_task_struct(tsk) but this only protects task_struct
itself. If tsk != current and tsk is not a leader, this process can exit/exec
and task_lock(tsk->group_leader) may use the already freed task_struct.
Another problem is that sys_prlimit64() can race with mt-exec which changes
->group_leader. In this case do_prlimit() may take the wrong lock, or (worse)
->group_leader may change between task_lock() and task_unlock().
Change sys_prlimit64() to take tasklist_lock when necessary. This is not
nice, but I don't see a better fix for -stable. |
| In the Linux kernel, the following vulnerability has been resolved:
page_pool: Fix PP_MAGIC_MASK to avoid crashing on some 32-bit arches
Helge reported that the introduction of PP_MAGIC_MASK let to crashes on
boot on his 32-bit parisc machine. The cause of this is the mask is set
too wide, so the page_pool_page_is_pp() incurs false positives which
crashes the machine.
Just disabling the check in page_pool_is_pp() will lead to the page_pool
code itself malfunctioning; so instead of doing this, this patch changes
the define for PP_DMA_INDEX_BITS to avoid mistaking arbitrary kernel
pointers for page_pool-tagged pages.
The fix relies on the kernel pointers that alias with the pp_magic field
always being above PAGE_OFFSET. With this assumption, we can use the
lowest bit of the value of PAGE_OFFSET as the upper bound of the
PP_DMA_INDEX_MASK, which should avoid the false positives.
Because we cannot rely on PAGE_OFFSET always being a compile-time
constant, nor on it always being >0, we fall back to disabling the
dma_index storage when there are not enough bits available. This leaves
us in the situation we were in before the patch in the Fixes tag, but
only on a subset of architecture configurations. This seems to be the
best we can do until the transition to page types in complete for
page_pool pages.
v2:
- Make sure there's at least 8 bits available and that the PAGE_OFFSET
bit calculation doesn't wrap |
| In the Linux kernel, the following vulnerability has been resolved:
fs: quota: create dedicated workqueue for quota_release_work
There is a kernel panic due to WARN_ONCE when panic_on_warn is set.
This issue occurs when writeback is triggered due to sync call for an
opened file(ie, writeback reason is WB_REASON_SYNC). When f2fs balance
is needed at sync path, flush for quota_release_work is triggered.
By default quota_release_work is queued to "events_unbound" queue which
does not have WQ_MEM_RECLAIM flag. During f2fs balance "writeback"
workqueue tries to flush quota_release_work causing kernel panic due to
MEM_RECLAIM flag mismatch errors.
This patch creates dedicated workqueue with WQ_MEM_RECLAIM flag
for work quota_release_work.
------------[ cut here ]------------
WARNING: CPU: 4 PID: 14867 at kernel/workqueue.c:3721 check_flush_dependency+0x13c/0x148
Call trace:
check_flush_dependency+0x13c/0x148
__flush_work+0xd0/0x398
flush_delayed_work+0x44/0x5c
dquot_writeback_dquots+0x54/0x318
f2fs_do_quota_sync+0xb8/0x1a8
f2fs_write_checkpoint+0x3cc/0x99c
f2fs_gc+0x190/0x750
f2fs_balance_fs+0x110/0x168
f2fs_write_single_data_page+0x474/0x7dc
f2fs_write_data_pages+0x7d0/0xd0c
do_writepages+0xe0/0x2f4
__writeback_single_inode+0x44/0x4ac
writeback_sb_inodes+0x30c/0x538
wb_writeback+0xf4/0x440
wb_workfn+0x128/0x5d4
process_scheduled_works+0x1c4/0x45c
worker_thread+0x32c/0x3e8
kthread+0x11c/0x1b0
ret_from_fork+0x10/0x20
Kernel panic - not syncing: kernel: panic_on_warn set ... |
| In the Linux kernel, the following vulnerability has been resolved:
cpufreq: intel_pstate: Fix object lifecycle issue in update_qos_request()
The cpufreq_cpu_put() call in update_qos_request() takes place too early
because the latter subsequently calls freq_qos_update_request() that
indirectly accesses the policy object in question through the QoS request
object passed to it.
Fortunately, update_qos_request() is called under intel_pstate_driver_lock,
so this issue does not matter for changing the intel_pstate operation
mode, but it theoretically can cause a crash to occur on CPU device hot
removal (which currently can only happen in virt, but it is formally
supported nevertheless).
Address this issue by modifying update_qos_request() to drop the
reference to the policy later. |
| In the Linux kernel, the following vulnerability has been resolved:
ext4: guard against EA inode refcount underflow in xattr update
syzkaller found a path where ext4_xattr_inode_update_ref() reads an EA
inode refcount that is already <= 0 and then applies ref_change (often
-1). That lets the refcount underflow and we proceed with a bogus value,
triggering errors like:
EXT4-fs error: EA inode <n> ref underflow: ref_count=-1 ref_change=-1
EXT4-fs warning: ea_inode dec ref err=-117
Make the invariant explicit: if the current refcount is non-positive,
treat this as on-disk corruption, emit ext4_error_inode(), and fail the
operation with -EFSCORRUPTED instead of updating the refcount. Delete the
WARN_ONCE() as negative refcounts are now impossible; keep error reporting
in ext4_error_inode().
This prevents the underflow and the follow-on orphan/cleanup churn. |
| In the Linux kernel, the following vulnerability has been resolved:
net: usb: lan78xx: Fix lost EEPROM read timeout error(-ETIMEDOUT) in lan78xx_read_raw_eeprom
Syzbot reported read of uninitialized variable BUG with following call stack.
lan78xx 8-1:1.0 (unnamed net_device) (uninitialized): EEPROM read operation timeout
=====================================================
BUG: KMSAN: uninit-value in lan78xx_read_eeprom drivers/net/usb/lan78xx.c:1095 [inline]
BUG: KMSAN: uninit-value in lan78xx_init_mac_address drivers/net/usb/lan78xx.c:1937 [inline]
BUG: KMSAN: uninit-value in lan78xx_reset+0x999/0x2cd0 drivers/net/usb/lan78xx.c:3241
lan78xx_read_eeprom drivers/net/usb/lan78xx.c:1095 [inline]
lan78xx_init_mac_address drivers/net/usb/lan78xx.c:1937 [inline]
lan78xx_reset+0x999/0x2cd0 drivers/net/usb/lan78xx.c:3241
lan78xx_bind+0x711/0x1690 drivers/net/usb/lan78xx.c:3766
lan78xx_probe+0x225c/0x3310 drivers/net/usb/lan78xx.c:4707
Local variable sig.i.i created at:
lan78xx_read_eeprom drivers/net/usb/lan78xx.c:1092 [inline]
lan78xx_init_mac_address drivers/net/usb/lan78xx.c:1937 [inline]
lan78xx_reset+0x77e/0x2cd0 drivers/net/usb/lan78xx.c:3241
lan78xx_bind+0x711/0x1690 drivers/net/usb/lan78xx.c:3766
The function lan78xx_read_raw_eeprom failed to properly propagate EEPROM
read timeout errors (-ETIMEDOUT). In the fallthrough path, it first
attempted to restore the pin configuration for LED outputs and then
returned only the status of that restore operation, discarding the
original timeout error.
As a result, callers could mistakenly treat the data buffer as valid
even though the EEPROM read had actually timed out with no data or partial
data.
To fix this, handle errors in restoring the LED pin configuration separately.
If the restore succeeds, return any prior EEPROM timeout error correctly
to the caller. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix metadata_dst leak __bpf_redirect_neigh_v{4,6}
Cilium has a BPF egress gateway feature which forces outgoing K8s Pod
traffic to pass through dedicated egress gateways which then SNAT the
traffic in order to interact with stable IPs outside the cluster.
The traffic is directed to the gateway via vxlan tunnel in collect md
mode. A recent BPF change utilized the bpf_redirect_neigh() helper to
forward packets after the arrival and decap on vxlan, which turned out
over time that the kmalloc-256 slab usage in kernel was ever-increasing.
The issue was that vxlan allocates the metadata_dst object and attaches
it through a fake dst entry to the skb. The latter was never released
though given bpf_redirect_neigh() was merely setting the new dst entry
via skb_dst_set() without dropping an existing one first. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/xe/guc: Check GuC running state before deregistering exec queue
In normal operation, a registered exec queue is disabled and
deregistered through the GuC, and freed only after the GuC confirms
completion. However, if the driver is forced to unbind while the exec
queue is still running, the user may call exec_destroy() after the GuC
has already been stopped and CT communication disabled.
In this case, the driver cannot receive a response from the GuC,
preventing proper cleanup of exec queue resources. Fix this by directly
releasing the resources when GuC is not running.
Here is the failure dmesg log:
"
[ 468.089581] ---[ end trace 0000000000000000 ]---
[ 468.089608] pci 0000:03:00.0: [drm] *ERROR* GT0: GUC ID manager unclean (1/65535)
[ 468.090558] pci 0000:03:00.0: [drm] GT0: total 65535
[ 468.090562] pci 0000:03:00.0: [drm] GT0: used 1
[ 468.090564] pci 0000:03:00.0: [drm] GT0: range 1..1 (1)
[ 468.092716] ------------[ cut here ]------------
[ 468.092719] WARNING: CPU: 14 PID: 4775 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:298 ttm_vram_mgr_fini+0xf8/0x130 [xe]
"
v2: use xe_uc_fw_is_running() instead of xe_guc_ct_enabled().
As CT may go down and come back during VF migration.
(cherry picked from commit 9b42321a02c50a12b2beb6ae9469606257fbecea) |
| In the Linux kernel, the following vulnerability has been resolved:
xen/events: Return -EEXIST for bound VIRQs
Change find_virq() to return -EEXIST when a VIRQ is bound to a
different CPU than the one passed in. With that, remove the BUG_ON()
from bind_virq_to_irq() to propogate the error upwards.
Some VIRQs are per-cpu, but others are per-domain or global. Those must
be bound to CPU0 and can then migrate elsewhere. The lookup for
per-domain and global will probably fail when migrated off CPU 0,
especially when the current CPU is tracked. This now returns -EEXIST
instead of BUG_ON().
A second call to bind a per-domain or global VIRQ is not expected, but
make it non-fatal to avoid trying to look up the irq, since we don't
know which per_cpu(virq_to_irq) it will be in. |
| In the Linux kernel, the following vulnerability has been resolved:
xsk: Harden userspace-supplied xdp_desc validation
Turned out certain clearly invalid values passed in xdp_desc from
userspace can pass xp_{,un}aligned_validate_desc() and then lead
to UBs or just invalid frames to be queued for xmit.
desc->len close to ``U32_MAX`` with a non-zero pool->tx_metadata_len
can cause positive integer overflow and wraparound, the same way low
enough desc->addr with a non-zero pool->tx_metadata_len can cause
negative integer overflow. Both scenarios can then pass the
validation successfully.
This doesn't happen with valid XSk applications, but can be used
to perform attacks.
Always promote desc->len to ``u64`` first to exclude positive
overflows of it. Use explicit check_{add,sub}_overflow() when
validating desc->addr (which is ``u64`` already).
bloat-o-meter reports a little growth of the code size:
add/remove: 0/0 grow/shrink: 2/1 up/down: 60/-16 (44)
Function old new delta
xskq_cons_peek_desc 299 330 +31
xsk_tx_peek_release_desc_batch 973 1002 +29
xsk_generic_xmit 3148 3132 -16
but hopefully this doesn't hurt the performance much. |
| In the Linux kernel, the following vulnerability has been resolved:
EDAC/i10nm: Skip DIMM enumeration on a disabled memory controller
When loading the i10nm_edac driver on some Intel Granite Rapids servers,
a call trace may appear as follows:
UBSAN: shift-out-of-bounds in drivers/edac/skx_common.c:453:16
shift exponent -66 is negative
...
__ubsan_handle_shift_out_of_bounds+0x1e3/0x390
skx_get_dimm_info.cold+0x47/0xd40 [skx_edac_common]
i10nm_get_dimm_config+0x23e/0x390 [i10nm_edac]
skx_register_mci+0x159/0x220 [skx_edac_common]
i10nm_init+0xcb0/0x1ff0 [i10nm_edac]
...
This occurs because some BIOS may disable a memory controller if there
aren't any memory DIMMs populated on this memory controller. The DIMMMTR
register of this disabled memory controller contains the invalid value
~0, resulting in the call trace above.
Fix this call trace by skipping DIMM enumeration on a disabled memory
controller. |
| In the Linux kernel, the following vulnerability has been resolved:
iommu/vt-d: debugfs: Fix legacy mode page table dump logic
In legacy mode, SSPTPTR is ignored if TT is not 00b or 01b. SSPTPTR
maybe uninitialized or zero in that case and may cause oops like:
Oops: general protection fault, probably for non-canonical address
0xf00087d3f000f000: 0000 [#1] SMP NOPTI
CPU: 2 UID: 0 PID: 786 Comm: cat Not tainted 6.16.0 #191 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-5.fc42 04/01/2014
RIP: 0010:pgtable_walk_level+0x98/0x150
RSP: 0018:ffffc90000f279c0 EFLAGS: 00010206
RAX: 0000000040000000 RBX: ffffc90000f27ab0 RCX: 000000000000001e
RDX: 0000000000000003 RSI: f00087d3f000f000 RDI: f00087d3f0010000
RBP: ffffc90000f27a00 R08: ffffc90000f27a98 R09: 0000000000000002
R10: 0000000000000000 R11: 0000000000000000 R12: f00087d3f000f000
R13: 0000000000000000 R14: 0000000040000000 R15: ffffc90000f27a98
FS: 0000764566dcb740(0000) GS:ffff8881f812c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000764566d44000 CR3: 0000000109d81003 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
<TASK>
pgtable_walk_level+0x88/0x150
domain_translation_struct_show.isra.0+0x2d9/0x300
dev_domain_translation_struct_show+0x20/0x40
seq_read_iter+0x12d/0x490
...
Avoid walking the page table if TT is not 00b or 01b. |
| In the Linux kernel, the following vulnerability has been resolved:
mm: hugetlb: avoid soft lockup when mprotect to large memory area
When calling mprotect() to a large hugetlb memory area in our customer's
workload (~300GB hugetlb memory), soft lockup was observed:
watchdog: BUG: soft lockup - CPU#98 stuck for 23s! [t2_new_sysv:126916]
CPU: 98 PID: 126916 Comm: t2_new_sysv Kdump: loaded Not tainted 6.17-rc7
Hardware name: GIGACOMPUTING R2A3-T40-AAV1/Jefferson CIO, BIOS 5.4.4.1 07/15/2025
pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mte_clear_page_tags+0x14/0x24
lr : mte_sync_tags+0x1c0/0x240
sp : ffff80003150bb80
x29: ffff80003150bb80 x28: ffff00739e9705a8 x27: 0000ffd2d6a00000
x26: 0000ff8e4bc00000 x25: 00e80046cde00f45 x24: 0000000000022458
x23: 0000000000000000 x22: 0000000000000004 x21: 000000011b380000
x20: ffff000000000000 x19: 000000011b379f40 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc875e0aa5e2c
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
x5 : fffffc01ce7a5c00 x4 : 00000000046cde00 x3 : fffffc0000000000
x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff0046cde7c000
Call trace:
mte_clear_page_tags+0x14/0x24
set_huge_pte_at+0x25c/0x280
hugetlb_change_protection+0x220/0x430
change_protection+0x5c/0x8c
mprotect_fixup+0x10c/0x294
do_mprotect_pkey.constprop.0+0x2e0/0x3d4
__arm64_sys_mprotect+0x24/0x44
invoke_syscall+0x50/0x160
el0_svc_common+0x48/0x144
do_el0_svc+0x30/0xe0
el0_svc+0x30/0xf0
el0t_64_sync_handler+0xc4/0x148
el0t_64_sync+0x1a4/0x1a8
Soft lockup is not triggered with THP or base page because there is
cond_resched() called for each PMD size.
Although the soft lockup was triggered by MTE, it should be not MTE
specific. The other processing which takes long time in the loop may
trigger soft lockup too.
So add cond_resched() for hugetlb to avoid soft lockup. |
| In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to avoid migrating empty section
It reports a bug from device w/ zufs:
F2FS-fs (dm-64): Inconsistent segment (173822) type [1, 0] in SSA and SIT
F2FS-fs (dm-64): Stopped filesystem due to reason: 4
Thread A Thread B
- f2fs_expand_inode_data
- f2fs_allocate_pinning_section
- f2fs_gc_range
- do_garbage_collect w/ segno #x
- writepage
- f2fs_allocate_data_block
- new_curseg
- allocate segno #x
The root cause is: fallocate on pinning file may race w/ block allocation
as above, result in do_garbage_collect() from fallocate() may migrate
segment which is just allocated by a log, the log will update segment type
in its in-memory structure, however GC will get segment type from on-disk
SSA block, once segment type changes by log, we can detect such
inconsistency, then shutdown filesystem.
In this case, on-disk SSA shows type of segno #173822 is 1 (SUM_TYPE_NODE),
however segno #173822 was just allocated as data type segment, so in-memory
SIT shows type of segno #173822 is 0 (SUM_TYPE_DATA).
Change as below to fix this issue:
- check whether current section is empty before gc
- add sanity checks on do_garbage_collect() to avoid any race case, result
in migrating segment used by log.
- btw, it fixes misc issue in printed logs: "SSA and SIT" -> "SIT and SSA". |
| In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Add NULL pointer checks in dc_stream cursor attribute functions
The function dc_stream_set_cursor_attributes() currently dereferences
the `stream` pointer and nested members `stream->ctx->dc->current_state`
without checking for NULL.
All callers of these functions, such as in
`dcn30_apply_idle_power_optimizations()` and
`amdgpu_dm_plane_handle_cursor_update()`, already perform NULL checks
before calling these functions.
Fixes below:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:336 dc_stream_program_cursor_attributes()
error: we previously assumed 'stream' could be null (see line 334)
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c
327 bool dc_stream_program_cursor_attributes(
328 struct dc_stream_state *stream,
329 const struct dc_cursor_attributes *attributes)
330 {
331 struct dc *dc;
332 bool reset_idle_optimizations = false;
333
334 dc = stream ? stream->ctx->dc : NULL;
^^^^^^
The old code assumed stream could be NULL.
335
--> 336 if (dc_stream_set_cursor_attributes(stream, attributes)) {
^^^^^^
The refactor added an unchecked dereference.
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c
313 bool dc_stream_set_cursor_attributes(
314 struct dc_stream_state *stream,
315 const struct dc_cursor_attributes *attributes)
316 {
317 bool result = false;
318
319 if (dc_stream_check_cursor_attributes(stream, stream->ctx->dc->current_state, attributes)) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here.
This function used to check for if stream as NULL and return false at
the start. Probably we should add that back. |
| In the Linux kernel, the following vulnerability has been resolved:
blk-throttle: fix access race during throttle policy activation
On repeated cold boots we occasionally hit a NULL pointer crash in
blk_should_throtl() when throttling is consulted before the throttle
policy is fully enabled for the queue. Checking only q->td != NULL is
insufficient during early initialization, so blkg_to_pd() for the
throttle policy can still return NULL and blkg_to_tg() becomes NULL,
which later gets dereferenced.
Unable to handle kernel NULL pointer dereference
at virtual address 0000000000000156
...
pc : submit_bio_noacct+0x14c/0x4c8
lr : submit_bio_noacct+0x48/0x4c8
sp : ffff800087f0b690
x29: ffff800087f0b690 x28: 0000000000005f90 x27: ffff00068af393c0
x26: 0000000000080000 x25: 000000000002fbc0 x24: ffff000684ddcc70
x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000000
x20: 0000000000080000 x19: ffff000684ddcd08 x18: ffffffffffffffff
x17: 0000000000000000 x16: ffff80008132a550 x15: 0000ffff98020fff
x14: 0000000000000000 x13: 1fffe000d11d7021 x12: ffff000688eb810c
x11: ffff00077ec4bb80 x10: ffff000688dcb720 x9 : ffff80008068ef60
x8 : 00000a6fb8a86e85 x7 : 000000000000111e x6 : 0000000000000002
x5 : 0000000000000246 x4 : 0000000000015cff x3 : 0000000000394500
x2 : ffff000682e35e40 x1 : 0000000000364940 x0 : 000000000000001a
Call trace:
submit_bio_noacct+0x14c/0x4c8
verity_map+0x178/0x2c8
__map_bio+0x228/0x250
dm_submit_bio+0x1c4/0x678
__submit_bio+0x170/0x230
submit_bio_noacct_nocheck+0x16c/0x388
submit_bio_noacct+0x16c/0x4c8
submit_bio+0xb4/0x210
f2fs_submit_read_bio+0x4c/0xf0
f2fs_mpage_readpages+0x3b0/0x5f0
f2fs_readahead+0x90/0xe8
Tighten blk_throtl_activated() to also require that the throttle policy
bit is set on the queue:
return q->td != NULL &&
test_bit(blkcg_policy_throtl.plid, q->blkcg_pols);
This prevents blk_should_throtl() from accessing throttle group state
until policy data has been attached to blkgs. |