| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
idpf: Fix RSS LUT NULL pointer crash on early ethtool operations
The RSS LUT is not initialized until the interface comes up, causing
the following NULL pointer crash when ethtool operations like rxhash on/off
are performed before the interface is brought up for the first time.
Move RSS LUT initialization from ndo_open to vport creation to ensure LUT
is always available. This enables RSS configuration via ethtool before
bringing the interface up. Simplify LUT management by maintaining all
changes in the driver's soft copy and programming zeros to the indirection
table when rxhash is disabled. Defer HW programming until the interface
comes up if it is down during rxhash and LUT configuration changes.
Steps to reproduce:
** Load idpf driver; interfaces will be created
modprobe idpf
** Before bringing the interfaces up, turn rxhash off
ethtool -K eth2 rxhash off
[89408.371875] BUG: kernel NULL pointer dereference, address: 0000000000000000
[89408.371908] #PF: supervisor read access in kernel mode
[89408.371924] #PF: error_code(0x0000) - not-present page
[89408.371940] PGD 0 P4D 0
[89408.371953] Oops: Oops: 0000 [#1] SMP NOPTI
<snip>
[89408.372052] RIP: 0010:memcpy_orig+0x16/0x130
[89408.372310] Call Trace:
[89408.372317] <TASK>
[89408.372326] ? idpf_set_features+0xfc/0x180 [idpf]
[89408.372363] __netdev_update_features+0x295/0xde0
[89408.372384] ethnl_set_features+0x15e/0x460
[89408.372406] genl_family_rcv_msg_doit+0x11f/0x180
[89408.372429] genl_rcv_msg+0x1ad/0x2b0
[89408.372446] ? __pfx_ethnl_set_features+0x10/0x10
[89408.372465] ? __pfx_genl_rcv_msg+0x10/0x10
[89408.372482] netlink_rcv_skb+0x58/0x100
[89408.372502] genl_rcv+0x2c/0x50
[89408.372516] netlink_unicast+0x289/0x3e0
[89408.372533] netlink_sendmsg+0x215/0x440
[89408.372551] __sys_sendto+0x234/0x240
[89408.372571] __x64_sys_sendto+0x28/0x30
[89408.372585] x64_sys_call+0x1909/0x1da0
[89408.372604] do_syscall_64+0x7a/0xfa0
[89408.373140] ? clear_bhb_loop+0x60/0xb0
[89408.373647] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[89408.378887] </TASK>
<snip> |
| In the Linux kernel, the following vulnerability has been resolved:
idpf: detach and close netdevs while handling a reset
Protect the reset path from callbacks by setting the netdevs to detached
state and close any netdevs in UP state until the reset handling has
completed. During a reset, the driver will de-allocate resources for the
vport, and there is no guarantee that those will recover, which is why the
existing vport_ctrl_lock does not provide sufficient protection.
idpf_detach_and_close() is called right before reset handling. If the
reset handling succeeds, the netdevs state is recovered via call to
idpf_attach_and_open(). If the reset handling fails the netdevs remain
down. The detach/down calls are protected with RTNL lock to avoid racing
with callbacks. On the recovery side the attach can be done without
holding the RTNL lock as there are no callbacks expected at that point,
due to detach/close always being done first in that flow.
The previous logic restoring the netdevs state based on the
IDPF_VPORT_UP_REQUESTED flag in the init task is not needed anymore, hence
the removal of idpf_set_vport_state(). The IDPF_VPORT_UP_REQUESTED is
still being used to restore the state of the netdevs following the reset,
but has no use outside of the reset handling flow.
idpf_init_hard_reset() is converted to void, since it was used as such and
there is no error handling being done based on its return value.
Before this change, invoking hard and soft resets simultaneously will
cause the driver to lose the vport state:
ip -br a
<inf> UP
echo 1 > /sys/class/net/ens801f0/device/reset& \
ethtool -L ens801f0 combined 8
ip -br a
<inf> DOWN
ip link set <inf> up
ip -br a
<inf> DOWN
Also in case of a failure in the reset path, the netdev is left
exposed to external callbacks, while vport resources are not
initialized, leading to a crash on subsequent ifup/down:
[408471.398966] idpf 0000:83:00.0: HW reset detected
[408471.411744] idpf 0000:83:00.0: Device HW Reset initiated
[408472.277901] idpf 0000:83:00.0: The driver was unable to contact the device's firmware. Check that the FW is running. Driver state= 0x2
[408508.125551] BUG: kernel NULL pointer dereference, address: 0000000000000078
[408508.126112] #PF: supervisor read access in kernel mode
[408508.126687] #PF: error_code(0x0000) - not-present page
[408508.127256] PGD 2aae2f067 P4D 0
[408508.127824] Oops: Oops: 0000 [#1] SMP NOPTI
...
[408508.130871] RIP: 0010:idpf_stop+0x39/0x70 [idpf]
...
[408508.139193] Call Trace:
[408508.139637] <TASK>
[408508.140077] __dev_close_many+0xbb/0x260
[408508.140533] __dev_change_flags+0x1cf/0x280
[408508.140987] netif_change_flags+0x26/0x70
[408508.141434] dev_change_flags+0x3d/0xb0
[408508.141878] devinet_ioctl+0x460/0x890
[408508.142321] inet_ioctl+0x18e/0x1d0
[408508.142762] ? _copy_to_user+0x22/0x70
[408508.143207] sock_do_ioctl+0x3d/0xe0
[408508.143652] sock_ioctl+0x10e/0x330
[408508.144091] ? find_held_lock+0x2b/0x80
[408508.144537] __x64_sys_ioctl+0x96/0xe0
[408508.144979] do_syscall_64+0x79/0x3d0
[408508.145415] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[408508.145860] RIP: 0033:0x7f3e0bb4caff |
| In the Linux kernel, the following vulnerability has been resolved:
landlock: Fix handling of disconnected directories
Disconnected files or directories can appear when they are visible and
opened from a bind mount, but have been renamed or moved from the source
of the bind mount in a way that makes them inaccessible from the mount
point (i.e. out of scope).
Previously, access rights tied to files or directories opened through a
disconnected directory were collected by walking the related hierarchy
down to the root of the filesystem, without taking into account the
mount point because it couldn't be found. This could lead to
inconsistent access results, potential access right widening, and
hard-to-debug renames, especially since such paths cannot be printed.
For a sandboxed task to create a disconnected directory, it needs to
have write access (i.e. FS_MAKE_REG, FS_REMOVE_FILE, and FS_REFER) to
the underlying source of the bind mount, and read access to the related
mount point. Because a sandboxed task cannot acquire more access
rights than those defined by its Landlock domain, this could lead to
inconsistent access rights due to missing permissions that should be
inherited from the mount point hierarchy, while inheriting permissions
from the filesystem hierarchy hidden by this mount point instead.
Landlock now handles files and directories opened from disconnected
directories by taking into account the filesystem hierarchy when the
mount point is not found in the hierarchy walk, and also always taking
into account the mount point from which these disconnected directories
were opened. This ensures that a rename is not allowed if it would
widen access rights [1].
The rationale is that, even if disconnected hierarchies might not be
visible or accessible to a sandboxed task, relying on the collected
access rights from them improves the guarantee that access rights will
not be widened during a rename because of the access right comparison
between the source and the destination (see LANDLOCK_ACCESS_FS_REFER).
It may look like this would grant more access on disconnected files and
directories, but the security policies are always enforced for all the
evaluated hierarchies. This new behavior should be less surprising to
users and safer from an access control perspective.
Remove a wrong WARN_ON_ONCE() canary in collect_domain_accesses() and
fix the related comment.
Because opened files have their access rights stored in the related file
security properties, there is no impact for disconnected or unlinked
files. |
| In the Linux kernel, the following vulnerability has been resolved:
media: nxp: imx8-isi: Fix streaming cleanup on release
The current implementation unconditionally calls
mxc_isi_video_cleanup_streaming() in mxc_isi_video_release(). This can
lead to situations where any release call (like from a simple
"v4l2-ctl -l") may release a currently streaming queue when called on
such a device.
This is reproducible on an i.MX8MP board by streaming from an ISI
capture device using gstreamer:
gst-launch-1.0 -v v4l2src device=/dev/videoX ! \
video/x-raw,format=GRAY8,width=1280,height=800,framerate=1/120 ! \
fakesink
While this stream is running, querying the caps of the same device
provokes the error state:
v4l2-ctl -l -d /dev/videoX
This results in the following trace:
[ 155.452152] ------------[ cut here ]------------
[ 155.452163] WARNING: CPU: 0 PID: 1708 at drivers/media/platform/nxp/imx8-isi/imx8-isi-pipe.c:713 mxc_isi_pipe_irq_handler+0x19c/0x1b0 [imx8_isi]
[ 157.004248] Modules linked in: cfg80211 rpmsg_ctrl rpmsg_char rpmsg_tty virtio_rpmsg_bus rpmsg_ns rpmsg_core rfkill nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables mcp251x6
[ 157.053499] CPU: 0 UID: 0 PID: 1708 Comm: python3 Not tainted 6.15.4-00114-g1f61ca5cad76 #1 PREEMPT
[ 157.064369] Hardware name: imx8mp_board_01 (DT)
[ 157.068205] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 157.075169] pc : mxc_isi_pipe_irq_handler+0x19c/0x1b0 [imx8_isi]
[ 157.081195] lr : mxc_isi_pipe_irq_handler+0x38/0x1b0 [imx8_isi]
[ 157.087126] sp : ffff800080003ee0
[ 157.090438] x29: ffff800080003ee0 x28: ffff0000c3688000 x27: 0000000000000000
[ 157.097580] x26: 0000000000000000 x25: ffff0000c1e7ac00 x24: ffff800081b5ad50
[ 157.104723] x23: 00000000000000d1 x22: 0000000000000000 x21: ffff0000c25e4000
[ 157.111866] x20: 0000000060000200 x19: ffff80007a0608d0 x18: 0000000000000000
[ 157.119008] x17: ffff80006a4e3000 x16: ffff800080000000 x15: 0000000000000000
[ 157.126146] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 157.133287] x11: 0000000000000040 x10: ffff0000c01445f0 x9 : ffff80007a053a38
[ 157.140425] x8 : ffff0000c04004b8 x7 : 0000000000000000 x6 : 0000000000000000
[ 157.147567] x5 : ffff0000c0400490 x4 : ffff80006a4e3000 x3 : ffff0000c25e4000
[ 157.154706] x2 : 0000000000000000 x1 : ffff8000825c0014 x0 : 0000000060000200
[ 157.161850] Call trace:
[ 157.164296] mxc_isi_pipe_irq_handler+0x19c/0x1b0 [imx8_isi] (P)
[ 157.170319] __handle_irq_event_percpu+0x58/0x218
[ 157.175029] handle_irq_event+0x54/0xb8
[ 157.178867] handle_fasteoi_irq+0xac/0x248
[ 157.182968] handle_irq_desc+0x48/0x68
[ 157.186723] generic_handle_domain_irq+0x24/0x38
[ 157.191346] gic_handle_irq+0x54/0x120
[ 157.195098] call_on_irq_stack+0x24/0x30
[ 157.199027] do_interrupt_handler+0x88/0x98
[ 157.203212] el0_interrupt+0x44/0xc0
[ 157.206792] __el0_irq_handler_common+0x18/0x28
[ 157.211328] el0t_64_irq_handler+0x10/0x20
[ 157.215429] el0t_64_irq+0x198/0x1a0
[ 157.219009] ---[ end trace 0000000000000000 ]---
Address this issue by moving the streaming preparation and cleanup to
the vb2 .prepare_streaming() and .unprepare_streaming() operations. This
also simplifies the driver by allowing direct usage of the
vb2_ioctl_streamon() and vb2_ioctl_streamoff() helpers, and removal of
the manual cleanup from mxc_isi_video_release(). |
| In the Linux kernel, the following vulnerability has been resolved:
gfs2: Fix unlikely race in gdlm_put_lock
In gdlm_put_lock(), there is a small window of time in which the
DFL_UNMOUNT flag has been set but the lockspace hasn't been released,
yet. In that window, dlm may still call gdlm_ast() and gdlm_bast().
To prevent it from dereferencing freed glock objects, only free the
glock if the lockspace has actually been released. |
| In the Linux kernel, the following vulnerability has been resolved:
ice: fix using untrusted value of pkt_len in ice_vc_fdir_parse_raw()
Fix using the untrusted value of proto->raw.pkt_len in function
ice_vc_fdir_parse_raw() by verifying if it does not exceed the
VIRTCHNL_MAX_SIZE_RAW_PACKET value. |
| In the Linux kernel, the following vulnerability has been resolved:
idpf: check error for register_netdev() on init
Current init logic ignores the error code from register_netdev(),
which will cause WARN_ON() on attempt to unregister it, if there was one,
and there is no info for the user that the creation of the netdev failed.
WARNING: CPU: 89 PID: 6902 at net/core/dev.c:11512 unregister_netdevice_many_notify+0x211/0x1a10
...
[ 3707.563641] unregister_netdev+0x1c/0x30
[ 3707.563656] idpf_vport_dealloc+0x5cf/0xce0 [idpf]
[ 3707.563684] idpf_deinit_task+0xef/0x160 [idpf]
[ 3707.563712] idpf_vc_core_deinit+0x84/0x320 [idpf]
[ 3707.563739] idpf_remove+0xbf/0x780 [idpf]
[ 3707.563769] pci_device_remove+0xab/0x1e0
[ 3707.563786] device_release_driver_internal+0x371/0x530
[ 3707.563803] driver_detach+0xbf/0x180
[ 3707.563816] bus_remove_driver+0x11b/0x2a0
[ 3707.563829] pci_unregister_driver+0x2a/0x250
Introduce an error check and log the vport number and error code.
On removal make sure to check VPORT_REG_NETDEV flag prior to calling
unregister and free on the netdev.
Add local variables for idx, vport_config and netdev for readability. |
| In the Linux kernel, the following vulnerability has been resolved:
fs/xattr: missing fdput() in fremovexattr error path
In the Linux kernel, the fremovexattr() syscall calls fdget() to acquire a
file reference but returns early without calling fdput() when
strncpy_from_user() fails on the name argument. In multi-threaded processes
where fdget() takes the slow path, this permanently leaks one
file reference per call, pinning the struct file and associated kernel
objects in memory. An unprivileged local user can exploit this to cause
kernel memory exhaustion. The issue was inadvertently fixed by commit
a71874379ec8 ("xattr: switch to CLASS(fd)"). |
| In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: L2CAP: Fix accepting multiple L2CAP_ECRED_CONN_REQ
Currently the code attempts to accept requests regardless of the
command identifier which may cause multiple requests to be marked
as pending (FLAG_DEFER_SETUP) which can cause more than
L2CAP_ECRED_MAX_CID(5) to be allocated in l2cap_ecred_rsp_defer
causing an overflow.
The spec is quite clear that the same identifier shall not be used on
subsequent requests:
'Within each signaling channel a different Identifier shall be used
for each successive request or indication.'
https://www.bluetooth.com/wp-content/uploads/Files/Specification/HTML/Core-62/out/en/host/logical-link-control-and-adaptation-protocol-specification.html#UUID-32a25a06-4aa4-c6c7-77c5-dcfe3682355d
So this attempts to check if there are any channels pending with the
same identifier and rejects if any are found. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: always walk all pending catchall elements
During transaction processing we might have more than one catchall element:
1 live catchall element and 1 pending element that is coming as part of the
new batch.
If the map holding the catchall elements is also going away, its
required to toggle all catchall elements and not just the first viable
candidate.
Otherwise, we get:
WARNING: ./include/net/netfilter/nf_tables.h:1281 at nft_data_release+0xb7/0xe0 [nf_tables], CPU#2: nft/1404
RIP: 0010:nft_data_release+0xb7/0xe0 [nf_tables]
[..]
__nft_set_elem_destroy+0x106/0x380 [nf_tables]
nf_tables_abort_release+0x348/0x8d0 [nf_tables]
nf_tables_abort+0xcf2/0x3ac0 [nf_tables]
nfnetlink_rcv_batch+0x9c9/0x20e0 [..] |
| In the Linux kernel, the following vulnerability has been resolved:
iommu: disable SVA when CONFIG_X86 is set
Patch series "Fix stale IOTLB entries for kernel address space", v7.
This proposes a fix for a security vulnerability related to IOMMU Shared
Virtual Addressing (SVA). In an SVA context, an IOMMU can cache kernel
page table entries. When a kernel page table page is freed and
reallocated for another purpose, the IOMMU might still hold stale,
incorrect entries. This can be exploited to cause a use-after-free or
write-after-free condition, potentially leading to privilege escalation or
data corruption.
This solution introduces a deferred freeing mechanism for kernel page
table pages, which provides a safe window to notify the IOMMU to
invalidate its caches before the page is reused.
This patch (of 8):
In the IOMMU Shared Virtual Addressing (SVA) context, the IOMMU hardware
shares and walks the CPU's page tables. The x86 architecture maps the
kernel's virtual address space into the upper portion of every process's
page table. Consequently, in an SVA context, the IOMMU hardware can walk
and cache kernel page table entries.
The Linux kernel currently lacks a notification mechanism for kernel page
table changes, specifically when page table pages are freed and reused.
The IOMMU driver is only notified of changes to user virtual address
mappings. This can cause the IOMMU's internal caches to retain stale
entries for kernel VA.
Use-After-Free (UAF) and Write-After-Free (WAF) conditions arise when
kernel page table pages are freed and later reallocated. The IOMMU could
misinterpret the new data as valid page table entries. The IOMMU might
then walk into attacker-controlled memory, leading to arbitrary physical
memory DMA access or privilege escalation. This is also a
Write-After-Free issue, as the IOMMU will potentially continue to write
Accessed and Dirty bits to the freed memory while attempting to walk the
stale page tables.
Currently, SVA contexts are unprivileged and cannot access kernel
mappings. However, the IOMMU will still walk kernel-only page tables all
the way down to the leaf entries, where it realizes the mapping is for the
kernel and errors out. This means the IOMMU still caches these
intermediate page table entries, making the described vulnerability a real
concern.
Disable SVA on x86 architecture until the IOMMU can receive notification
to flush the paging cache before freeing the CPU kernel page table pages. |
| In the Linux kernel, the following vulnerability has been resolved:
ksmbd: ipc: fix use-after-free in ipc_msg_send_request
ipc_msg_send_request() waits for a generic netlink reply using an
ipc_msg_table_entry on the stack. The generic netlink handler
(handle_generic_event()/handle_response()) fills entry->response under
ipc_msg_table_lock, but ipc_msg_send_request() used to validate and free
entry->response without holding the same lock.
Under high concurrency this allows a race where handle_response() is
copying data into entry->response while ipc_msg_send_request() has just
freed it, leading to a slab-use-after-free reported by KASAN in
handle_generic_event():
BUG: KASAN: slab-use-after-free in handle_generic_event+0x3c4/0x5f0 [ksmbd]
Write of size 12 at addr ffff888198ee6e20 by task pool/109349
...
Freed by task:
kvfree
ipc_msg_send_request [ksmbd]
ksmbd_rpc_open -> ksmbd_session_rpc_open [ksmbd]
Fix by:
- Taking ipc_msg_table_lock in ipc_msg_send_request() while validating
entry->response, freeing it when invalid, and removing the entry from
ipc_msg_table.
- Returning the final entry->response pointer to the caller only after
the hash entry is removed under the lock.
- Returning NULL in the error path, preserving the original API
semantics.
This makes all accesses to entry->response consistent with
handle_response(), which already updates and fills the response buffer
under ipc_msg_table_lock, and closes the race that allowed the UAF. |
| In the Linux kernel, the following vulnerability has been resolved:
tls: make sure to abort the stream if headers are bogus
Normally we wait for the socket to buffer up the whole record
before we service it. If the socket has a tiny buffer, however,
we read out the data sooner, to prevent connection stalls.
Make sure that we abort the connection when we find out late
that the record is actually invalid. Retrying the parsing is
fine in itself but since we copy some more data each time
before we parse we can overflow the allocated skb space.
Constructing a scenario in which we're under pressure without
enough data in the socket to parse the length upfront is quite
hard. syzbot figured out a way to do this by serving us the header
in small OOB sends, and then filling in the recvbuf with a large
normal send.
Make sure that tls_rx_msg_size() aborts strp, if we reach
an invalid record there's really no way to recover. |
| In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: l2cap: Check encryption key size on incoming connection
This is required for passing GAP/SEC/SEM/BI-04-C PTS test case:
Security Mode 4 Level 4, Responder - Invalid Encryption Key Size
- 128 bit
This tests the security key with size from 1 to 15 bytes while the
Security Mode 4 Level 4 requests 16 bytes key size.
Currently PTS fails with the following logs:
- expected:Connection Response:
Code: [3 (0x03)] Code
Identifier: (lt)WildCard: Exists(gt)
Length: [8 (0x0008)]
Destination CID: (lt)WildCard: Exists(gt)
Source CID: [64 (0x0040)]
Result: [3 (0x0003)] Connection refused - Security block
Status: (lt)WildCard: Exists(gt),
but received:Connection Response:
Code: [3 (0x03)] Code
Identifier: [1 (0x01)]
Length: [8 (0x0008)]
Destination CID: [64 (0x0040)]
Source CID: [64 (0x0040)]
Result: [0 (0x0000)] Connection Successful
Status: [0 (0x0000)] No further information available
And HCI logs:
< HCI Command: Read Encrypti.. (0x05|0x0008) plen 2
Handle: 14 Address: 00:1B:DC:F2:24:10 (Vencer Co., Ltd.)
> HCI Event: Command Complete (0x0e) plen 7
Read Encryption Key Size (0x05|0x0008) ncmd 1
Status: Success (0x00)
Handle: 14 Address: 00:1B:DC:F2:24:10 (Vencer Co., Ltd.)
Key size: 7
> ACL Data RX: Handle 14 flags 0x02 dlen 12
L2CAP: Connection Request (0x02) ident 1 len 4
PSM: 4097 (0x1001)
Source CID: 64
< ACL Data TX: Handle 14 flags 0x00 dlen 16
L2CAP: Connection Response (0x03) ident 1 len 8
Destination CID: 64
Source CID: 64
Result: Connection successful (0x0000)
Status: No further information available (0x0000) |
| In the Linux kernel, the following vulnerability has been resolved:
ksmbd: fix use-after-free in kerberos authentication
Setting sess->user = NULL was introduced to fix the dangling pointer
created by ksmbd_free_user. However, it is possible another thread could
be operating on the session and make use of sess->user after it has been
passed to ksmbd_free_user but before sess->user is set to NULL. |
| In the Linux kernel, the following vulnerability has been resolved:
ksmbd: fix use-after-free in ksmbd_sessions_deregister()
In multichannel mode, UAF issue can occur in session_deregister
when the second channel sets up a session through the connection of
the first channel. session that is freed through the global session
table can be accessed again through ->sessions of connection. |
| In the Linux kernel, the following vulnerability has been resolved:
ksmbd: fix session use-after-free in multichannel connection
There is a race condition between session setup and
ksmbd_sessions_deregister. The session can be freed before the connection
is added to channel list of session.
This patch check reference count of session before freeing it. |
| In the Linux kernel, the following vulnerability has been resolved:
ksmbd: fix type confusion via race condition when using ipc_msg_send_request
req->handle is allocated using ksmbd_acquire_id(&ipc_ida), based on
ida_alloc. req->handle from ksmbd_ipc_login_request and
FSCTL_PIPE_TRANSCEIVE ioctl can be same and it could lead to type confusion
between messages, resulting in access to unexpected parts of memory after
an incorrect delivery. ksmbd check type of ipc response but missing add
continue to check next ipc reponse. |
| In the Linux kernel, the following vulnerability has been resolved:
pfifo_tail_enqueue: Drop new packet when sch->limit == 0
Expected behaviour:
In case we reach scheduler's limit, pfifo_tail_enqueue() will drop a
packet in scheduler's queue and decrease scheduler's qlen by one.
Then, pfifo_tail_enqueue() enqueue new packet and increase
scheduler's qlen by one. Finally, pfifo_tail_enqueue() return
`NET_XMIT_CN` status code.
Weird behaviour:
In case we set `sch->limit == 0` and trigger pfifo_tail_enqueue() on a
scheduler that has no packet, the 'drop a packet' step will do nothing.
This means the scheduler's qlen still has value equal 0.
Then, we continue to enqueue new packet and increase scheduler's qlen by
one. In summary, we can leverage pfifo_tail_enqueue() to increase qlen by
one and return `NET_XMIT_CN` status code.
The problem is:
Let's say we have two qdiscs: Qdisc_A and Qdisc_B.
- Qdisc_A's type must have '->graft()' function to create parent/child relationship.
Let's say Qdisc_A's type is `hfsc`. Enqueue packet to this qdisc will trigger `hfsc_enqueue`.
- Qdisc_B's type is pfifo_head_drop. Enqueue packet to this qdisc will trigger `pfifo_tail_enqueue`.
- Qdisc_B is configured to have `sch->limit == 0`.
- Qdisc_A is configured to route the enqueued's packet to Qdisc_B.
Enqueue packet through Qdisc_A will lead to:
- hfsc_enqueue(Qdisc_A) -> pfifo_tail_enqueue(Qdisc_B)
- Qdisc_B->q.qlen += 1
- pfifo_tail_enqueue() return `NET_XMIT_CN`
- hfsc_enqueue() check for `NET_XMIT_SUCCESS` and see `NET_XMIT_CN` => hfsc_enqueue() don't increase qlen of Qdisc_A.
The whole process lead to a situation where Qdisc_A->q.qlen == 0 and Qdisc_B->q.qlen == 1.
Replace 'hfsc' with other type (for example: 'drr') still lead to the same problem.
This violate the design where parent's qlen should equal to the sum of its childrens'qlen.
Bug impact: This issue can be used for user->kernel privilege escalation when it is reachable. |
| In the Linux kernel, the following vulnerability has been resolved:
vsock: Keep the binding until socket destruction
Preserve sockets bindings; this includes both resulting from an explicit
bind() and those implicitly bound through autobind during connect().
Prevents socket unbinding during a transport reassignment, which fixes a
use-after-free:
1. vsock_create() (refcnt=1) calls vsock_insert_unbound() (refcnt=2)
2. transport->release() calls vsock_remove_bound() without checking if
sk was bound and moved to bound list (refcnt=1)
3. vsock_bind() assumes sk is in unbound list and before
__vsock_insert_bound(vsock_bound_sockets()) calls
__vsock_remove_bound() which does:
list_del_init(&vsk->bound_table); // nop
sock_put(&vsk->sk); // refcnt=0
BUG: KASAN: slab-use-after-free in __vsock_bind+0x62e/0x730
Read of size 4 at addr ffff88816b46a74c by task a.out/2057
dump_stack_lvl+0x68/0x90
print_report+0x174/0x4f6
kasan_report+0xb9/0x190
__vsock_bind+0x62e/0x730
vsock_bind+0x97/0xe0
__sys_bind+0x154/0x1f0
__x64_sys_bind+0x6e/0xb0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Allocated by task 2057:
kasan_save_stack+0x1e/0x40
kasan_save_track+0x10/0x30
__kasan_slab_alloc+0x85/0x90
kmem_cache_alloc_noprof+0x131/0x450
sk_prot_alloc+0x5b/0x220
sk_alloc+0x2c/0x870
__vsock_create.constprop.0+0x2e/0xb60
vsock_create+0xe4/0x420
__sock_create+0x241/0x650
__sys_socket+0xf2/0x1a0
__x64_sys_socket+0x6e/0xb0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 2057:
kasan_save_stack+0x1e/0x40
kasan_save_track+0x10/0x30
kasan_save_free_info+0x37/0x60
__kasan_slab_free+0x4b/0x70
kmem_cache_free+0x1a1/0x590
__sk_destruct+0x388/0x5a0
__vsock_bind+0x5e1/0x730
vsock_bind+0x97/0xe0
__sys_bind+0x154/0x1f0
__x64_sys_bind+0x6e/0xb0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 2057 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150
RIP: 0010:refcount_warn_saturate+0xce/0x150
__vsock_bind+0x66d/0x730
vsock_bind+0x97/0xe0
__sys_bind+0x154/0x1f0
__x64_sys_bind+0x6e/0xb0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
refcount_t: underflow; use-after-free.
WARNING: CPU: 7 PID: 2057 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150
RIP: 0010:refcount_warn_saturate+0xee/0x150
vsock_remove_bound+0x187/0x1e0
__vsock_release+0x383/0x4a0
vsock_release+0x90/0x120
__sock_release+0xa3/0x250
sock_close+0x14/0x20
__fput+0x359/0xa80
task_work_run+0x107/0x1d0
do_exit+0x847/0x2560
do_group_exit+0xb8/0x250
__x64_sys_exit_group+0x3a/0x50
x64_sys_call+0xfec/0x14f0
do_syscall_64+0x93/0x1b0
entry_SYSCALL_64_after_hwframe+0x76/0x7e |