* [BUG] BUG: scheduling while atomic in throttle_direct_reclaim @ 2025-05-26 15:49 Xianying Wang 2025-05-27 0:04 ` Harry Yoo 0 siblings, 1 reply; 3+ messages in thread From: Xianying Wang @ 2025-05-26 15:49 UTC (permalink / raw) To: akpm; +Cc: linux-mm, linux-kernel Hi, I discovered a kernel crash described as "BUG: scheduling while atomic in throttle_direct_reclaim." This issue occurs in the memory reclaim path, specifically in the throttle_direct_reclaim function (mm/vmscan.c), where the kernel attempts to perform a potentially blocking operation (schedule_timeout) while still in an atomic or non-preemptible context, leading to an invalid scheduling state and triggering __schedule_bug(). The crash trace shows that this condition can occur when the kernel mounts a specially crafted ISO9660 image via syz_mount_image$iso9660. During image parsing, the VFS initiates page readahead through read_pages, which issues block I/O backed by a loop device. This leads to a SCSI read path where scsi_alloc_sgtables (drivers/scsi/scsi_lib.c) attempts to allocate memory for a scatterlist using mempool_alloc. If memory pressure is present, mempool_alloc triggers try_to_free_pages, and subsequently throttle_direct_reclaim. At this point, the kernel is likely in an atomic context due to earlier direct reclaim or preemption disabling within the block layer or SCSI stack. As a result, schedule_timeout is not allowed and triggers a BUG. I recommend reviewing the reclaim context propagation in: scsi_alloc_sgtables and sg_alloc_table_chained mempool_alloc in SCSI I/O paths throttle_direct_reclaim to ensure blocking calls are not made from atomic contexts This can be reproduced on: HEAD commit: commit e8f897f4afef0031fe618a8e94127a0934896aba report: https://pastebin.com/raw/bxuLHCgu console output : https://pastebin.com/raw/mCZ4Ap8Q kernel config : https://pastebin.com/raw/aJ9rUnhG C reproducer : https://pastebin.com/raw/1dku01DG Best regards, Xianying ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] BUG: scheduling while atomic in throttle_direct_reclaim 2025-05-26 15:49 [BUG] BUG: scheduling while atomic in throttle_direct_reclaim Xianying Wang @ 2025-05-27 0:04 ` Harry Yoo 2025-05-27 0:19 ` Harry Yoo 0 siblings, 1 reply; 3+ messages in thread From: Harry Yoo @ 2025-05-27 0:04 UTC (permalink / raw) To: Xianying Wang; +Cc: akpm, linux-mm, linux-kernel On Mon, May 26, 2025 at 11:49:30PM +0800, Xianying Wang wrote: > Hi, > > I discovered a kernel crash described as "BUG: scheduling while atomic > in throttle_direct_reclaim." This issue occurs in the memory reclaim > path, specifically in the throttle_direct_reclaim function > (mm/vmscan.c), where the kernel attempts to perform a potentially > blocking operation (schedule_timeout) while still in an atomic or > non-preemptible context, leading to an invalid scheduling state and > triggering __schedule_bug(). > > The crash trace shows that this condition can occur when the kernel > mounts a specially crafted ISO9660 image via syz_mount_image$iso9660. > During image parsing, the VFS initiates page readahead through > read_pages, which issues block I/O backed by a loop device. This leads > to a SCSI read path where scsi_alloc_sgtables > (drivers/scsi/scsi_lib.c) attempts to allocate memory for a > scatterlist using mempool_alloc. If memory pressure is present, > mempool_alloc triggers try_to_free_pages, and subsequently > throttle_direct_reclaim. > > At this point, the kernel is likely in an atomic context due to > earlier direct reclaim or preemption disabling within the block layer > or SCSI stack. As a result, schedule_timeout is not allowed and > triggers a BUG. > > I recommend reviewing the reclaim context propagation in: > > scsi_alloc_sgtables and sg_alloc_table_chained > mempool_alloc in SCSI I/O paths > throttle_direct_reclaim to ensure blocking calls are not made from > atomic contexts > > This can be reproduced on: > > HEAD commit: > > commit e8f897f4afef0031fe618a8e94127a0934896aba Well, that's Linux v6.8, which is already end of life. Please DO NOT REPORT bugs from kernels that are past their EOL. I spent an hour only to realize this had already been fixed. https://lore.kernel.org/all/20240614143238.60323-1-andrey.konovalov@linux.dev/T/#u This is KASAN passing incorrect gfp flag to stackdepot, triggering memory reclamation while mempool is holding a spinlock. > report: https://pastebin.com/raw/bxuLHCgu > > console output : https://pastebin.com/raw/mCZ4Ap8Q > > kernel config : https://pastebin.com/raw/aJ9rUnhG > > C reproducer : https://pastebin.com/raw/1dku01DG > > Best regards, > > Xianying -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] BUG: scheduling while atomic in throttle_direct_reclaim 2025-05-27 0:04 ` Harry Yoo @ 2025-05-27 0:19 ` Harry Yoo 0 siblings, 0 replies; 3+ messages in thread From: Harry Yoo @ 2025-05-27 0:19 UTC (permalink / raw) To: Xianying Wang; +Cc: akpm, linux-mm, linux-kernel On Tue, May 27, 2025 at 09:04:58AM +0900, Harry Yoo wrote: > On Mon, May 26, 2025 at 11:49:30PM +0800, Xianying Wang wrote: > > Hi, > > > > I discovered a kernel crash described as "BUG: scheduling while atomic > > in throttle_direct_reclaim." This issue occurs in the memory reclaim > > path, specifically in the throttle_direct_reclaim function > > (mm/vmscan.c), where the kernel attempts to perform a potentially > > blocking operation (schedule_timeout) while still in an atomic or > > non-preemptible context, leading to an invalid scheduling state and > > triggering __schedule_bug(). > > > > The crash trace shows that this condition can occur when the kernel > > mounts a specially crafted ISO9660 image via syz_mount_image$iso9660. > > During image parsing, the VFS initiates page readahead through > > read_pages, which issues block I/O backed by a loop device. This leads > > to a SCSI read path where scsi_alloc_sgtables > > (drivers/scsi/scsi_lib.c) attempts to allocate memory for a > > scatterlist using mempool_alloc. If memory pressure is present, > > mempool_alloc triggers try_to_free_pages, and subsequently > > throttle_direct_reclaim. > > > > At this point, the kernel is likely in an atomic context due to > > earlier direct reclaim or preemption disabling within the block layer > > or SCSI stack. As a result, schedule_timeout is not allowed and > > triggers a BUG. > > > > I recommend reviewing the reclaim context propagation in: > > > > scsi_alloc_sgtables and sg_alloc_table_chained > > mempool_alloc in SCSI I/O paths > > throttle_direct_reclaim to ensure blocking calls are not made from > > atomic contexts > > > > This can be reproduced on: > > > > HEAD commit: > > > > commit e8f897f4afef0031fe618a8e94127a0934896aba > > Well, that's Linux v6.8, which is already end of life. > Please DO NOT REPORT bugs from kernels that are past their EOL. I mean, it is fine to report bugs from the following: 1) The latest stable version (e.g., v6.14.8, v6.12.30, v6.6.92, ... etc) which can be found at [1] [2], or 2) The latest mainline (Currently v6.15), or 3) Development trees like linux-next and Andrew's mm.git. FYI, kernel.org [1] lists kernel versions that are supported. I appreciate your effort to test kernels, but testing EOL kernels might be a waste of time as the bug you're encountering might have already been fixed in a newer version but wasn't backported due to the kernel being EOL. [1] https://kernel.org [2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git -- Cheers, Harry / Hyeonggon ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-05-27 0:20 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-05-26 15:49 [BUG] BUG: scheduling while atomic in throttle_direct_reclaim Xianying Wang 2025-05-27 0:04 ` Harry Yoo 2025-05-27 0:19 ` Harry Yoo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox