[BUG] BUG: scheduling while atomic in throttle_direct

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [BUG] BUG: scheduling while atomic in throttle_direct_reclaim
@ 2025-05-26 15:49 Xianying Wang
  2025-05-27  0:04 ` Harry Yoo
  0 siblings, 1 reply; 3+ messages in thread
From: Xianying Wang @ 2025-05-26 15:49 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

Hi,

I discovered a kernel crash described as "BUG: scheduling while atomic
in throttle_direct_reclaim." This issue occurs in the memory reclaim
path, specifically in the throttle_direct_reclaim function
(mm/vmscan.c), where the kernel attempts to perform a potentially
blocking operation (schedule_timeout) while still in an atomic or
non-preemptible context, leading to an invalid scheduling state and
triggering __schedule_bug().

The crash trace shows that this condition can occur when the kernel
mounts a specially crafted ISO9660 image via syz_mount_image$iso9660.
During image parsing, the VFS initiates page readahead through
read_pages, which issues block I/O backed by a loop device. This leads
to a SCSI read path where scsi_alloc_sgtables
(drivers/scsi/scsi_lib.c) attempts to allocate memory for a
scatterlist using mempool_alloc. If memory pressure is present,
mempool_alloc triggers try_to_free_pages, and subsequently
throttle_direct_reclaim.

At this point, the kernel is likely in an atomic context due to
earlier direct reclaim or preemption disabling within the block layer
or SCSI stack. As a result, schedule_timeout is not allowed and
triggers a BUG.

I recommend reviewing the reclaim context propagation in:

scsi_alloc_sgtables and sg_alloc_table_chained
mempool_alloc in SCSI I/O paths
throttle_direct_reclaim to ensure blocking calls are not made from
atomic contexts

This can be reproduced on:

HEAD commit:

commit e8f897f4afef0031fe618a8e94127a0934896aba

report: https://pastebin.com/raw/bxuLHCgu

console output : https://pastebin.com/raw/mCZ4Ap8Q

kernel config : https://pastebin.com/raw/aJ9rUnhG

C reproducer : https://pastebin.com/raw/1dku01DG

Best regards,

Xianying

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] BUG: scheduling while atomic in throttle_direct_reclaim
  2025-05-26 15:49 [BUG] BUG: scheduling while atomic in throttle_direct_reclaim Xianying Wang
@ 2025-05-27  0:04 ` Harry Yoo
  2025-05-27  0:19   ` Harry Yoo
  0 siblings, 1 reply; 3+ messages in thread
From: Harry Yoo @ 2025-05-27  0:04 UTC (permalink / raw)
  To: Xianying Wang; +Cc: akpm, linux-mm, linux-kernel

On Mon, May 26, 2025 at 11:49:30PM +0800, Xianying Wang wrote:
> Hi,
> 
> I discovered a kernel crash described as "BUG: scheduling while atomic
> in throttle_direct_reclaim." This issue occurs in the memory reclaim
> path, specifically in the throttle_direct_reclaim function
> (mm/vmscan.c), where the kernel attempts to perform a potentially
> blocking operation (schedule_timeout) while still in an atomic or
> non-preemptible context, leading to an invalid scheduling state and
> triggering __schedule_bug().
> 
> The crash trace shows that this condition can occur when the kernel
> mounts a specially crafted ISO9660 image via syz_mount_image$iso9660.
> During image parsing, the VFS initiates page readahead through
> read_pages, which issues block I/O backed by a loop device. This leads
> to a SCSI read path where scsi_alloc_sgtables
> (drivers/scsi/scsi_lib.c) attempts to allocate memory for a
> scatterlist using mempool_alloc. If memory pressure is present,
> mempool_alloc triggers try_to_free_pages, and subsequently
> throttle_direct_reclaim.
> 
> At this point, the kernel is likely in an atomic context due to
> earlier direct reclaim or preemption disabling within the block layer
> or SCSI stack. As a result, schedule_timeout is not allowed and
> triggers a BUG.
> 
> I recommend reviewing the reclaim context propagation in:
> 
> scsi_alloc_sgtables and sg_alloc_table_chained
> mempool_alloc in SCSI I/O paths
> throttle_direct_reclaim to ensure blocking calls are not made from
> atomic contexts
>
> This can be reproduced on:
> 
> HEAD commit:
> 
> commit e8f897f4afef0031fe618a8e94127a0934896aba

Well, that's Linux v6.8, which is already end of life.
Please DO NOT REPORT bugs from kernels that are past their EOL.

I spent an hour only to realize this had already been fixed.

https://lore.kernel.org/all/20240614143238.60323-1-andrey.konovalov@linux.dev/T/#u

This is KASAN passing incorrect gfp flag to stackdepot, triggering
memory reclamation while mempool is holding a spinlock.

> report: https://pastebin.com/raw/bxuLHCgu
> 
> console output : https://pastebin.com/raw/mCZ4Ap8Q
> 
> kernel config : https://pastebin.com/raw/aJ9rUnhG
> 
> C reproducer : https://pastebin.com/raw/1dku01DG
> 
> Best regards,
> 
> Xianying

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [BUG] BUG: scheduling while atomic in throttle_direct_reclaim
  2025-05-27  0:04 ` Harry Yoo
@ 2025-05-27  0:19   ` Harry Yoo
  0 siblings, 0 replies; 3+ messages in thread
From: Harry Yoo @ 2025-05-27  0:19 UTC (permalink / raw)
  To: Xianying Wang; +Cc: akpm, linux-mm, linux-kernel

On Tue, May 27, 2025 at 09:04:58AM +0900, Harry Yoo wrote:
> On Mon, May 26, 2025 at 11:49:30PM +0800, Xianying Wang wrote:
> > Hi,
> > 
> > I discovered a kernel crash described as "BUG: scheduling while atomic
> > in throttle_direct_reclaim." This issue occurs in the memory reclaim
> > path, specifically in the throttle_direct_reclaim function
> > (mm/vmscan.c), where the kernel attempts to perform a potentially
> > blocking operation (schedule_timeout) while still in an atomic or
> > non-preemptible context, leading to an invalid scheduling state and
> > triggering __schedule_bug().
> > 
> > The crash trace shows that this condition can occur when the kernel
> > mounts a specially crafted ISO9660 image via syz_mount_image$iso9660.
> > During image parsing, the VFS initiates page readahead through
> > read_pages, which issues block I/O backed by a loop device. This leads
> > to a SCSI read path where scsi_alloc_sgtables
> > (drivers/scsi/scsi_lib.c) attempts to allocate memory for a
> > scatterlist using mempool_alloc. If memory pressure is present,
> > mempool_alloc triggers try_to_free_pages, and subsequently
> > throttle_direct_reclaim.
> > 
> > At this point, the kernel is likely in an atomic context due to
> > earlier direct reclaim or preemption disabling within the block layer
> > or SCSI stack. As a result, schedule_timeout is not allowed and
> > triggers a BUG.
> > 
> > I recommend reviewing the reclaim context propagation in:
> > 
> > scsi_alloc_sgtables and sg_alloc_table_chained
> > mempool_alloc in SCSI I/O paths
> > throttle_direct_reclaim to ensure blocking calls are not made from
> > atomic contexts
> >
> > This can be reproduced on:
> > 
> > HEAD commit:
> > 
> > commit e8f897f4afef0031fe618a8e94127a0934896aba
> 
> Well, that's Linux v6.8, which is already end of life.
> Please DO NOT REPORT bugs from kernels that are past their EOL.

I mean, it is fine to report bugs from the following:

1) The latest stable version (e.g., v6.14.8, v6.12.30, v6.6.92, ... etc)
   which can be found at [1] [2], or
2) The latest mainline (Currently v6.15), or
3) Development trees like linux-next and Andrew's mm.git.

FYI, kernel.org [1] lists kernel versions that are supported.

I appreciate your effort to test kernels, but testing EOL kernels might be
a waste of time as the bug you're encountering might have already been
fixed in a newer version but wasn't backported due to the kernel being EOL.

[1] https://kernel.org
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git

-- 
Cheers,
Harry / Hyeonggon


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-05-27  0:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-26 15:49 [BUG] BUG: scheduling while atomic in throttle_direct_reclaim Xianying Wang
2025-05-27  0:04 ` Harry Yoo
2025-05-27  0:19   ` Harry Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox