linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kent Overstreet <kent.overstreet@linux.dev>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Chengming Zhou <chengming.zhou@linux.dev>,
	 Yosry Ahmed <yosryahmed@google.com>,
	Nhat Pham <nphamcs@gmail.com>,
	linux-mm@kvack.org
Subject: Re: zswap doing io in GFP_NOIO reclaim context
Date: Thu, 21 Mar 2024 12:45:36 -0400	[thread overview]
Message-ID: <dvhajcs3ywifxhbho6p2zj3jjh4ro6wwoieajs2lxjv77bdgqc@5vaxwpqcyxim> (raw)
In-Reply-To: <20240321151757.GC777580@cmpxchg.org>

On Thu, Mar 21, 2024 at 11:17:57AM -0400, Johannes Weiner wrote:
> On Thu, Mar 21, 2024 at 01:16:23PM +0800, Chengming Zhou wrote:
> > On 2024/3/21 11:54, Kent Overstreet wrote:
> > > just got this bug report, things wildly backed up in bcachefs and do
> > > some digging and it looks like zswap is to blame
> > > 
> > > [10264.128242] sysrq: Show Blocked State
> > > [10264.128268] task:kworker/20:0H   state:D stack:0     pid:143   tgid:143   ppid:2      flags:0x00004000
> > > [10264.128271] Workqueue: bcachefs_io btree_write_submit [bcachefs]
> > > [10264.128295] Call Trace:
> > > [10264.128295]  <TASK>
> > > [10264.128297]  __schedule+0x3e6/0x1520
> > > [10264.128301]  ? ttwu_do_activate+0x64/0x200
> > > [10264.128303]  schedule+0x32/0xd0
> > > [10264.128304]  schedule_timeout+0x98/0x160
> > > [10264.128306]  ? __pfx_process_timeout+0x10/0x10
> > > [10264.128308]  io_schedule_timeout+0x50/0x80
> > > [10264.128309]  wait_for_completion_io_timeout+0x7f/0x180
> > > [10264.128310]  submit_bio_wait+0x78/0xb0
> > > [10264.128313]  swap_writepage_bdev_sync+0xf6/0x150
> > > [10264.128315]  ? __pfx_submit_bio_wait_endio+0x10/0x10
> > > [10264.128317]  zswap_writeback_entry+0xf2/0x180
> > > [10264.128319]  shrink_memcg_cb+0xe7/0x2f0
> > > [10264.128320]  ? xa_load+0x8c/0xe0
> > > [10264.128321]  ? __pfx_shrink_memcg_cb+0x10/0x10
> > > [10264.128322]  __list_lru_walk_one+0xb9/0x1d0
> > > [10264.128324]  ? __pfx_shrink_memcg_cb+0x10/0x10
> > > [10264.128325]  list_lru_walk_one+0x5d/0x90
> > > [10264.128326]  zswap_shrinker_scan+0xc4/0x130
> > > [10264.128327]  do_shrink_slab+0x13f/0x360
> > > [10264.128328]  shrink_slab+0x28e/0x3c0
> > > [10264.128329]  shrink_one+0x123/0x1b0
> > > [10264.128331]  shrink_node+0x97e/0xbc0
> > > [10264.128332]  do_try_to_free_pages+0xe7/0x5b0
> > > [10264.128333]  try_to_free_pages+0xe1/0x200
> > > [10264.128334]  __alloc_pages_slowpath.constprop.0+0x343/0xde0
> > > [10264.128337]  __alloc_pages+0x32d/0x350
> > > [10264.128338]  allocate_slab+0x400/0x460
> > > [10264.128339]  ___slab_alloc+0x40d/0xa40
> > > [10264.128341]  ? mempool_alloc+0x86/0x1b0
> > > [10264.128343]  ? finish_task_switch.isra.0+0x94/0x2f0
> > > [10264.128345]  ? __schedule+0x3ee/0x1520
> > > [10264.128345]  kmem_cache_alloc+0x2e7/0x330
> > > [10264.128347]  ? mempool_alloc+0x86/0x1b0
> > > [10264.128348]  mempool_alloc+0x86/0x1b0
> > > [10264.128349]  bio_alloc_bioset+0x200/0x4f0
> > > [10264.128351]  ? __queue_work.part.0+0x1a5/0x390
> > > [10264.128352]  bio_alloc_clone+0x23/0x60
> > > [10264.128354]  alloc_io+0x26/0xf0 [dm_mod 7e9e6b44df4927f93fb3e4b5c782767396f58382]
> > > [10264.128361]  dm_submit_bio+0xb8/0x580 [dm_mod 7e9e6b44df4927f93fb3e4b5c782767396f58382]
> > > [10264.128366]  __submit_bio+0xb0/0x170
> > > [10264.128367]  submit_bio_noacct_nocheck+0x159/0x370
> > > [10264.128368]  bch2_submit_wbio_replicas+0x21c/0x3a0 [bcachefs 85f1b9a7a824f272eff794653a06dde1a94439f2]
> > > [10264.128391]  btree_write_submit+0x1cf/0x220 [bcachefs 85f1b9a7a824f272eff794653a06dde1a94439f2]
> > > [10264.128406]  process_one_work+0x178/0x350
> > > [10264.128408]  worker_thread+0x30f/0x450
> > > [10264.128409]  ? __pfx_worker_thread+0x10/0x10
> > > [10264.128409]  kthread+0xe5/0x120
> > > 
> > > dm is using GFP_NOIO for that allocation, so zswap is clearly busted.
> > 
> > You are right, and the shrink_control->gfp_mask is not even used in zswap,
> > which would just use GFP_KERNEL in its zswap_writeback_entry().
> 
> I'm not sure the gfp_mask of the allocation is (fully?) applicable to
> the allocation of the swapcache.
> 
> The reclaim-related ones are not. We're already in reclaim and won't
> recurse.
> 
> Things like __GFP_THISNODE, __GFP_ACCOUNT are definitely not
> applicable to the swapcache allocation on writeback.
> 
> See for reference also the gfp_mask in add_to_swap() ->
> add_to_swap_cache() when it's called from reclaim context.
> 
> But the shrinker directly calls __swap_writepage(), which will submit
> IO, and may even enter the fs. We definitely have to filter for that:

Are you applying the fix? You're listed as maintainer

> 
> diff --git a/mm/zswap.c b/mm/zswap.c
> index b31c977f53e9..535c907345e0 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1303,6 +1303,14 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker,
>  	if (!zswap_shrinker_enabled || !mem_cgroup_zswap_writeback_enabled(memcg))
>  		return 0;
>  
> +	/*
> +	 * The shrinker resumes swap writeback, which will enter block
> +	 * and may enter fs. XXX: Harmonize with vmscan.c __GFP_FS
> +	 * rules (may_enter_fs()), which apply on a per-folio basis.
> +	 */
> +	if (!gfp_has_io_fs(sc->gfp_mask))
> +		return 0;
> +
>  #ifdef CONFIG_MEMCG_KMEM
>  	mem_cgroup_flush_stats(memcg);
>  	nr_backing = memcg_page_state(memcg, MEMCG_ZSWAP_B) >> PAGE_SHIFT;


  reply	other threads:[~2024-03-21 16:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-21  3:54 Kent Overstreet
2024-03-21  5:16 ` Chengming Zhou
2024-03-21 15:17   ` Johannes Weiner
2024-03-21 16:45     ` Kent Overstreet [this message]
2024-03-21 17:35       ` Johannes Weiner
2024-03-21 18:51         ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dvhajcs3ywifxhbho6p2zj3jjh4ro6wwoieajs2lxjv77bdgqc@5vaxwpqcyxim \
    --to=kent.overstreet@linux.dev \
    --cc=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox