linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeel.butt@linux.dev>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
	 jefflexu@linux.alibaba.com, josef@toxicpanda.com,
	linux-mm@kvack.org,  bernd.schubert@fastmail.fm,
	kernel-team@meta.com
Subject: Re: [PATCH v4 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block
Date: Fri, 8 Nov 2024 16:16:22 -0800	[thread overview]
Message-ID: <tlwkh4pxerhijg5uz2556d4jwswctfyftpxvfjgb3tw64mewek@fy4cto4t5nmj> (raw)
In-Reply-To: <20241107235614.3637221-3-joannelkoong@gmail.com>

On Thu, Nov 07, 2024 at 03:56:10PM -0800, Joanne Koong wrote:
> Currently in shrink_folio_list(), reclaim for folios under writeback
> falls into 3 different cases:
> 1) Reclaim is encountering an excessive number of folios under
>    writeback and this folio has both the writeback and reclaim flags
>    set
> 2) Dirty throttling is enabled (this happens if reclaim through cgroup
>    is not enabled, if reclaim through cgroupv2 memcg is enabled, or
>    if reclaim is on the root cgroup), or if the folio is not marked for
>    immediate reclaim, or if the caller does not have __GFP_FS (or
>    __GFP_IO if it's going to swap) set
> 3) Legacy cgroupv1 encounters a folio that already has the reclaim flag
>    set and the caller did not have __GFP_FS (or __GFP_IO if swap) set
> 
> In cases 1) and 2), we activate the folio and skip reclaiming it while
> in case 3), we wait for writeback to finish on the folio and then try
> to reclaim the folio again. In case 3, we wait on writeback because
> cgroupv1 does not have dirty folio throttling, as such this is a
> mitigation against the case where there are too many folios in writeback
> with nothing else to reclaim.
> 
> The issue is that for filesystems where writeback may block, sub-optimal
> workarounds may need to be put in place to avoid a potential deadlock
> that may arise from reclaim waiting on writeback. (Even though case 3
> above is rare given that legacy cgroupv1 is on its way to being
> deprecated, this case still needs to be accounted for). For example, for
> FUSE filesystems, a temp page gets allocated per dirty page and the
> contents of the dirty page are copied over to the temp page so that
> writeback can be immediately cleared on the dirty page in order to avoid
> the following deadlock:
> * single-threaded FUSE server is in the middle of handling a request that
>   needs a memory allocation
> * memory allocation triggers direct reclaim
> * direct reclaim waits on a folio under writeback (eg falls into case 3
>   above) that needs to be written back to the FUSE server
> * the FUSE server can't write back the folio since it's stuck in direct
>   reclaim
> 
> In this commit, if legacy memcg encounters a folio with the reclaim flag
> set (eg case 3) and the folio belongs to a mapping that has the
> AS_WRITEBACK_MAY_BLOCK flag set, the folio will be activated and skip
> reclaim (eg default to behavior in case 2) instead.
> 
> This allows for the suboptimal workarounds added to address the
> "reclaim wait on writeback" deadlock scenario to be removed.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>

This looks good just one nit below.

Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>

> ---
>  mm/vmscan.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 749cdc110c74..e9755cb7211b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1110,6 +1110,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  		if (writeback && folio_test_reclaim(folio))
>  			stat->nr_congested += nr_pages;
>  
> +		mapping = folio_mapping(folio);

Move the above line within folio_test_writeback() check block.

> +
>  		/*
>  		 * If a folio at the tail of the LRU is under writeback, there
>  		 * are three cases to consider.
> @@ -1129,8 +1131,9 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  		 * 2) Global or new memcg reclaim encounters a folio that is
>  		 *    not marked for immediate reclaim, or the caller does not
>  		 *    have __GFP_FS (or __GFP_IO if it's simply going to swap,
> -		 *    not to fs). In this case mark the folio for immediate
> -		 *    reclaim and continue scanning.
> +		 *    not to fs), or writebacks in the mapping may block.
> +		 *    In this case mark the folio for immediate reclaim and
> +		 *    continue scanning.
>  		 *
>  		 *    Require may_enter_fs() because we would wait on fs, which
>  		 *    may not have submitted I/O yet. And the loop driver might
> @@ -1165,7 +1168,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
>  			/* Case 2 above */
>  			} else if (writeback_throttling_sane(sc) ||
>  			    !folio_test_reclaim(folio) ||
> -			    !may_enter_fs(folio, sc->gfp_mask)) {
> +			    !may_enter_fs(folio, sc->gfp_mask) ||
> +			    (mapping && mapping_writeback_may_block(mapping))) {
>  				/*
>  				 * This is slightly racy -
>  				 * folio_end_writeback() might have
> -- 
> 2.43.5
> 


  reply	other threads:[~2024-11-09  0:16 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-07 23:56 [PATCH v4 0/6] fuse: remove temp page copies in writeback Joanne Koong
2024-11-07 23:56 ` [PATCH v4 1/6] mm: add AS_WRITEBACK_MAY_BLOCK mapping flag Joanne Koong
2024-11-09  0:10   ` Shakeel Butt
2024-11-11 21:11     ` Joanne Koong
2024-11-15 19:33       ` Joanne Koong
2024-11-15 20:17         ` Joanne Koong
2024-11-07 23:56 ` [PATCH v4 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block Joanne Koong
2024-11-09  0:16   ` Shakeel Butt [this message]
2024-11-07 23:56 ` [PATCH v4 3/6] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_MAY_BLOCK mappings Joanne Koong
2024-11-07 23:56 ` [PATCH v4 4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails Joanne Koong
2024-11-08 17:33   ` SeongJae Park
2024-11-08 18:56     ` David Hildenbrand
2024-11-08 19:00       ` David Hildenbrand
2024-11-08 21:27         ` Shakeel Butt
2024-11-08 21:42           ` Joanne Koong
2024-11-08 22:16             ` Shakeel Butt
2024-11-08 22:20               ` Joanne Koong
2024-11-08 21:59     ` Joanne Koong
2024-11-07 23:56 ` [PATCH v4 5/6] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_MAY_BLOCK mappings Joanne Koong
2024-11-07 23:56 ` [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2024-11-08  8:48   ` Jingbo Xu
2024-11-08 22:33     ` Joanne Koong
2024-11-11  8:32   ` Jingbo Xu
2024-11-11 21:30     ` Joanne Koong
2024-11-12  2:31       ` Jingbo Xu
2024-11-13 19:11         ` Joanne Koong
2024-11-12  9:25   ` Jingbo Xu
2024-11-14  0:39     ` Joanne Koong
2024-11-14  1:46       ` Jingbo Xu
2024-11-14 18:19         ` Joanne Koong
2024-11-15  2:18           ` Jingbo Xu
2024-11-15 18:29             ` Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=tlwkh4pxerhijg5uz2556d4jwswctfyftpxvfjgb3tw64mewek@fy4cto4t5nmj \
    --to=shakeel.butt@linux.dev \
    --cc=bernd.schubert@fastmail.fm \
    --cc=jefflexu@linux.alibaba.com \
    --cc=joannelkoong@gmail.com \
    --cc=josef@toxicpanda.com \
    --cc=kernel-team@meta.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox