From: Shakeel Butt <shakeel.butt@linux.dev>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: miklos@szeredi.hu, linux-fsdevel@vger.kernel.org,
jefflexu@linux.alibaba.com, josef@toxicpanda.com,
linux-mm@kvack.org, bernd.schubert@fastmail.fm,
kernel-team@meta.com
Subject: Re: [PATCH v4 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block
Date: Fri, 8 Nov 2024 16:16:22 -0800 [thread overview]
Message-ID: <tlwkh4pxerhijg5uz2556d4jwswctfyftpxvfjgb3tw64mewek@fy4cto4t5nmj> (raw)
In-Reply-To: <20241107235614.3637221-3-joannelkoong@gmail.com>
On Thu, Nov 07, 2024 at 03:56:10PM -0800, Joanne Koong wrote:
> Currently in shrink_folio_list(), reclaim for folios under writeback
> falls into 3 different cases:
> 1) Reclaim is encountering an excessive number of folios under
> writeback and this folio has both the writeback and reclaim flags
> set
> 2) Dirty throttling is enabled (this happens if reclaim through cgroup
> is not enabled, if reclaim through cgroupv2 memcg is enabled, or
> if reclaim is on the root cgroup), or if the folio is not marked for
> immediate reclaim, or if the caller does not have __GFP_FS (or
> __GFP_IO if it's going to swap) set
> 3) Legacy cgroupv1 encounters a folio that already has the reclaim flag
> set and the caller did not have __GFP_FS (or __GFP_IO if swap) set
>
> In cases 1) and 2), we activate the folio and skip reclaiming it while
> in case 3), we wait for writeback to finish on the folio and then try
> to reclaim the folio again. In case 3, we wait on writeback because
> cgroupv1 does not have dirty folio throttling, as such this is a
> mitigation against the case where there are too many folios in writeback
> with nothing else to reclaim.
>
> The issue is that for filesystems where writeback may block, sub-optimal
> workarounds may need to be put in place to avoid a potential deadlock
> that may arise from reclaim waiting on writeback. (Even though case 3
> above is rare given that legacy cgroupv1 is on its way to being
> deprecated, this case still needs to be accounted for). For example, for
> FUSE filesystems, a temp page gets allocated per dirty page and the
> contents of the dirty page are copied over to the temp page so that
> writeback can be immediately cleared on the dirty page in order to avoid
> the following deadlock:
> * single-threaded FUSE server is in the middle of handling a request that
> needs a memory allocation
> * memory allocation triggers direct reclaim
> * direct reclaim waits on a folio under writeback (eg falls into case 3
> above) that needs to be written back to the FUSE server
> * the FUSE server can't write back the folio since it's stuck in direct
> reclaim
>
> In this commit, if legacy memcg encounters a folio with the reclaim flag
> set (eg case 3) and the folio belongs to a mapping that has the
> AS_WRITEBACK_MAY_BLOCK flag set, the folio will be activated and skip
> reclaim (eg default to behavior in case 2) instead.
>
> This allows for the suboptimal workarounds added to address the
> "reclaim wait on writeback" deadlock scenario to be removed.
>
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
This looks good just one nit below.
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> ---
> mm/vmscan.c | 10 +++++++---
> 1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 749cdc110c74..e9755cb7211b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1110,6 +1110,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> if (writeback && folio_test_reclaim(folio))
> stat->nr_congested += nr_pages;
>
> + mapping = folio_mapping(folio);
Move the above line within folio_test_writeback() check block.
> +
> /*
> * If a folio at the tail of the LRU is under writeback, there
> * are three cases to consider.
> @@ -1129,8 +1131,9 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> * 2) Global or new memcg reclaim encounters a folio that is
> * not marked for immediate reclaim, or the caller does not
> * have __GFP_FS (or __GFP_IO if it's simply going to swap,
> - * not to fs). In this case mark the folio for immediate
> - * reclaim and continue scanning.
> + * not to fs), or writebacks in the mapping may block.
> + * In this case mark the folio for immediate reclaim and
> + * continue scanning.
> *
> * Require may_enter_fs() because we would wait on fs, which
> * may not have submitted I/O yet. And the loop driver might
> @@ -1165,7 +1168,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> /* Case 2 above */
> } else if (writeback_throttling_sane(sc) ||
> !folio_test_reclaim(folio) ||
> - !may_enter_fs(folio, sc->gfp_mask)) {
> + !may_enter_fs(folio, sc->gfp_mask) ||
> + (mapping && mapping_writeback_may_block(mapping))) {
> /*
> * This is slightly racy -
> * folio_end_writeback() might have
> --
> 2.43.5
>
next prev parent reply other threads:[~2024-11-09 0:16 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-07 23:56 [PATCH v4 0/6] fuse: remove temp page copies in writeback Joanne Koong
2024-11-07 23:56 ` [PATCH v4 1/6] mm: add AS_WRITEBACK_MAY_BLOCK mapping flag Joanne Koong
2024-11-09 0:10 ` Shakeel Butt
2024-11-11 21:11 ` Joanne Koong
2024-11-15 19:33 ` Joanne Koong
2024-11-15 20:17 ` Joanne Koong
2024-11-07 23:56 ` [PATCH v4 2/6] mm: skip reclaiming folios in legacy memcg writeback contexts that may block Joanne Koong
2024-11-09 0:16 ` Shakeel Butt [this message]
2024-11-07 23:56 ` [PATCH v4 3/6] fs/writeback: in wait_sb_inodes(), skip wait for AS_WRITEBACK_MAY_BLOCK mappings Joanne Koong
2024-11-07 23:56 ` [PATCH v4 4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails Joanne Koong
2024-11-08 17:33 ` SeongJae Park
2024-11-08 18:56 ` David Hildenbrand
2024-11-08 19:00 ` David Hildenbrand
2024-11-08 21:27 ` Shakeel Butt
2024-11-08 21:42 ` Joanne Koong
2024-11-08 22:16 ` Shakeel Butt
2024-11-08 22:20 ` Joanne Koong
2024-11-08 21:59 ` Joanne Koong
2024-11-07 23:56 ` [PATCH v4 5/6] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_MAY_BLOCK mappings Joanne Koong
2024-11-07 23:56 ` [PATCH v4 6/6] fuse: remove tmp folio for writebacks and internal rb tree Joanne Koong
2024-11-08 8:48 ` Jingbo Xu
2024-11-08 22:33 ` Joanne Koong
2024-11-11 8:32 ` Jingbo Xu
2024-11-11 21:30 ` Joanne Koong
2024-11-12 2:31 ` Jingbo Xu
2024-11-13 19:11 ` Joanne Koong
2024-11-12 9:25 ` Jingbo Xu
2024-11-14 0:39 ` Joanne Koong
2024-11-14 1:46 ` Jingbo Xu
2024-11-14 18:19 ` Joanne Koong
2024-11-15 2:18 ` Jingbo Xu
2024-11-15 18:29 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=tlwkh4pxerhijg5uz2556d4jwswctfyftpxvfjgb3tw64mewek@fy4cto4t5nmj \
--to=shakeel.butt@linux.dev \
--cc=bernd.schubert@fastmail.fm \
--cc=jefflexu@linux.alibaba.com \
--cc=joannelkoong@gmail.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox