linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zhongkun He <hezhongkun.hzk@bytedance.com>
To: akpm@linux-foundation.org, tytso@mit.edu, jack@suse.com,
	hannes@cmpxchg.org, mhocko@kernel.org
Cc: muchun.song@linux.dev, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	cgroups@vger.kernel.org,
	Zhongkun He <hezhongkun.hzk@bytedance.com>,
	Muchun Song <songmuchun@bytedance.com>
Subject: [PATCH 2/2] jbd2: mark the transaction context with the scope PF_MEMALLOC_ACFORCE context
Date: Wed, 18 Jun 2025 19:39:58 +0800	[thread overview]
Message-ID: <81b1f3df0379b0e34bdf239d36d4d9aeb4bee9cf.1750234270.git.hezhongkun.hzk@bytedance.com> (raw)
In-Reply-To: <cover.1750234270.git.hezhongkun.hzk@bytedance.com>

The jbd2 handle, associated with filesystem metadata,
can be held during direct reclaim when a memcg limit is hit.
This prevents other tasks from writing pages, resulting
in shrink failures due to dirty pages that cannot be written
back. These shrink failures may leave many tasks stuck in
the uninterruptible (D) state. The OOM killer may select a
victim and return success, allowing the current thread to
retry the memory charge. However, the selected task cannot
respond to the SIGKILL because it is also stuck in the
uninterruptible state. As a result, the charging task resets
nr_retries and attempts reclaim again, but the victim never
exits. This leads to a prolonged retry loop in direct reclaim
with the jbd2 handle held, significantly extending its hold
time and potentially causing a system-wide block.

We found that a related issue has been reported and
partially addressed in previous fixes [1][2].
However, those fixes only skip direct reclaim and
return a failure for some cases like readahead
requests. Since sb_getblk() is called multiple
times in __ext4_get_inode_loc() with the NOFAIL flag,
the problem still persists.

So call the memalloc_account_force_save() to charge
the pages and delay the direct reclaim util return to
userland, to release the global resource jbd2 handle.

[1]:https://lore.kernel.org/linux-fsdevel/20230811071519.1094-1-teawaterz@linux.alibaba.com/
[2]:https://lore.kernel.org/all/20230914150011.843330-1-willy@infradead.org/T/#u

Co-developed-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
---
 fs/jbd2/transaction.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index c7867139af69..d05847301a8f 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -448,6 +448,13 @@ static int start_this_handle(journal_t *journal, handle_t *handle,
 	 * going to recurse back to the fs layer.
 	 */
 	handle->saved_alloc_context = memalloc_nofs_save();
+
+	/*
+	 * Avoid blocking on jbd2 handler in memcg direct reclaim
+	 * which may otherwise lead to system-wide stalls.
+	 */
+	handle->saved_alloc_context |= memalloc_account_force_save();
+
 	return 0;
 }
 
@@ -733,10 +740,10 @@ static void stop_this_handle(handle_t *handle)
 
 	rwsem_release(&journal->j_trans_commit_map, _THIS_IP_);
 	/*
-	 * Scope of the GFP_NOFS context is over here and so we can restore the
-	 * original alloc context.
+	 * Scope of the GFP_NOFS and PF_MEMALLOC_ACCOUNTFORCE context
+	 * is over here and so we can restore the original alloc context.
 	 */
-	memalloc_nofs_restore(handle->saved_alloc_context);
+	memalloc_flags_restore(handle->saved_alloc_context);
 }
 
 /**
@@ -1838,7 +1845,7 @@ int jbd2_journal_stop(handle_t *handle)
 		 * Handle is already detached from the transaction so there is
 		 * nothing to do other than free the handle.
 		 */
-		memalloc_nofs_restore(handle->saved_alloc_context);
+		memalloc_flags_restore(handle->saved_alloc_context);
 		goto free_and_exit;
 	}
 	journal = transaction->t_journal;
-- 
2.39.5



  parent reply	other threads:[~2025-06-18 11:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-18 11:39 [PATCH 0/2] Postpone memcg reclaim to return-to-user path Zhongkun He
2025-06-18 11:39 ` [PATCH 1/2] mm: memcg: introduce PF_MEMALLOC_ACCOUNTFORCE to postpone reclaim to return-to-userland path Zhongkun He
2025-06-24 14:47   ` Dan Carpenter
2025-06-18 11:39 ` Zhongkun He [this message]
2025-06-18 22:37 ` [PATCH 0/2] Postpone memcg reclaim to return-to-user path Shakeel Butt
2025-06-19  8:04   ` Jan Kara
2025-06-24  7:45     ` [External] " Zhongkun He
2025-06-24  7:44   ` Zhongkun He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=81b1f3df0379b0e34bdf239d36d4d9aeb4bee9cf.1750234270.git.hezhongkun.hzk@bytedance.com \
    --to=hezhongkun.hzk@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=songmuchun@bytedance.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox