linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Shivank Garg <shivankg@amd.com>
Cc: David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Zi Yan <ziy@nvidia.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Nico Pache <npache@redhat.com>,
	Ryan Roberts <ryan.roberts@arm.com>, Dev Jain <dev.jain@arm.com>,
	Barry Song <baohua@kernel.org>, Lance Yang <lance.yang@linux.dev>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	<linux-trace-kernel@vger.kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Zach O'Keefe <zokeefe@google.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>,
	Stephen Rothwell <sfr@canb.auug.org.au>
Subject: Re: [PATCH V5 0/2] mm/khugepaged: fix dirty page handling for MADV_COLLAPSE
Date: Sun, 18 Jan 2026 12:22:29 -0800	[thread overview]
Message-ID: <20260118122229.dcdda884bbb19a9c30ec6f1e@linux-foundation.org> (raw)
In-Reply-To: <20260118190939.8986-2-shivankg@amd.com>

On Sun, 18 Jan 2026 19:09:38 +0000 Shivank Garg <shivankg@amd.com> wrote:

> MADV_COLLAPSE on file-backed mappings fails with -EINVAL when TEXT pages
> are dirty. This affects scenarios like package/container updates or
> executing binaries immediately after writing them, etc.
> 
> The issue is that collapse_file() triggers async writeback and returns
> SCAN_FAIL (maps to -EINVAL), expecting khugepaged to revisit later. But
> MADV_COLLAPSE is synchronous and userspace expects immediate success or
> a clear retry signal.
> 
> Reproduction:
>  - Compile or copy 2MB-aligned executable to XFS/ext4 FS
>  - Call MADV_COLLAPSE on .text section
>  - First call fails with -EINVAL (text pages dirty from copy)
>  - Second call succeeds (async writeback completed)
> 
> Issue Report:
> https://lore.kernel.org/all/4e26fe5e-7374-467c-a333-9dd48f85d7cc@amd.com

Updated, thanks.

Please tolerate a little whining about the timeliess here.  We're at
-rc6, v4 was added to mm.git over a month ago, had quite a lot of
review, this is very close to being moved into the mm-stable branch and now
we get v5.  Argh.

> V5:
> - In patch 2/2, Simplify dirty writeback retry logic (David)

Are you sure this is the only change?  It looks like a lot for a
simplification and I'm wondering if we should retain the v4 series and
defer a simplification for separate consideration during the next
cycle.

Below is how this updated altered mm.git.  Could reviewers please check
this fairly soon?


--- a/mm/khugepaged.c~b
+++ a/mm/khugepaged.c
@@ -2788,11 +2788,11 @@ int madvise_collapse(struct vm_area_stru
 	hend = end & HPAGE_PMD_MASK;
 
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
-		bool retried = false;
 		int result = SCAN_FAIL;
+		bool triggered_wb = false;
 
-		if (!mmap_locked) {
 retry:
+		if (!mmap_locked) {
 			cond_resched();
 			mmap_read_lock(mm);
 			mmap_locked = true;
@@ -2812,52 +2812,27 @@ retry:
 
 			mmap_read_unlock(mm);
 			mmap_locked = false;
+			*lock_dropped = true;
 			result = hpage_collapse_scan_file(mm, addr, file, pgoff,
 							  cc);
-			fput(file);
-		} else {
-			result = hpage_collapse_scan_pmd(mm, vma, addr,
-							 &mmap_locked, cc);
-		}
-		if (!mmap_locked)
-			*lock_dropped = true;
-
-		/*
-		 * If the file-backed VMA has dirty pages, the scan triggers
-		 * async writeback and returns SCAN_PAGE_DIRTY_OR_WRITEBACK.
-		 * Since MADV_COLLAPSE is sync, we force sync writeback and
-		 * retry once.
-		 */
-		if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !retried) {
-			/*
-			 * File scan drops the lock. We must re-acquire it to
-			 * safely inspect the VMA and hold the file reference.
-			 */
-			if (!mmap_locked) {
-				cond_resched();
-				mmap_read_lock(mm);
-				mmap_locked = true;
-				result = hugepage_vma_revalidate(mm, addr, false, &vma, cc);
-				if (result != SCAN_SUCCEED)
-					goto handle_result;
-			}
 
-			if (!vma_is_anonymous(vma) && vma->vm_file &&
-			    mapping_can_writeback(vma->vm_file->f_mapping)) {
-				struct file *file = get_file(vma->vm_file);
-				pgoff_t pgoff = linear_page_index(vma, addr);
+			if (result == SCAN_PAGE_DIRTY_OR_WRITEBACK && !triggered_wb &&
+			    mapping_can_writeback(file->f_mapping)) {
 				loff_t lstart = (loff_t)pgoff << PAGE_SHIFT;
 				loff_t lend = lstart + HPAGE_PMD_SIZE - 1;
 
-				mmap_read_unlock(mm);
-				mmap_locked = false;
-				*lock_dropped = true;
 				filemap_write_and_wait_range(file->f_mapping, lstart, lend);
+				triggered_wb = true;
 				fput(file);
-				retried = true;
 				goto retry;
 			}
+			fput(file);
+		} else {
+			result = hpage_collapse_scan_pmd(mm, vma, addr,
+							 &mmap_locked, cc);
 		}
+		if (!mmap_locked)
+			*lock_dropped = true;
 
 handle_result:
 		switch (result) {
_



  parent reply	other threads:[~2026-01-18 20:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-18 19:09 Shivank Garg
2026-01-18 19:09 ` [PATCH V5 1/2] mm/khugepaged: map dirty/writeback pages failures to EAGAIN Shivank Garg
2026-01-18 19:09 ` [PATCH V5 2/2] mm/khugepaged: retry with sync writeback for MADV_COLLAPSE Shivank Garg
2026-01-19 10:08   ` David Hildenbrand (Red Hat)
2026-01-22 11:07   ` Dev Jain
2026-01-22 22:34     ` David Hildenbrand (Red Hat)
2026-01-18 20:22 ` Andrew Morton [this message]
2026-01-19  5:07   ` [PATCH V5 0/2] mm/khugepaged: fix dirty page handling " Garg, Shivank
2026-01-19  9:58     ` David Hildenbrand (Red Hat)
2026-01-19 10:54       ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260118122229.dcdda884bbb19a9c30ec6f1e@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=npache@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=ryan.roberts@arm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=shivankg@amd.com \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox