linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Josef Bacik <josef@toxicpanda.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Dan Carpenter <dan.carpenter@linaro.org>,
	syzbot+48011b86c8ea329af1b9@syzkaller.appspotmail.com,
	Christoph Hellwig <hch@lst.de>, Peter Xu <peterx@redhat.com>
Subject: Re: [PATCH] filemap: Handle error return from __filemap_get_folio()
Date: Tue, 9 May 2023 15:19:18 -0400	[thread overview]
Message-ID: <20230509191918.GB18828@cmpxchg.org> (raw)
In-Reply-To: <CAHk-=wiZ0GaAdqyke-egjBRaqP-QdLcX=8gNk7m6Hx7rXjcXVQ@mail.gmail.com>

On Sat, May 06, 2023 at 10:04:48AM -0700, Linus Torvalds wrote:
> On Sat, May 6, 2023 at 9:35 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > And yes, the simplest fix for the "wrong test" would be to just add a
> > new "out_nofolio" error case after "out_retry", and use that.
> >
> > However, even that seems wrong, because the return value for that path
> > is the wrong one.
> 
> Actually, my suggested patch is _also_ wrong.
> 
> The problem is that we do need to return VM_FAULT_RETRY to let the
> caller know that we released the mmap_lock.
> 
> And once we return VM_FAULT_RETRY, the other error bits don't even matter.
> 
> So while I think the *right* thing to do is to return VM_FAULT_OOM |
> VM_FAULT_RETRY, that doesn't actually end up working, because if
> VM_FAULT_RETRY is set, the caller will know that "yes, mmap_lock was
> dropped", but the callers will also just ignore the other bits and
> unconditionally retry.
> 
> How very very annoying.
> 
> This was introduced several years ago by commit 6b4c9f446981
> ("filemap: drop the mmap_sem for all blocking operations").
> 
> Looking at that, we have at least one other similar error case wrong
> too: the "page_not_uptodate" case carefully checks for IO errors and
> retries only if there was no error (or for the AOP_TRUNCATED_PAGE)
> case.
> 
> For an actual IO error on page reading, it returns VM_FAULT_SIGBUS.
> 
> Except - again - for that "if (fpin) goto out_retry" case, which will
> just return VM_FAULT_RETRY and retry the fault.
> 
> I do not believe that retrying the fault is the right thing to do when
> we ran out of memory, or when we had an IO error, and I do not think
> it was intentional that the error handling was changed.

This is a while ago and the code has changed quite a bit since, so
bear with me.

Originally, we only ever did a maximum of two tries: one where the
lock might be dropped to kick off IO, then a synchronous one. IIRC the
thinking at the time was that events like OOMs and IO failures are
rare enough that doing the retry anyway (even if somewhat pointless)
and reacting to the issue then (if it persisted) was a tradeoff to
keep the retry logic simple.

Since 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times") we
don't clear FAULT_FLAG_ALLOW_RETRY anymore though, and we might see
more than one loop. At least outside the page cache. So I agree it
makes sense to look at the return value more carefully and act on
errors more timely in the arch handler.

Draft patch below. It survives a boot and a will-it-scale smoke test,
but I haven't put it through the grinder yet.

One thing that gave me pause is this comment:

	/*
	 * If we need to retry the mmap_lock has already been released,
	 * and if there is a fatal signal pending there is no guarantee
	 * that we made any progress. Handle this case first.
	 */

I think it made sense when it was added in 26178ec11ef3 ("x86: mm:
consolidate VM_FAULT_RETRY handling"). But after 39678191cd89
("x86/mm: use helper fault_signal_pending()") it's in a misleading
location, since the signal handling is above it.

So I'm removing it, but please let me know if I missed something.

---
 arch/x86/mm/fault.c | 40 +++++++++++++++++++++++-----------------
 mm/filemap.c        | 18 +++++++++++-------
 2 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index e4399983c50c..f1d242be723f 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1456,20 +1456,15 @@ void do_user_addr_fault(struct pt_regs *regs,
 		return;
 
 	/*
-	 * If we need to retry the mmap_lock has already been released,
-	 * and if there is a fatal signal pending there is no guarantee
-	 * that we made any progress. Handle this case first.
+	 * If we need to retry the mmap_lock has already been released.
 	 */
-	if (unlikely(fault & VM_FAULT_RETRY)) {
-		flags |= FAULT_FLAG_TRIED;
-		goto retry;
-	}
+	if (likely(!(fault & VM_FAULT_RETRY)))
+		mmap_read_unlock(mm);
 
-	mmap_read_unlock(mm);
 #ifdef CONFIG_PER_VMA_LOCK
 done:
 #endif
-	if (likely(!(fault & VM_FAULT_ERROR)))
+	if (likely(!(fault & (VM_FAULT_ERROR|VM_FAULT_RETRY))))
 		return;
 
 	if (fatal_signal_pending(current) && !user_mode(regs)) {
@@ -1493,15 +1488,26 @@ void do_user_addr_fault(struct pt_regs *regs,
 		 * oom-killed):
 		 */
 		pagefault_out_of_memory();
-	} else {
-		if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|
-			     VM_FAULT_HWPOISON_LARGE))
-			do_sigbus(regs, error_code, address, fault);
-		else if (fault & VM_FAULT_SIGSEGV)
-			bad_area_nosemaphore(regs, error_code, address);
-		else
-			BUG();
+		return;
+	}
+
+	if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|
+		     VM_FAULT_HWPOISON_LARGE)) {
+		do_sigbus(regs, error_code, address, fault);
+		return;
 	}
+
+	if (fault & VM_FAULT_SIGSEGV) {
+		bad_area_nosemaphore(regs, error_code, address);
+		return;
+	}
+
+	if (fault & VM_FAULT_RETRY) {
+		flags |= FAULT_FLAG_TRIED;
+		goto retry;
+	}
+
+	BUG();
 }
 NOKPROBE_SYMBOL(do_user_addr_fault);
 
diff --git a/mm/filemap.c b/mm/filemap.c
index b4c9bd368b7e..f97ca5045c19 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3290,10 +3290,11 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 					  FGP_CREAT|FGP_FOR_MMAP,
 					  vmf->gfp_mask);
 		if (IS_ERR(folio)) {
+			ret = VM_FAULT_OOM;
 			if (fpin)
 				goto out_retry;
 			filemap_invalidate_unlock_shared(mapping);
-			return VM_FAULT_OOM;
+			return ret;
 		}
 	}
 
@@ -3362,15 +3363,18 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	 */
 	fpin = maybe_unlock_mmap_for_io(vmf, fpin);
 	error = filemap_read_folio(file, mapping->a_ops->read_folio, folio);
-	if (fpin)
-		goto out_retry;
 	folio_put(folio);
-
-	if (!error || error == AOP_TRUNCATED_PAGE)
+	folio = NULL;
+	if (!error || error == AOP_TRUNCATED_PAGE) {
+		if (fpin)
+			goto out_retry;
 		goto retry_find;
+	}
+	ret = VM_FAULT_SIGBUS;
+	if (fpin)
+		goto out_retry;
 	filemap_invalidate_unlock_shared(mapping);
-
-	return VM_FAULT_SIGBUS;
+	return ret;
 
 out_retry:
 	/*
-- 
2.40.1


  parent reply	other threads:[~2023-05-09 19:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-06 16:04 Matthew Wilcox (Oracle)
2023-05-06 16:09 ` Linus Torvalds
2023-05-06 16:35   ` Linus Torvalds
2023-05-06 17:04     ` Linus Torvalds
2023-05-06 17:10       ` Linus Torvalds
2023-05-06 17:34         ` Linus Torvalds
2023-05-06 17:41           ` Andrew Morton
2023-05-08 13:56             ` Dan Carpenter
2023-05-09  7:43               ` Dan Carpenter
2023-05-09 17:37                 ` Linus Torvalds
2023-05-09 20:49                   ` Christoph Hellwig
2023-05-11  9:44                   ` Dan Carpenter
2023-05-09 19:19       ` Johannes Weiner [this message]
2023-05-10 20:27         ` Peter Xu
2023-05-10 21:33           ` Linus Torvalds
2023-05-10 21:44             ` Linus Torvalds
2023-05-11  4:45               ` Peter Xu
2023-05-12  0:14                 ` Peter Xu
2023-05-12  3:28                   ` [PATCH 1/3] mm: handle_mm_fault_one() kernel test robot
2023-05-12  3:52                   ` kernel test robot
2023-05-12  3:52                   ` kernel test robot
2023-05-12  4:49                   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230509191918.GB18828@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=dan.carpenter@linaro.org \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=syzbot+48011b86c8ea329af1b9@syzkaller.appspotmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox