From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B2DCC77B75 for ; Tue, 9 May 2023 19:19:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EA35D6B0071; Tue, 9 May 2023 15:19:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E2C4B6B0072; Tue, 9 May 2023 15:19:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CCCA26B0074; Tue, 9 May 2023 15:19:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id BA7A36B0071 for ; Tue, 9 May 2023 15:19:25 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 83004A0974 for ; Tue, 9 May 2023 19:19:25 +0000 (UTC) X-FDA: 80771680290.19.669FF85 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by imf25.hostedemail.com (Postfix) with ESMTP id 12858A0012 for ; Tue, 9 May 2023 19:19:21 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=yqgd4aRs; spf=pass (imf25.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.181 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683659962; a=rsa-sha256; cv=none; b=Co/J2+UTm6bQvhwDkfJS9UTmRnmyFRs8PTsU5C70a3XaDYk7YG97FAveP4guRfSib9+3YJ zZWM20RCOfPXvqfYpvjmMVc/Y/kci3B8UCLPzzpfqhafsIT86srP8jwthLt6alqRDZTIZU E0pPFGDoEMnGsmXvDI2NmJqbmF5XFuw= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=yqgd4aRs; spf=pass (imf25.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.181 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683659962; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ncIdxq+s6e7PHgHqdU6MtN0Ghm+TIz8m0btjprVaInQ=; b=an3u55olHtzRMbKHVbiAbPoTxwzspJewGav+mYERwf6KwfZx4wIhYtDK5RWQ/GFnc4On/T jAsyTC+wvfMXesPjrtcnd41lQ6sq11oAinSu0sXY/HnzZMLba9VXRbhzmBoF/tPuF6ha+q r5i4iIKWAVHpY9gSMMCx7+Dfe7JCNGo= Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-75773252cbfso321506385a.2 for ; Tue, 09 May 2023 12:19:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1683659961; x=1686251961; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=ncIdxq+s6e7PHgHqdU6MtN0Ghm+TIz8m0btjprVaInQ=; b=yqgd4aRsRKTcq8htOIadwJl/9OwA1DDCIIcBjMRTcXJ8Ue1QsW2oVWXylY69hy9qmm LDzVq0u/zdCFedu1dsWjKQlM53+b4YXmaC8GBdqFMDnTEuhRuRnotcJH4RMUuejyYgv+ H9cwm4u3jJg1jqnNsQ2NgCIRwERhNw6Ud5LeEnXJEZ0c5QYUhPziTf6xI2KHiSdoWsto O3StfRrMGDoMqtmbctJ+3eB43KhfD1hShm2SbBNQ5ULSDkTuwUwlrf4dOYmC7AW81H3K QONqXW/6jFzP88UyJ3cpgf+e7pnpD6ikqPPy21xzUCJ8yFEujmlj6+7ul1zTj9HG+Ipp j0zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683659961; x=1686251961; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ncIdxq+s6e7PHgHqdU6MtN0Ghm+TIz8m0btjprVaInQ=; b=EWfjlx0fmqlaSiPcC/wrrqrtf+21qPtRvk27kVdsFh5/iCTcbP71ZKqJIfgWa/Jw/n v9bCyWRG663uxW9qqJVbnLgmf4Xg1D+GRFCoyRnZ+AXK7NpVf9592e+vHcbSp/YfZwY1 pUGer93e+aBU1U2uu835G4c/062S/JoTAIoqBQk5uWAgm9EqqlRi6aaEqBMrXEOumRkW K4H6Rir6lPad5P+YKnuRz+w116LwTmh/9zQxrx+Rz8OJeezWLpFd8FzMdh9f345pKXOs mIGpK60agkxmRRZtrWFpfSlu4xg0biZ+zYx3OQ+eo6QDRIZLINu3m0i4z4WTlq8VNDwF jALA== X-Gm-Message-State: AC+VfDz69bXiPzLSQdnvuJSaJ14o0P9JZOCpmTYsOoKCyOWL4srLVh8e u3r2QLHD0P+25Mrk7+hMLhLHGg== X-Google-Smtp-Source: ACHHUZ7dEwDVwNDnKLlFr1kt52j6G7sN6LzdN0IYwishxAudW8Ajtwwq1oce8EAghBtXgS+CHbMqKA== X-Received: by 2002:ac8:5c50:0:b0:3ef:25ad:27fb with SMTP id j16-20020ac85c50000000b003ef25ad27fbmr25568244qtj.30.1683659960948; Tue, 09 May 2023 12:19:20 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-8f57-5681-ccd3-4a2e.res6.spectrum.com. [2603:7000:c01:2716:8f57:5681:ccd3:4a2e]) by smtp.gmail.com with ESMTPSA id g11-20020a05620a13cb00b007456b51ee13sm3476717qkl.16.2023.05.09.12.19.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 May 2023 12:19:20 -0700 (PDT) Date: Tue, 9 May 2023 15:19:18 -0400 From: Johannes Weiner To: Linus Torvalds Cc: "Matthew Wilcox (Oracle)" , Josef Bacik , Andrew Morton , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Dan Carpenter , syzbot+48011b86c8ea329af1b9@syzkaller.appspotmail.com, Christoph Hellwig , Peter Xu Subject: Re: [PATCH] filemap: Handle error return from __filemap_get_folio() Message-ID: <20230509191918.GB18828@cmpxchg.org> References: <20230506160415.2992089-1-willy@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 12858A0012 X-Stat-Signature: rxdppz7rdao7eu7s1duxcpfpuxzhm9yk X-HE-Tag: 1683659961-742124 X-HE-Meta: U2FsdGVkX19r3i/Jv/z2sbj3xOX5ctBcybKOPRE8TRbGxNY4nu+9XWvqieA7I89gDi1UTLFuLYfjURV4kgXK9rDTCo74uBwIrYXyvfW5EnuoREwRozCq9v7VZKBHEugYjOeI7zyNNXhL3qlRw3TiBNmJXLudGZK4yw4QrHXPG/gxuo/GBC2WzkQ8WLv2Qn48m1b+1ZrseCMOerex48AXf0VGCk+Fg2MulxCqucB/9mTBLoWTCzuZ9f45YO1Rj75LaY7qTEpjOLFv1loUeecZHubjIx7Tmqc2mkfu80EfEqab9k3MWnLcMh/QKjKCPk40GjVX4iTUHnHiANik3Kx5ttWQdzhVpYBLFJ5gUhk4WaTj7jTRMqSBrdDpm7zub+ebUUh7W0k+xYqyqIayTKts4jMYmd+0X3WVQoa7Cpyxj9RuDZZEJbuQASPF77eRXScegVDYL/DDB2FpXI3/5iCFifiNrVcqNaHb2frIm5TeysQzj9Exz8iS3yguhOkTaPIb8R+fbByCTbTi19xkrrhAqZjslLmkM7UYJPu1vetC7pKeE8pqMd9MqF//vyjsCjFusK6hpfwgnWy51af9SpIa3kfXu0rcX1T+tv8TIn16nhLbKogjJ3Htf9gRg9AvBEDdLnjSKJWUlbsNoY3eFT6bBHQj7UZ1TCmlDs/EysNkSJg57f7kpbzSv3Pph4OE1ci96mgapALD9j3V/Z9lbfViAfTh7wxjdg4GIVfTlSJjJ6qgq1+yh81jRsHCR83DKLwmgeRTEcrDq/+3LBdfIOyyAw2W/MzIZPfgkmm+54R7SN9FpsrDWjlZeadPcgc9QQsECtbhwdtQ/3AT15hIXdpx9MufNi9HNEv5pmoYv7ZGD0Bd/kubRMp4ofqs7972ijf3FO6hgSgc5C70N3rjHCAzQNM/OOOJqO4lwGH6lw7BweFfIKpQ1a879xCRzv+7WWZc6+6UWZQJHY/Rr9NpxaF qVv5bZmy cysGmhwuIvFvSHksTQRMC9Jj7pTKHcsZjBRfE4PdqLGvl+0hhy6mCAWg/gH8vnXKU87e9gvSl8tC+k+OuyknP6vfhmmx5nh3iPSk7VSyiwCIqX6IhcjAFnC23M/HbxIlYL2TaygGQW6OF8C4IVx5MS73S1hNNCTjYPiKQL4HP/KDsilCEZ6yqTJkvCeCXf7dwYCVpISYzPSw0uUpqxWu7epXsAFFkWZ+LQQhIghmy0sC/uWroKgO+2Yu72ZSfAOFHDSPsajgM7s9CJMFRDwPc6F1KnK2c7bSGngjwQa7SZYHlZVSzyZkDcHvm2TOHph9rfjvLqiMviAm8FlEqlcIRSjPlGzomtdEZFwcMAfp4wOwJ0iQPBOaM3uc/bBxD1A0D6lpYF5TqpVpnB6fE1mWHQJ8SbEx/7p1jFtdjv3JiGO+buFTL/r0gwKJxzU3frwxkGoKCYrsuTKRMaBYEweK837vXKJB24Nsr8dhecRTZlkz+74JoR7dJJGKG+F9JdVOmOrSMtMFUsiqyiaYq92EKmcAf0eEYtM2HgKEWQCjDEyYAAARiF1rv0ReAWDVGxcVK8I1jNlyQ+jVQHps4Q2Ds6AmjDtVvBVigP2dx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, May 06, 2023 at 10:04:48AM -0700, Linus Torvalds wrote: > On Sat, May 6, 2023 at 9:35 AM Linus Torvalds > wrote: > > > > And yes, the simplest fix for the "wrong test" would be to just add a > > new "out_nofolio" error case after "out_retry", and use that. > > > > However, even that seems wrong, because the return value for that path > > is the wrong one. > > Actually, my suggested patch is _also_ wrong. > > The problem is that we do need to return VM_FAULT_RETRY to let the > caller know that we released the mmap_lock. > > And once we return VM_FAULT_RETRY, the other error bits don't even matter. > > So while I think the *right* thing to do is to return VM_FAULT_OOM | > VM_FAULT_RETRY, that doesn't actually end up working, because if > VM_FAULT_RETRY is set, the caller will know that "yes, mmap_lock was > dropped", but the callers will also just ignore the other bits and > unconditionally retry. > > How very very annoying. > > This was introduced several years ago by commit 6b4c9f446981 > ("filemap: drop the mmap_sem for all blocking operations"). > > Looking at that, we have at least one other similar error case wrong > too: the "page_not_uptodate" case carefully checks for IO errors and > retries only if there was no error (or for the AOP_TRUNCATED_PAGE) > case. > > For an actual IO error on page reading, it returns VM_FAULT_SIGBUS. > > Except - again - for that "if (fpin) goto out_retry" case, which will > just return VM_FAULT_RETRY and retry the fault. > > I do not believe that retrying the fault is the right thing to do when > we ran out of memory, or when we had an IO error, and I do not think > it was intentional that the error handling was changed. This is a while ago and the code has changed quite a bit since, so bear with me. Originally, we only ever did a maximum of two tries: one where the lock might be dropped to kick off IO, then a synchronous one. IIRC the thinking at the time was that events like OOMs and IO failures are rare enough that doing the retry anyway (even if somewhat pointless) and reacting to the issue then (if it persisted) was a tradeoff to keep the retry logic simple. Since 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times") we don't clear FAULT_FLAG_ALLOW_RETRY anymore though, and we might see more than one loop. At least outside the page cache. So I agree it makes sense to look at the return value more carefully and act on errors more timely in the arch handler. Draft patch below. It survives a boot and a will-it-scale smoke test, but I haven't put it through the grinder yet. One thing that gave me pause is this comment: /* * If we need to retry the mmap_lock has already been released, * and if there is a fatal signal pending there is no guarantee * that we made any progress. Handle this case first. */ I think it made sense when it was added in 26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling"). But after 39678191cd89 ("x86/mm: use helper fault_signal_pending()") it's in a misleading location, since the signal handling is above it. So I'm removing it, but please let me know if I missed something. --- arch/x86/mm/fault.c | 40 +++++++++++++++++++++++----------------- mm/filemap.c | 18 +++++++++++------- 2 files changed, 34 insertions(+), 24 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index e4399983c50c..f1d242be723f 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1456,20 +1456,15 @@ void do_user_addr_fault(struct pt_regs *regs, return; /* - * If we need to retry the mmap_lock has already been released, - * and if there is a fatal signal pending there is no guarantee - * that we made any progress. Handle this case first. + * If we need to retry the mmap_lock has already been released. */ - if (unlikely(fault & VM_FAULT_RETRY)) { - flags |= FAULT_FLAG_TRIED; - goto retry; - } + if (likely(!(fault & VM_FAULT_RETRY))) + mmap_read_unlock(mm); - mmap_read_unlock(mm); #ifdef CONFIG_PER_VMA_LOCK done: #endif - if (likely(!(fault & VM_FAULT_ERROR))) + if (likely(!(fault & (VM_FAULT_ERROR|VM_FAULT_RETRY)))) return; if (fatal_signal_pending(current) && !user_mode(regs)) { @@ -1493,15 +1488,26 @@ void do_user_addr_fault(struct pt_regs *regs, * oom-killed): */ pagefault_out_of_memory(); - } else { - if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| - VM_FAULT_HWPOISON_LARGE)) - do_sigbus(regs, error_code, address, fault); - else if (fault & VM_FAULT_SIGSEGV) - bad_area_nosemaphore(regs, error_code, address); - else - BUG(); + return; + } + + if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| + VM_FAULT_HWPOISON_LARGE)) { + do_sigbus(regs, error_code, address, fault); + return; } + + if (fault & VM_FAULT_SIGSEGV) { + bad_area_nosemaphore(regs, error_code, address); + return; + } + + if (fault & VM_FAULT_RETRY) { + flags |= FAULT_FLAG_TRIED; + goto retry; + } + + BUG(); } NOKPROBE_SYMBOL(do_user_addr_fault); diff --git a/mm/filemap.c b/mm/filemap.c index b4c9bd368b7e..f97ca5045c19 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3290,10 +3290,11 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) FGP_CREAT|FGP_FOR_MMAP, vmf->gfp_mask); if (IS_ERR(folio)) { + ret = VM_FAULT_OOM; if (fpin) goto out_retry; filemap_invalidate_unlock_shared(mapping); - return VM_FAULT_OOM; + return ret; } } @@ -3362,15 +3363,18 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) */ fpin = maybe_unlock_mmap_for_io(vmf, fpin); error = filemap_read_folio(file, mapping->a_ops->read_folio, folio); - if (fpin) - goto out_retry; folio_put(folio); - - if (!error || error == AOP_TRUNCATED_PAGE) + folio = NULL; + if (!error || error == AOP_TRUNCATED_PAGE) { + if (fpin) + goto out_retry; goto retry_find; + } + ret = VM_FAULT_SIGBUS; + if (fpin) + goto out_retry; filemap_invalidate_unlock_shared(mapping); - - return VM_FAULT_SIGBUS; + return ret; out_retry: /* -- 2.40.1