From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A023C77B7C for ; Wed, 10 May 2023 20:27:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8796F6B0072; Wed, 10 May 2023 16:27:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8294C6B0074; Wed, 10 May 2023 16:27:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F0F06B0075; Wed, 10 May 2023 16:27:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6038A6B0072 for ; Wed, 10 May 2023 16:27:40 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 0F9B740B5B for ; Wed, 10 May 2023 20:27:40 +0000 (UTC) X-FDA: 80775481080.15.D484294 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf16.hostedemail.com (Postfix) with ESMTP id B689C18000D for ; Wed, 10 May 2023 20:27:37 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ftTZYbIj; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683750457; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9GYo8N3GVBW0oULuq3gkozsZNhPHI8kWzSKmHgQI1Vo=; b=MnB0eYXWlMGgGY++9yp5onl3gUq/C2Vh17U7f6PDjRc6og4KTSk2SymW2Ls1I10KYUDetn IugXViqvolvSiK6r5MIW0bawzSa8OVOc7M4cOciFsM1Vqu94auylwE0Rfm1xDD5+9BvnJn mUusETRcVhu9UZtCMnTz35WnMnKkluU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683750457; a=rsa-sha256; cv=none; b=qjGVtlBL8w7LbeczrwuvAWM/W0y7msgVhMrbmEJ7ZW7ZLHpi+/x/fOCgnYemXhI0QbJvik l/LPoUTDflka65TG3bn82v+IlX/YKrZBq1JZLAkBZvriAt9GWX/C1rjS/vKkijcQwZpiD4 mNBfLMdrAWnh123v/kfqs6foZuMjX3U= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ftTZYbIj; spf=pass (imf16.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683750456; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9GYo8N3GVBW0oULuq3gkozsZNhPHI8kWzSKmHgQI1Vo=; b=ftTZYbIjUPCHUZOHTKpYPhwd+H+eisItbPD7Q5ADJhtEtSo/aJg9kfJojfceY5cCJoWaS6 9L3bPsYYNenqJLM1/pO9VFTYxj0uZaN7PLF+WLHS+oiqxEqGs1wtIUNjTcBhIu0w9xKfyr muX0thqa6+/EgmU1xfmy6+sLQBw1JXg= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-400-zWO2Me22OcaIXg5InNGHTQ-1; Wed, 10 May 2023 16:27:34 -0400 X-MC-Unique: zWO2Me22OcaIXg5InNGHTQ-1 Received: by mail-pl1-f199.google.com with SMTP id d9443c01a7336-1ab062521f9so8966925ad.1 for ; Wed, 10 May 2023 13:27:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683750453; x=1686342453; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9GYo8N3GVBW0oULuq3gkozsZNhPHI8kWzSKmHgQI1Vo=; b=WZhXASrT0Ku+Uy2fsPfnrVC/gflpwNMZB0ACqGk9TnFnE8StIZLhw1nGwoA4BzmTTf otKjrJ2haNzmzcuYU+vfoUiwqYSie6DgwhEHfgUh/dV0JzdXwX2xWmijktRdxkTMAR52 760tWl3xfOqkWQX7WUc5AIRgtZp7QFkaDM+4MkTc9Hj8aEOpbRk5ITMlIfIqryeGEtrz eg7s4AImulgOcNNeH6g3p6sZiY/NwyVFwra5iLKng3Yq5vsNpYWgifiVKNXF/ygIaPYW vszzfhpDOmkLVMpvXVFS96IIZoJi3HrEJElMv97wv4Ez0+ZsatacuSmLtcGdq0h1wi2n E9Tg== X-Gm-Message-State: AC+VfDz7GLRxvvOb2UQwCBTThiNT6SZ270eWZDJoWSnO6Zok4gRL7H5v mhWs48WGSXv1cTsvUoxCuKqz5sx/sds5C1tuRJx+vplytZ8n9c1vIU944eI+ojBND134cUuVM9f AtwX7QalU2PA= X-Received: by 2002:a17:902:ea0b:b0:1ac:69ba:e159 with SMTP id s11-20020a170902ea0b00b001ac69bae159mr17043167plg.5.1683750453422; Wed, 10 May 2023 13:27:33 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7vWW0nbm4b6CKppwdgUldxkv86ysGUVfro14rqsXNjjup9ieVZhY479DLefU70ByzdIyIOyQ== X-Received: by 2002:a17:902:ea0b:b0:1ac:69ba:e159 with SMTP id s11-20020a170902ea0b00b001ac69bae159mr17043137plg.5.1683750452968; Wed, 10 May 2023 13:27:32 -0700 (PDT) Received: from x1n ([2001:4958:15a0:30:ea6e:cae6:2dab:2897]) by smtp.gmail.com with ESMTPSA id f10-20020a17090274ca00b001ab28f620d0sm4199783plt.290.2023.05.10.13.27.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 May 2023 13:27:32 -0700 (PDT) Date: Wed, 10 May 2023 13:27:31 -0700 From: Peter Xu To: Johannes Weiner Cc: Linus Torvalds , "Matthew Wilcox (Oracle)" , Josef Bacik , Andrew Morton , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Dan Carpenter , syzbot+48011b86c8ea329af1b9@syzkaller.appspotmail.com, Christoph Hellwig Subject: Re: [PATCH] filemap: Handle error return from __filemap_get_folio() Message-ID: References: <20230506160415.2992089-1-willy@infradead.org> <20230509191918.GB18828@cmpxchg.org> MIME-Version: 1.0 In-Reply-To: <20230509191918.GB18828@cmpxchg.org> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: B689C18000D X-Stat-Signature: kpcd7hwfpnj3izgokhkm6uq9eopdedj5 X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1683750457-401710 X-HE-Meta: U2FsdGVkX18dZ5ZDlXT4j6utF+bo7YXLDKxRsOg06nmCs/OLjajQa5lflZBHQb0WrCuLxiiqieJxvr6lvsO0FcxqXrw+izgHZakRJgI9zaG7C8VQC0PWxzKHEKAkDvqZSRata/Du/W5V//3SWk3cWESGx7gA/LJTub6CBKXVOA2Jls5aBOGkac9nA9Y73YNoFEAmaYj11WDDlU/7iRKyrRdV4yR9tkwQOQrttsWYQSzYq0K/7uHOnstZYWsWjZwsjEGqO/URpxI2XlfXFQUL7kC/nHI7Ohew0yCxyGCGuQtQGuD4d0cFnWBAT24jNLlcw15wh93TJHAnOk0XGWEirlnweJT7jUiU5H9nEhDM7YD3/7TXWTAl6pno6JXQLk32yT8379NvtMMsCW4i+rF8ptpHHLfRo3NKcJ6dEqE9xXqgVjjpib6eHfMLsdd0cemjNC7REwkMMCvZO8B6WMdKMEogqoQVv3DPSl0t9RNPYkizjh81G9ixpUuatBKpdMUrq30zwX9qOvFNMQjS+uSUbKBX0t7tHIKl4p5x73PBktosJbkHErhdn/aOC6vx2FPmDe+ssYzd/J/8JWX2YzPMiYmziIF9EZO6HmFqN3OA2dcXZ1Jeo8ecmKfJSYfs2lmaYnG20BNsmmie+yJWw/9nPjmJWPBC758OAdA/Qsn4L5N2rtc5PDrXTjLuBcpRJ3BiK7nrnPkx/dUznpkTS/qtiYk+UeEUKe8KzKT7d1xEF6y3s3+46ryhdB5e/6yVZ5ZpFrGxlHC1GOy6QUwclAgAqNfmKnUWM9cdy2fnlbDxHjxAkYiknN4euSH0zJgUtNZZV3dP2A9MuiztHi/RzhsTY24H/TNxd1aYCM7B0qvKg+P1uTXXyZqd8m72qUm3Ni/Hx6vSsWGqmxSIDPiaryuIzU/YE16xlR1X0JYJyQTtQ1TGX/NLkg3u8civaWmCx8SgeIgwUw2rhByli53XSij HpF3yNba O+Ljs57IVfAQuIvSZ6rijQ1q+KaW5412Yewn4KlaQfaM8uickMXyu/UNeM1BoZ/10r8A4LTnX+sWPA9Sy9WlJVq1JdDxAqhrROurET7GZnYeKpyzysynzQ3WqXZALE6AtBNe0qQC9axowuVNtDY72tf3DCYo6FiTcfFA4Gk+gURjlRvY+h+rZSyOQvxwypUUqs8NrPnyh1SvIAHFAUJ5jWZjtWnd+Qd9Hfy+0PeFuAPO+O3++Ngt/TiFhdIuD+PKUwJ+W84Td9GgIYhdHIMJeixErYKZZr+D+iqVxrNgEI5aYLIyNUpa4MTUZHkvs54/nuEReXBogeaHZvC02y7CfraPqFpDcFxBXfx9H7arfREFLMiJEOiGe8vdCYjrz706Fj4ylCIuD97cAr5JNVE7NWl6Q1QYRKsJjkYweOdQlb4wGWTSzNlJDb2wTuBlFzXK1f02METwMYya2+BAOxgfQLWYaN4hWsyUkt3samVHhirQGGCz0vttpcNbMxFcDNzuOo/zv5/89vWR7MNxpepbrXNzyBmGjYC21AtRX3M5PSgVMXRe8gL4KNQwmEjj4ef3OMQEtEdlxk0dLXa20yzjtM1/GPg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 09, 2023 at 03:19:18PM -0400, Johannes Weiner wrote: > On Sat, May 06, 2023 at 10:04:48AM -0700, Linus Torvalds wrote: > > On Sat, May 6, 2023 at 9:35 AM Linus Torvalds > > wrote: > > > > > > And yes, the simplest fix for the "wrong test" would be to just add a > > > new "out_nofolio" error case after "out_retry", and use that. > > > > > > However, even that seems wrong, because the return value for that path > > > is the wrong one. > > > > Actually, my suggested patch is _also_ wrong. > > > > The problem is that we do need to return VM_FAULT_RETRY to let the > > caller know that we released the mmap_lock. > > > > And once we return VM_FAULT_RETRY, the other error bits don't even matter. > > > > So while I think the *right* thing to do is to return VM_FAULT_OOM | > > VM_FAULT_RETRY, that doesn't actually end up working, because if > > VM_FAULT_RETRY is set, the caller will know that "yes, mmap_lock was > > dropped", but the callers will also just ignore the other bits and > > unconditionally retry. > > > > How very very annoying. > > > > This was introduced several years ago by commit 6b4c9f446981 > > ("filemap: drop the mmap_sem for all blocking operations"). > > > > Looking at that, we have at least one other similar error case wrong > > too: the "page_not_uptodate" case carefully checks for IO errors and > > retries only if there was no error (or for the AOP_TRUNCATED_PAGE) > > case. > > > > For an actual IO error on page reading, it returns VM_FAULT_SIGBUS. > > > > Except - again - for that "if (fpin) goto out_retry" case, which will > > just return VM_FAULT_RETRY and retry the fault. > > > > I do not believe that retrying the fault is the right thing to do when > > we ran out of memory, or when we had an IO error, and I do not think > > it was intentional that the error handling was changed. > > This is a while ago and the code has changed quite a bit since, so > bear with me. > > Originally, we only ever did a maximum of two tries: one where the > lock might be dropped to kick off IO, then a synchronous one. IIRC the > thinking at the time was that events like OOMs and IO failures are > rare enough that doing the retry anyway (even if somewhat pointless) > and reacting to the issue then (if it persisted) was a tradeoff to > keep the retry logic simple. > > Since 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times") we > don't clear FAULT_FLAG_ALLOW_RETRY anymore though, and we might see > more than one loop. At least outside the page cache. So I agree it > makes sense to look at the return value more carefully and act on > errors more timely in the arch handler. > > Draft patch below. It survives a boot and a will-it-scale smoke test, > but I haven't put it through the grinder yet. > > One thing that gave me pause is this comment: > > /* > * If we need to retry the mmap_lock has already been released, > * and if there is a fatal signal pending there is no guarantee > * that we made any progress. Handle this case first. > */ > > I think it made sense when it was added in 26178ec11ef3 ("x86: mm: > consolidate VM_FAULT_RETRY handling"). But after 39678191cd89 > ("x86/mm: use helper fault_signal_pending()") it's in a misleading > location, since the signal handling is above it. > > So I'm removing it, but please let me know if I missed something. > > --- > arch/x86/mm/fault.c | 40 +++++++++++++++++++++++----------------- > mm/filemap.c | 18 +++++++++++------- > 2 files changed, 34 insertions(+), 24 deletions(-) > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c > index e4399983c50c..f1d242be723f 100644 > --- a/arch/x86/mm/fault.c > +++ b/arch/x86/mm/fault.c > @@ -1456,20 +1456,15 @@ void do_user_addr_fault(struct pt_regs *regs, > return; > > /* > - * If we need to retry the mmap_lock has already been released, > - * and if there is a fatal signal pending there is no guarantee > - * that we made any progress. Handle this case first. > + * If we need to retry the mmap_lock has already been released. > */ > - if (unlikely(fault & VM_FAULT_RETRY)) { > - flags |= FAULT_FLAG_TRIED; > - goto retry; > - } > + if (likely(!(fault & VM_FAULT_RETRY))) > + mmap_read_unlock(mm); > > - mmap_read_unlock(mm); > #ifdef CONFIG_PER_VMA_LOCK > done: > #endif > - if (likely(!(fault & VM_FAULT_ERROR))) > + if (likely(!(fault & (VM_FAULT_ERROR|VM_FAULT_RETRY)))) > return; > > if (fatal_signal_pending(current) && !user_mode(regs)) { > @@ -1493,15 +1488,26 @@ void do_user_addr_fault(struct pt_regs *regs, > * oom-killed): > */ > pagefault_out_of_memory(); > - } else { > - if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| > - VM_FAULT_HWPOISON_LARGE)) > - do_sigbus(regs, error_code, address, fault); > - else if (fault & VM_FAULT_SIGSEGV) > - bad_area_nosemaphore(regs, error_code, address); > - else > - BUG(); > + return; > + } > + > + if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON| > + VM_FAULT_HWPOISON_LARGE)) { > + do_sigbus(regs, error_code, address, fault); > + return; > } > + > + if (fault & VM_FAULT_SIGSEGV) { > + bad_area_nosemaphore(regs, error_code, address); > + return; > + } > + > + if (fault & VM_FAULT_RETRY) { > + flags |= FAULT_FLAG_TRIED; > + goto retry; > + } > + > + BUG(); > } > NOKPROBE_SYMBOL(do_user_addr_fault); > > diff --git a/mm/filemap.c b/mm/filemap.c > index b4c9bd368b7e..f97ca5045c19 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3290,10 +3290,11 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > FGP_CREAT|FGP_FOR_MMAP, > vmf->gfp_mask); > if (IS_ERR(folio)) { > + ret = VM_FAULT_OOM; > if (fpin) > goto out_retry; > filemap_invalidate_unlock_shared(mapping); > - return VM_FAULT_OOM; > + return ret; > } > } > > @@ -3362,15 +3363,18 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > */ > fpin = maybe_unlock_mmap_for_io(vmf, fpin); > error = filemap_read_folio(file, mapping->a_ops->read_folio, folio); > - if (fpin) > - goto out_retry; > folio_put(folio); > - > - if (!error || error == AOP_TRUNCATED_PAGE) > + folio = NULL; > + if (!error || error == AOP_TRUNCATED_PAGE) { > + if (fpin) > + goto out_retry; > goto retry_find; > + } > + ret = VM_FAULT_SIGBUS; > + if (fpin) > + goto out_retry; > filemap_invalidate_unlock_shared(mapping); > - > - return VM_FAULT_SIGBUS; > + return ret; > > out_retry: > /* > -- > 2.40.1 > The change looks all right to me. Acked-by: Peter Xu For the long term maybe we want to cleanup a bit on the VM_FAULT_* entries, e.g., here VM_FAULT_RETRY doesn't really mean "we should retry the fault" but instead only the hint to show that we'ver released the mmap lock when any error is set. Meanwhile it's also not the only one to express that because now we also have VM_FAULT_COMPLETED, so it's debatable which one we should use here purely from the logic and definition of the retvals. One thing we can make this slightly cleaner in the future is we can have a flag sololy for "we have released the mmap lock", so fundamentally we can consider renaming COMPLETE->MM_RELEASED, then we use RETRY to only hint "whether we really want to retry" and leave "whether mmap lock released" for the new flag. It could look like: MM_RELEASED (new) RETRY 0 0 -> page fault resolved (old "retval=0") 0 1 -> retry with mmap held (currently invalid, so let's ignore this) 1 0 -> resolved and lock released (old COMPLETE) 1 1 -> lock releaesd,need to retry (old RETRY) Then IIUC for error cases we can return MM_RELEASED|ERROR for whatever error (without setting RETRY). Not sure whether it helps in any form. Even if it would, it can definitely be done on top. Thanks, -- Peter Xu