From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E8C6C77B76 for ; Mon, 17 Apr 2023 21:21:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B20A68E0002; Mon, 17 Apr 2023 17:21:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD0918E0001; Mon, 17 Apr 2023 17:21:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9BF9D8E0002; Mon, 17 Apr 2023 17:21:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8D54B8E0001 for ; Mon, 17 Apr 2023 17:21:33 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 40B73160499 for ; Mon, 17 Apr 2023 21:21:33 +0000 (UTC) X-FDA: 80692154466.21.6D65B5A Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) by imf14.hostedemail.com (Postfix) with ESMTP id 750EB100003 for ; Mon, 17 Apr 2023 21:21:31 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=gPqjm7ba; spf=pass (imf14.hostedemail.com: domain of surenb@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681766491; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=skxzpscAsv87Qax/0bgvCnRhsQO+KKAWjy3PoG439VQ=; b=NcBftSmdcXWHUs2DuPB+28sBtP8yaVh94zIBd1yY8oBLMcyJ8jH5NhSHhf+4e8moJx/Y6h hoipxDE9rQzsCZZfgewJ/I6Il3WsMCddjzHJTnpWkPiNF7P7suZKw1TYHcOS/UivECDkL0 f9geenpVxmML6nei2XN1nrWYFzN1B0U= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=gPqjm7ba; spf=pass (imf14.hostedemail.com: domain of surenb@google.com designates 209.85.128.172 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681766491; a=rsa-sha256; cv=none; b=Oa4XpPt8kl36iM/J/JXhiVcuWH5Tj+mybS8m4A/oJWgzW3Jpg28QPXNZDm0F1RomKB7acI JcDL+c5BxTXcVDW01nIKSJIrjNBtSTMVfS2ZWPi+Iz7NWaXpvX5LPJL5YvuCLhQyCytRsk IIYkxp1UkMPvZlysFDC5DIPQIfILX2g= Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-54c12009c30so540382927b3.9 for ; Mon, 17 Apr 2023 14:21:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681766490; x=1684358490; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=skxzpscAsv87Qax/0bgvCnRhsQO+KKAWjy3PoG439VQ=; b=gPqjm7ba7fTmREghp9AWYVRFKr58sh4PS4CfZ/Fnori3qEbcxS/KfNJS0XZwsW6GMt TXVFCHSJ7Fn8o4+mQUwJuzw8Xp2NiDUo628lJRW5a9v7nuVqXxflBpIPPu+E2tPFnIyN W6IiBRfuAApLpYfoMZRm+fp/BqLRT05Yn8x+14P7AKCmF8kchA2oGoMy60I2IKYzf3To uIyxyZYfmT9LqIOpcM/u1oXW0kRh5iF94aoioa3wL1b52JC8lpyn0G+URi0+xcWo6fbE 5HhM/x6usH5b2BLb48d6c5ugHQCrYB2Y6aVkotUvnHFddkvcoumGa35c786bME/oi6p6 UFkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681766490; x=1684358490; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=skxzpscAsv87Qax/0bgvCnRhsQO+KKAWjy3PoG439VQ=; b=DOPPH71lW6CVr11Ujt9evQoDIPLDV+xMlMHmkI6BZaI6zpwFIs2wG6qY7d75uGE0xI SxNNqmnEqbJsycmV8MK/gbvYuMg7gMbgtvdNIPMa6mO05enEsGQiMsHxk4TXCPHEybg6 CnKuEtkoSF70/Kj0uncE11v0LRy9qUChdJ3V63RaMdO4CLEafc+VjTt7mRUHs9qrY22S kxwUEjBVzV7bC9xTjOrE+nQ0PJkAPG/NFtRnqqBRSB+W6tcaIceSGNcOOMRm3PXGG/3G 8TW1+ZXF5RF8ntaITfHh9XevGVnt3PfalQWMQEW1u0LxvUPuFhJ2RdhevOlz4vo7w88u 1B3w== X-Gm-Message-State: AAQBX9dtfZMnZELKg+N+/bMVRkOfon+fL9PmYQDzcPBxlW6Tm4g5pAO+ lt23QTFMoq6G8grzqSB5+XZBL+ILbau/LjZWxUZEcw== X-Google-Smtp-Source: AKy350blLU+TA2bW6CBbSNbqjHX8LQL1OTHe90yUAeABat6gSKf2OL44RLi4ryGaIDgRHCVtrb0qdgDt2Dwdph7UlPE= X-Received: by 2002:a81:d441:0:b0:54f:a60c:12eb with SMTP id g1-20020a81d441000000b0054fa60c12ebmr9831050ywl.1.1681766490351; Mon, 17 Apr 2023 14:21:30 -0700 (PDT) MIME-Version: 1.0 References: <20230415000818.1955007-1-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 17 Apr 2023 14:21:19 -0700 Message-ID: Subject: Re: [PATCH v2 1/1] mm: do not increment pgfault stats when page fault handler retries To: Peter Xu Cc: akpm@linux-foundation.org, willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 750EB100003 X-Stat-Signature: 9m9xp7ckd39g9hegbgkzfqctqzbc1ekp X-HE-Tag: 1681766491-263147 X-HE-Meta: U2FsdGVkX1/o02CPuW5J0seQdxCzCVUwIumSmMvAKN2aLHcSE0YoWiU3bhBk1RX9ZVK14vY4a0HJzUqvkdHt9rWPw84x8qEjITm4+B7POBsTjupupb03/4Tkdc6xB5ndSvjIlij5DtluVg1w+Oa1uyqL6kfI2pDrkD+6cxiAqGTebeb+FSEdEd+gStC/4LU2cWhPu6guR4eBz5lkStvzT7OKRE9JwqhQHaLKgRgtodJktTeii9cSnMokbM5/TBVexNUJnAPuYA1jFpHsLtMMQ6hHc0LfPgfCSum76vGsgS9WQF+Y9CqwsIXw+95rg1kmMmeIPwU1cBx8eSbFBXt+zDIFr+UHVxcKiZWhxiJSOqH6wPawNXKgRxJPGIeAS1OqMkB1cLVGTxQ5kMvL0dugUG9rLXD1Krmskwcu25IIxvgUq4pgSM00nS8gFo/J194ZBVhq6am+gFrpiblcoOZdsnmtLiwsK8iutHFiABog4ZYC3LvWx5gJLHdd2bHSokevkjVauUUFsJphtKJ8kh8Xi413QfNHxeY7KWh5SJ+DOjY1fOnF97peKiZ9X2VgwNsCZvTxagEHenDarNK071Ox/Fv9aAWfQGDdOutO1rYSENtPNxLRX8Esd9YF15s6JDTR9H+EdNYPATQP77x8h9YP+WOh29CYcTxaX1pHAOUPkxguTKJFB29lppw5hQTIq573lJFaoxoIiA/goYhMaqjVKdodOHh0LA2TAOq89H0pCmj78xpxnqqI+5MZZIL9Ih1XrmdRy42hMXjV8G+TIrgnnT6GWyRF0Z3DCYc3EH2w6Ecx3+4BF46sFHpm/zwqSGLiuyyH2BsGERwyKPKSA4dO4FbTOvz8xiT5daS21mUzMkZKsCuU4LJfvoA3eEzAIP5lhkh0lzTQ22KkQQ/n/+npHVf1hwEX9jG6rCdfg8G4VQtgl7T7CYTTqsHy6ajAExW7YthbvHuUl8FWrDcv0zT Bn63FhLn L5jRv6bu7l6RqZmXC3GBL0zGQljIyrXNlAtJzZmZt2LZb+QVZpHDDXOg9pPoxNeegK8hQpDG+aTBsA3656w8iHS5qsQr9C26eeOgNDvErbGh0wbrvkhEPaTYkGlpWjVYDQfAprC1oZIonyBFRLL3JJMfdQjT3F01Qw+UOlpEx53OmWn5u0jFH6zsmKVcm6zMoRvm783k3MuSWscCgiHHdTmfE/TnsNhlSAc5nrACzNrkP27oW0g6mxlBVrUprGZWSycDCMrAc0z7Qo8HkQujm7Vz5Jqulp7AiNpOTnyCNEIUnz0oWLjEmAGfKrua3kWWExFRL3OfzFu432+gWrXNQ7HruYBS6wFfO4hPk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Apr 17, 2023 at 2:14=E2=80=AFPM Peter Xu wrote: > > On Mon, Apr 17, 2023 at 01:29:45PM -0700, Suren Baghdasaryan wrote: > > On Mon, Apr 17, 2023 at 12:40=E2=80=AFPM Peter Xu w= rote: > > > > > > On Fri, Apr 14, 2023 at 05:08:18PM -0700, Suren Baghdasaryan wrote: > > > > If the page fault handler requests a retry, we will count the fault > > > > multiple times. This is a relatively harmless problem as the retry = paths > > > > are not often requested, and the only user-visible problem is that = the > > > > fault counter will be slightly higher than it should be. Neverthel= ess, > > > > userspace only took one fault, and should not see the fact that the > > > > kernel had to retry the fault multiple times. > > > > Move page fault accounting into mm_account_fault() and skip incompl= ete > > > > faults which will be accounted upon completion. > > > > > > > > Fixes: d065bd810b6d ("mm: retry page fault when blocking on disk tr= ansfer") > > > > Signed-off-by: Suren Baghdasaryan > > > > --- > > > > mm/memory.c | 45 ++++++++++++++++++++++++++------------------- > > > > 1 file changed, 26 insertions(+), 19 deletions(-) > > > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > > index 01a23ad48a04..c3b709ceeed7 100644 > > > > --- a/mm/memory.c > > > > +++ b/mm/memory.c > > > > @@ -5080,24 +5080,30 @@ static vm_fault_t __handle_mm_fault(struct = vm_area_struct *vma, > > > > * updates. However, note that the handling of PERF_COUNT_SW_PAGE= _FAULTS should > > > > * still be in per-arch page fault handlers at the entry of page f= ault. > > > > */ > > > > -static inline void mm_account_fault(struct pt_regs *regs, > > > > +static inline void mm_account_fault(struct mm_struct *mm, struct p= t_regs *regs, > > > > unsigned long address, unsigned i= nt flags, > > > > vm_fault_t ret) > > > > { > > > > bool major; > > > > > > > > /* > > > > - * We don't do accounting for some specific faults: > > > > - * > > > > - * - Unsuccessful faults (e.g. when the address wasn't valid)= . That > > > > - * includes arch_vma_access_permitted() failing before reac= hing here. > > > > - * So this is not a "this many hardware page faults" counte= r. We > > > > - * should use the hw profiling for that. > > > > - * > > > > - * - Incomplete faults (VM_FAULT_RETRY). They will only be c= ounted > > > > - * once they're completed. > > > > + * Do not account for incomplete faults (VM_FAULT_RETRY). The= y will be > > > > + * counted upon completion. > > > > */ > > > > - if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > > > + if (ret & VM_FAULT_RETRY) > > > > + return; > > > > + > > > > + /* Register both successful and failed faults in PGFAULT coun= ters. */ > > [1] > > > > > + count_vm_event(PGFAULT); > > > > + count_memcg_event_mm(mm, PGFAULT); > > > > > > Is there reason on why vm events accountings need to be explicitly > > > different from perf events right below on handling ERROR? > > > > > > I get the point if this is to make sure ERROR accountings untouched f= or > > > these two vm events after this patch. IOW probably the only concern r= ight > > > now is having RETRY counted much more than before (perhaps worse with= vma > > > locking applied). > > > > > > But since we're on this, I'm wondering whether we should also align t= he two > > > events (vm, perf) so they represent in an aligned manner if we'll cha= nge it > > > anyway. Any future reader will be confused on why they account > > > differently, IMHO, so if we need to differenciate we'd better add a c= omment > > > on why. > > > > > > I'm wildly guessing the error faults are indeed very rare and probabl= y not > > > matter much at all. I just think the code can be slightly cleaner if > > > vm/perf accountings match and easier if we treat everything the same.= E.g., > > > we can also drop the below "goto out"s too. What do you think? > > > > I think the rationale might be that vm accounting should account for > > *all* events, including failing page faults while for perf, the corner > > cases which rarely happen would not have tangible effect. > > Note that it's not only for perf, but also task_struct.maj_flt|min_flt. > > If we check the reasoning of "why ERROR shouldn't be accounted for perf > events", there're actually something valid in the comment: > > * - Unsuccessful faults (e.g. when the address wasn't valid). T= hat > * includes arch_vma_access_permitted() failing before reaching= here. > * So this is not a "this many hardware page faults" counter. = We > * should use the hw profiling for that. > > IMHO it suggests that if someone wants to trap either ERROR or RETRY one > can use the hardware counters instead. The same reasoning just sounds > applicable to vm events too, because vm events are not special in this ca= se > to me. > > > I don't have a strong position on this issue and kept it as is to > > avoid changing the current accounting approach. If we are fine with > > such consolidation which would miss failing faults in vm accounting, I > > can make the change. > > I don't have a strong opinion either. We used to change this path before > for perf events and task events and no one complains with the change. I'= d > just bet the same to vm events: > > https://lore.kernel.org/all/20200707225021.200906-1-peterx@redhat.com/ Ok, if these rare failures don't change anything then let's consolidate the code. It should simplify things a bit and will account faults in a consistent way. I'll post v3 shortly incorporating your and Matthew's feedbacks. Thanks for the input! > > Please feel free to keep it as-is if you still feel unsafe on changing > ERROR handling. If so, would you mind slightly modify [1] above explaini= ng > why we need ERROR to be accounted for vm accountings with the reasoning? > Current comment only says "what it does" rather than why. > > Thanks, > > -- > Peter Xu >