From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E51FBC77B72 for ; Mon, 17 Apr 2023 20:30:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4CB698E0001; Mon, 17 Apr 2023 16:30:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 454586B0075; Mon, 17 Apr 2023 16:30:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 31B968E0001; Mon, 17 Apr 2023 16:30:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 22EEC6B0074 for ; Mon, 17 Apr 2023 16:30:02 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D79EA14061B for ; Mon, 17 Apr 2023 20:30:01 +0000 (UTC) X-FDA: 80692024602.09.02436FD Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) by imf12.hostedemail.com (Postfix) with ESMTP id 01D2440012 for ; Mon, 17 Apr 2023 20:29:59 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3jsA5YcN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of surenb@google.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681763400; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kNsY/om+mAROto3FUjviflZNhDllR9YU7fZAWFjD/S4=; b=VZj4XGfA0OzD8wvF0O3TbnPR/v0oxGebMKNBhmpLuPOk8tqJYkANgBrDOz+ze1MmNd3ttN bfQEn4+I4Y5jMdCIcwfrUwslj9XUXa28EnZzoYaHt/3rQkMGUSHjB1qZ6dL2V+Ct0xJEk/ qHCsLRJYCDACxFSA1rUPnmg/nmiKw84= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3jsA5YcN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of surenb@google.com designates 209.85.221.49 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681763400; a=rsa-sha256; cv=none; b=FcfxI+2JFvZWju9rtLbZd6PThlpFwKGTKpZpqiG8H15g7P77RChSrRlen6hjE7oY7j+dvn v6BRYssZJh1K+dXPvmijFLglqIEFQvj50BZYMwpiYBVLUIxD/oFpMImd6uOZHJ96D0Pjgk NR6OfxONXiXDtPiQzlKREbQJiJnUHvU= Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-2f95231618aso897632f8f.1 for ; Mon, 17 Apr 2023 13:29:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1681763398; x=1684355398; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kNsY/om+mAROto3FUjviflZNhDllR9YU7fZAWFjD/S4=; b=3jsA5YcNZFjbuJShlzhdTusHgdyl0Vo2swVZVMT02Vte/u4gDfvHFmITSKVxeR9M6N 9iRSjpPchuC6DqtRbIje6ux15qxd5J97xRp33/sxavH4InZxjIYTTaFgha6LgGoE1xjT 6Crgzg6IhEKBTkbFNzjKtLXi67XQFlIlUIPgTuKn7j9Bba73hUCVrxJRXV7xVvGecjHq ZfNgjSsqEvw4i02M92CTD9eptdP7i7f+gJlwMT+fhgkalEgU6nZPXzP7PZTHiss7Q6aC nDrt0r3cqQtyub9yLRr4yDahZQIZ8KGCzX5k4iuXQMcxm6SDAtd2Xfw5yfLIu6/SyxwP HJFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681763398; x=1684355398; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kNsY/om+mAROto3FUjviflZNhDllR9YU7fZAWFjD/S4=; b=ZFBnhgVI1wdtKg+WGuV7UVzDjugIkJL2HJdQhfeIjmMB4Sm/jVcgvEijfgOc2S+D4S sRp7akx8k4hYBttqJJm/wbQe/CYUsCNcQ0LCw522LuxLZyUnY+/9aTyE5v6gTiNUv9T4 3psE8IQ+/z/+UFyEDJaBtsC03a+i23sc7SOatYxJPL7LDzunUbE6vnlASD3U34zhUngY MO2HMEGwpxhoQCM85Mpq1+RHLAKFaX0yOoDwZEn5Irxc1WqJkXeraoCc7iTwilZJhprx 1C3mBEIqKrkwp89mEbDtRnphLKq+Pi9I07K2nCSFk95YmMOTq9iiIGikhocGDr4cXVV6 47rQ== X-Gm-Message-State: AAQBX9c/sWzax62q7FABTpwRyXg4yd1bEheda7DdOnE0iDbsVc7BeFAu AHdxrxdJSQ4rxYEF2D6RXodZflRpzLQX9Z48lWEXAw== X-Google-Smtp-Source: AKy350bYy3mB/npaRZtWfAg/g4Oif8VApb5XzXGUONnx+3hFrOC06b9giuEnx2hfHJbKBQTc6d4xCUELVtzWff0779k= X-Received: by 2002:a5d:5254:0:b0:2ef:ae9e:b191 with SMTP id k20-20020a5d5254000000b002efae9eb191mr107312wrc.45.1681763398159; Mon, 17 Apr 2023 13:29:58 -0700 (PDT) MIME-Version: 1.0 References: <20230415000818.1955007-1-surenb@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 17 Apr 2023 13:29:45 -0700 Message-ID: Subject: Re: [PATCH v2 1/1] mm: do not increment pgfault stats when page fault handler retries To: Peter Xu Cc: akpm@linux-foundation.org, willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 01D2440012 X-Stat-Signature: ehenysxtqnjg8rpesxemetz387go9oru X-HE-Tag: 1681763399-61889 X-HE-Meta: U2FsdGVkX19BKMgTi29gwCVT5qISQLr47pdrLXq2P5mghlwP3wdU+91hcoNLBpwXFRzV89DEr8C82yelM50ZDN/ZC4tEMX3wVacLmQkh/A6g7eqcLduLVbuZs52oAOfqUflnIXztjznhxQbHgJFf1HjL/jwQ8flrhXxAvAT7EPG8Eh6UOUr53zmW40s8ZblXkEEpg3IAAOddc6CB5t4Q6SsxyyrGhLqQx/RxceyjnkjmPpndCGIBRa9xEahLYSidGxIOypasWBGwCN64B6t6fucuAdnHU2K8OQBAUxFK2/wP1mTVLlv6LB3HHz6GPCUaJg/Jtti1nVjJgJo4Fs0q3iF87LhQG7eLWC3gzdLdM6WLhRuvNf1L5HlrFMUhkSlLHCcpGBsbZu4T1v8mCmQj2IO8KmDdhVLDu4fnq2TYmN3+YBkxuQDTb2vlHQXBhWoPpIkiIv0XrgT5eBExBKPPitGzsc9PcmIyKIt1hLJH1ccj6fxdU9ZK3qG3y2SsBp+6yOnfzV2USij589praDESMkMPpo4HQDyXeLVetq3a+lcvm3fhTpBqz5wtL2iP+pFs1jbGFI1PvIB0YDDKTNNAVJ7EnZGNWhH/Ksz63eXO6zkVrXOsc9rq9p53abu8Xx48NUipxVj6GWGSRuegZAqcVntmvq6JWp8HV6IBfsdnH4K/F1d5Fv+ad7m8RyuGeV4LQ+j32QU0WnboP+XhHUqV+rLbA0Y1XxR/z0s5qGlQjInTK+WwJPGFTEn+1pgzYV8XCQhyiQXKvAkrZvSMf0it/33x0jpBqPGI7GKM+Ktv03Rg+Mck98X7t9F8o9DS2EV3lJ+nQlhnvCYxFbxzYJm+HBFgiVg0dMcQLxgE+noWwfMlwx/ggqL3YwymvGsC1m6gedRF0ILy/JwHCZW8E5Sg0Cy4bgIWG3OCF65owQ6+nFXXQvzwYIYgcY4IV2dvX6QCEGwFrsOIr/X9eWeme5d NYIymQIW 5bshTBqeYZZenazysEb2P8pZ8fi4HctlQQ0sEaNrrxjT4xVLBrWhPkJcVdD+QLOIZy5upxxH8gvRufE4j/caZ0z1z8Hi436nZXQR3hLIc46+1/dXtVer6qzfOlJbFLZTcFSxuFkySHMAR0vc0fhmWDbNGoByuMFf/1b4f0iAY0oEVSreXviGao237gU5g9N4scEQTyyxE0hjSNJur9kAB289aft8Slf3VEIrlzAyV11kAlT4rhPgpi+GIdvO9XrDEmgG6RVtQkyOG9A31g2xOEpVxDjlBPdTz6wBePiluUFCiQqw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Apr 17, 2023 at 12:40=E2=80=AFPM Peter Xu wrote= : > > On Fri, Apr 14, 2023 at 05:08:18PM -0700, Suren Baghdasaryan wrote: > > If the page fault handler requests a retry, we will count the fault > > multiple times. This is a relatively harmless problem as the retry path= s > > are not often requested, and the only user-visible problem is that the > > fault counter will be slightly higher than it should be. Nevertheless, > > userspace only took one fault, and should not see the fact that the > > kernel had to retry the fault multiple times. > > Move page fault accounting into mm_account_fault() and skip incomplete > > faults which will be accounted upon completion. > > > > Fixes: d065bd810b6d ("mm: retry page fault when blocking on disk transf= er") > > Signed-off-by: Suren Baghdasaryan > > --- > > mm/memory.c | 45 ++++++++++++++++++++++++++------------------- > > 1 file changed, 26 insertions(+), 19 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 01a23ad48a04..c3b709ceeed7 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5080,24 +5080,30 @@ static vm_fault_t __handle_mm_fault(struct vm_a= rea_struct *vma, > > * updates. However, note that the handling of PERF_COUNT_SW_PAGE_FAU= LTS should > > * still be in per-arch page fault handlers at the entry of page fault= . > > */ > > -static inline void mm_account_fault(struct pt_regs *regs, > > +static inline void mm_account_fault(struct mm_struct *mm, struct pt_re= gs *regs, > > unsigned long address, unsigned int f= lags, > > vm_fault_t ret) > > { > > bool major; > > > > /* > > - * We don't do accounting for some specific faults: > > - * > > - * - Unsuccessful faults (e.g. when the address wasn't valid). T= hat > > - * includes arch_vma_access_permitted() failing before reaching= here. > > - * So this is not a "this many hardware page faults" counter. = We > > - * should use the hw profiling for that. > > - * > > - * - Incomplete faults (VM_FAULT_RETRY). They will only be count= ed > > - * once they're completed. > > + * Do not account for incomplete faults (VM_FAULT_RETRY). They wi= ll be > > + * counted upon completion. > > */ > > - if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > + if (ret & VM_FAULT_RETRY) > > + return; > > + > > + /* Register both successful and failed faults in PGFAULT counters= . */ > > + count_vm_event(PGFAULT); > > + count_memcg_event_mm(mm, PGFAULT); > > Is there reason on why vm events accountings need to be explicitly > different from perf events right below on handling ERROR? > > I get the point if this is to make sure ERROR accountings untouched for > these two vm events after this patch. IOW probably the only concern right > now is having RETRY counted much more than before (perhaps worse with vma > locking applied). > > But since we're on this, I'm wondering whether we should also align the t= wo > events (vm, perf) so they represent in an aligned manner if we'll change = it > anyway. Any future reader will be confused on why they account > differently, IMHO, so if we need to differenciate we'd better add a comme= nt > on why. > > I'm wildly guessing the error faults are indeed very rare and probably no= t > matter much at all. I just think the code can be slightly cleaner if > vm/perf accountings match and easier if we treat everything the same. E.g= ., > we can also drop the below "goto out"s too. What do you think? I think the rationale might be that vm accounting should account for *all* events, including failing page faults while for perf, the corner cases which rarely happen would not have tangible effect. I don't have a strong position on this issue and kept it as is to avoid changing the current accounting approach. If we are fine with such consolidation which would miss failing faults in vm accounting, I can make the change. Thanks, Suren. > > Thanks, > > > + > > + /* > > + * Do not account for unsuccessful faults (e.g. when the address = wasn't > > + * valid). That includes arch_vma_access_permitted() failing bef= ore > > + * reaching here. So this is not a "this many hardware page fault= s" > > + * counter. We should use the hw profiling for that. > > + */ > > + if (ret & VM_FAULT_ERROR) > > return; > > > > /* > > @@ -5180,21 +5186,22 @@ static vm_fault_t sanitize_fault_flags(struct v= m_area_struct *vma, > > vm_fault_t handle_mm_fault(struct vm_area_struct *vma, unsigned long a= ddress, > > unsigned int flags, struct pt_regs *regs) > > { > > + /* Copy vma->vm_mm in case mmap_lock is dropped and vma becomes u= nstable. */ > > + struct mm_struct *mm =3D vma->vm_mm; > > vm_fault_t ret; > > > > __set_current_state(TASK_RUNNING); > > > > - count_vm_event(PGFAULT); > > - count_memcg_event_mm(vma->vm_mm, PGFAULT); > > - > > ret =3D sanitize_fault_flags(vma, &flags); > > if (ret) > > - return ret; > > + goto out; > > > > if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE, > > flags & FAULT_FLAG_INSTRUCTIO= N, > > - flags & FAULT_FLAG_REMOTE)) > > - return VM_FAULT_SIGSEGV; > > + flags & FAULT_FLAG_REMOTE)) { > > + ret =3D VM_FAULT_SIGSEGV; > > + goto out; > > + } > > > > /* > > * Enable the memcg OOM handling for faults triggered in user > > @@ -5223,8 +5230,8 @@ vm_fault_t handle_mm_fault(struct vm_area_struct = *vma, unsigned long address, > > if (task_in_memcg_oom(current) && !(ret & VM_FAULT_OOM)) > > mem_cgroup_oom_synchronize(false); > > } > > - > > - mm_account_fault(regs, address, flags, ret); > > +out: > > + mm_account_fault(mm, regs, address, flags, ret); > > > > return ret; > > } > > -- > > 2.40.0.634.g4ca3ef3211-goog > > > > > > -- > Peter Xu >