From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BB5BC77B72 for ; Mon, 17 Apr 2023 21:14:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A60778E0003; Mon, 17 Apr 2023 17:14:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A100C8E0001; Mon, 17 Apr 2023 17:14:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D76A8E0003; Mon, 17 Apr 2023 17:14:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 79E088E0001 for ; Mon, 17 Apr 2023 17:14:27 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2BE401C67AC for ; Mon, 17 Apr 2023 21:14:27 +0000 (UTC) X-FDA: 80692136574.19.F77C350 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 693831C000C for ; Mon, 17 Apr 2023 21:14:24 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=GOzYcSsF; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681766065; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sxnnHZDwZJJGWb/T5iJo27HHrfD/tsQknrCbU63tA4U=; b=wY+gDQVnZrJw4X9/d85geUEfqgS+cBjoh5Y9vT6AKbcsJwEKDQkKDHd6CDsg73Q05AcEjr E0LHtjzB9Gwa/94BhLRTUWwKD38Ts6IYnNgHLDJ2hU+IUyrv+x85bxZScM9YxUsjB9rYZC JZFLEKq9rK5J+68pKvW8+X1IAyAHNQE= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=GOzYcSsF; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681766065; a=rsa-sha256; cv=none; b=dGJKlqBU4jBKVzDvENX1URwfqI7yWFS9SYCiA//Kocw8t9TjlIZ/pmRpJ/ZJWrLp2616Is GLENbSqvUDkDTmNzcMA6OAQiANZRL+A4Kk4KYFMIK60bIa0n+ODLeGY15CvEKuXy8eo4rk QehMt8hSL2urszadAZNMZGqhfOsSL48= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681766063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sxnnHZDwZJJGWb/T5iJo27HHrfD/tsQknrCbU63tA4U=; b=GOzYcSsFcot/9DO/aKUg/NW+hcXnfEzmI7jtjvWgsQ3lbzWYiG5JdT1ZCTAFuU6lJxE/qx szwKDPHdej1v536urgwYbp5jr2luHllNdwqr0it3esGvTyXx4gukzIbddr+9vJ39cCEvNT hJJBL3so67bfoZAJoXMreqhRlf1fmVU= Received: from mail-yb1-f197.google.com (mail-yb1-f197.google.com [209.85.219.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-505-EoDArqeSPGeKm1kYzddWsg-1; Mon, 17 Apr 2023 17:14:22 -0400 X-MC-Unique: EoDArqeSPGeKm1kYzddWsg-1 Received: by mail-yb1-f197.google.com with SMTP id 3f1490d57ef6-b9266754251so70597276.1 for ; Mon, 17 Apr 2023 14:14:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681766062; x=1684358062; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=sxnnHZDwZJJGWb/T5iJo27HHrfD/tsQknrCbU63tA4U=; b=YKwlkwD+tDmID38GxepuoRTDBbF+Nq+DlLXHagbf4Yv/eRe5NfK/D1wdzugB3oJlsF MFddTtEkGwF1mhBAAAHj2XLSUuQBy0rMrOWeiwO4bat6YdMXzXhV7DGWIVjJUooA0pRK YiB3QcteBlSs5yVlcUWEicYKTDUYBIzOuJDRl6JWWweUSbUGeeQ7knwyPJ4WUAN0ra4u t6Yo0b1tbnRBsfN1MQh0c+iTIXuD91SHkbbyubuYcf+FlVSG2M07WK9thUKIjd5n5fif U+J146ByDYr+xYWKCl77xXVTDUYfu5qQRz1pY8JfqFsK5F1niTsiyvbp9dKAaj5xGSWS SLpw== X-Gm-Message-State: AAQBX9eOURbtXPIbofX8QyuZ1wxveDzRXk/aab03XUiyLfJyuQE54Vun usKm3e6bSYwsFZJwcyAs82hYIBfczzhf3jMpdjPzEW4SlueMo9V5v5Yfq64r1lPZDGGOMBlFVME azijxlnwjTpw= X-Received: by 2002:a81:f88:0:b0:54f:b503:6e69 with SMTP id 130-20020a810f88000000b0054fb5036e69mr10747142ywp.5.1681766061857; Mon, 17 Apr 2023 14:14:21 -0700 (PDT) X-Google-Smtp-Source: AKy350Zc4PjRQEW5uZIQQ0SScbuwxUtuQz06X2KxPomAXA0oAAk06pWNw1Uqh0wTchGQSDxe1XRs6A== X-Received: by 2002:a81:f88:0:b0:54f:b503:6e69 with SMTP id 130-20020a810f88000000b0054fb5036e69mr10747125ywp.5.1681766061505; Mon, 17 Apr 2023 14:14:21 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-40-70-52-229-124.dsl.bell.ca. [70.52.229.124]) by smtp.gmail.com with ESMTPSA id 141-20020a810393000000b0054fba955474sm3345769ywd.17.2023.04.17.14.14.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Apr 2023 14:14:20 -0700 (PDT) Date: Mon, 17 Apr 2023 17:14:19 -0400 From: Peter Xu To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH v2 1/1] mm: do not increment pgfault stats when page fault handler retries Message-ID: References: <20230415000818.1955007-1-surenb@google.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 693831C000C X-Stat-Signature: s7x3s81oihh1ap7duo45a3s5131ek7jw X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1681766064-478086 X-HE-Meta: U2FsdGVkX1+pe01ncJ85HvaBfn9bcZxOOEPaV/y3TjdlNV+LQMwUJFQTZ4C7WBxUQTD93JaHnrGfXvVgK2G4jbfbjqft4x+PfXxSwHhnpncgw8WNN4VQjO3TwDKqhR0f9qdR3KmBU3yymZlaIw+HKwltRrP70KyZCLSXnH8NpEnJUmG4ejfICsimzfIfe5cv4yWd+1Flz9JLfpwxEXTmTd0mgEWA4ww+ItSVJR/x2niaBQuDoF+fjqoc9NRGBnB+lWqdktSv3BpCL7reeX7Zeop3vxeOKJGC89unvTawfHox7OQmhdOwrpc+dDyejH/9yO/3hbHR10YOu6s61mWJ/thsvh9400os3TemhEOZhzR4fBhlpzyi8f+YKlHBLmHDEMPz3wlbYz/Dpnb34CK1BwDYYUuTuQ3pv8nhnCwGGQILyPlblni3hQPtmysYr4IBhJtnohZPVPlGC0xVzH4SCmozhkYqCLZ7edgz4cEhqKf41mlMARiWJHhYAj5nnkOw1Nz75umazSaq7i4LIfbiQqI+N3gXpXKrwYWSBqfv/iqaGjwZAMSsArnDv1b0HsNwSSVg2BaQTugvtbNWUgGBVGRJdfXcoJTtG3POuUJAVBHBRQBqLj3/vISOrbLZvTYBN6uBLfyvJYqyDAUt8j6NxdLV0Knc5to4uRrCdGJR+6DhZveKhf+Ty2F0GP4QBc88iGiw81m55OjtDeOEsxOeO9uyMutaSzX5+6ys2SO6eNC5z4eShPD/B0p8hiAV/wV3x+jyJzp6etkGcjoIfC7j/8kZMElCgl0i5Y2AzjVVfKW8wbA1cW8lOKmMBFHYmFe6RfSL7CxRVQWIYJxIxSAaswIgx/Zn9K9aVm9R/2DbJQo+GXo3uooX6c9BUc8d74hdhdRO8G7KqMS4xYflAbRVclokYimvWgSUuOVQtMNRkZfxpjMadeijOajNUSVEdtkpqB20QG6Wnj43mc94CwF 8LKpcneE SYdmD4BiMQ5BWc7xd1Eao1qx8vESNW4+B1Djs/rVJ85O+rypNeUyqh3Ctv9s4d2RurFL39vtTpWh8Y+GbwO1mlwVMwMbQAx5y1qoe/PPTX3NDuhAy6NF3IZquvGAAH8BPL9fCA34tx0vUGefa9b5XxwrOFOLkDVFlGNZqTznv36kNt+1NXV5lg6et4fc0kKztm/SSURTJW0EkFhEjioye7eNOB8G5i3HQWJU+jsk/yADghnTy9Bij98qZLKBYDNX47MNwhPPC/sFdVXqb1kIxpKuwUl1E88mN4pNA1Avew5a/Zs86W0xj4BTo6Tyw1kx6D18tL5emGi5zg6T5QzHE4wYABl2QnDMr6Vmx+mTKTQHiHoiqby2wuD0LWEwV8YRCTvIeV9A1pSIM086cd+BeIozYyvhqE+6LiGnqFFFzmr+jqDNVZE5rMiVhQEo0u1PaNqYYw1GEQJ9oRCuMapdZDH85j+7iaRwwRhQ8H9Q8k8xLgE8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Apr 17, 2023 at 01:29:45PM -0700, Suren Baghdasaryan wrote: > On Mon, Apr 17, 2023 at 12:40 PM Peter Xu wrote: > > > > On Fri, Apr 14, 2023 at 05:08:18PM -0700, Suren Baghdasaryan wrote: > > > If the page fault handler requests a retry, we will count the fault > > > multiple times. This is a relatively harmless problem as the retry paths > > > are not often requested, and the only user-visible problem is that the > > > fault counter will be slightly higher than it should be. Nevertheless, > > > userspace only took one fault, and should not see the fact that the > > > kernel had to retry the fault multiple times. > > > Move page fault accounting into mm_account_fault() and skip incomplete > > > faults which will be accounted upon completion. > > > > > > Fixes: d065bd810b6d ("mm: retry page fault when blocking on disk transfer") > > > Signed-off-by: Suren Baghdasaryan > > > --- > > > mm/memory.c | 45 ++++++++++++++++++++++++++------------------- > > > 1 file changed, 26 insertions(+), 19 deletions(-) > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 01a23ad48a04..c3b709ceeed7 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -5080,24 +5080,30 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > > > * updates. However, note that the handling of PERF_COUNT_SW_PAGE_FAULTS should > > > * still be in per-arch page fault handlers at the entry of page fault. > > > */ > > > -static inline void mm_account_fault(struct pt_regs *regs, > > > +static inline void mm_account_fault(struct mm_struct *mm, struct pt_regs *regs, > > > unsigned long address, unsigned int flags, > > > vm_fault_t ret) > > > { > > > bool major; > > > > > > /* > > > - * We don't do accounting for some specific faults: > > > - * > > > - * - Unsuccessful faults (e.g. when the address wasn't valid). That > > > - * includes arch_vma_access_permitted() failing before reaching here. > > > - * So this is not a "this many hardware page faults" counter. We > > > - * should use the hw profiling for that. > > > - * > > > - * - Incomplete faults (VM_FAULT_RETRY). They will only be counted > > > - * once they're completed. > > > + * Do not account for incomplete faults (VM_FAULT_RETRY). They will be > > > + * counted upon completion. > > > */ > > > - if (ret & (VM_FAULT_ERROR | VM_FAULT_RETRY)) > > > + if (ret & VM_FAULT_RETRY) > > > + return; > > > + > > > + /* Register both successful and failed faults in PGFAULT counters. */ [1] > > > + count_vm_event(PGFAULT); > > > + count_memcg_event_mm(mm, PGFAULT); > > > > Is there reason on why vm events accountings need to be explicitly > > different from perf events right below on handling ERROR? > > > > I get the point if this is to make sure ERROR accountings untouched for > > these two vm events after this patch. IOW probably the only concern right > > now is having RETRY counted much more than before (perhaps worse with vma > > locking applied). > > > > But since we're on this, I'm wondering whether we should also align the two > > events (vm, perf) so they represent in an aligned manner if we'll change it > > anyway. Any future reader will be confused on why they account > > differently, IMHO, so if we need to differenciate we'd better add a comment > > on why. > > > > I'm wildly guessing the error faults are indeed very rare and probably not > > matter much at all. I just think the code can be slightly cleaner if > > vm/perf accountings match and easier if we treat everything the same. E.g., > > we can also drop the below "goto out"s too. What do you think? > > I think the rationale might be that vm accounting should account for > *all* events, including failing page faults while for perf, the corner > cases which rarely happen would not have tangible effect. Note that it's not only for perf, but also task_struct.maj_flt|min_flt. If we check the reasoning of "why ERROR shouldn't be accounted for perf events", there're actually something valid in the comment: * - Unsuccessful faults (e.g. when the address wasn't valid). That * includes arch_vma_access_permitted() failing before reaching here. * So this is not a "this many hardware page faults" counter. We * should use the hw profiling for that. IMHO it suggests that if someone wants to trap either ERROR or RETRY one can use the hardware counters instead. The same reasoning just sounds applicable to vm events too, because vm events are not special in this case to me. > I don't have a strong position on this issue and kept it as is to > avoid changing the current accounting approach. If we are fine with > such consolidation which would miss failing faults in vm accounting, I > can make the change. I don't have a strong opinion either. We used to change this path before for perf events and task events and no one complains with the change. I'd just bet the same to vm events: https://lore.kernel.org/all/20200707225021.200906-1-peterx@redhat.com/ Please feel free to keep it as-is if you still feel unsafe on changing ERROR handling. If so, would you mind slightly modify [1] above explaining why we need ERROR to be accounted for vm accountings with the reasoning? Current comment only says "what it does" rather than why. Thanks, -- Peter Xu