From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7D6CC001DF for ; Wed, 2 Aug 2023 15:08:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4820F280188; Wed, 2 Aug 2023 11:08:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3E321280143; Wed, 2 Aug 2023 11:08:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28468280188; Wed, 2 Aug 2023 11:08:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0F7D0280143 for ; Wed, 2 Aug 2023 11:08:46 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B95B1120F83 for ; Wed, 2 Aug 2023 15:08:45 +0000 (UTC) X-FDA: 81079496610.13.3EECB36 Received: from outbound-smtp01.blacknight.com (outbound-smtp01.blacknight.com [81.17.249.7]) by imf22.hostedemail.com (Postfix) with ESMTP id CD197C003A for ; Wed, 2 Aug 2023 15:08:19 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.7 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690988900; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=77TFvZt16s+RV3s3IoNu7FzdTi9gZSggClr+R+eW9gc=; b=HS6aZJIZzI716fImAln35FECwX5hV0JWWzZnYWLEbvgXcWLuwTGNQvyNEQz0XGtGWI/p+n nGG3Ue6FhmgMcX1nKi+uJONi4PwuA5euRdRSZOsgVdAJnxp856CuBMUUfrfga8bikO8nTG wq0lNIrTxZIQTvtfpTIzlQPsJvNVAg4= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.7 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690988900; a=rsa-sha256; cv=none; b=mROGDgwvXtmql6wkEyReh3vVpgA5+35ApqkGpeSdcyas1LAutITCir4Z3nr1cij+gy/cmF eQcWRTC7cN1xcJc2/1uiO3j2Z9w3ezN2FhmoWL3qZu0VtuoOTuT/QFw7Z+8kpC6U7u4Oc6 VTveos1ubw9GwJ98Q/Yn+IIpVX9Si4Y= Received: from mail.blacknight.com (pemlinmail04.blacknight.ie [81.17.254.17]) by outbound-smtp01.blacknight.com (Postfix) with ESMTPS id A5BA8C4AFA for ; Wed, 2 Aug 2023 16:08:17 +0100 (IST) Received: (qmail 16070 invoked from network); 2 Aug 2023 15:08:17 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.20.191]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 2 Aug 2023 15:08:17 -0000 Date: Wed, 2 Aug 2023 16:08:16 +0100 From: Mel Gorman To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, Andrew Morton , Linus Torvalds , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman , Shuah Khan , Paolo Bonzini , stable@vger.kernel.org Subject: Re: [PATCH v2 1/8] mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT Message-ID: <20230802150816.aaubbx4t7745lqik@techsingularity.net> References: <20230801124844.278698-1-david@redhat.com> <20230801124844.278698-2-david@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20230801124844.278698-2-david@redhat.com> X-Rspamd-Queue-Id: CD197C003A X-Rspam-User: X-Stat-Signature: mbcnrmjosh1w3qpcw9fg4n9pjd449mhn X-Rspamd-Server: rspam01 X-HE-Tag: 1690988899-362315 X-HE-Meta: U2FsdGVkX19CBQbgKt2bH8CvWzni9/zVm0aCFblihE0IN7dkvrGxRF/VpkUmRPKro6lB1CGxN2nVk9IDwjeUpRBsfudT9uQOUKP/7dnUuep85f+3WOQvPJrYPpjxwMMaTau0mjwvp9SpQyUmrKhD/uCKlN972E1KOpT9mWSbCTL3mEHVHWS1bkg4fVPBZFvQYDAi7dInZq3/VZzTk61zlBzgM15RBeQ+Fzu+paK4lDPR2MuAM2I54xSYQ9XhFOlEMkEzt7RovS/R0dqdK9myPTsy93ydY0g9Zo1SckCrFTQSsMaGx5/b/paDMpweuzNdJkZsF/UJuBU19+BwuKc44fI62SOQfzNuNvvG/jferNXriygryHSxJjOyVdA2Fjb0j/NdFXx2drUJHsBwcM3dzaEqoy6xKZVhNYVDwFG/7Qwmf1n7x3IYxk0yIiqImLfeusDtFZplxL1MTWB6p3gHedVKaV2jaPEuV25wXGfCWr/4J1CTqo4OFutEMU5peSLQa1T+6RkkjF+Cs4gs8xruycZj0Rz2vnJmQzhG6eG0qz0+5XOvFKsjHQ7es6H0axoMwBS+y15uRc9FliL2EIbmCD/SQQM+IcgmVo63+uAhhQ3w9Sy2cG1oQFWp4t5QCYvPxEjxvcIgXqSEkuyqm5IV5WdGjth6qZT54LnCOVKSNQmgU418Z0qHLyTB94TTFj28opem7huXZPE+XHlew5FR+yGJiuG63BgFQ1jQ48NnEQwuP57834ZbeWe/31jWNcJ1xQV9jciMH6PsliIokdSYJrl/pTO2crrk7JAwMgnpZdLad7m5LwrO3wrfuF26EBjNf93ckVq3O8FEOLqlu2cX9Gqg4xswVlr0OORSoirf+S8BJPy/nzQ+46BbxpiyfnRM8cYSopZ2fLltTXl9UtNZkrJ7t30fD5hqCVoGXoZ7BdjH0sbodVsBECMWntg3q0UsAJG//W5tal3BkoKZkdq b6bkFbyA XfTtjEs++45546VOngCXYqP1qGEP+pNKOAUMQpPJHSN2AIuzRt4siii7NmhkzMlr0tNEgeDzQun4GefSDEAcYXvnp5Ycs8VeYqSn0KyFA+Z08rSqd2luKP58kEfeUVUoaq6hz70P3I2WnOY1XYS4LLmXxc9+aOnebxMBQca8lHqegjcJYMJRd3zlpZJ+ccQvOLK/aQbIXkPoFlL6M+EI2M2sqL9J4zaOkLPox2vZWQmZzzwi+ioDSCIYU9h53IZRDADP3cKB9JY3kBP9fxOifp2BQMzg+Yte5ZzyeJw2/KkfcjXKaKJcSRqlDj8T6nLxT861D0AZgKoluucHRL+WFhf6bOrhI3ph/kGRGMUIU4d4fseHEWpgu7dtP9g+E1Q3HQgWqAi8mQJZYZX4wbBftU8+MjhHeWoLt03SbF68j3xnUSXDpxVzxi+ARBoonA1dKZiyCuOpI9W/RLk3ILw6JaRhSEZDJ/FKjnkSQkc2E0332/V3ZnTR8K8DMWA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 01, 2023 at 02:48:37PM +0200, David Hildenbrand wrote: > Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by > gup_can_follow_protnone()") missed that follow_page() and > follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really > don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting > or due to inaccessible (PROT_NONE) VMAs. > > As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page > faults from gup/gup_fast"): "Other follow_page callers like KSM should not > use FOLL_NUMA, or they would fail to get the pages if they use follow_page > instead of get_user_pages." > > liubo reported [1] that smaps_rollup results are imprecise, because they > miss accounting of pages that are mapped PROT_NONE. Further, it's easy > to reproduce that KSM no longer works on inaccessible VMAs on x86-64, > because pte_protnone()/pmd_protnone() also indictaes "true" in > inaccessible VMAs, and follow_page() refuses to return such pages right > now. > > As KVM really depends on these NUMA hinting faults, removing the > pte_protnone()/pmd_protnone() handling in GUP code completely is not really > an option. > > To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT > to restore the original behavior for now and add better comments. > > Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in > is_valid_gup_args(), to add that flag for all external GUP users. > > Note that there are three GUP-internal __get_user_pages() users that don't > end up calling is_valid_gup_args() and consequently won't get > FOLL_HONOR_NUMA_FAULT set. > > 1) get_dump_page(): we really don't want to handle NUMA hinting > faults. It specifies FOLL_FORCE and wouldn't have honored NUMA > hinting faults already. > 2) populate_vma_page_range(): we really don't want to handle NUMA hinting > faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have > honored NUMA hinting faults already. > 3) faultin_vma_page_range(): we similarly don't want to handle NUMA > hinting faults. > > To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in > inaccessible VMAs properly, we have to perform VMA accessibility checks in > gup_can_follow_protnone(). > > As GUP-fast should reject such pages either way in > pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and > arm64 that both implement pte_protnone() -- let's just always fallback > to ordinary GUP when stumbling over pte_protnone()/pmd_protnone(). > > As Linus notes [2], honoring NUMA faults might only make sense for > selected GUP users. > > So we should really see if we can instead let relevant GUP callers specify > it manually, and not trigger NUMA hinting faults from GUP as default. > Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag > and adding appropriate documenation. > > [1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com > [2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs=g@mail.gmail.com > > Reported-by: liubo > Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com > Reported-by: Peter Xu > Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/ > Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") > Cc: > Signed-off-by: David Hildenbrand I agree that FOLL_REMOTE probably needs separate treatment but also agree that it's outside the context of this patch, particularly as a -stable candidate so Acked-by: Mel Gorman I've a minor nit below that would be nice to get fixed up, but not mandatory. > --- > include/linux/mm.h | 21 +++++++++++++++------ > include/linux/mm_types.h | 9 +++++++++ > mm/gup.c | 29 +++++++++++++++++++++++------ > mm/huge_memory.c | 2 +- > 4 files changed, 48 insertions(+), 13 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index 2493ffa10f4b..f463d3004ddc 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2240,6 +2244,12 @@ static bool is_valid_gup_args(struct page **pages, int *locked, > gup_flags |= FOLL_UNLOCKABLE; > } > > + /* > + * For now, always trigger NUMA hinting faults. Some GUP users like > + * KVM really require it to benefit from autonuma. > + */ > + gup_flags |= FOLL_HONOR_NUMA_FAULT; > + > /* FOLL_GET and FOLL_PIN are mutually exclusive. */ > if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) == > (FOLL_PIN | FOLL_GET))) Expand on *why* KVM requires it even though I suspect this changes later in the series. Maybe "Some GUP users like KVM require the hint to be as the calling context of GUP is functionally similar to a memory reference from task context"? Also minor nit -- s/autonuma/NUMA Balancing/ or numab. autonuma refers to a specific implementation of automatic balancing that operated similar to khugepaged but not merged. If you grep for it, you'll find that autonuma is only referenced in powerpc-specific code. It's not important these days but very early on, it was very confusing if AutoNUMA was mentioned when NUMAB was intended. -- Mel Gorman SUSE Labs