From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE978C54798 for ; Tue, 27 Feb 2024 23:29:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5BA2D6B024F; Tue, 27 Feb 2024 18:29:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 569B46B0251; Tue, 27 Feb 2024 18:29:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E3DF6B0254; Tue, 27 Feb 2024 18:29:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 25A886B024F for ; Tue, 27 Feb 2024 18:29:37 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id CA7DFA1339 for ; Tue, 27 Feb 2024 23:29:36 +0000 (UTC) X-FDA: 81839177952.09.9CC12A9 Received: from mail-vk1-f175.google.com (mail-vk1-f175.google.com [209.85.221.175]) by imf29.hostedemail.com (Postfix) with ESMTP id 0150C12000B for ; Tue, 27 Feb 2024 23:29:34 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mjkHT8eH; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.175 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709076575; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S/HZWWY/BxyXj68/kdx8luNNS+VFIwwb3gQYHrbtGh4=; b=a/8d6BkMGLHeDki+S3qfpcdusr9xdv0b7UFQLAi/Gx/7mhLd/34WKqkMdukP3dUkQ7w4DA c+BGdMluFr6eBI+a21C+0E8obzTpwQdfXpS0qNwpXGbiRlFYMOxODe4LbOp8bhQbIHJer4 y2ImG9j3RIVvHXI8oRA0Y2076e32qD8= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mjkHT8eH; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf29.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.175 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709076575; a=rsa-sha256; cv=none; b=EY4RMgT2twsuVEVUURVkI8wzKFV2M1TGvrTurKIyX3DwLR3f71S6woEgL4n0wpJMGWHPfQ OYR4zg1LIgcWj1TJtwZzhdsH5GZWOsDpAyuidJw8JXsy/V8gG8WUKyVHKBDM1vNiVY+s4p bsSYqYg0DjaQHNb+sZJM8FWItcgK6o8= Received: by mail-vk1-f175.google.com with SMTP id 71dfb90a1353d-4ce9bfffa19so768452e0c.1 for ; Tue, 27 Feb 2024 15:29:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709076574; x=1709681374; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=S/HZWWY/BxyXj68/kdx8luNNS+VFIwwb3gQYHrbtGh4=; b=mjkHT8eHNWqjHwqYXwLt1QMiD+uVJJqQbkCIXhJdZNoZ2zRwyVYW/fhdyKhQmuFxph I5eTsvS9+0AZCJNAXxrmUhUXVujmVX0YPHG8W/KmxzV/PZog7dQk9y4sKPHK9PY4v7gI E9zZhOB9S1WK0Ts0qffEruMkA7XHayLpFimOxgg+yfxYcSrqlWNGQFdL/vKNvs/uZFvf 9EeTdhJ2kXOy+elZISXDq8SQ2fva9dKKca4yl/oyaXgEMnL4ehlL80tEpOUnnZi1VDb/ rwcfw9ppMMQ1hiMbkStQtPZ32Z4nBlTp9C8NmfSqIJbybbFHflLghQnkkk9UrI4U/GlC mSmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709076574; x=1709681374; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S/HZWWY/BxyXj68/kdx8luNNS+VFIwwb3gQYHrbtGh4=; b=aTs0UGI110F+ePcKRlK+J64anNg941P2I4nLLLDkFzYHQMDvN/A52IO6Rl/pfkB3gT pKvLlHTFarYqzi2yk/ZHGktxUl4zzLJK5RYe/zFuD9V9N+CZzUDmjejT7ctgoTCRJq// +FszwwV2dy+N4/Tjz6usIh/6OF4uoijclrVwhaceDYBiiRB/ZOGRbGyncOTxBLfjbFAo 1Vf8gt7mi7emjNgo0rSh9qJq58pluaxbDhg7ghmVoToPMuV5Atc5+4+8B0g32pA545me 9WJZs4GbD8FOYrMrgpvXFxiH/ABnc7523WVfJsxAMb8jfWbjVefV4IyX1xAw+LIFxOv/ sUyQ== X-Forwarded-Encrypted: i=1; AJvYcCVcQksjRx7H6FJm4OZNoHa0N0wIlUQuipocD/99M7nlJfwttkyoMGtI0I/3kcCapRt+3tagYT1oyVBa8lmRlpWRN+k= X-Gm-Message-State: AOJu0YwDElXt0iNV0w7TNFxLGVPlkF0n0bLSwk/SsOOBWgwezo9qzWL7 nyfNTrT/IAxMx2UEYg6W5RpMBNzj9p+xpZiiOEheSgbD2rudvTdGvfbo9GGZkPwNB9+wmWW0gVH 4x1G3lDleWGgo2L7LonJa+miOODa8Bf9ymcMVbw== X-Google-Smtp-Source: AGHT+IH3RZcyV+rgPNWxhx4xvCOiXGF6Ap3KkGHlOO0i0klIiDnagjT9CcYHs28t/yHvoOYo8YX3mXUEX9PuHswQRSw= X-Received: by 2002:a1f:e681:0:b0:4c8:f33e:67a9 with SMTP id d123-20020a1fe681000000b004c8f33e67a9mr8067233vkh.5.1709076574054; Tue, 27 Feb 2024 15:29:34 -0800 (PST) MIME-Version: 1.0 References: <20240227201548.857831-1-david@redhat.com> In-Reply-To: <20240227201548.857831-1-david@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Wed, 28 Feb 2024 12:29:22 +1300 Message-ID: Subject: Re: [PATCH v1] mm: convert folio_estimated_sharers() to folio_likely_mapped_shared() To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song , Vishal Moola , Ryan Roberts Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0150C12000B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: sxs69pfm5csx3puwc9uodjfth478zbkh X-HE-Tag: 1709076574-536718 X-HE-Meta: U2FsdGVkX1+g4YIWa2EK/KsbVSXwzIuLuURj6rxf3vkS+V5A8KClHBE7lELt5RtvjxS976Sy/lW9aRfuZLvZcsW/u+ZGcnRUV5HuHl+r8Yd6ZN+sbmRb31rqyR3jxu5ZnQWSdhLsSyufE4nWZ8gRPSAG6Ura/4V3ve2Vz0I52xq8x+Tof625XkhwzKKiGhuKAfwBcYZaLxHickcpdAUWm7R6bGtf5hGUO1/VYeDCOKeJ8hsrr2NCAPtvBLuKTGJ141UQ1ILOUY9vy7rETlTRqQAy0Q4KS9pLAGLBRmKLU+kITo3Qn7HNg9RfQw7h8z6Fd40x9oDTbeijPxSLMr2306FF/q/35mHm3f4lve9mR5mZPZcCLNCTi0C9n4Aw+uRPz89d/cz9M6/LxhqyLC/R42jIOjhmKMgFUrzYehSWU3F1fp463y1BjSMSvHmk4C2cbmsdEjbHdX9lFJGsHEGqyh72U3gVebl8Cp89Gu6oqr/vO3Mzp7x8wO+9DcbU+yu5hyFTMtaQJk8su6ogfP3ahIu2MJeLgQdAf4q/pVSF/AUG/1xNTT3cVn943mtKX9Q0SRn7JNAdgtN2qte9C5ZtfXQe/U6mUcpDhhpzsh3LwoYp6x50Ceo44F06W8Y/ThrXmuYde9MF4Yvdr4F+kZ1pPSqtokSj2jvfteZfkB+G8F5KKxLhOTE7Qw7tvxiyHejD/fDEVvIsuwCcDc+aVxcUtmjLYPbHefjML4YX3hxDzIx96GzXXffKw/LaUlEI9yvEYKUTqslh/1H3dFSDNOXCON7uillHDrexjnIkrf1xVwTb35V5UFk8isXZfPwKckGpdPc3IIxs7GmUAyTPuCrSHd8Fxf3YdlmDIzhRFczhaYWAlGzVgu4xAm2rXaqPU67U2nzlTAuRhc5OsR6mTuH6jHwWHtof/YABBkcTXpk1pgAatHNE2oMEDLSlnw3DXWeJeOe9FkWthqDqgUUFN3/ ltocK8km BO5LSRPfRNW0jSwiKPknDs+Dw91hD8eZPmyGMOK28LSiAlhVjCo+XwwYo1HSLf0el7WUwIPlOyGwchQyU2uOgYNqhkN6q2rNldtW/RuqosjB+Mm4VUm2UALyvXbfYX9wwScB6SkyQwvfJsRq39LGqpWtE7ac1/F9rz1OLWnLEb+EmeYlDNFuqfQsJRc32Z8uo9VxzjqDR0biPZYZt9CCZr2aiaJTdoWlnsO/schIkjEg/P5jgRFLOHFtc2c2yp+eBjAnqAK1AFAMFZD/3foh9xzLiTPWREPlmnasheQFOoMTTVI/jiFrGukuCEHV4DQzMDg5z2yxzAYuuUFL0bB9bHQs+3ox454CDnb2qpVVyXqMe5eYw/40shk+2iUnMKd9v9HMJVr5i4VD0WSWjCxTHZYdLnbcAJg4uIYQbHew6IaSNJ8WbtnmSvmhmzgZ7g7V7o+FuQuRv9lv4z9acTxEUY9qrVTSjGPJLtDJyYPQ5ZP/Jcz/WQwVx8x0/dpojHAjdCr+TDGoFbcU2ZHrIGn1LeZEtc1yQlCIUfqqvItPeMu02r6bzNfzJN2ZOw2oelxataT4SMOGMZcWP1goLhqg3g2p7P0H8iiiDvuABbwTWqQph0ybMCnD3fg977OGruuRzFnmx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 28, 2024 at 9:16=E2=80=AFAM David Hildenbrand wrote: > > Callers of folio_estimated_sharers() only care about "mapped shared vs. > mapped exclusively", not the exact estimate of sharers. Let's consolidate > and unify the condition users are checking. While at it clarify the > semantics and extend the discussion on the fuzziness. > > Use the "likely mapped shared" terminology to better express what the > (adjusted) function actually checks. > > Whether a partially-mappable folio is more likely to not be partially > mapped than partially mapped is debatable. In the future, we might be abl= e > to improve our estimate for partially-mappable folios, though. > > Note that we will now consistently detect "mapped shared" only if the > first subpage is actually mapped multiple times. When the first subpage > is not mapped, we will consistently detect it as "mapped exclusively". > This change should currently only affect the usage in > madvise_free_pte_range() and queue_folios_pte_range() for large folios: i= f > the first page was already unmapped, we would have skipped the folio. > > Cc: Barry Song > Cc: Vishal Moola (Oracle) > Cc: Ryan Roberts > Signed-off-by: David Hildenbrand LGTM, Acked-by: Barry Song > --- > include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++++++---------- > mm/huge_memory.c | 2 +- > mm/madvise.c | 6 +++--- > mm/memory.c | 2 +- > mm/mempolicy.c | 14 ++++++-------- > mm/migrate.c | 8 ++++---- > 6 files changed, 51 insertions(+), 27 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 6f4825d829656..795c89632265f 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2147,21 +2147,47 @@ static inline size_t folio_size(struct folio *fol= io) > } > > /** > - * folio_estimated_sharers - Estimate the number of sharers of a folio. > + * folio_likely_mapped_shared - Estimate if the folio is mapped into the= page > + * tables of more than one MM > * @folio: The folio. > * > - * folio_estimated_sharers() aims to serve as a function to efficiently > - * estimate the number of processes sharing a folio. This is done by > - * looking at the precise mapcount of the first subpage in the folio, an= d > - * assuming the other subpages are the same. This may not be true for la= rge > - * folios. If you want exact mapcounts for exact calculations, look at > - * page_mapcount() or folio_total_mapcount(). > + * This function checks if the folio is currently mapped into more than = one > + * MM ("mapped shared"), or if the folio is only mapped into a single MM > + * ("mapped exclusively"). > * > - * Return: The estimated number of processes sharing a folio. > + * As precise information is not easily available for all folios, this f= unction > + * estimates the number of MMs ("sharers") that are currently mapping a = folio > + * using the number of times the first page of the folio is currently ma= pped > + * into page tables. > + * > + * For small anonymous folios (except KSM folios) and anonymous hugetlb = folios, > + * the return value will be exactly correct, because they can only be ma= pped > + * at most once into an MM, and they cannot be partially mapped. > + * > + * For other folios, the result can be fuzzy: > + * (a) For partially-mappable large folios (THP), the return value can w= rongly > + * indicate "mapped exclusively" (false negative) when the folio is > + * only partially mapped into at least one MM. > + * (b) For pagecache folios (including hugetlb), the return value can wr= ongly > + * indicate "mapped shared" (false positive) when two VMAs in the sa= me MM > + * cover the same file range. > + * (c) For (small) KSM folios, the return value can wrongly indicate "ma= pped > + * shared" (false negative), when the folio is mapped multiple times= into > + * the same MM. > + * > + * Further, this function only considers current page table mappings tha= t > + * are tracked using the folio mapcount(s). It does not consider: > + * (1) If the folio might get mapped in the (near) future (e.g., swapcac= he, > + * pagecache, temporary unmapping for migration). > + * (2) If the folio is mapped differently (VM_PFNMAP). > + * (3) If hugetlb page table sharing applies. Callers might want to chec= k > + * hugetlb_pmd_shared(). > + * > + * Return: Whether the folio is estimated to be mapped into more than on= e MM. > */ > -static inline int folio_estimated_sharers(struct folio *folio) > +static inline bool folio_likely_mapped_shared(struct folio *folio) > { > - return page_mapcount(folio_page(folio, 0)); > + return page_mapcount(folio_page(folio, 0)) > 1; > } > > #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 50d146eb248ff..4d10904fef70c 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1829,7 +1829,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, = struct vm_area_struct *vma, > * If other processes are mapping this folio, we couldn't discard > * the folio unless they all do MADV_FREE so let's skip the folio= . > */ > - if (folio_estimated_sharers(folio) !=3D 1) > + if (folio_likely_mapped_shared(folio)) > goto out; > > if (!folio_trylock(folio)) > diff --git a/mm/madvise.c b/mm/madvise.c > index 44a498c94158c..32a534d200219 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -366,7 +366,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *p= md, > folio =3D pfn_folio(pmd_pfn(orig_pmd)); > > /* Do not interfere with other mappings of this folio */ > - if (folio_estimated_sharers(folio) !=3D 1) > + if (folio_likely_mapped_shared(folio)) > goto huge_unlock; > > if (pageout_anon_only_filter && !folio_test_anon(folio)) > @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *p= md, > if (folio_test_large(folio)) { > int err; > > - if (folio_estimated_sharers(folio) > 1) > + if (folio_likely_mapped_shared(folio)) > break; > if (pageout_anon_only_filter && !folio_test_anon(= folio)) > break; > @@ -677,7 +677,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigne= d long addr, > if (folio_test_large(folio)) { > int err; > > - if (folio_estimated_sharers(folio) !=3D 1) > + if (folio_likely_mapped_shared(folio)) > break; > if (!folio_trylock(folio)) > break; > diff --git a/mm/memory.c b/mm/memory.c > index 1c45b6a42a1b9..8394a9843ca06 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5173,7 +5173,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf= ) > * Flag if the folio is shared between multiple address spaces. T= his > * is later used when determining whether to group tasks together > */ > - if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags & VM_SHA= RED)) > + if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHAR= ED)) > flags |=3D TNF_SHARED; > > nid =3D folio_nid(folio); > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index f60b4c99f1302..0b92fde395182 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -642,12 +642,11 @@ static int queue_folios_hugetlb(pte_t *pte, unsigne= d long hmask, > * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared fo= lio. > * Choosing not to migrate a shared folio is not counted as a fai= lure. > * > - * To check if the folio is shared, ideally we want to make sure > - * every page is mapped to the same process. Doing that is very > - * expensive, so check the estimated sharers of the folio instead= . > + * See folio_likely_mapped_shared() on possible imprecision when = we > + * cannot easily detect if a folio is shared. > */ > if ((flags & MPOL_MF_MOVE_ALL) || > - (folio_estimated_sharers(folio) =3D=3D 1 && !hugetlb_pmd_shar= ed(pte))) > + (!folio_likely_mapped_shared(folio) && !hugetlb_pmd_shared(pt= e))) > if (!isolate_hugetlb(folio, qp->pagelist)) > qp->nr_failed++; > unlock: > @@ -1032,11 +1031,10 @@ static bool migrate_folio_add(struct folio *folio= , struct list_head *foliolist, > * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared fo= lio. > * Choosing not to migrate a shared folio is not counted as a fai= lure. > * > - * To check if the folio is shared, ideally we want to make sure > - * every page is mapped to the same process. Doing that is very > - * expensive, so check the estimated sharers of the folio instead= . > + * See folio_likely_mapped_shared() on possible imprecision when = we > + * cannot easily detect if a folio is shared. > */ > - if ((flags & MPOL_MF_MOVE_ALL) || folio_estimated_sharers(folio) = =3D=3D 1) { > + if ((flags & MPOL_MF_MOVE_ALL) || !folio_likely_mapped_shared(fol= io)) { > if (folio_isolate_lru(folio)) { > list_add_tail(&folio->lru, foliolist); > node_stat_mod_folio(folio, > diff --git a/mm/migrate.c b/mm/migrate.c > index 73a052a382f13..35d376969f8b9 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2568,11 +2568,11 @@ int migrate_misplaced_folio(struct folio *folio, = struct vm_area_struct *vma, > /* > * Don't migrate file folios that are mapped in multiple processe= s > * with execute permissions as they are probably shared libraries= . > - * To check if the folio is shared, ideally we want to make sure > - * every page is mapped to the same process. Doing that is very > - * expensive, so check the estimated mapcount of the folio instea= d. > + * > + * See folio_likely_mapped_shared() on possible imprecision when = we > + * cannot easily detect if a folio is shared. > */ > - if (folio_estimated_sharers(folio) !=3D 1 && folio_is_file_lru(fo= lio) && > + if (folio_likely_mapped_shared(folio) && folio_is_file_lru(folio)= && > (vma->vm_flags & VM_EXEC)) > goto out; > > -- > 2.43.2 > > Thanks Barry