From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5EF72C54798 for ; Tue, 27 Feb 2024 23:46:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCEAE6B00ED; Tue, 27 Feb 2024 18:46:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C57856B00F1; Tue, 27 Feb 2024 18:46:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1F106B00FB; Tue, 27 Feb 2024 18:46:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9B6A66B00ED for ; Tue, 27 Feb 2024 18:46:33 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 46CE280E55 for ; Tue, 27 Feb 2024 23:46:33 +0000 (UTC) X-FDA: 81839220666.29.177CC45 Received: from mail-yb1-f178.google.com (mail-yb1-f178.google.com [209.85.219.178]) by imf25.hostedemail.com (Postfix) with ESMTP id A1587A000F for ; Tue, 27 Feb 2024 23:46:31 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=W2DgYXJU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of vishal.moola@gmail.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=vishal.moola@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709077591; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KZz3p2A7RI7L01958ow5hSvNRkDHntNWyAAk2EK1SjU=; b=OjmhWsghD/8NrK9w23xOkOC3vqj/lFW6hW2DJh+VcH3/HDla8EEsMR8jjN66y6l0cgu9PY 2hTiinE0yabRXcRu3dp/4TjDj+rYCIyqIpER5sS9LMO9KaLIsIU441GHGae7aB4KEHOlw+ wKyoTOBlyBxxkuhPVPvAux12XFNJ8Uk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=W2DgYXJU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of vishal.moola@gmail.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=vishal.moola@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709077591; a=rsa-sha256; cv=none; b=fyiRont+s5qeq2af2zU3xuzlNR3PHXqJrIngU286XoWxdAWS2ipx6kY1zvNBW5wfTn3SdG u6heWuN7gcjE64EwBZkLaZMg2L/WHWe9qjD05tVHTKaz7fRolVrdNNVH5Ni/MpwVg+uysh HoOiSPGTQ2auKnSP8gED//8nUQhXB10= Received: by mail-yb1-f178.google.com with SMTP id 3f1490d57ef6-dc6cbe1ac75so277809276.1 for ; Tue, 27 Feb 2024 15:46:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709077591; x=1709682391; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KZz3p2A7RI7L01958ow5hSvNRkDHntNWyAAk2EK1SjU=; b=W2DgYXJUIUV8KT7Y6qiEQvL2OrmgGNQHnD1h7Tp7jCnF/mzNavVJYNkyyGU81V7RUZ /GobA38BKnAMfNfWmYKflcwc2HL/ke36WQ/XwsFeB6xKbT9npI65c4uOsA240LiDlJcq jXRwXiUs4Nkf3qRTcPAFSTs/aUYQMFwE5GpqTol+j8g4f/MBJayEUd2MHx7MHserlHZo nXTtc7KJedCwheyaGM6j+Zbc0UEBUoxg+rlFTq93i1EatRZhPM5gIAtM93jLiuAQVK7B xWmJo41D7RxL9px7ujnibG0gy8GcZjXAYV/kv30HPklDDg2RDNUSYuuGQ2ojGm0lZDPd /ZkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709077591; x=1709682391; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KZz3p2A7RI7L01958ow5hSvNRkDHntNWyAAk2EK1SjU=; b=W1rmVCXjiyfu8DsHLXuC9Crs4tBAKUfjcY01lXR3KRit1qyzMl+ZVhRqvf9H3GIJ82 uUeo9NCpWupIYIn4DSJGcqDC1cIjD+rh6HhZhKnG7CYlIJFHB+bcNd5M7S7KfRAaiJ+r uSY+rfGDf3iFOzWqd0ApBiSKqrm7OBnC1KcDlPj0qzeEVFH/tl6Bs1n9XVYOQBknjK/q wPL8Pa0cjF86UeQI7VfsXpC6VK+1hsiv/WroN52s0miXRZuw8jVyhtOLLwN3zWhwFxi8 PSNqvo72tUg3yjm0dNRDChRlWF5P9NRlgmfPO1P/zI9jOinXllc4wfVWINydJ8ggW9RV wJ8w== X-Forwarded-Encrypted: i=1; AJvYcCXa6S5juRSaOsES81RefBDWMeJ80FX1N0vUUt+VfTZY53SFyw1scelpcY23scSfbc09qZAYtjXZPW4mDhHRp+pokFM= X-Gm-Message-State: AOJu0YwutunUpFIlEr7moO4QdiMERteh8z9vfCOxwQOdsqsosnDkzn72 4ADqgR/xj0AsDBCTjRqJAkjshCVYZjA0aXZwLtK8lOIHMxDa0a/TxpIv4HrCVPS7/0lNwz+uBXo z/kj6DljOF4wa8SF2LKXEWCHzcNo= X-Google-Smtp-Source: AGHT+IFcAHC9P+2wk/dIxyeIP6q9s517UZikGNg6ASNVpWUi833sNrRl/ToG90a7iwvyipZamuw/48a50Q/NXXqZYZc= X-Received: by 2002:a25:a20a:0:b0:dcb:df38:1c20 with SMTP id b10-20020a25a20a000000b00dcbdf381c20mr621378ybi.24.1709077590654; Tue, 27 Feb 2024 15:46:30 -0800 (PST) MIME-Version: 1.0 References: <20240227201548.857831-1-david@redhat.com> In-Reply-To: <20240227201548.857831-1-david@redhat.com> From: Vishal Moola Date: Tue, 27 Feb 2024 15:46:19 -0800 Message-ID: Subject: Re: [PATCH v1] mm: convert folio_estimated_sharers() to folio_likely_mapped_shared() To: David Hildenbrand Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song , Ryan Roberts Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A1587A000F X-Stat-Signature: omghmz67wti4ruaa4otwzwwef1xjgkyk X-Rspam-User: X-HE-Tag: 1709077591-352138 X-HE-Meta: U2FsdGVkX19Yu7wPJpQh96QDIRTWFOoJpfqG9m+wihgEiMynLSXbttwBbgB0Xu9v3kVeQ/fF42rssmLTNiCV6F5/rHb+W3h0UojNdQmfUTvIcb6DvXvD5ZbAZq8U7veWshAF/bXpNStpvM9k30jNUxzsgfgL6bb/nAWvIisvmuR7Kk74WCCR6lyPPDE+pXKQo48JT8zbe0UPhXB61lETmXkyGueCwRyNpNtk4ZCf5KFPKz6xvGw7iROXr1lHIaXMv9mXtC8hAQytAVTpo2ukiCl8sQoCZmcIFgAFnSyJh+uNexcZAmKkgnqIQMDracleqbgGhVGnCn15ITS4ZnHnTItTjR+OT36DiV/Nbgq9XjbBch/EGqXzknsxLgGfxNbplko4+wYBa942LP4okKZSFWSp4e8wJ+87RrkrKFtzBZji96AoceDK3f5TUdXpjQVDkG7n9VDJQQYKZHtOeos6sTboBopCCT8QhkuxEjTxewQtVMiz+BdQ1Wvu2xap2Xo1zTOIAjnYZxVeiDmJ7ZzERRm9o/7ANCJ6EENIIswzx9OiXB+IpDJyLNTOyAJn9tTuN6IZJ0WBOmd6F6wbjUsJIF7DqVP5obCVouFM81Shib02cj2RczG/KauaE+5Zs3b8EyKcTMpUVmxGkSOj4MGFvxmORjsicK8jWxl4odqQxuYGk5BfHNYEwijUPsPNLnxtByQRe2XrzpQW5XmQR3oDL0qkxbtNFkGZTwCQ3eXsvHjydo/15epifqNfBSX9/HCwaXC/rO/VGVFZ7sXgCZu5I94L3SUpgNHMls/E4t6xd/5RiNzO5iSjdg9a403xyWZ7XNZlk+c8T9FOxSg7+tmHn130m+Kub4eRV3VAyF2KOFf0yrGLqkFdEXTaA6tXJhmFQ/PUQaTMQka1AfYQNkmu9g7WWgODAIdIgKgf71u6PsVaQ5Z0hmQGJwOtu2ArImMivB3CRl2TKUAHsP2U8OD HdQcSXHj yI54MXr0G0y/H2DbvyLAIul1AcnSXrRBEcgbOgKvpGfz4i4smYQY2JLHMHSIXlLu76FO5pFnIaOveet3cWJn8NrL4rMQ+/Nx5/33ujsjicfIvbbgopAzx1TqZEg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 27, 2024 at 12:15=E2=80=AFPM David Hildenbrand wrote: > > Callers of folio_estimated_sharers() only care about "mapped shared vs. > mapped exclusively", not the exact estimate of sharers. Let's consolidate > and unify the condition users are checking. While at it clarify the > semantics and extend the discussion on the fuzziness. > > Use the "likely mapped shared" terminology to better express what the > (adjusted) function actually checks. > > Whether a partially-mappable folio is more likely to not be partially > mapped than partially mapped is debatable. In the future, we might be abl= e > to improve our estimate for partially-mappable folios, though. > > Note that we will now consistently detect "mapped shared" only if the > first subpage is actually mapped multiple times. When the first subpage > is not mapped, we will consistently detect it as "mapped exclusively". > This change should currently only affect the usage in > madvise_free_pte_range() and queue_folios_pte_range() for large folios: i= f > the first page was already unmapped, we would have skipped the folio. > > Cc: Barry Song > Cc: Vishal Moola (Oracle) > Cc: Ryan Roberts > Signed-off-by: David Hildenbrand Reviewed-by: Vishal Moola (Oracle) > --- > include/linux/mm.h | 46 ++++++++++++++++++++++++++++++++++++---------- > mm/huge_memory.c | 2 +- > mm/madvise.c | 6 +++--- > mm/memory.c | 2 +- > mm/mempolicy.c | 14 ++++++-------- > mm/migrate.c | 8 ++++---- > 6 files changed, 51 insertions(+), 27 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 6f4825d829656..795c89632265f 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2147,21 +2147,47 @@ static inline size_t folio_size(struct folio *fol= io) > } > > /** > - * folio_estimated_sharers - Estimate the number of sharers of a folio. > + * folio_likely_mapped_shared - Estimate if the folio is mapped into the= page > + * tables of more than one MM > * @folio: The folio. > * > - * folio_estimated_sharers() aims to serve as a function to efficiently > - * estimate the number of processes sharing a folio. This is done by > - * looking at the precise mapcount of the first subpage in the folio, an= d > - * assuming the other subpages are the same. This may not be true for la= rge > - * folios. If you want exact mapcounts for exact calculations, look at > - * page_mapcount() or folio_total_mapcount(). > + * This function checks if the folio is currently mapped into more than = one > + * MM ("mapped shared"), or if the folio is only mapped into a single MM > + * ("mapped exclusively"). > * > - * Return: The estimated number of processes sharing a folio. > + * As precise information is not easily available for all folios, this f= unction > + * estimates the number of MMs ("sharers") that are currently mapping a = folio > + * using the number of times the first page of the folio is currently ma= pped > + * into page tables. > + * > + * For small anonymous folios (except KSM folios) and anonymous hugetlb = folios, > + * the return value will be exactly correct, because they can only be ma= pped > + * at most once into an MM, and they cannot be partially mapped. > + * > + * For other folios, the result can be fuzzy: > + * (a) For partially-mappable large folios (THP), the return value can w= rongly > + * indicate "mapped exclusively" (false negative) when the folio is > + * only partially mapped into at least one MM. > + * (b) For pagecache folios (including hugetlb), the return value can wr= ongly > + * indicate "mapped shared" (false positive) when two VMAs in the sa= me MM > + * cover the same file range. > + * (c) For (small) KSM folios, the return value can wrongly indicate "ma= pped > + * shared" (false negative), when the folio is mapped multiple times= into > + * the same MM. > + * > + * Further, this function only considers current page table mappings tha= t > + * are tracked using the folio mapcount(s). It does not consider: > + * (1) If the folio might get mapped in the (near) future (e.g., swapcac= he, > + * pagecache, temporary unmapping for migration). > + * (2) If the folio is mapped differently (VM_PFNMAP). > + * (3) If hugetlb page table sharing applies. Callers might want to chec= k > + * hugetlb_pmd_shared(). > + * > + * Return: Whether the folio is estimated to be mapped into more than on= e MM. > */ > -static inline int folio_estimated_sharers(struct folio *folio) > +static inline bool folio_likely_mapped_shared(struct folio *folio) > { > - return page_mapcount(folio_page(folio, 0)); > + return page_mapcount(folio_page(folio, 0)) > 1; > } > > #ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 50d146eb248ff..4d10904fef70c 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1829,7 +1829,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, = struct vm_area_struct *vma, > * If other processes are mapping this folio, we couldn't discard > * the folio unless they all do MADV_FREE so let's skip the folio= . > */ > - if (folio_estimated_sharers(folio) !=3D 1) > + if (folio_likely_mapped_shared(folio)) > goto out; > > if (!folio_trylock(folio)) > diff --git a/mm/madvise.c b/mm/madvise.c > index 44a498c94158c..32a534d200219 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -366,7 +366,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *p= md, > folio =3D pfn_folio(pmd_pfn(orig_pmd)); > > /* Do not interfere with other mappings of this folio */ > - if (folio_estimated_sharers(folio) !=3D 1) > + if (folio_likely_mapped_shared(folio)) > goto huge_unlock; > > if (pageout_anon_only_filter && !folio_test_anon(folio)) > @@ -453,7 +453,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *p= md, > if (folio_test_large(folio)) { > int err; > > - if (folio_estimated_sharers(folio) > 1) > + if (folio_likely_mapped_shared(folio)) > break; > if (pageout_anon_only_filter && !folio_test_anon(= folio)) > break; > @@ -677,7 +677,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigne= d long addr, > if (folio_test_large(folio)) { > int err; > > - if (folio_estimated_sharers(folio) !=3D 1) > + if (folio_likely_mapped_shared(folio)) > break; > if (!folio_trylock(folio)) > break; > diff --git a/mm/memory.c b/mm/memory.c > index 1c45b6a42a1b9..8394a9843ca06 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5173,7 +5173,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf= ) > * Flag if the folio is shared between multiple address spaces. T= his > * is later used when determining whether to group tasks together > */ > - if (folio_estimated_sharers(folio) > 1 && (vma->vm_flags & VM_SHA= RED)) > + if (folio_likely_mapped_shared(folio) && (vma->vm_flags & VM_SHAR= ED)) > flags |=3D TNF_SHARED; > > nid =3D folio_nid(folio); > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index f60b4c99f1302..0b92fde395182 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -642,12 +642,11 @@ static int queue_folios_hugetlb(pte_t *pte, unsigne= d long hmask, > * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared fo= lio. > * Choosing not to migrate a shared folio is not counted as a fai= lure. > * > - * To check if the folio is shared, ideally we want to make sure > - * every page is mapped to the same process. Doing that is very > - * expensive, so check the estimated sharers of the folio instead= . > + * See folio_likely_mapped_shared() on possible imprecision when = we > + * cannot easily detect if a folio is shared. > */ > if ((flags & MPOL_MF_MOVE_ALL) || > - (folio_estimated_sharers(folio) =3D=3D 1 && !hugetlb_pmd_shar= ed(pte))) > + (!folio_likely_mapped_shared(folio) && !hugetlb_pmd_shared(pt= e))) > if (!isolate_hugetlb(folio, qp->pagelist)) > qp->nr_failed++; > unlock: > @@ -1032,11 +1031,10 @@ static bool migrate_folio_add(struct folio *folio= , struct list_head *foliolist, > * Unless MPOL_MF_MOVE_ALL, we try to avoid migrating a shared fo= lio. > * Choosing not to migrate a shared folio is not counted as a fai= lure. > * > - * To check if the folio is shared, ideally we want to make sure > - * every page is mapped to the same process. Doing that is very > - * expensive, so check the estimated sharers of the folio instead= . > + * See folio_likely_mapped_shared() on possible imprecision when = we > + * cannot easily detect if a folio is shared. > */ > - if ((flags & MPOL_MF_MOVE_ALL) || folio_estimated_sharers(folio) = =3D=3D 1) { > + if ((flags & MPOL_MF_MOVE_ALL) || !folio_likely_mapped_shared(fol= io)) { > if (folio_isolate_lru(folio)) { > list_add_tail(&folio->lru, foliolist); > node_stat_mod_folio(folio, > diff --git a/mm/migrate.c b/mm/migrate.c > index 73a052a382f13..35d376969f8b9 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2568,11 +2568,11 @@ int migrate_misplaced_folio(struct folio *folio, = struct vm_area_struct *vma, > /* > * Don't migrate file folios that are mapped in multiple processe= s > * with execute permissions as they are probably shared libraries= . > - * To check if the folio is shared, ideally we want to make sure > - * every page is mapped to the same process. Doing that is very > - * expensive, so check the estimated mapcount of the folio instea= d. > + * > + * See folio_likely_mapped_shared() on possible imprecision when = we > + * cannot easily detect if a folio is shared. > */ > - if (folio_estimated_sharers(folio) !=3D 1 && folio_is_file_lru(fo= lio) && > + if (folio_likely_mapped_shared(folio) && folio_is_file_lru(folio)= && > (vma->vm_flags & VM_EXEC)) > goto out; > > -- > 2.43.2 >