From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 723F9C6FA8E for ; Thu, 2 Mar 2023 16:49:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E820F6B0071; Thu, 2 Mar 2023 11:49:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E31456B0073; Thu, 2 Mar 2023 11:49:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD3236B0074; Thu, 2 Mar 2023 11:49:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BD74D6B0071 for ; Thu, 2 Mar 2023 11:49:49 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 78347C0DDF for ; Thu, 2 Mar 2023 16:49:49 +0000 (UTC) X-FDA: 80524544898.28.7B3D3DB Received: from mail-yw1-f176.google.com (mail-yw1-f176.google.com [209.85.128.176]) by imf23.hostedemail.com (Postfix) with ESMTP id 96AD8140017 for ; Thu, 2 Mar 2023 16:49:47 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Qkt6qD7U; spf=pass (imf23.hostedemail.com: domain of jthoughton@google.com designates 209.85.128.176 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677775787; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OtSkf5MF6MTtXsT3MyqpoFulHtBnaDyIDTTRZHqKdEg=; b=4Q1crtgB68UyDBu5xEKQm4GIqIcQ0ZFhKXZE+U/quX+Q1gwTx29lpFki9zP0mxIhqjhtYH oIESFJmgEKVzkSEqkmvdXFbzBj+k84TXcAq/CwYG9A/d2ndk+p9RUxzhhm1sOqgmg8ovn9 jOh7AWkkftIlCOp/aJvFABwQJR80OdA= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Qkt6qD7U; spf=pass (imf23.hostedemail.com: domain of jthoughton@google.com designates 209.85.128.176 as permitted sender) smtp.mailfrom=jthoughton@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677775787; a=rsa-sha256; cv=none; b=Th+LaFgZTInLE47SnO9scV6Ru/yo6VmgGgtu3nvoYkQBXkkpNkgq8gJur4JU3ar79J4lW0 bYhWYKHLa0Nzil0givVI4zWxiau3utC9Z3ssytWcJBDIWe7J9ULs2x+4gPufC80oXBFG3l +R0hTjKLgtwOyA3vp+82Ldw/UJClZH8= Received: by mail-yw1-f176.google.com with SMTP id 00721157ae682-536bbe5f888so437470997b3.8 for ; Thu, 02 Mar 2023 08:49:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1677775786; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OtSkf5MF6MTtXsT3MyqpoFulHtBnaDyIDTTRZHqKdEg=; b=Qkt6qD7U6SMcPoNh/gIVhtkQqSf11lqlzj7v6xkeI+kf9sY5A5gAL802rSyqyWxrkO VZN/PcYlCtMC+hoEt0jAJD3cJVGou0xklHO3TbxnH8S9ZX4O38pGTUd2BkAGovZz0llw BTnIbTHaZEWrdU+mYVqFRAt7lA5gWIjv6ImvE+THTDj4oiuFE3Xy615eyTCFJoIFHDLE FukfFMO4XZktHKqm9wh5jm6DADehnhux2O6TMIBZXng6TMTyC3p7FKb9kS5yWaKFQrri BBGDTDfy+4VMtelRjCTD6wlOpPrDflWfVY9m7QNYgFUzRVExSxhAxvuJOY4hJaKjp6hc 8Rjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677775786; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OtSkf5MF6MTtXsT3MyqpoFulHtBnaDyIDTTRZHqKdEg=; b=2XlhOOuGD9tjx+rzLFHxt50obxsUDUB6CNGrqqCgECsNVBX347dcZmoFxPxCxLpkWP 9cJv8mRoASVzeqQFuZ+75xO/FTI10b8jwljY0Y1bc4VaDuzehG6MAgXrOW36GSr6cswK pm4AXd5V4TwZxX4QpNT6kQ7kvfMcomW2MtcIb7tc30+dRDkjbzQ1Kl3lH8QdlCKDs1df fGkOAG+KJWuaFd7GqLNfLBvLtdhjMRUecQJTV6uHw1tBUvoaCSlrgPTecs0h7nHTnm9C NKDczUQJS26rZ7jzbj+pKoTxZ4ACRFkvmYpQ0ADmDom/Eo6x1JsHSUomY5z5OlakEdBW 4wgQ== X-Gm-Message-State: AO0yUKUEz9YFJl/fYCOwGDp9UCYGpLb4Nz3SdD5Am8V4CDnd1x//e5tx ZM24dNo5FXCUFoQwdaqDzwAFLOLEHS/ogKpEY9tYThFYGOt0g0CaxnA= X-Google-Smtp-Source: AK7set+hvx7Wgjhea8R3CjbZJvcN48WjNh85ZCPVU1R1z18VqHVsjHrZVzO1z2+9J+5A1QcZSY228YzFS8mkyWMbIZ4= X-Received: by 2002:a17:903:2591:b0:19a:8bc7:d814 with SMTP id jb17-20020a170903259100b0019a8bc7d814mr4028466plb.13.1677775461664; Thu, 02 Mar 2023 08:44:21 -0800 (PST) MIME-Version: 1.0 References: <20230218002819.1486479-1-jthoughton@google.com> <20230218002819.1486479-6-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Thu, 2 Mar 2023 08:43:43 -0800 Message-ID: Subject: Re: [PATCH v2 05/46] rmap: hugetlb: switch from page_dup_file_rmap to page_add_file_rmap To: Jiaqi Yan Cc: Mike Kravetz , Muchun Song , Peter Xu , Andrew Morton , David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Frank van der Linden , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 96AD8140017 X-Stat-Signature: ex6srnori6kx41d5ftwn71kgm9e4k8gs X-Rspam-User: X-HE-Tag: 1677775787-888597 X-HE-Meta: U2FsdGVkX1+yqLo2iaf21SNQGjDffo1k1q1x8T1gmMVnsca53ZCvDjJG9cG6Sh/Mkpz5isL7/JsVliq7aWQyLpbQG3DmV2SCmquLJhF+deIXgbhUkPXt8TbS9b7Bimo6A4BaI3yR25PeMMXbf8pD9K3fyvwULa8/2JItEYNS85bqHMci0j9QeodQSckpRdq3AeCoN06OnSnm2RDi7W64PU3jgVxyaLEg1dvxAYRuab9GSjejZKoUVdkbErnCk9MZhSZyeQahtCIXNSNlgTryH2bFyVzETMWZZi28N17iietZ5ud3V5TwyxP6r7Duk7gM5SliQz9aV56mrDOfVq9v4wL/zT/0Mgo3R5F8U94x/7c7mGGT4oRHUxszbDGzdsJFTadYRni+p0KP6dIre4CWpMhe0OV9ivuaaj9+2kDWzAwJpzGSb75A7vZul73Oj0kPiT9te5WJ4PCG7ftt0GRrrnVO4b2ceQ0jv2DIVGbl9tyGFOL6eGqTAprOC/0NUWLpOrJTXlEvmrSQRP/thZgpBm7itdhOZtDrKDfqfC4OZbFMZ1VsRVXsVqX1aqIYXkvbLDE1sX1TplQlNp3fdj+K6Lz0ruU6sMd++nHg0dV6lQYHzrFdS3m1bMKH77gdms2xnaGnqaYqwEwIvopVaoRAYSb+5J8o3MPq0ZAUvpFMXtGl2JQV4NrHljBlYiTYvONBAnjwinFnbKdgQMUlLZMVHZ/yMBzEtuDXNIaBEAtrhl0i/XaLdfvk3mQguC+o6vjRbAXpNp/xcCWxrVCYs/ELRL/zu6lPHkp1KXr7oJWmLHQ1DZEfEeLlvfA3qXJqTDSseh4DrS+w8lXF2niVokNtfghWgzHtIwyIh4XtjoDtKqsKYrsmsJgsuloIXmo5PkGVqCK6lB64UtqV3Gq3Sg+R2wWXxZtqJwRrL0HkD1aNKsezD56o0KojpkvypDBA+W7pBDb2/kuxaV28Z9Nglrb yRloxV5F zoa4c5ncZWx7nSVLiO0D31UQrndRVDPZQ18ZlxEoI4WW/3We5rxY4ARTQ9dIIH2fgMTEjLSMDn7XmWJBwUhMzRDuupXrMG679vcLimPQQoNIC2AeyhBxeqdCBzpiLC0HLZpyewvBfMp41uCBTF1B76EB6YlxSBXGg6A9b0JV0xJph6WegU29DIl0wfSDE2b82A1hn0UHgriVhv/XPRL6sOS38b8KBkX4ll7chP3Jv1ekAPTdgSUFGs681QFRJcqQKncJbxzzNFXvNs/8FMpd7Sxr5VIn8ueSANUZINbf+Y/Cj5cr1375o2Nrlo6aa5N/+josz X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 2, 2023 at 7:44=E2=80=AFAM James Houghton wrote: > > On Wed, Mar 1, 2023 at 5:06=E2=80=AFPM Jiaqi Yan wr= ote: > > > > On Fri, Feb 17, 2023 at 4:28 PM James Houghton = wrote: > > > > > > This only applies to file-backed HugeTLB, and it should be a no-op un= til > > > high-granularity mapping is possible. Also update page_remove_rmap to > > > support the eventual case where !compound && folio_test_hugetlb(). > > > > > > HugeTLB doesn't use LRU or mlock, so we avoid those bits. This also > > > means we don't need to use subpage_mapcount; if we did, it would > > > overflow with only a few mappings. > > This is wrong; I guess I misunderstood the code when I wrote this > commit. subpages_mapcount (now called _nr_pages_mapped) won't overflow > (unless HugeTLB pages could be greater than 16G). It is indeed a bug > not to update _nr_pages_mapped the same way THPs do. > > > > > > > > > There is still one caller of page_dup_file_rmap left: copy_present_pt= e, > > > and it is always called with compound=3Dfalse in this case. > > > > > > Signed-off-by: James Houghton > > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index 08004371cfed..6c008c9de80e 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -5077,7 +5077,7 @@ int copy_hugetlb_page_range(struct mm_struct *d= st, struct mm_struct *src, > > > * sleep during the process. > > > */ > > > if (!PageAnon(ptepage)) { > > > - page_dup_file_rmap(ptepage, true); > > > + page_add_file_rmap(ptepage, src_vma, = true); > > > } else if (page_try_dup_anon_rmap(ptepage, tr= ue, > > > src_vma)) { > > > pte_t src_pte_old =3D entry; > > > @@ -5910,7 +5910,7 @@ static vm_fault_t hugetlb_no_page(struct mm_str= uct *mm, > > > if (anon_rmap) > > > hugepage_add_new_anon_rmap(folio, vma, haddr); > > > else > > > - page_dup_file_rmap(&folio->page, true); > > > + page_add_file_rmap(&folio->page, vma, true); > > > new_pte =3D make_huge_pte(vma, &folio->page, ((vma->vm_flags = & VM_WRITE) > > > && (vma->vm_flags & VM_SHARED))); > > > /* > > > @@ -6301,7 +6301,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *= dst_mm, > > > goto out_release_unlock; > > > > > > if (folio_in_pagecache) > > > - page_dup_file_rmap(&folio->page, true); > > > + page_add_file_rmap(&folio->page, dst_vma, true); > > > else > > > hugepage_add_new_anon_rmap(folio, dst_vma, dst_addr); > > > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > > index d3964c414010..b0f87f19b536 100644 > > > --- a/mm/migrate.c > > > +++ b/mm/migrate.c > > > @@ -254,7 +254,7 @@ static bool remove_migration_pte(struct folio *fo= lio, > > > hugepage_add_anon_rmap(new, vma, pvmw= .address, > > > rmap_flags); > > > else > > > - page_dup_file_rmap(new, true); > > > + page_add_file_rmap(new, vma, true); > > > set_huge_pte_at(vma->vm_mm, pvmw.address, pvm= w.pte, pte); > > > } else > > > #endif > > > diff --git a/mm/rmap.c b/mm/rmap.c > > > index 15ae24585fc4..c010d0af3a82 100644 > > > --- a/mm/rmap.c > > > +++ b/mm/rmap.c > > > > Given you are making hugetlb's ref/mapcount mechanism to be consistent > > with THP, I think the special folio_test_hugetlb checks you added in > > this commit will break page_mapped() and folio_mapped() if page/folio > > is HGMed. With these checks, folio->_nr_pages_mapped are not properly > > increased/decreased. > > Thank you, Jiaqi! I didn't realize I broke > folio_mapped()/page_mapped(). The end result is that page_mapped() may > report that an HGMed page isn't mapped when it is. Not good! > > > > > > @@ -1318,21 +1318,21 @@ void page_add_file_rmap(struct page *page, st= ruct vm_area_struct *vma, > > > int nr =3D 0, nr_pmdmapped =3D 0; > > > bool first; > > > > > > - VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page); > > > + VM_BUG_ON_PAGE(compound && !PageTransHuge(page) > > > + && !folio_test_hugetlb(folio), page); > > > > > > /* Is page being mapped by PTE? Is this its first map to be a= dded? */ > > > if (likely(!compound)) { > > > first =3D atomic_inc_and_test(&page->_mapcount); > > > nr =3D first; > > > - if (first && folio_test_large(folio)) { > > > + if (first && folio_test_large(folio) > > > + && !folio_test_hugetlb(folio)) { > > > > So we should still increment _nr_pages_mapped for hugetlb case here, > > and decrement in the corresponding place in page_remove_rmap. > > > > > nr =3D atomic_inc_return_relaxed(mapped); > > > nr =3D (nr < COMPOUND_MAPPED); > > > } > > > - } else if (folio_test_pmd_mappable(folio)) { > > > - /* That test is redundant: it's for safety or to opti= mize out */ > > > - > > > + } else { > > > first =3D atomic_inc_and_test(&folio->_entire_mapcoun= t); > > > - if (first) { > > > + if (first && !folio_test_hugetlb(folio)) { > > > > Same here: we should still increase _nr_pages_mapped by > > COMPOUND_MAPPED and decrease by COMPOUND_MAPPED in the corresponding > > place in page_remove_rmap. > > > > > nr =3D atomic_add_return_relaxed(COMPOUND_MAP= PED, mapped); > > > if (likely(nr < COMPOUND_MAPPED + COMPOUND_MA= PPED)) { > > > nr_pmdmapped =3D folio_nr_pages(folio= ); > > > @@ -1347,6 +1347,9 @@ void page_add_file_rmap(struct page *page, stru= ct vm_area_struct *vma, > > > } > > > } > > > > > > + if (folio_test_hugetlb(folio)) > > > + return; > > > + > > > if (nr_pmdmapped) > > > __lruvec_stat_mod_folio(folio, folio_test_swapbacked(= folio) ? > > > NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pm= dmapped); > > > @@ -1376,8 +1379,7 @@ void page_remove_rmap(struct page *page, struct= vm_area_struct *vma, > > > VM_BUG_ON_PAGE(compound && !PageHead(page), page); > > > > > > /* Hugetlb pages are not counted in NR_*MAPPED */ > > > - if (unlikely(folio_test_hugetlb(folio))) { > > > - /* hugetlb pages are always mapped with pmds */ > > > + if (unlikely(folio_test_hugetlb(folio)) && compound) { > > > atomic_dec(&folio->_entire_mapcount); > > > return; > > > } > > > > This entire if-block should be removed after you remove the > > !folio_test_hugetlb checks in page_add_file_rmap. > > This is the not-so-obvious change that is needed. Thank you! > > > > > > @@ -1386,15 +1388,14 @@ void page_remove_rmap(struct page *page, stru= ct vm_area_struct *vma, > > > if (likely(!compound)) { > > > last =3D atomic_add_negative(-1, &page->_mapcount); > > > nr =3D last; > > > - if (last && folio_test_large(folio)) { > > > + if (last && folio_test_large(folio) > > > + && !folio_test_hugetlb(folio)) { > > > > ditto. > > > > > nr =3D atomic_dec_return_relaxed(mapped); > > > nr =3D (nr < COMPOUND_MAPPED); > > > } > > > - } else if (folio_test_pmd_mappable(folio)) { > > > - /* That test is redundant: it's for safety or to opti= mize out */ > > > - > > > + } else { > > > last =3D atomic_add_negative(-1, &folio->_entire_mapc= ount); > > > - if (last) { > > > + if (last && !folio_test_hugetlb(folio)) { > > > > ditto. > > I agree with all of your suggestions. Testing with the hugetlb-hgm > selftest, nothing seems to break. :) > > Given that this is at least the third or fourth major bug in this > version of the series, I'll go ahead and send a v3 sooner rather than > later. This solution limits the size of a HugeTLB page to 16G. I'm not sure if there are any architectures that support HugeTLB pages larger than 16G (it looks like powerpc supports 16G). If they do, I think we can just increase the value of COMPOUND_MAPPED. If that's not possible, we would need another solution than participating in _nr_pages_mapped like THPs.