From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5D6BC4345F for ; Fri, 12 Apr 2024 18:30:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4D8E16B0083; Fri, 12 Apr 2024 14:30:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 487866B0092; Fri, 12 Apr 2024 14:30:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 328486B0095; Fri, 12 Apr 2024 14:30:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1532D6B0083 for ; Fri, 12 Apr 2024 14:30:09 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C914D1A0F65 for ; Fri, 12 Apr 2024 18:30:08 +0000 (UTC) X-FDA: 82001719296.06.4D6FB31 Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) by imf14.hostedemail.com (Postfix) with ESMTP id E6DBA100019 for ; Fri, 12 Apr 2024 18:30:06 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GaSjSRYM; spf=pass (imf14.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712946607; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hD2nD0XariWODin1huWBZbgRGq+Hrz+B6uxbj5nmPoA=; b=hYLDefb88XvsUM0vlaOm3/3//VR86GFHpmIEBZplpuvuL7P0XmrvKY1b1Vlh/XQg6VQBnr o+ftyzkDLnCIw19DNvoTcgP5zgy3fOSN9lOlRE3ywXRLxgPAqmqgHxZ4EVT88ypH13aqxJ FkM6gHESYcx26ChC7HlgWZw78KTG9QQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712946607; a=rsa-sha256; cv=none; b=t7xh9Sj8YE7LfVxL6B6wJ1Dmjr3HO0bJupyDVibniEbsZ/TyloEfebJ2/tjQ6baK8oo+dX KrEXBxVttIe5KC1M8wkCdRozWqr62LZYuNwdXBCB+PzW17IW5a8KSracPdrore4BcV7w/c 2mJWGqEuw7yLJe7fBB3TyYr+Cx/v+sg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GaSjSRYM; spf=pass (imf14.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.46 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-56fd7df9ea9so1409236a12.0 for ; Fri, 12 Apr 2024 11:30:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712946605; x=1713551405; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hD2nD0XariWODin1huWBZbgRGq+Hrz+B6uxbj5nmPoA=; b=GaSjSRYMP9fS2DcfdGFLZ9KwhiGHtJFu2/Cm4YsLn35wLMUgAsaljJjSP4oQyyClZz 54JCEyRWKgGb99CZ0q+TwfC5lw8yGAcFGRswLMzMJWX4AWeqQAyGSwP0fED4ppFToDcw 3j6KQgVJ/sbFxMA5yyGAN2zNfgKi7b2PoeS2C6wPdjquzs7/roEnHO7BDYt+bwXotzV/ 9+0DPCKNr0eqpQYomKfzCDT6wgVTdMNKmByDg9yRbYAd77vPIuJsAAsDnQLEkMnCd8IU +TqbP5xcUGSlJfln4m8EyvWyZ3Ei5RBdTsxKPvcFAkuZRXFeeBg6o0AkJ8XJm2+XV4+8 0R/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712946605; x=1713551405; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hD2nD0XariWODin1huWBZbgRGq+Hrz+B6uxbj5nmPoA=; b=oFxDH9a/03nFLhj03iG5MY5RzUC2w6ETX+G0wVUNOC7SKOkFRGW4x2KkZa8oh7Ihek 1rjGeFA+lYmse0ihjBPY4bDOnT/jpzWu9LDUEBRsAQ3K+Tk90rrAjU9BDKzBS2d2ZgD8 t7UhHYhynWHSISbiJmp8JrY1YDPNFq4F5mEpflERFBiBVZHO4p7SD6vhRI7skJXcyx5X +j0xwrCWQqjJWUErZ7ioVwg/rmshOgeQ8Nc1L3+5kDJLEETlQ4lW3uYwQt2T2FGkUASQ 0lzImKzS+kRuss8HWtH4jnsanELvcZe1INzKbufcQLeB71OkRm2MN4kGjkiO1cO6DM+D 3Y9Q== X-Forwarded-Encrypted: i=1; AJvYcCUyFtEFk/40AVCWeepRqbzjLlAZ35ZLRhiaCv1IaPfoG17eut9a1VJq7sJ+sZdbcqYld18AsIrCpp6aOY3zATsNYYA= X-Gm-Message-State: AOJu0YzrKBex1A7V0spN7fUPiBd0yAzQucpiF51x882tshU5SvkSdigJ Zp9EAlov1zrL+N1+IJybMsdQm8xiPp+DZFA4RP3IncY621tlL+LYESe7/rCcRG+owbD67TMVwtL lnyjJrk2tXE7uAlwBaGcRY8Nvnok= X-Google-Smtp-Source: AGHT+IGDMpUjzSkY3psrdP+Q+TVgpXXBLlLrW6Om+fPGeJ5DUQVuJtaswTsWa3AEWUXE3N4x5fygVtV0hE0rXT+wkKA= X-Received: by 2002:a50:99c2:0:b0:568:7c01:a4a2 with SMTP id n2-20020a5099c2000000b005687c01a4a2mr2152393edb.13.1712946605037; Fri, 12 Apr 2024 11:30:05 -0700 (PDT) MIME-Version: 1.0 References: <20240411153232.169560-1-zi.yan@sent.com> <86722546-1d54-4224-9f31-da4f368cd47e@redhat.com> <0A4D2CA1-E156-43E9-A1C9-E09E62E760A3@nvidia.com> In-Reply-To: From: Yang Shi Date: Fri, 12 Apr 2024 11:29:53 -0700 Message-ID: Subject: Re: [PATCH] mm/rmap: do not add fully unmapped large folio to deferred split list To: Zi Yan Cc: David Hildenbrand , linux-mm@kvack.org, Andrew Morton , Matthew Wilcox , Ryan Roberts , Barry Song <21cnbao@gmail.com>, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: E6DBA100019 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: cii51qdhzaas7jq9j5qcmfz99433zmjk X-HE-Tag: 1712946606-522268 X-HE-Meta: U2FsdGVkX1/yjFRFsGwwZy9ZIdf3A/IWdIZoX/1YT6wV8DGQgngf4dspyMriSKoEv+A0VVdo8iCz2roxFSiSWqpKQ+zRnFMA3IondsMV5TYvYVyYPqpR+K05rYBQbCdvX/VI13/2x3C60Ugn7UPPAAdtu6DzhnS/IZdELE1g3CB4Gp7p9TbXMD/xIYyeezZj+nMILiReexZpKrUCisuwacur/Vfh9zEWtp2Oa2yRdi+RHnNQIm5gexf9GFG32PYJFZy9P9cOL3GP4gnhQHT4XklM/IWY3AvbTAc4T62Vffjg206EBTO50SSvWK/WzOfj2U8xHO4BHPKkZsvIgjfJP+rOff4QugdvCWDUq11QOaMTS5vr36gBIHIt81jpqts3r1AGWSXNPMJfcVanZKlXZjHV6ox2NzTOdKNDC04h2r5Vgk4DlRMpMh7WqO1bLRM4LhQ7j//ceUG+pf/i+4Z9GOH/kGAmeCTEzTI5sovlFG8XM3YbJn0+AngE9L0mkYlc7IQKVjVn0knVqkw4x+akXrn4V2QOk2UJfbeqrvfVtZg7KXOMvnFONv+4FgeUnow0ryCbX29Qzmlv9eW+XslooU5RGR99X0N+wjnBda5jXhqWK0wypaEb4PYNaVhWm5oNU6gcSxwlGIsbfC+H7g6Ketzj0gfBDBukSdYLmCGdeIF9XGT3VqYDiiKL+5KTRvWemdBA7XYGaaRTXSExGwF4OpOiaQbK032l39Opizy15Kr2i1rSW4f6A3LL0jjnaIpwUUbOaWgFl9pBa0evuX95lh8wGeeZwSsonqd6b8RJgrsgWSEx1ggezDEkVufPlwxm5id0I76hP0Q2dRC6lrLq8KvUNvHjWrLgLTzeqyahA6ua08c2GrkMvZTHMeR+YxpI0IllD/f7a2j7ozRYai4hlv+wUlO+Zp0W0ENmR0NznKXOeqAzoN2cdIiSLxNxPYsX8zkQ9rD4OrB379nKcwk 8sFj0MI+ 65H4EkdYQg/4v7B38eU36XtYslSeXlXJY5sEGTL5YPzA+rtBrhm4CoMLaK3lAzCH9iMMH7HIqyWso/qrpo9z7uvcTrvd3KnYYL0zikYT1h1vDkxnlykgqr8vIHgZPyYfuKzHJ6uA4eIzhdk+0t3L71Q9iax5t544oe4G5wyWuf2LB0rBRitMCiO+JiOYA7XUtpRwKTC1K0ONCxzS7EEUjkpPbNz3XxgL3bcVi4nBJDOAft3PhtEfrjge5bypg/9v/ooYj/uSEVZHSvF73bLzB/zntC6vY4KZGbGXIh3vPXIVLIfTsgEuXyuygX0Uvu7nwV727/nofc9hcDFzFu3OUGYCadk9hp4mYlYO97W61FACKhaLYuLRpj3Ds5QteUIDE1RmcQTBXaq6dqicQgM3sfpKNdQisZUPgXhSYQaTlPZzw1F0pUycSH6EQDE4Uh3RnbX4SpvI0gG4dPpt+F+su6PMiCOAQ2ucqBsXUf5PMUwQTUPj2pq3TTiCSBHL8+KsdSptfoaiNlBwLAfTEgtRrOdfQvvBxAP+Vn6TO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 12, 2024 at 7:31=E2=80=AFAM Zi Yan wrote: > > On 12 Apr 2024, at 10:21, Zi Yan wrote: > > > On 11 Apr 2024, at 17:59, Yang Shi wrote: > > > >> On Thu, Apr 11, 2024 at 2:15=E2=80=AFPM David Hildenbrand wrote: > >>> > >>> On 11.04.24 21:01, Yang Shi wrote: > >>>> On Thu, Apr 11, 2024 at 8:46=E2=80=AFAM David Hildenbrand wrote: > >>>>> > >>>>> On 11.04.24 17:32, Zi Yan wrote: > >>>>>> From: Zi Yan > >>>>>> > >>>>>> In __folio_remove_rmap(), a large folio is added to deferred split= list > >>>>>> if any page in a folio loses its final mapping. It is possible tha= t > >>>>>> the folio is unmapped fully, but it is unnecessary to add the foli= o > >>>>>> to deferred split list at all. Fix it by checking folio mapcount b= efore > >>>>>> adding a folio to deferred split list. > >>>>>> > >>>>>> Signed-off-by: Zi Yan > >>>>>> --- > >>>>>> mm/rmap.c | 9 ++++++--- > >>>>>> 1 file changed, 6 insertions(+), 3 deletions(-) > >>>>>> > >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c > >>>>>> index 2608c40dffad..d599a772e282 100644 > >>>>>> --- a/mm/rmap.c > >>>>>> +++ b/mm/rmap.c > >>>>>> @@ -1494,7 +1494,7 @@ static __always_inline void __folio_remove_r= map(struct folio *folio, > >>>>>> enum rmap_level level) > >>>>>> { > >>>>>> atomic_t *mapped =3D &folio->_nr_pages_mapped; > >>>>>> - int last, nr =3D 0, nr_pmdmapped =3D 0; > >>>>>> + int last, nr =3D 0, nr_pmdmapped =3D 0, mapcount =3D 0; > >>>>>> enum node_stat_item idx; > >>>>>> > >>>>>> __folio_rmap_sanity_checks(folio, page, nr_pages, level); > >>>>>> @@ -1506,7 +1506,8 @@ static __always_inline void __folio_remove_r= map(struct folio *folio, > >>>>>> break; > >>>>>> } > >>>>>> > >>>>>> - atomic_sub(nr_pages, &folio->_large_mapcount); > >>>>>> + mapcount =3D atomic_sub_return(nr_pages, > >>>>>> + &folio->_large_mapcount= ) + 1; > >>>>> > >>>>> That becomes a new memory barrier on some archs. Rather just re-rea= d it > >>>>> below. Re-reading should be fine here. > >>>>> > >>>>>> do { > >>>>>> last =3D atomic_add_negative(-1, &page->_ma= pcount); > >>>>>> if (last) { > >>>>>> @@ -1554,7 +1555,9 @@ static __always_inline void __folio_remove_r= map(struct folio *folio, > >>>>>> * is still mapped. > >>>>>> */ > >>>>>> if (folio_test_large(folio) && folio_test_anon(foli= o)) > >>>>>> - if (level =3D=3D RMAP_LEVEL_PTE || nr < nr_p= mdmapped) > >>>>>> + if ((level =3D=3D RMAP_LEVEL_PTE && > >>>>>> + mapcount !=3D 0) || > >>>>>> + (level =3D=3D RMAP_LEVEL_PMD && nr < nr_= pmdmapped)) > >>>>>> deferred_split_folio(folio); > >>>>>> } > >>>>> > >>>>> But I do wonder if we really care? Usually the folio will simply ge= t > >>>>> freed afterwards, where we simply remove it from the list. > >>>>> > >>>>> If it's pinned, we won't be able to free or reclaim, but it's rathe= r a > >>>>> corner case ... > >>>>> > >>>>> Is it really worth the added code? Not convinced. > >>>> > >>>> It is actually not only an optimization, but also fixed the broken > >>>> thp_deferred_split_page counter in /proc/vmstat. > >>>> > >>>> The counter actually counted the partially unmapped huge pages (so > >>>> they are on deferred split queue), but it counts the fully unmapped > >>>> mTHP as well now. For example, when a 64K THP is fully unmapped, the > >>>> thp_deferred_split_page is not supposed to get inc'ed, but it does > >>>> now. > >>>> > >>>> The counter is also useful for performance analysis, for example, > >>>> whether a workload did a lot of partial unmap or not. So fixing the > >>>> counter seems worthy. Zi Yan should have mentioned this in the commi= t > >>>> log. > >>> > >>> Yes, all that is information that is missing from the patch descripti= on. > >>> If it's a fix, there should be a "Fixes:". > >>> > >>> Likely we want to have a folio_large_mapcount() check in the code bel= ow. > >>> (I yet have to digest the condition where this happens -- can we have= an > >>> example where we'd use to do the wrong thing and now would do the rig= ht > >>> thing as well?) > >> > >> For example, map 1G memory with 64K mTHP, then unmap the whole 1G or > >> some full 64K areas, you will see thp_deferred_split_page increased, > >> but it shouldn't. > >> > >> It looks __folio_remove_rmap() incorrectly detected whether the mTHP > >> is fully unmapped or partially unmapped by comparing the number of > >> still-mapped subpages to ENTIRELY_MAPPED, which should just work for > >> PMD-mappable THP. > >> > >> However I just realized this problem was kind of workaround'ed by comm= it: > >> > >> commit 98046944a1597f3a02b792dbe9665e9943b77f28 > >> Author: Baolin Wang > >> Date: Fri Mar 29 14:59:33 2024 +0800 > >> > >> mm: huge_memory: add the missing folio_test_pmd_mappable() for THP > >> split statistics > >> > >> Now the mTHP can also be split or added into the deferred list, so= add > >> folio_test_pmd_mappable() validation for PMD mapped THP, to avoid > >> confusion with PMD mapped THP related statistics. > >> > >> Link: https://lkml.kernel.org/r/a5341defeef27c9ac7b85c97f030f93e43= 68bbc1.1711694852.git.baolin.wang@linux.alibaba.com > >> Signed-off-by: Baolin Wang > >> Acked-by: David Hildenbrand > >> Cc: Muchun Song > >> Signed-off-by: Andrew Morton > >> > >> This commit made thp_deferred_split_page didn't count mTHP anymore, it > >> also made thp_split_page didn't count mTHP anymore. > >> > >> However Zi Yan's patch does make the code more robust and we don't > >> need to worry about the miscounting issue anymore if we will add > >> deferred_split_page and split_page counters for mTHP in the future. > > > > Actually, the patch above does not fix everything. A fully unmapped > > PTE-mapped order-9 THP is also added to deferred split list and > > counted as THP_DEFERRED_SPLIT_PAGE without my patch, since nr is 512 > > (non zero), level is RMAP_LEVEL_PTE, and inside deferred_split_folio() > > the order-9 folio is folio_test_pmd_mappable(). > > > > I will add this information in the next version. > > It might > Fixes: b06dc281aa99 ("mm/rmap: introduce folio_remove_rmap_[pte|ptes|pmd]= ()"), > but before this commit fully unmapping a PTE-mapped order-9 THP still inc= reased > THP_DEFERRED_SPLIT_PAGE, because PTEs are unmapped individually and first= PTE > unmapping adds the THP into the deferred split list. This means commit b0= 6dc281aa99 > did not change anything and before that THP_DEFERRED_SPLIT_PAGE increase = is > due to implementation. I will add this to the commit log as well without = Fixes > tag. Thanks for digging deeper. The problem may be not that obvious before mTHP because PMD-mappable THP is converted to PTE-mapped due to partial unmap in most cases. But mTHP is always PTE-mapped in the first place. The other reason is batched rmap remove was not supported before David's optimization. Now we do have reasonable motivation to make it precise and it is also easier to do so than before. > > > -- > Best Regards, > Yan, Zi