From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87102C4345F for ; Thu, 11 Apr 2024 21:59:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1916B6B0092; Thu, 11 Apr 2024 17:59:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 142296B0093; Thu, 11 Apr 2024 17:59:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 008756B0096; Thu, 11 Apr 2024 17:59:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D8AE66B0092 for ; Thu, 11 Apr 2024 17:59:21 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 710321A0D07 for ; Thu, 11 Apr 2024 21:59:21 +0000 (UTC) X-FDA: 81998617722.22.F92005E Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54]) by imf09.hostedemail.com (Postfix) with ESMTP id 8F02F14000F for ; Thu, 11 Apr 2024 21:59:19 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HbTUHMvo; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of shy828301@gmail.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712872759; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ATjB4IMVnw3WXwF1N9IZiesyfgmls8vjWPpfZttEQvA=; b=PYBjg9qwetctJjoj9FVSSQ6S6T/82UZWmt+dgv/MNzWbIUpJfzMqzwV+MVDfVG9cxe35qs Mk+qvPMoxTlHPhleA7F8nm7XfA06Uh7SYeVJ9Y6WjRSf15YpNwIBlSgQL7leOfrLnRKTHX Ax4O+lxa2BjVtb/nAAHwbtyowQxJstw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HbTUHMvo; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of shy828301@gmail.com designates 209.85.167.54 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712872759; a=rsa-sha256; cv=none; b=Kux0M9Xbjw4NfmS2H3y1t71jdPdG48b1p/wTh7vuFofDAyOJSlYoWT+O3RKQlpeOqA0y4W YlhYdt41KIwgzWxoztFNiGi1qi88mX/XaR3QW1pXuGP488pQxacD9IFKrvReUMe48GV6Ia ejRSFZ63KoTNU+Z9L+OZA8P5NstSjM0= Received: by mail-lf1-f54.google.com with SMTP id 2adb3069b0e04-516d04fc04bso444057e87.2 for ; Thu, 11 Apr 2024 14:59:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712872758; x=1713477558; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ATjB4IMVnw3WXwF1N9IZiesyfgmls8vjWPpfZttEQvA=; b=HbTUHMvo239OsUsVX8kiWRRRU4S6p5Q73bjVMKy6Z453nwIQBTGd0+myZYfKWF8+VI quRqIDYTNTWQcaL072uPS6xd1wjYwapcQNh9584JBMS5oWxDsiM85Ur9BhZs4fTdAnj+ BqhyenS9vnaeCFanDsG4GQYmflwVrvc4oSjRx74rwe/AbefjWQZbva1CnHL6D1DvH3gB L94+PWb0+yE3vl8egSSl9VBjF4p+IOaAmwliLP9jykRW25+KMl5tJeCtF9oMsflW3gs5 gIfIxB2mIfYEDU2XHDgbVy6QaSAF01tOkLOpq5EvA/PZd/zW7pV1Jnv3fsSvKEKouqn2 Lobg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712872758; x=1713477558; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ATjB4IMVnw3WXwF1N9IZiesyfgmls8vjWPpfZttEQvA=; b=FWfP8bEQz+UwudDZPsctGH0pTbG23ZAyoYZbqvs0WWFdh/Z0sXRY9hZMvItuBQsXyC A57FOMvIny7Z72p7Ufv/asj1VZ8H9Xj9eIoA5sisA7EVQ7VKbxaV9vG/RJcb/PY0CGI3 t4gv0HZrqgR0jCKwyxRLyHN+Arfa15LNzhgH/IJSluSM4KKrGRRwcUUyJyNb2VvWfS69 MrILl+0T/3cVqkmCxzxboGm5XcdBvzqEI3ilCBcMwAG6XFj5EbxxF0D/a7WcSMAB8kY3 hkj9IOOyAsWTK7FZGpVjTq7GpbOHcGqZhLe6x9jrD+WKgzdJ7HOwoaBJut9IbdnQOqEL 1sQQ== X-Forwarded-Encrypted: i=1; AJvYcCWO7M9q08FtFzOjxryrjCufhFcleT1aaJmE8hx3L57fRJ/Wcd2cm+3cTX4BIfvoiP5pK9LoyEhOb7H03EP+GqGm03k= X-Gm-Message-State: AOJu0YzDGcilBOs9s0c6UDnBL/9eCM8RPP6n1Yqky9ZzTMxL2M+9aQI3 Wy1ReeoezuDA9PlcQP6Mmnsn4P9d1FGaM0lp2rgXuTVL6b/86cNS5kRVpI0XmbcaN7K96UCY+u0 KXXGp0azWId4J5Y98O5U5hXCRoe0= X-Google-Smtp-Source: AGHT+IGXq3eeVHYDDYQQGHpXeCy56vyTljtgTLUXn3P7uFG8swKtVBa66Qk7DjBVP3Cn4ENsWxWsp6BzqPcoBSot6qI= X-Received: by 2002:ac2:4dad:0:b0:516:d09b:cbe4 with SMTP id h13-20020ac24dad000000b00516d09bcbe4mr704337lfe.53.1712872757355; Thu, 11 Apr 2024 14:59:17 -0700 (PDT) MIME-Version: 1.0 References: <20240411153232.169560-1-zi.yan@sent.com> <86722546-1d54-4224-9f31-da4f368cd47e@redhat.com> In-Reply-To: <86722546-1d54-4224-9f31-da4f368cd47e@redhat.com> From: Yang Shi Date: Thu, 11 Apr 2024 14:59:05 -0700 Message-ID: Subject: Re: [PATCH] mm/rmap: do not add fully unmapped large folio to deferred split list To: David Hildenbrand Cc: Zi Yan , linux-mm@kvack.org, Andrew Morton , "Matthew Wilcox (Oracle)" , Ryan Roberts , Barry Song <21cnbao@gmail.com>, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: g4wooo7fbxgyejzmt3habpqr7dqh151x X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8F02F14000F X-HE-Tag: 1712872759-238244 X-HE-Meta: U2FsdGVkX1/Pn98xWcbA1z77Rnqizzo5ySgMsHKiJQ9B6BFVQSAbh3JWgMhd/C2PMuUfEerGj5lX/5hZx24SuGYwubTFMzMyQ4hm+q0/J4a7S08ICzovdKIRzldQOeqT0Vokp7curNih/PtgfjAZxG3pfKlbq7c6hSN10ADDqnJBstEP+IgVu45Pyg55tgzlIeT0iLBXcFixWNHpM8eJO35SiDoOxMCLhT4r0boZ6tRCtLrwS/71h7CuqY1Dk26ymqsP3ZlS9YeOfpfAdOQ13XIwXNM+ASLKavm61YDQ6kINXKZYQmGNThpJVuXUWEi7/QakcA7s09zsMTEbP3T0UL8MLv5xOOy0KxBLhaF6UaIMh4phUzDxzN32nO6Mjl5apeVKyLy0WhkSd3gOpksC87uYhniTF5r9xuSAX7G+LHe6yo4bCerkRCtxg5XqgNGf6YBJvyiK3AeoB323RTao/bEBQjcGDBR9AdjHP66WZNMrrTzULEVW8dY/BoxEFO6o8OthFXkhNP08uQWxUQLIWUQ4/IMvLuXz+AV9fbE0z4O8Pp5lxe95Yr5tTZjF/KPM+Pb1sgHyEMFCYMJ77+KHXb5OQ/hFY1h3/ZB6r7lM7dNX8aMQPRgd/khXrYV4yJ3M2oc4nTKUgrH/fcftRgTkbAW+xC/fxokAFKVAduLJxufwhOVP12nlZWUWgat9nWKuJ2rYAumCjLS8EoNTujO7AhER5qGoiZk8r+JAnhS7ro3z5VH8hJkbDjEiJRe1C/LWJ1Ifv2vhy31ldB7ZPcvNzBZXkmbSPDkJ+imJCswSQiTj/aZjr9gLZ+YXj0odKgOQKD7RwpRfMOo4ObwpMkYf6mywChymrGE9LAa+f0zsLwTQf1ZXtHcWbY0zKTjmGCHWUwyOoJ2aFKLoWZM+PVX0WaoE4ktAcInTzooQ0eeoxxriO6FG31dl9xi/PeKt/ApGod7ssystYnT7cpK5GbT ROJMPx2V ng9j/EXNmrGceZ7U1OUfIvyabXitMuBfchT3jTP8aYTOlpecNqH00x676AE7Lhr97Obd+90MsCdr90G/ZOHj86FQBgwdsdTmEcRTFuyYU524hlqv1Y4xHw939SNbzdmZBgFfJy5UL0jVSEtIWfs2XcM8lHYGxGmpgs7DgXchINveafAULwI5oR0I/fjA8HIFEw3BH17NH3bNKH9NmcEQ0vW9e4QpnOOqP7DoRugSFq3t7H1qbK9SlZeQ/IpZe+xojITYSWpVmnKjjHl2fa/wrLRpE89FtxlagA2QLTmvkR8KeFSYrQmILSc2A/I48gQDGVtpPTSmnxasDMFgue3aaDxwoVhftguk5GLswgaQlobkoLO631vM/E4zxoWYYG+O2Zh78IFiF+IIbFivpM9sPxyNiXxHe8kqw1Qh9ObgW3FLnef5Rl9QAu91+dVUbxbwKGZahUrpfeWxs8SdOnomgL7sIgeING3qOFd7HFf9KKWb6D0UnErMypwa55poGnNSWgXTtIKWPC7ojomyElBOnkSD9OlD6MXB2SEjn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 11, 2024 at 2:15=E2=80=AFPM David Hildenbrand wrote: > > On 11.04.24 21:01, Yang Shi wrote: > > On Thu, Apr 11, 2024 at 8:46=E2=80=AFAM David Hildenbrand wrote: > >> > >> On 11.04.24 17:32, Zi Yan wrote: > >>> From: Zi Yan > >>> > >>> In __folio_remove_rmap(), a large folio is added to deferred split li= st > >>> if any page in a folio loses its final mapping. It is possible that > >>> the folio is unmapped fully, but it is unnecessary to add the folio > >>> to deferred split list at all. Fix it by checking folio mapcount befo= re > >>> adding a folio to deferred split list. > >>> > >>> Signed-off-by: Zi Yan > >>> --- > >>> mm/rmap.c | 9 ++++++--- > >>> 1 file changed, 6 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/mm/rmap.c b/mm/rmap.c > >>> index 2608c40dffad..d599a772e282 100644 > >>> --- a/mm/rmap.c > >>> +++ b/mm/rmap.c > >>> @@ -1494,7 +1494,7 @@ static __always_inline void __folio_remove_rmap= (struct folio *folio, > >>> enum rmap_level level) > >>> { > >>> atomic_t *mapped =3D &folio->_nr_pages_mapped; > >>> - int last, nr =3D 0, nr_pmdmapped =3D 0; > >>> + int last, nr =3D 0, nr_pmdmapped =3D 0, mapcount =3D 0; > >>> enum node_stat_item idx; > >>> > >>> __folio_rmap_sanity_checks(folio, page, nr_pages, level); > >>> @@ -1506,7 +1506,8 @@ static __always_inline void __folio_remove_rmap= (struct folio *folio, > >>> break; > >>> } > >>> > >>> - atomic_sub(nr_pages, &folio->_large_mapcount); > >>> + mapcount =3D atomic_sub_return(nr_pages, > >>> + &folio->_large_mapcount) += 1; > >> > >> That becomes a new memory barrier on some archs. Rather just re-read i= t > >> below. Re-reading should be fine here. > >> > >>> do { > >>> last =3D atomic_add_negative(-1, &page->_mapco= unt); > >>> if (last) { > >>> @@ -1554,7 +1555,9 @@ static __always_inline void __folio_remove_rmap= (struct folio *folio, > >>> * is still mapped. > >>> */ > >>> if (folio_test_large(folio) && folio_test_anon(folio)) > >>> - if (level =3D=3D RMAP_LEVEL_PTE || nr < nr_pmdm= apped) > >>> + if ((level =3D=3D RMAP_LEVEL_PTE && > >>> + mapcount !=3D 0) || > >>> + (level =3D=3D RMAP_LEVEL_PMD && nr < nr_pmd= mapped)) > >>> deferred_split_folio(folio); > >>> } > >> > >> But I do wonder if we really care? Usually the folio will simply get > >> freed afterwards, where we simply remove it from the list. > >> > >> If it's pinned, we won't be able to free or reclaim, but it's rather a > >> corner case ... > >> > >> Is it really worth the added code? Not convinced. > > > > It is actually not only an optimization, but also fixed the broken > > thp_deferred_split_page counter in /proc/vmstat. > > > > The counter actually counted the partially unmapped huge pages (so > > they are on deferred split queue), but it counts the fully unmapped > > mTHP as well now. For example, when a 64K THP is fully unmapped, the > > thp_deferred_split_page is not supposed to get inc'ed, but it does > > now. > > > > The counter is also useful for performance analysis, for example, > > whether a workload did a lot of partial unmap or not. So fixing the > > counter seems worthy. Zi Yan should have mentioned this in the commit > > log. > > Yes, all that is information that is missing from the patch description. > If it's a fix, there should be a "Fixes:". > > Likely we want to have a folio_large_mapcount() check in the code below. > (I yet have to digest the condition where this happens -- can we have an > example where we'd use to do the wrong thing and now would do the right > thing as well?) For example, map 1G memory with 64K mTHP, then unmap the whole 1G or some full 64K areas, you will see thp_deferred_split_page increased, but it shouldn't. It looks __folio_remove_rmap() incorrectly detected whether the mTHP is fully unmapped or partially unmapped by comparing the number of still-mapped subpages to ENTIRELY_MAPPED, which should just work for PMD-mappable THP. However I just realized this problem was kind of workaround'ed by commit: commit 98046944a1597f3a02b792dbe9665e9943b77f28 Author: Baolin Wang Date: Fri Mar 29 14:59:33 2024 +0800 mm: huge_memory: add the missing folio_test_pmd_mappable() for THP split statistics Now the mTHP can also be split or added into the deferred list, so add folio_test_pmd_mappable() validation for PMD mapped THP, to avoid confusion with PMD mapped THP related statistics. Link: https://lkml.kernel.org/r/a5341defeef27c9ac7b85c97f030f93e4368bbc= 1.1711694852.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang Acked-by: David Hildenbrand Cc: Muchun Song Signed-off-by: Andrew Morton This commit made thp_deferred_split_page didn't count mTHP anymore, it also made thp_split_page didn't count mTHP anymore. However Zi Yan's patch does make the code more robust and we don't need to worry about the miscounting issue anymore if we will add deferred_split_page and split_page counters for mTHP in the future. > -- > Cheers, > > David / dhildenb >