From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0AFA4C4345F for ; Fri, 26 Apr 2024 03:44:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F0536B009E; Thu, 25 Apr 2024 23:44:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 79FDB6B009F; Thu, 25 Apr 2024 23:44:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 640766B00A0; Thu, 25 Apr 2024 23:44:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 458836B009E for ; Thu, 25 Apr 2024 23:44:50 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B6437806C2 for ; Fri, 26 Apr 2024 03:44:49 +0000 (UTC) X-FDA: 82050291498.24.23DF088 Received: from mail-vk1-f181.google.com (mail-vk1-f181.google.com [209.85.221.181]) by imf01.hostedemail.com (Postfix) with ESMTP id F1B4740004 for ; Fri, 26 Apr 2024 03:44:47 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Xgc/i52z"; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714103088; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hl+KwHwGCvopYJe2xXkZJXc+dLxXvy+2rqA7eH7DOaI=; b=kTI0oTwJzzSgiOxqf8p/Mi066nqAxIPVxF6/xTClxZe1prHx5ouRVmOpiWnsi5xbe20fNi u+mikTXQt83VlCA+KmU0oapPpOffeW0z7CCG7kNU7PXDIYZYZyipLg2kqXxH733BDh3lym ITD6CPtmCv9j1RN+I1mKAjzAzw3eqS8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714103088; a=rsa-sha256; cv=none; b=lTI2pSAwzRwHMYg/tMhMhs4553hVfy6d9lHLmxKwXeQJMkM90vzWmj7Jny3ErQIjRm44Y2 Cpmp9+SSMYKDaDxvFijaIEchKuxwFHQJvtUi1ojL9URdX5EEHokFABzOxSzVFukz1PdbaS MRZAkgpKVR910ycvSRa54dW8W+3XciY= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Xgc/i52z"; spf=pass (imf01.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.181 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vk1-f181.google.com with SMTP id 71dfb90a1353d-4dcbf770c24so905099e0c.1 for ; Thu, 25 Apr 2024 20:44:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714103087; x=1714707887; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hl+KwHwGCvopYJe2xXkZJXc+dLxXvy+2rqA7eH7DOaI=; b=Xgc/i52zKBiyPBdzOCoOHbd1RE32Wj+ld9uCBj4XKZQ15k6kydfk7SMrxmCOR62RRk KLg+Cxzmf136DUelmCMN9wl0X9UxLwGH31GCLgYsJs9q3t/F7iejDpVHVjdiGyzy31q+ M+10l5OnyUYW+hvJLqVqY6lcwZSr5719KMeG3svctTNCD+9jcOrqz2SV1yVrimkMHyyA 74bIW4MLsq2Gz77MPirvu+Hm9APOkGkrlWrxMF+KwHBSVFPBSE6oG48D/e2V8cfdUw/j VNkZh4s9hpgt4wADpmH+e6YRlEyXODpzMKFSBx331vCNvZ1HNt0EZIYy8rZ/9rXJHr0K +swg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714103087; x=1714707887; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hl+KwHwGCvopYJe2xXkZJXc+dLxXvy+2rqA7eH7DOaI=; b=IwedSffN77PqlsZtqR+BDgCFT0pGByuodW0OjPP4fQMlha8b2qTqX9zZXhqY9mjGIw qrYABCphI+G8++fmPrrALcpb/EolqzFLo6H9mMBlNg3Z9Hww5DcWmWyuFeLU9QejFXSJ bKT4VMiR+gIRxHtPnMCzT/EQAzXX0xjjjTeC60gy7MzDhm5Tlm8jOXkSfTgDZ73k+cdT jRKJbIMk099o9RMAT8dmGmId/XhwUK50vzjW2wf2p2OEVmqqVQjOMbwoL8/s2aVNQb34 ICG9R+rpVshZTLA6OQAelF45hlmnh7GKiXLiiPwadvFuBfMguJqAl/8D0+62pPZt/2Wb RFKQ== X-Forwarded-Encrypted: i=1; AJvYcCXoKLMZskAgUsaJYpZQI1aZGBcQ/8SNERYHgjKfJPpKcc/1+cxBEmd4ry6bPJhE8D/r4jHDuAgWo3bn3LQ/L8VffxY= X-Gm-Message-State: AOJu0Ywf7GuJ/ZOSMP4G1pl7o4EHIYRY3Jk0XYWzUjI+RIfTCDHRhtxm IP9tMqDglSgjqzHuitp37+qxF8M60QP8ZgE/sQPlB9fkmhDZM9NgPFlapQ4bnbPWzAT04h01URw XOix53YzJ2bzi7CoVMqBppBt1BVg= X-Google-Smtp-Source: AGHT+IFx+1ecCbTS9p2WDOgrOlJuG+TMhRgBlPkAaar7fKSVClUxULAMdrigbpw5toyOS4O+8OB60bBFZjxl0Z+cR2g= X-Received: by 2002:a05:6122:986:b0:4d3:34f4:7e99 with SMTP id g6-20020a056122098600b004d334f47e99mr2185296vkd.0.1714103087010; Thu, 25 Apr 2024 20:44:47 -0700 (PDT) MIME-Version: 1.0 References: <20240425211136.486184-1-zi.yan@sent.com> <6C31DF81-94FB-4D09-A3B8-0CED2AD8EDDB@nvidia.com> <730660D2-E1BA-4A2E-B99C-2F160F9D9A9B@nvidia.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Fri, 26 Apr 2024 11:44:35 +0800 Message-ID: Subject: Re: [PATCH v4] mm/rmap: do not add fully unmapped large folio to deferred split list To: Zi Yan Cc: Andrew Morton , linux-mm@kvack.org, "Matthew Wilcox (Oracle)" , Yang Shi , Ryan Roberts , David Hildenbrand , Lance Yang , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: fj7ainba1ijimu6j31ib8yd568w1u9e9 X-Rspamd-Queue-Id: F1B4740004 X-Rspamd-Server: rspam06 X-Rspam-User: X-HE-Tag: 1714103087-315715 X-HE-Meta: U2FsdGVkX1+COAZAtFAKdLLxKgwNbLphcJqzqqzWH/UTjy5JzR61mYcIQj8+h5Vn3JG3qt6ki3V0rRJ4R++fGYv+4DZqyudssTCItN1VeDiZ62mXMS4nKyQXoSTz0qnTx86dKFvjSvkOqk6kEbNPbLm4PfE6uSWXJI4TzuYAVzsmslnBAhk45Jt1a0F6ouPr2jVIUQ0qiiJ/8SF6H0e6n7gvdpjrlm978mvTn67ki81hICWz2h1r3HaOAGQG0o4rqIERK/BVEJY60XJpIUx6Ld+uU6UPa58p9EN26xzEhz4vDwRqCNcRFpYWu/xmhQdsqT335Gh6rCtSYBTgUnrKJHCUf30Bs/tQxNtTEx84//5rPxx7qBwUibdictkmuF1BeU/GSGo/n3g71j9rUky0Z4w+SnEkjeGAO4+LonOYnbudShqOAwk1xBeA6S+noqa+nht7bksi+Zj0wUqzdS9FyaKJmaXEud9FBfPk7XfOQKKT/EBqWN1l3jxsqYxtogdWFH497frr+TiEZIvLAqRncx/QAH+rQQi5PnIW237BkodlKcqqYgzCUvZ78d2dPgwfbGlEm9Oovxs2q2J6J1sCClkXnuJokWLrS5zz7pOHr3Gxa59cz9m2ZQ7/pl4RTSQHC/aCbpxO2U3Q48465SDD7xyWwUrpkWDWBsE/Hx5klw+Qcn7VoTZ8x0D/B/5DGjneerm0ipFqubV/D6P+URNsUAlWsvolKSa8w77faRIEzLd13nauTfNUPD/uEh5ViqdUeucJr7kxL3lS7gl5cZ47ZzOgvTvm8zp3NuYdUYXs4qFQMKoR4RMQeInzrmjg9rsSwxn9OUydaV8s1c09ceb1MGMCg5hBCC33q0srDQiJZ4bI1zzzwtkTelI0KTCHHXILp5L9OvGPABrvbtV/RkTzd4SF96C6J0n3ZFpbuX0bYXN61uoMvGmQRQ+w6pSSGoSl2qdEIgW8aqT/LKQIkNn nyj+L9CW BARyqK30EQ6vnYA00Bsq3fGK4k3Lstdz/fZVZVBwobGKJiQtNXuVQiKrr3BUs2vothMMLSLq6VV9qtQkCpWLeJJ+s5j8Ea9uyIjiQm1ovPu8hWEQ+OUSOZ/GUH1+a/dQsdXUCdZObVhdt8jZ/y+1KThpmNGg562TgS/OqHphua1b8+Ry4Qf/CxONkzlBDMKpdbJk7Eks3AtyifITq3x6uD9eiwmLsADNHL3lkLTrAZhFzYYC6zUqOE4ghdka6tgDp9JewjVBfTf+OtB58IPyytgtsRFdz/z8UI+lvCarALOpi2x2YfBE361TBuIOha9ph1N8l1RKNMnhDntGGYYdSWwQRKnEaWOjjYsmtD5Y0dNcieDaFibXeeq453MwbxPyI5cHgXZ1rvO6FF15ufTMIb5JqGq5ohDB6KYEe2hxy6stCzyFhjA+Gd0iwlAAHnjvVa2FcxPb9X3KJK7c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 26, 2024 at 11:37=E2=80=AFAM Zi Yan wrote: > > On 25 Apr 2024, at 23:28, Barry Song wrote: > > > On Fri, Apr 26, 2024 at 10:50=E2=80=AFAM Zi Yan wrote: > >> > >> On 25 Apr 2024, at 22:23, Barry Song wrote: > >> > >>> On Fri, Apr 26, 2024 at 9:55=E2=80=AFAM Zi Yan wrote= : > >>>> > >>>> On 25 Apr 2024, at 21:45, Barry Song wrote: > >>>> > >>>>> On Fri, Apr 26, 2024 at 5:11=E2=80=AFAM Zi Yan wr= ote: > >>>>>> > >>>>>> From: Zi Yan > >>>>>> > >>>>>> In __folio_remove_rmap(), a large folio is added to deferred split= list > >>>>>> if any page in a folio loses its final mapping. But it is possible= that > >>>>>> the folio is fully unmapped and adding it to deferred split list i= s > >>>>>> unnecessary. > >>>>>> > >>>>>> For PMD-mapped THPs, that was not really an issue, because removin= g the > >>>>>> last PMD mapping in the absence of PTE mappings would not have add= ed the > >>>>>> folio to the deferred split queue. > >>>>>> > >>>>>> However, for PTE-mapped THPs, which are now more prominent due to = mTHP, > >>>>>> they are always added to the deferred split queue. One side effect > >>>>>> is that the THP_DEFERRED_SPLIT_PAGE stat for a PTE-mapped folio ca= n be > >>>>>> unintentionally increased, making it look like there are many part= ially > >>>>>> mapped folios -- although the whole folio is fully unmapped stepwi= se. > >>>>>> > >>>>>> Core-mm now tries batch-unmapping consecutive PTEs of PTE-mapped T= HPs > >>>>>> where possible starting from commit b06dc281aa99 ("mm/rmap: introd= uce > >>>>>> folio_remove_rmap_[pte|ptes|pmd]()"). When it happens, a whole PTE= -mapped > >>>>>> folio is unmapped in one go and can avoid being added to deferred = split > >>>>>> list, reducing the THP_DEFERRED_SPLIT_PAGE noise. But there will s= till be > >>>>>> noise when we cannot batch-unmap a complete PTE-mapped folio in on= e go > >>>>>> -- or where this type of batching is not implemented yet, e.g., mi= gration. > >>>>>> > >>>>>> To avoid the unnecessary addition, folio->_nr_pages_mapped is chec= ked > >>>>>> to tell if the whole folio is unmapped. If the folio is already on > >>>>>> deferred split list, it will be skipped, too. > >>>>>> > >>>>>> Note: commit 98046944a159 ("mm: huge_memory: add the missing > >>>>>> folio_test_pmd_mappable() for THP split statistics") tried to excl= ude > >>>>>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it doe= s not > >>>>>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was s= till > >>>>>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAG= E, > >>>>>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside > >>>>>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappabl= e(). > >>>>>> > >>>>>> Signed-off-by: Zi Yan > >>>>>> Reviewed-by: Yang Shi > >>>>>> --- > >>>>>> mm/rmap.c | 8 +++++--- > >>>>>> 1 file changed, 5 insertions(+), 3 deletions(-) > >>>>>> > >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c > >>>>>> index a7913a454028..220ad8a83589 100644 > >>>>>> --- a/mm/rmap.c > >>>>>> +++ b/mm/rmap.c > >>>>>> @@ -1553,9 +1553,11 @@ static __always_inline void __folio_remove_= rmap(struct folio *folio, > >>>>>> * page of the folio is unmapped and at least one = page > >>>>>> * is still mapped. > >>>>>> */ > >>>>>> - if (folio_test_large(folio) && folio_test_anon(fol= io)) > >>>>>> - if (level =3D=3D RMAP_LEVEL_PTE || nr < nr= _pmdmapped) > >>>>>> - deferred_split_folio(folio); > >>>>>> + if (folio_test_large(folio) && folio_test_anon(fol= io) && > >>>>>> + list_empty(&folio->_deferred_list) && > >>>>>> + ((level =3D=3D RMAP_LEVEL_PTE && atomic_read(m= apped)) || > >>>>>> + (level =3D=3D RMAP_LEVEL_PMD && nr < nr_pmdma= pped))) > >>>>>> + deferred_split_folio(folio); > >>>>> > >>>>> Hi Zi Yan, > >>>>> in case a mTHP is mapped by two processed (forked but not CoW yet),= if we > >>>>> unmap the whole folio by pte level in one process only, are we stil= l adding this > >>>>> folio into deferred list? > >>>> > >>>> No. Because the mTHP is still fully mapped by the other process. In = terms of code, > >>>> nr will be 0 in that case and this if condition is skipped. nr is on= ly increased > >>>> from 0 when one of the subpages in the mTHP has no mapping, namely p= age->_mapcount > >>>> becomes negative and last is true in the case RMAP_LEVEL_PTE. > >>> > >>> Ok. i see, so "last" won't be true? > >>> > >>> case RMAP_LEVEL_PTE: > >>> do { > >>> last =3D atomic_add_negative(-1, &page->_mapcount); > >>> if (last && folio_test_large(folio)) { > >>> last =3D atomic_dec_return_relaxed(mapped); > >>> last =3D (last < ENTIRELY_MAPPED); > >>> } > >>> > >>> if (last) > >>> nr++; > >>> } while (page++, --nr_pages > 0); > >>> break; > >> > >> Right, because for every subpage its corresponding > >> last =3D atomic_add_negative(-1, &page->_mapcount); is not true after = the unmapping.2 > > > > if a mTHP is mapped only by one process, and we unmap it entirely, we w= ill > > get nr > 0, then we are executing adding it into deferred_list? so it s= eems > > atomic_read(mapped) is preventing this case from adding deferred_list? > > Yes, that is what this patch is doing. When a mTHP is mapped by one proce= ss > and later unmapped fully, there is no need to add it to deferred_list. > The mTHP will be freed right afterwards. thanks. I understand. i feel fixing up nr earlier can make the code more readable. case RMAP_LEVEL_PTE: ... + if (!atomic_read(mapped)) + nr =3D 0; break; as I have been struggling for a long time to understand the code, especiall= y the one with many conditions in the =E2=80=9Cif=E2=80=9D :-) + if (folio_test_large(folio) && folio_test_anon(folio) && + list_empty(&folio->_deferred_list) && + ((level =3D=3D RMAP_LEVEL_PTE && atomic_read(mapped)) |= | + (level =3D=3D RMAP_LEVEL_PMD && nr < nr_pmdmapped))) + deferred_split_folio(folio); } > > > > > I wonder if it is possible to fixup nr to 0 from the first place? > > for example > > /* we are doing an entire unmapping */ > > if (page=3D=3D&folio->page && nr_pages =3D=3D folio_nr_pages(folio)) > > ... > > > >> > >> > >> -- > >> Best Regards, > >> Yan, Zi > > > -- > Best Regards, > Yan, Zi