From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52A36C4345F for ; Fri, 26 Apr 2024 03:36:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DB8F06B0089; Thu, 25 Apr 2024 23:36:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D68096B008A; Thu, 25 Apr 2024 23:36:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C30856B008C; Thu, 25 Apr 2024 23:36:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A71556B0089 for ; Thu, 25 Apr 2024 23:36:33 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 271094030F for ; Fri, 26 Apr 2024 03:36:33 +0000 (UTC) X-FDA: 82050270666.26.3348F88 Received: from mail-vs1-f51.google.com (mail-vs1-f51.google.com [209.85.217.51]) by imf04.hostedemail.com (Postfix) with ESMTP id 5F91F4000E for ; Fri, 26 Apr 2024 03:36:31 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=grOpSQF5; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1714102591; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5mP63EtOiQrtP2MNkkQ3dOJIbcfxN3+tIBhU95IFWp8=; b=BLSnJgPkHhv/4jowUCJvv+ZEAI2gcDyLAxhZ2Rt+HpC6mgjT8lUh+KN11K2M16wHqY5AFf ycCsvxhxTg/3DoJ8nlxFUwdovBWKbMMqXwuJR6BNP9kInMG8NKM/8X3LuUnnqmhI9wd3T4 hXAwWyaiWE/T6AYLZwtGCR+00hkpMNQ= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=grOpSQF5; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.51 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1714102591; a=rsa-sha256; cv=none; b=m0SrF63npCUApIXTGHLDng/dk1WDGugU05bRxX+bzjRiOzMh9eranJJ0gorcpGs+x3HxXY y3vtFlhNraX1ZvcWxejxVH2Gtc7XY5UPaNlB/rv+OQk5REQZf9EULhC26BUw8AWytzvqs4 MJk2ytB2Tj/0SG5Nwo1IxYaeBEBtDuc= Received: by mail-vs1-f51.google.com with SMTP id ada2fe7eead31-47bb81424adso877887137.1 for ; Thu, 25 Apr 2024 20:36:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1714102590; x=1714707390; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5mP63EtOiQrtP2MNkkQ3dOJIbcfxN3+tIBhU95IFWp8=; b=grOpSQF5R+/tW32wK3VYtP+kDQ4JZ1S8qKBRbPWwojr1GM8pJO2mEJ4opDU0/MWADq LjqyjFbt/OYcD8CNuG9IfxzGgtZp8KYx4X5gIdxItiJaz/lMzJgneQEAYI0WkKGiKenT 0SdLWqxHJKpmdGbeY+8ge94gfZ1hcyT5rQ35VxqbxfL+0PFlpeI2bNHFB3C6W4dy2eDD RUuX5auP+PuAIIrMAllis5y2xIhhxyseyBl/rwksOp9Q7wCL89lPq6NvuzoB36p2IiaZ 4JW4kFPdRFNOdTAmRY0OclgFrV0jERgE8G/sLsB/WEsASeNtEvf08moZfBcpRkV7A9vj 0+2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1714102590; x=1714707390; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5mP63EtOiQrtP2MNkkQ3dOJIbcfxN3+tIBhU95IFWp8=; b=BPHu/luU0MHny8JbTYOxGfWgs1jXKXKjRoLAgAZEFw45TLbr95T5o0K0iKdC3tIRgX MG7QCwO2Dbu1cPV1K7quTGA/pGzPZBxD3aY6V7hvmKkOdnLy7isYDsSN8ZysBEU96kN2 TFuvhqYfF0Wqq0pemRCh8y0H9lCeZtqMVAsKZ39lZ0r5k2CMWJoVriGWaiN4kMb2p6kw 7GHKGLKsBJ8YRsTp4Ea5HCAXFeB2Zaq4r2kWNFZwGqPa9H466vyVRmgbNXDdo4wzXwGk Md27r72jVsjrmf7YUoYiDr71Cs4vaRPzyLGlsM7jrjAkz+I4Aihzo7+CSdwsqqWapDB/ lh+Q== X-Forwarded-Encrypted: i=1; AJvYcCWU/RXFNhV/20wohc0McidWsOuIr+chcyjIRIyqQHCHKOLXsTXw+yr/PVyRigJZcSxYt2w/8ojTCEoU6PfE+ioc4IM= X-Gm-Message-State: AOJu0YyBnqhy23tOH+wbYfnO0aPQ2+cQ0ytstEHc3Cd4AS1Z68w5erKV cuFMOwV4N1vZXUWgBFGtsTBIrMvqMIZfWNVkL+cjTcnPqp/QF09/UWA4jy4kvoRaijI0WQF5i16 O076fYJntwSIwQERay0Vu6BUpLU8= X-Google-Smtp-Source: AGHT+IECArRcUa1EoPD0EghSu9ojG36h4iSussPZ7fZe7AnYJYdpG6Y1zeJEaKJWxP+XRNNLYz6hD9u/mJqSgq/YR9A= X-Received: by 2002:a05:6102:2ac4:b0:47c:22df:95a3 with SMTP id eh4-20020a0561022ac400b0047c22df95a3mr1992284vsb.2.1714102590468; Thu, 25 Apr 2024 20:36:30 -0700 (PDT) MIME-Version: 1.0 References: <20240425211136.486184-1-zi.yan@sent.com> <6C31DF81-94FB-4D09-A3B8-0CED2AD8EDDB@nvidia.com> <730660D2-E1BA-4A2E-B99C-2F160F9D9A9B@nvidia.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Fri, 26 Apr 2024 11:36:19 +0800 Message-ID: Subject: Re: [PATCH v4] mm/rmap: do not add fully unmapped large folio to deferred split list To: Zi Yan Cc: Andrew Morton , linux-mm@kvack.org, "Matthew Wilcox (Oracle)" , Yang Shi , Ryan Roberts , David Hildenbrand , Lance Yang , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: w8su7r46qebpmjgq17dio3ftiztjbfeo X-Rspamd-Queue-Id: 5F91F4000E X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1714102591-538101 X-HE-Meta: U2FsdGVkX1+nHvrEqeOKIXkST6xpzQzcOxSjHWvblCLuk2SLC/Ey9NOeW0rjBPzWchj6A6T0HnO5NYiLZnb4MAirdVjxftFJRne5VulAXSOq3B4SpZPPNGqNwhNM+MOXnuvCi8mU0OO/TYIZWmh+HEq9cUP8qrI+gJR6mV9drKUJZeTAIpkUZzQgbWzAdtRMMzwGRwRakJssog6O9UJVWnMT61kT9YqUHaFwo71DueExhHClEERJxjJOSHScAdvKaUr7FNiwf+lVo6p3qMBLVOdIrmR/w437pg3/06p643B996QzZSqaHmbOtFb9auST9fKeiqtPlcerIQ7RMxO1TdkB7wT8o7j3lXwFlxyOdWIyskF+mVT40eAwp3scN3gbJXgSFTVpH6K/Rqod5nUIcnZJ3psDIGY1+tL50DnpTHIQgSpensVBIlQRrGC/Ka8a1JqnDrlkF7PyFY0J/R4AK8rhNdfe6IxZDib+m5iN4WkAFW+aZ3hV77Kj754sXl/6b7irxS4RM+KRidWxyyaOp3kXrOdZk20jS9v1ru4+CNIbyKiAqiKRp+ABswIJ05H2BFbXA6o3VrZ84336FqdAld2lpsCYKEnEJcHH38jWw/NvpqfTOa7KN6q8oSt9OlmWWuT2s0ZGvjlU/gJp9MUZAzG5Yjzn+z3ZkV7WlauWI756/3byX/YKG+E6qJLfSccPrVP16BGA/dtrx7KLssIqOqn6AH3ZhnxO+mravIQSHgxYI/tHwb2G4wUbeFgyoTiSIw18x/KthvsUf2vYFrn1P2qT+TGyJhtheTp5jP2xIT+njUV2taTWwmE43MNwPuDoCVeXks1aF1bhcNjDdGNoa+MmUhRDwh8/rZ03nyWMzHLEXFlN4RNZSfRjVVBF3LwJg2wV3WG9J3jywwMUXQpa4gVnAcZpFNsHrtRozDmsjBktB8Q6drIZN1mP2fNGYI4qvW70c5ZAZqKlYl4ooPm WvtS/Vag 3B9QnGTU21rIg/Ysuv9hKUfktADVBMsZVGJPYMe5XH9PI4rdVb/P9fbGGz5t0nbu6C1lHX2J35Pre32q8KPBaiqFQsewQobSKzSQYoPsN7rLQqroCbr0HVUcB6XKTRopSLz1vYmf6I1BDvz9ABHWEWQFhd0qdm7Xb6iE3kX5MHvarv1bEYFXoPp0c2c7t21UNSOIpWbdyloTvJ7u7QK6ML3YZLN86da/tirshPzOmaAePYLigdmPbvZTJXAib2DdTZNBG5hOFx/Fnqi4SRg87WWiS2pU5ROmI0jT11XiMRAw49lyOYDwBPL3jzBbM2aprhgKBsA0GKnwjMzl03oJkFlUNlubTryNmIhPUzGvooOM+zCeeNOYMKi8JJvb6Xiag1kbqVlRUzdzyY1J2kSHEddDhG1n+Po5sQZU0BDD2tnpDsJ/DzdYNLwbxFVtJ/VUA6KnErtKEI/keOQg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 26, 2024 at 11:28=E2=80=AFAM Barry Song <21cnbao@gmail.com> wro= te: > > On Fri, Apr 26, 2024 at 10:50=E2=80=AFAM Zi Yan wrote: > > > > On 25 Apr 2024, at 22:23, Barry Song wrote: > > > > > On Fri, Apr 26, 2024 at 9:55=E2=80=AFAM Zi Yan wrote= : > > >> > > >> On 25 Apr 2024, at 21:45, Barry Song wrote: > > >> > > >>> On Fri, Apr 26, 2024 at 5:11=E2=80=AFAM Zi Yan wr= ote: > > >>>> > > >>>> From: Zi Yan > > >>>> > > >>>> In __folio_remove_rmap(), a large folio is added to deferred split= list > > >>>> if any page in a folio loses its final mapping. But it is possible= that > > >>>> the folio is fully unmapped and adding it to deferred split list i= s > > >>>> unnecessary. > > >>>> > > >>>> For PMD-mapped THPs, that was not really an issue, because removin= g the > > >>>> last PMD mapping in the absence of PTE mappings would not have add= ed the > > >>>> folio to the deferred split queue. > > >>>> > > >>>> However, for PTE-mapped THPs, which are now more prominent due to = mTHP, > > >>>> they are always added to the deferred split queue. One side effect > > >>>> is that the THP_DEFERRED_SPLIT_PAGE stat for a PTE-mapped folio ca= n be > > >>>> unintentionally increased, making it look like there are many part= ially > > >>>> mapped folios -- although the whole folio is fully unmapped stepwi= se. > > >>>> > > >>>> Core-mm now tries batch-unmapping consecutive PTEs of PTE-mapped T= HPs > > >>>> where possible starting from commit b06dc281aa99 ("mm/rmap: introd= uce > > >>>> folio_remove_rmap_[pte|ptes|pmd]()"). When it happens, a whole PTE= -mapped > > >>>> folio is unmapped in one go and can avoid being added to deferred = split > > >>>> list, reducing the THP_DEFERRED_SPLIT_PAGE noise. But there will s= till be > > >>>> noise when we cannot batch-unmap a complete PTE-mapped folio in on= e go > > >>>> -- or where this type of batching is not implemented yet, e.g., mi= gration. > > >>>> > > >>>> To avoid the unnecessary addition, folio->_nr_pages_mapped is chec= ked > > >>>> to tell if the whole folio is unmapped. If the folio is already on > > >>>> deferred split list, it will be skipped, too. > > >>>> > > >>>> Note: commit 98046944a159 ("mm: huge_memory: add the missing > > >>>> folio_test_pmd_mappable() for THP split statistics") tried to excl= ude > > >>>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it doe= s not > > >>>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was s= till > > >>>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAG= E, > > >>>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside > > >>>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappabl= e(). > > >>>> > > >>>> Signed-off-by: Zi Yan > > >>>> Reviewed-by: Yang Shi > > >>>> --- > > >>>> mm/rmap.c | 8 +++++--- > > >>>> 1 file changed, 5 insertions(+), 3 deletions(-) > > >>>> > > >>>> diff --git a/mm/rmap.c b/mm/rmap.c > > >>>> index a7913a454028..220ad8a83589 100644 > > >>>> --- a/mm/rmap.c > > >>>> +++ b/mm/rmap.c > > >>>> @@ -1553,9 +1553,11 @@ static __always_inline void __folio_remove_= rmap(struct folio *folio, > > >>>> * page of the folio is unmapped and at least one = page > > >>>> * is still mapped. > > >>>> */ > > >>>> - if (folio_test_large(folio) && folio_test_anon(fol= io)) > > >>>> - if (level =3D=3D RMAP_LEVEL_PTE || nr < nr= _pmdmapped) > > >>>> - deferred_split_folio(folio); > > >>>> + if (folio_test_large(folio) && folio_test_anon(fol= io) && > > >>>> + list_empty(&folio->_deferred_list) && > > >>>> + ((level =3D=3D RMAP_LEVEL_PTE && atomic_read(m= apped)) || > > >>>> + (level =3D=3D RMAP_LEVEL_PMD && nr < nr_pmdma= pped))) > > >>>> + deferred_split_folio(folio); > > >>> > > >>> Hi Zi Yan, > > >>> in case a mTHP is mapped by two processed (forked but not CoW yet),= if we > > >>> unmap the whole folio by pte level in one process only, are we stil= l adding this > > >>> folio into deferred list? > > >> > > >> No. Because the mTHP is still fully mapped by the other process. In = terms of code, > > >> nr will be 0 in that case and this if condition is skipped. nr is on= ly increased > > >> from 0 when one of the subpages in the mTHP has no mapping, namely p= age->_mapcount > > >> becomes negative and last is true in the case RMAP_LEVEL_PTE. > > > > > > Ok. i see, so "last" won't be true? > > > > > > case RMAP_LEVEL_PTE: > > > do { > > > last =3D atomic_add_negative(-1, &page->_mapcount); > > > if (last && folio_test_large(folio)) { > > > last =3D atomic_dec_return_relaxed(mapped); > > > last =3D (last < ENTIRELY_MAPPED); > > > } > > > > > > if (last) > > > nr++; > > > } while (page++, --nr_pages > 0); > > > break; > > > > Right, because for every subpage its corresponding > > last =3D atomic_add_negative(-1, &page->_mapcount); is not true after t= he unmapping.2 > > if a mTHP is mapped only by one process, and we unmap it entirely, we wil= l > get nr > 0, then we are executing adding it into deferred_list? so it see= ms > atomic_read(mapped) is preventing this case from adding deferred_list? > > I wonder if it is possible to fixup nr to 0 from the first place? > for example > /* we are doing an entire unmapping */ > if (page=3D=3D&folio->page && nr_pages =3D=3D folio_nr_pages(folio)) or maybe case RMAP_LEVEL_PTE: ... + if (!atomic_read(mapped)) + nr =3D 0; break; > ... > > > > > > > -- > > Best Regards, > > Yan, Zi