From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99EA5C25B75 for ; Thu, 6 Jun 2024 09:39:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 177696B009C; Thu, 6 Jun 2024 05:39:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 100D66B009D; Thu, 6 Jun 2024 05:39:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E94946B009F; Thu, 6 Jun 2024 05:39:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C86956B009C for ; Thu, 6 Jun 2024 05:39:08 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 276EB80911 for ; Thu, 6 Jun 2024 09:39:08 +0000 (UTC) X-FDA: 82199965176.29.492193C Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf11.hostedemail.com (Postfix) with ESMTP id 4398840017 for ; Thu, 6 Jun 2024 09:39:06 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=deI1HE5d; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717666746; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K8xP4iI8y3hGs9QaBeNbsUbwYziKnYDMkGGrCS2wfSA=; b=jyYf9x9GSRF59xc+uyIDYevwOlBavLYtMD6d9ujxYAXh0NhX+U79zs1Dq5rzyVGJX0LDqk aDcvF7NVc8GxjMg9ToLc1GITVblFseURdaEPCN9iBjwjQsM/+9ysNGsUiFFOU3CF7/lBR+ fqQt9+SkwIUbDTWm6sDupzQVa2iAHKg= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=deI1HE5d; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717666746; a=rsa-sha256; cv=none; b=ML2fo3g7JtGncR/MIpLbq70DE9lnJ4JUKhj5V1q0phjXWZtK5mGHwbqm5kX5GBCbC7P53I Bc11/6Kst8PfTOexIXQWGZ8XryXk3jAId+4dNcVePO9ZtnylxeTP41g/seQHKmRa53zFIR S8D71nkCv4yjaHtZ1cO2JpdqVLeuCy4= Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-57a2f032007so809708a12.0 for ; Thu, 06 Jun 2024 02:39:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717666745; x=1718271545; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=K8xP4iI8y3hGs9QaBeNbsUbwYziKnYDMkGGrCS2wfSA=; b=deI1HE5dyTZG2Hqx0a3LAxsSI8qmvQ1iBlIBz7Ik2YF8W63H1v0Y2GbHzt8n0nZiIc lJaMqJSg2Dza/mNKS3MnqU5mv7QSLXVPbpODvOmwSZ/npNf6YwQuVXFpGPyW793jSBBK O8kztFfvQWhRja+0HYiJZC9VkMwV/8Q7kLtgTRMfZZzryMmRcJyOzliAs7OCFHApFP5r VCcnW0kIbZRDAX+YvuxYJdRKWylc2Y8jq2/NQ1hzomkKUHSFDSQ8+1pZeYvM3GpVB2cY fbAm/MVqpmhum8HaMFZ/zi6LIr5SX16BWBY+FBAZ4nvNxXMWVOd3sTxDU6/8ewgMQfrM X56w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717666745; x=1718271545; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K8xP4iI8y3hGs9QaBeNbsUbwYziKnYDMkGGrCS2wfSA=; b=GZZB7TFHWnuXyBRDpju8h4LJr10PvPIFn9wxQA77Jec9D9dnMMBjIpB/L69fHBGTBZ ZbcqT2g8w/s//dAygj30Vd6kzlXD3QRN+W+QxPxRduxRSgloxi7gkXffksuUv9+GLe+l WpYBCTJPGfhim2ilN0AgFWd3beWuASJq1jatXW3SERPOd/V21O4r6qiAelfQXJVAo7NO E3xfF+PkYpCNqWYLD6m3EalTeqSUSd9C8pvpCMftrOuK3tkDQEVplpRsAzXBNi1RP4G1 FxX6b8i+rNDJyB85VIRI27w932kHM5T5voG0XOe8ufBV8amlsdkNwklcXK2FJiGaR68X 04hQ== X-Forwarded-Encrypted: i=1; AJvYcCXsmq/AWRG/DVDQKfHAX4RQfInhJ62YrzThN4eUSnLALcgpFxFnpCA+AIyq72bsT+e03xImr1KfFbkPMc0JbG5AHb0= X-Gm-Message-State: AOJu0YwnsfGEJNK2M6gb/VXJr+d6P0cBFiNB8sSmMFQbxcRFaV5d2ckz aIrHKk2cyejkkYLSbPtvuTXIA/iO303vTsKI1eGSY9vKHKKg3KItYxw8kfeT6LS9ETOHq9qHUmZ rwj7iWBhWOtX/M+bcMPkfy/OVjZc= X-Google-Smtp-Source: AGHT+IG2T9lbzffU7CgaOg8lnSW5VZ5O9T1UU/WFFkEqrSPD7Y5nSYj5Jcb0g6rH1mOfXn1sqNxOPbJhqlHhqAfPhc0= X-Received: by 2002:a50:8a92:0:b0:578:67db:7516 with SMTP id 4fb4d7f45d1cf-57a8bcb5db5mr3332344a12.37.1717666744421; Thu, 06 Jun 2024 02:39:04 -0700 (PDT) MIME-Version: 1.0 References: <20240521040244.48760-1-ioworker0@gmail.com> <20240521040244.48760-3-ioworker0@gmail.com> <758f7be7-c17e-46d1-879f-83340ec85749@redhat.com> <5a728148-ed93-4d68-a86f-9be3612dedbb@redhat.com> In-Reply-To: <5a728148-ed93-4d68-a86f-9be3612dedbb@redhat.com> From: Lance Yang Date: Thu, 6 Jun 2024 17:38:52 +0800 Message-ID: Subject: Re: [PATCH v6 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop To: David Hildenbrand Cc: akpm@linux-foundation.org, willy@infradead.org, sj@kernel.org, baolin.wang@linux.alibaba.com, maskray@google.com, ziy@nvidia.com, ryan.roberts@arm.com, 21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, libang.li@antgroup.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4398840017 X-Stat-Signature: omdstixmf1n89w39qw9iiptx5tkdk4z8 X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1717666746-184887 X-HE-Meta: U2FsdGVkX1+UAKe9zmhxEnPK5Ev82GAfhvgpnQ4DgAEcGlOTcXV3e1ylnVampHoDzl/rtUlJbXv8n5YTIvReYmavSA2El2YjQjPKd/8T9Gh6RLdqIy8Gx711CaGiViBCGIFM5elCJzirwcwTymMmrCNZEh15hUXQ1BhDvurEH8/xzUY2FeiIja9RDWF+5J8c68gg08HMvADaLMtNhMu51m8dhNMiBVNjk20dafHtUcrCS2VoEUm5f0mzRzKdmSLLbeILNG5ekpm7vjMDZQFTGXLsutSi4eAz771svo0jKdZToUR3Yq20ZNpKNelzVBWfE+n6oz5TuIrZz98nR+xz6vIPf2b9+cjJv3KsveKtpLyi9CpiNl/VcI9Lue94qdzR/gq9fm4aCj3Q1PTxjYQ2oBlXU4t+D2p1soKHPUdaa6grEkmWXfk+/FgupKgjklx/1GQTiu0IbH+by4d2gxE+OrLcnRafDWYfJj20BGUPvpFAkzvKsI5kb6+E1zp0K1yZVU9xO6kuThbxkO4XuB+a3oAPQnv0zJ0W+qKmWglP/aAkPgR7xC/coZn/MZj1HtOp0SHftdRa9I7M4NqX7M7eBghsTUlZjDCXakiDHPGwLDfxBf1a534dQcQn8XMf7zQXvpGlAPfuskA81Tzr+hnH4q8PRbBe9j7U4Ww3TYu9k3TsbhquHl/wIb0Ayja4v3Q/ZFCcQqGQ3c7gUhcaY/mLhvIGVtpcqsuUqjNWsbMzvqDi70Qbh/AIs+DSV2L0sn0bUOKJK2qV5Ht4dLeQUmCbmD+Fa8QbpH5LXoZa7evL2qdpI5iNV0yaoN8cvJN5hLcxrvzbEY+Dh44eQA7tA30zobF9XTg+nHIAJnx+bXb6iJCVzWW6sCR6znHiegycCQ5y41tMH8J1q1x9mkc48TlGd/XX4wU9hIVD/2O6P9LL/MEvMAszqhEhlGFZ4FLGd//mlPXVp2P1OG8Mx2Qs26x e5El4GVy 0rpNP29EMRuZte9TcdTZGlKh7wbqyBCZ+uluqN7gyodJZjr7DAXUxiY5XZwUsRcXWhaS5I4h+OdQkC0KeLGBPorFQWlk+m62T/0K6p3uyUIpsGw4LUtgX1+2KZ+cJG8CojY+YCmvTF7MXHNZGaFHW0/ed/EBxsOlbB/FbHA+/+7sKRTSINimD4twKaOF8D25KbTHIjiKo/hkZ9xD5/NF05BEm5bv4UEoiiGPmx5qVc/MFSwD2g8PyQHT3TPrQ8k22Tdh0gmJ6kpKstVniXqanwxlj+UkzLMOB2eOMUmhvp7Re4AszLn0VN/5Or48aNsm5TL/kI7oCu5rp/d4EUIG/2P1jAnvjA6gcMsJrLYVSYiHktNfOP3BiA1otJGZPNfmjtmKx3dTAqM09lpoUaw1YrN8T1uvA3xf5Gic+V0QG/5QUtpI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 6, 2024 at 4:06=E2=80=AFPM David Hildenbrand = wrote: > > On 06.06.24 10:01, David Hildenbrand wrote: > > On 06.06.24 05:55, Lance Yang wrote: > >> On Wed, Jun 5, 2024 at 10:28=E2=80=AFPM David Hildenbrand wrote: > >>> > >>> On 05.06.24 16:20, Lance Yang wrote: > >>>> Hi David, > >>>> > >>>> On Wed, Jun 5, 2024 at 8:46=E2=80=AFPM David Hildenbrand wrote: > >>>>> > >>>>> On 21.05.24 06:02, Lance Yang wrote: > >>>>>> In preparation for supporting try_to_unmap_one() to unmap PMD-mapp= ed > >>>>>> folios, start the pagewalk first, then call split_huge_pmd_address= () to > >>>>>> split the folio. > >>>>>> > >>>>>> Since TTU_SPLIT_HUGE_PMD will no longer perform immediately, we mi= ght > >>>>>> encounter a PMD-mapped THP missing the mlock in the VM_LOCKED rang= e during > >>>>>> the page walk. It=E2=80=99s probably necessary to mlock this THP t= o prevent it from > >>>>>> being picked up during page reclaim. > >>>>>> > >>>>>> Suggested-by: David Hildenbrand > >>>>>> Suggested-by: Baolin Wang > >>>>>> Signed-off-by: Lance Yang > >>>>>> --- > >>>>> > >>>>> [...] again, sorry for the late review. > >>>> > >>>> No worries at all, thanks for taking time to review! > >>>> > >>>>> > >>>>>> diff --git a/mm/rmap.c b/mm/rmap.c > >>>>>> index ddffa30c79fb..08a93347f283 100644 > >>>>>> --- a/mm/rmap.c > >>>>>> +++ b/mm/rmap.c > >>>>>> @@ -1640,9 +1640,6 @@ static bool try_to_unmap_one(struct folio *f= olio, struct vm_area_struct *vma, > >>>>>> if (flags & TTU_SYNC) > >>>>>> pvmw.flags =3D PVMW_SYNC; > >>>>>> > >>>>>> - if (flags & TTU_SPLIT_HUGE_PMD) > >>>>>> - split_huge_pmd_address(vma, address, false, folio); > >>>>>> - > >>>>>> /* > >>>>>> * For THP, we have to assume the worse case ie pmd for i= nvalidation. > >>>>>> * For hugetlb, it could be much worse if we need to do p= ud > >>>>>> @@ -1668,20 +1665,35 @@ static bool try_to_unmap_one(struct folio = *folio, struct vm_area_struct *vma, > >>>>>> mmu_notifier_invalidate_range_start(&range); > >>>>>> > >>>>>> while (page_vma_mapped_walk(&pvmw)) { > >>>>>> - /* Unexpected PMD-mapped THP? */ > >>>>>> - VM_BUG_ON_FOLIO(!pvmw.pte, folio); > >>>>>> - > >>>>>> /* > >>>>>> * If the folio is in an mlock()d vma, we must no= t swap it out. > >>>>>> */ > >>>>>> if (!(flags & TTU_IGNORE_MLOCK) && > >>>>>> (vma->vm_flags & VM_LOCKED)) { > >>>>>> /* Restore the mlock which got missed */ > >>>>>> - if (!folio_test_large(folio)) > >>>>>> + if (!folio_test_large(folio) || > >>>>>> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PM= D))) > >>>>>> mlock_vma_folio(folio, vma); Should we still keep the '!pvmw.pte' here? Something like: if (!folio_test_large(folio) || !pvmw.pte) mlock_vma_folio(folio, vma); We can mlock the THP to prevent it from being picked up during page reclaim= . David, I=E2=80=99d like to hear your thoughts on this ;) Thanks, Lance > >>>>> > >>>>> Can you elaborate why you think this would be required? If we would= have > >>>>> performed the split_huge_pmd_address() beforehand, we would still = be > >>>>> left with a large folio, no? > >>>> > >>>> Yep, there would still be a large folio, but it wouldn't be PMD-mapp= ed. > >>>> > >>>> After Weifeng's series[1], the kernel supports mlock for PTE-mapped = large > >>>> folio, but there are a few scenarios where we don't mlock a large fo= lio, such > >>>> as when it crosses a VM_LOCKed VMA boundary. > >>>> > >>>> - if (!folio_test_large(folio)) > >>>> + if (!folio_test_large(folio) || > >>>> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_= PMD))) > >>>> > >>>> And this check is just future-proofing and likely unnecessary. If en= countering a > >>>> PMD-mapped THP missing the mlock for some reason, we can mlock this > >>>> THP to prevent it from being picked up during page reclaim, since it= is fully > >>>> mapped and doesn't cross the VMA boundary, IIUC. > >>>> > >>>> What do you think? > >>>> I would appreciate any suggestions regarding this check ;) > >>> > >>> Reading this patch only, I wonder if this change makes sense in the > >>> context here. > >> > >> Allow me to try explaining it again ;) > >> > >>> > >>> Before this patch, we would have PTE-mapped the PMD-mapped THP before > >>> reaching this call and skipped it due to "!folio_test_large(folio)". > >> > >> Yes, there is only a PTE-mapped THP when doing the "!folio_test_large(= folio)" > >> check, as we will first conditionally split the PMD via > >> split_huge_pmd_address(). > >> > >>> > >>> After this patch, we either > >> > >> Things will change. We'll first do the "!folio_test_large(folio)" chec= k, then > >> conditionally split the PMD via split_huge_pmd_address(). > >> > >>> > >>> a) PTE-remap the THP after this check, but retry and end-up here agai= n, > >>> whereby we would skip it due to "!folio_test_large(folio)". > >> > >> Hmm... > >> > >> IIUC, we will skip it after this check, stop the page walk, and not > >> PTE-remap the THP. > >> > >>> > >>> b) Discard the PMD-mapped THP due to lazyfree directly. Can that > >>> co-exist with mlock and what would be the problem here with mlock? > >> > >> Before discarding a PMD-mapped THP as a whole, as patch #3 did, > >> we also perform the "!folio_test_large(folio)" check. If the THP coexi= sts > >> with mlock, we will skip it, stop the page walk, and not discard it. I= IUC. > > > > But "!folio_test_large(folio)" would *skip* the THP and not consider it > > regarding mlock. > > > > I'm probably missing something > > I'm stupid, I missed that we still do the "goto walk_done_err;", only > that we don't do the mlock_vma_folio(folio, vma); > > Yes, let's drop it for now! :) > > -- > Cheers, > > David / dhildenb >