From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2CCDC25B76 for ; Wed, 5 Jun 2024 14:57:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F5886B0085; Wed, 5 Jun 2024 10:57:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A48C6B0089; Wed, 5 Jun 2024 10:57:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 26C6A6B008C; Wed, 5 Jun 2024 10:57:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 091AB6B0085 for ; Wed, 5 Jun 2024 10:57:19 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id A45AF41352 for ; Wed, 5 Jun 2024 14:57:18 +0000 (UTC) X-FDA: 82197138156.17.2885826 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf09.hostedemail.com (Postfix) with ESMTP id BD7CC140003 for ; Wed, 5 Jun 2024 14:57:16 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a5jusi0l; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717599436; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ew0ehDFYMnSFybPsJ2azRb8JDVlCAGuoP78U60v29ZY=; b=q+fum+x/CKvz/ExJ2eR/uGJXl0Vw8FYoprDwEFjvlOCzbL+vMVzly2Mhm1cOprkmqBLFVC vmPVNSNnPsmg5UlAO4R2t7n2CxakAGqtO9G1MU7RxysebGA1oG3wl4Z1+bLMB20e2mRhYn tAnbFo8S9CrH0CH/rxOs3meFWbPlbQw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a5jusi0l; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717599436; a=rsa-sha256; cv=none; b=LY6cDJ3Grjz/W606dm8wMx+rSWiowfxWzOmBGD9VyxH8QCucZkyjMz0t9TDc5FYaaD+0IL sZPVl2TVJIdenCC3QZ9/bAAg/lmKT3K/vFcbGARCvyWoV0FYooA5JQwFrrpjQyC5xzKjJu wrmchlpc7kr/gBPupqqlNBPW0pAMURI= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-57a30dbdb7fso3370734a12.3 for ; Wed, 05 Jun 2024 07:57:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717599435; x=1718204235; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ew0ehDFYMnSFybPsJ2azRb8JDVlCAGuoP78U60v29ZY=; b=a5jusi0lRIPqqwhTgRHBY2q9khaGxROgR4jbmGDUtc/AFS/BpjUxbG8Zb5xGQjF+uq cdeUrBEztXqqwYj/XBq4HMNSGaNgowZQcZA6ZSxhwG97gC71yrlcxtfzg7mubh5gqY6D JEMbv9q5krdQszBBhToopDP4exloqlHBUvHyAoZmrRkcEIHEwXgL0FYUnORv/X9NEBF6 /XWzXrnjNsOlLASJHHHZLK3cBA24FbAT1NCMDItHb0MkN2HvuNDaKcCbPPbJ0dnhwjLH TpVZaZTncOhFMz/JPy+QmehVIFjFpcS6mBdkUAsnshdCDVdkMCkpgqSfxH1aLQNrtrHB cYCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717599435; x=1718204235; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ew0ehDFYMnSFybPsJ2azRb8JDVlCAGuoP78U60v29ZY=; b=LAIis7uT/ASp0zpWi7t+NviIGTRASLHFuhvIgwcuW5VFgmbc32PH4iN6GCssfSTwpc +8oVKFtN+pGyNt0S4huJqfHC8dpD4kJ+05z4JuBaOPJNzTcypo5XwxrA50lJ5rUNxDmk RZki9qAr+BZCfZ9C4NkH+Bd1S8NWCwpciVqzlG06mjmsdyPc62Gm5aMuBg/aTOZOHiN2 6vyGpsd4/HyjXkrSCjZ+u/GimDWZ8bU71fXwWz3LZz8CRM+K7zvlsWTiBefphYt6dS38 coDRbcUoaYwiasLc+cN2xXZQqWyYtMXEeZ05bwIbQG4G4kRUWpOvcmKJtNNQvaFgRj8W pR4A== X-Forwarded-Encrypted: i=1; AJvYcCUE3eytlcKNOBUEaytXsSwkBdbz0RJ2sLG0vyzb2ZI4E3OiQm1bzzu1AtmxV6+7+yotOYky/RKmwD1UwREoupnBelY= X-Gm-Message-State: AOJu0Yy5WoP4s+BrJq43FfzW7fpJzP6p06G8DaISEkKbDqt9pOawGXnH I6vY9LSpHpwR+RAqGhz+fahYU+NIDuxvUMvoF81h3hmdWFA+5lfcb9r7HlcBZe+ovePHBYtSDeY YsqZsKhK0xnDeduCWsYKFaM0OugA= X-Google-Smtp-Source: AGHT+IEghxABuzt8hc7dbwM1mrOj97xuKqWBK5xJM+Ug+8gPFvaD/YI18EGQtid1Qg2cFuruJUiY7sFjhiOAHMaUkiM= X-Received: by 2002:a50:998e:0:b0:57a:215a:5cb9 with SMTP id 4fb4d7f45d1cf-57a8b6f8f2bmr1996745a12.19.1717599434705; Wed, 05 Jun 2024 07:57:14 -0700 (PDT) MIME-Version: 1.0 References: <20240521040244.48760-1-ioworker0@gmail.com> <20240521040244.48760-3-ioworker0@gmail.com> <8580a462-eadc-4fa5-b01a-c0b8c3ae644d@redhat.com> In-Reply-To: <8580a462-eadc-4fa5-b01a-c0b8c3ae644d@redhat.com> From: Lance Yang Date: Wed, 5 Jun 2024 22:57:01 +0800 Message-ID: Subject: Re: [PATCH v6 2/3] mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop To: David Hildenbrand Cc: Yin Fengwei , akpm@linux-foundation.org, willy@infradead.org, sj@kernel.org, baolin.wang@linux.alibaba.com, maskray@google.com, ziy@nvidia.com, ryan.roberts@arm.com, 21cnbao@gmail.com, mhocko@suse.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, libang.li@antgroup.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: BD7CC140003 X-Stat-Signature: 5kxrx5oj363yzwkhroebtxgiys1ckg5e X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1717599436-379927 X-HE-Meta: U2FsdGVkX1+CnI/TUqwIjt+gUGwguQOfsBUBjzLOkxg17iFG9EfGMs/vX+BlENBi7qKf3YNCWddNJhIz4rmJtm03w5C+cXTmEHQAdoVgWdNzjbtUBjK5kAMjQWVd7WLaCkltwLCbtd7eZO+GorwlP28i5MPSqydPBZaFj6VA5SRXOKL5z6/bF6wxPM+KHClUCKqA5haq9EMwbm7iKhLrfpBFXeB249X0UzInHJZoOI/v7SuFCgP9JJvnCpYAwSVJ9djwDnIumkQuZnxWjZAl6ozJnPjXbPzTsq5Hacv1BhThCazg+rWAaWawdu12yZYY5nrgwAqMCgNlO0aF4w1RriFk47UffffAPPHvQJAKAP7hdFdIZrsvsXkLR0GKkmygYqXHOD0WlqE1jd6XxmFdxfNLVee+mWf4hNiJ3dBj2xK+mrcyPXB0DD3hpv1vHJPWHV/x6mAE5jTxb3yaAsxsNBt8R8PSPAwRh7A1OE5nuqbnFw4HHIo5qdWTsYM9ISplJF1/zleRO7wIB5zudzbDACV4HEuq2IlhHDcYl7+h0tTKB4kOZVMQUhyXCt/2lo0Ja3Z/KQqScs4zubJwkzpxC95hPw6JEmt93VkChGZ+iiWYSuOIGg3rkjMVFFPa0J5mItxmeetHDcmCRIRZcIrPl1bpRh03np++PRUluZPiTl8n+v7OowRuojf+jva/fdJQ7cT4i9qkPCZRJ/l8Px9XqjRAPXeQXr4f8yjH7t2jm4rW4b+/Xa0G1O5OCPAyodihQdy7FBxvvfKscmKDXcROvTfc01gqMwkM9GVQR+UvGbn8K6z5WpvctUd+69aCqEfhLngpfl1SXD1QyDAMxbQDePS02Sgu5qh3YoSWHWh/uPlVSjJfoROhrkkuCELajEfeL/4IY/CsghcsNxAKUDgQe/aAaOMwcMR73B5/SdiqbklB+ZtXbX9z2rsVk4uEIp3r2aA/klsuAd1QnkLrdBu dQTgkAJ7 Lsjm9pgotITH9JLqYA/eArUmhBXfhoaS3B6JgZOaWm/6UMX940MAq1rHDqSMqWZQh8sucBnpwCDk319KR8+7oZKgTGV7ujvUhaqF97gNu3Pd5XYGy9z9qnWGW+ONCZRsO2bWd7QSZp/fWRtOXM1Dsn3d+5OOrM6nZwJ4X75aRIXCGlZUiphHlknuKOa7eVuRu9sYXdA2ou+eHXsQKpZDU9F3j5G6OQN4UlHHTpyVSHgYy2hNRam9BwcTpTrcTtnsywLAe1jXh6gC1xnU4/Gnxm7eLdHzqGT073nNi0PGKtF/vShaSeW541bOxs+B1iH0p6QGMCW1LWYIhscBzrSJroSJW2S/tyX8XY8rLmtGOnwduLu/gIpAGbC9z6X0GrlbkJL9i2dZPiYBihyP1SWjUYuigEMmuqEDzDrIIPijpuWVCpMA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 5, 2024 at 10:39=E2=80=AFPM David Hildenbrand wrote: > > On 05.06.24 16:28, David Hildenbrand wrote: > > On 05.06.24 16:20, Lance Yang wrote: > >> Hi David, > >> > >> On Wed, Jun 5, 2024 at 8:46=E2=80=AFPM David Hildenbrand wrote: > >>> > >>> On 21.05.24 06:02, Lance Yang wrote: > >>>> In preparation for supporting try_to_unmap_one() to unmap PMD-mapped > >>>> folios, start the pagewalk first, then call split_huge_pmd_address()= to > >>>> split the folio. > >>>> > >>>> Since TTU_SPLIT_HUGE_PMD will no longer perform immediately, we migh= t > >>>> encounter a PMD-mapped THP missing the mlock in the VM_LOCKED range = during > >>>> the page walk. It=E2=80=99s probably necessary to mlock this THP to = prevent it from > >>>> being picked up during page reclaim. > >>>> > >>>> Suggested-by: David Hildenbrand > >>>> Suggested-by: Baolin Wang > >>>> Signed-off-by: Lance Yang > >>>> --- > >>> > >>> [...] again, sorry for the late review. > >> > >> No worries at all, thanks for taking time to review! > >> > >>> > >>>> diff --git a/mm/rmap.c b/mm/rmap.c > >>>> index ddffa30c79fb..08a93347f283 100644 > >>>> --- a/mm/rmap.c > >>>> +++ b/mm/rmap.c > >>>> @@ -1640,9 +1640,6 @@ static bool try_to_unmap_one(struct folio *fol= io, struct vm_area_struct *vma, > >>>> if (flags & TTU_SYNC) > >>>> pvmw.flags =3D PVMW_SYNC; > >>>> > >>>> - if (flags & TTU_SPLIT_HUGE_PMD) > >>>> - split_huge_pmd_address(vma, address, false, folio); > >>>> - > >>>> /* > >>>> * For THP, we have to assume the worse case ie pmd for inva= lidation. > >>>> * For hugetlb, it could be much worse if we need to do pud > >>>> @@ -1668,20 +1665,35 @@ static bool try_to_unmap_one(struct folio *f= olio, struct vm_area_struct *vma, > >>>> mmu_notifier_invalidate_range_start(&range); > >>>> > >>>> while (page_vma_mapped_walk(&pvmw)) { > >>>> - /* Unexpected PMD-mapped THP? */ > >>>> - VM_BUG_ON_FOLIO(!pvmw.pte, folio); > >>>> - > >>>> /* > >>>> * If the folio is in an mlock()d vma, we must not s= wap it out. > >>>> */ > >>>> if (!(flags & TTU_IGNORE_MLOCK) && > >>>> (vma->vm_flags & VM_LOCKED)) { > >>>> /* Restore the mlock which got missed */ > >>>> - if (!folio_test_large(folio)) > >>>> + if (!folio_test_large(folio) || > >>>> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD)= )) > >>>> mlock_vma_folio(folio, vma); > >>> > >>> Can you elaborate why you think this would be required? If we would h= ave > >>> performed the split_huge_pmd_address() beforehand, we would still be > >>> left with a large folio, no? > >> > >> Yep, there would still be a large folio, but it wouldn't be PMD-mapped= . > >> > >> After Weifeng's series[1], the kernel supports mlock for PTE-mapped la= rge > >> folio, but there are a few scenarios where we don't mlock a large foli= o, such > >> as when it crosses a VM_LOCKed VMA boundary. > >> > >> - if (!folio_test_large(folio)) > >> + if (!folio_test_large(folio) || > >> + (!pvmw.pte && (flags & TTU_SPLIT_HUGE_PMD= ))) > >> > >> And this check is just future-proofing and likely unnecessary. If enco= untering a > >> PMD-mapped THP missing the mlock for some reason, we can mlock this > >> THP to prevent it from being picked up during page reclaim, since it i= s fully > >> mapped and doesn't cross the VMA boundary, IIUC. > >> > >> What do you think? > >> I would appreciate any suggestions regarding this check ;) > > > > Reading this patch only, I wonder if this change makes sense in the > > context here. > > > > Before this patch, we would have PTE-mapped the PMD-mapped THP before > > reaching this call and skipped it due to "!folio_test_large(folio)". > > > > After this patch, we either > > > > a) PTE-remap the THP after this check, but retry and end-up here again, > > whereby we would skip it due to "!folio_test_large(folio)". > > > > b) Discard the PMD-mapped THP due to lazyfree directly. Can that > > co-exist with mlock and what would be the problem here with mlock? > > > > Thanks a lot for clarifying! > > So if the check is required in this patch, we really have to understand > > why. If not, we should better drop it from this patch. > > > > At least my opinion, still struggling to understand why it would be > > required (I have 0 knowledge about mlock interaction with large folios = :) ). > > > > Looking at that series, in folio_references_one(), we do > > if (!folio_test_large(folio) || !pvmw.pte) { > /* Restore the mlock which got missed */ > mlock_vma_folio(folio, vma); > page_vma_mapped_walk_done(&pvmw); > pra->vm_flags |=3D VM_LOCKED; > return false; /* To break the loop */ > } > > I wonder if we want that here as well now: in case of lazyfree we > would not back off, right? > > But I'm not sure if lazyfree in mlocked areas are even possible. > > Adding the "!pvmw.pte" would be much clearer to me than the flag check. Hmm... How about we drop it from this patch for now, and add it back if nee= ded in the future? Thanks, Lance > > -- > Cheers, > > David / dhildenb >