From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA967C7115C for ; Wed, 25 Jun 2025 11:42:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4787C8D0007; Wed, 25 Jun 2025 07:42:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4280F8D0001; Wed, 25 Jun 2025 07:42:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 318028D0007; Wed, 25 Jun 2025 07:42:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 208CF8D0001 for ; Wed, 25 Jun 2025 07:42:52 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9A59010256C for ; Wed, 25 Jun 2025 11:42:51 +0000 (UTC) X-FDA: 83593736142.24.D36C6E5 Received: from mail-vs1-f53.google.com (mail-vs1-f53.google.com [209.85.217.53]) by imf24.hostedemail.com (Postfix) with ESMTP id 73558180008 for ; Wed, 25 Jun 2025 11:42:49 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="f/xJLe2z"; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750851769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w1E0eHLYsSqj8V6/6/0gwkH/3TSH/eRM7xqdNUicbSQ=; b=3ew+jVY1dzd2Jh+3CK3moHbzEuww4fXJabVnNrwb/xoktlA8AW4cKF3lpDBI8RFyf4d0yf 2iZjaevCW41uBZbw5NXY7ljxQjQY3qEdWLClf3t+IUjlVS8w/VXQ1PdN2wF2gahuIvhMAf ZuxgrHDbsrVjcd+CRTYW1yU7qoCo34Q= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750851769; a=rsa-sha256; cv=none; b=f5+SNJMUAfpjjTs1YjWut+kWoqNfJ1BnPOp6Kqal8UzQjdxiOhgRJT14iqrCvLjVQug3/q JOj2rSt3LjjLW5osnf0UY4owApmlTeA6N8W+KtV/mrkMsLMgOAG2aB1G0nRfWfTk2p2jhs Z56tg/QDZJ0MBXpnZ/B++60nVHlW5Dg= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="f/xJLe2z"; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vs1-f53.google.com with SMTP id ada2fe7eead31-4e7f367ea11so483064137.3 for ; Wed, 25 Jun 2025 04:42:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750851768; x=1751456568; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=w1E0eHLYsSqj8V6/6/0gwkH/3TSH/eRM7xqdNUicbSQ=; b=f/xJLe2zYesSuENi+901qZOdE3FAOP+DR0Brwp4CYXEberbUVHR8SRDjtmA3JNX8yg KuSwk8ez8r2SY1qucCP2h91ZAxeSJj63ByT/bK+KbYm/FjQOULQhZuPr4KaV9u1vL8/r 8NiKlNiLPevzsiv3S9Wp64JBMFokbQcD9nHnm96bxN7qwTEmoNRFbDR7kt9D+TuYH1bj s+UBrgcT8dTNqDx4kwY9WJfo3d9tt43Mu8mEFevlNZ7PlSTmmI6Bp5BpD98Ni1z3G+qB dqevR7iPQqpFyULyfhP+6K6ydPulifypMdDboolGkQxDAsqA0wUM8o0KWoMapEh6tJef VK1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750851768; x=1751456568; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w1E0eHLYsSqj8V6/6/0gwkH/3TSH/eRM7xqdNUicbSQ=; b=xDejAqeaqFsqUyzMrABOr7V2yeUXRYn4EwjSsaeB7mHCCbE47PGx0FZ71lZh5UoXHo 2oHeeUbBUS/+6uDR/lpeKsVoZqJcOp+asCNZYwzktxYhh9Zl7a3Ix5OXsMBAX39a2ga7 lKp1LP56hy6hf/qcnex6FV7Ni2wVGYnvxz1WaWyuD9YYgisk/fxo1aHMJHXq/CSNlpYV gdkTkkIYauqXrh0A1nL8S4X2HL/0XQLdCczb85Z0aXw9mZEp7fNO6dytr5gs3LPG5FSW T3y85ILF1kEtSFJBGpZd1oclcNShbNti5v7D1g+5ExRsyKxRO2yjox0u8bk+YRukqTZ5 GKQQ== X-Forwarded-Encrypted: i=1; AJvYcCWeHfNDEERovzsetkjpEOdXGlHtLUBa9l4xHqfxXc8GdF1c8CKldVPYGIvppSIOtYInwYa4xU3OzQ==@kvack.org X-Gm-Message-State: AOJu0Ywp7l7EQFQlYTHu/6MmvTGb6a59UwgH97zJ8kqJXMbrn3PsfLKV kBqjW4mTTkAKR+nKwa7FBfZK+1CEyOz34z8yqfkyhOstloiHBfwek02jFOF3OV9jsPbNelalkGx EpvY0m8SBgQhQGzFbxY9WGYKI3b2w4Ow= X-Gm-Gg: ASbGncvLqF0fowZMbjZD/CrwZBsEcVH9xMah7TIsRGC0BdZ5XX6Hqo1KkgSMC4eRyXy fcSHDHa4ZtotuMyl8Mze+ANxFLIhkT9i5ozQKZkD1JoAD8bD096gLzTN9hAaDuGj2ZzJ/1gOyMO F+rjqC7fOrZZNGVSKEjQw6wQJ4Mzus4SM1IzJztlPLYiPtyRdFf8+4XQ== X-Google-Smtp-Source: AGHT+IGsSxCfiBHYZxf4zTJngUXVXeIOT0PdP8VNFGZYgDUlzbW2AoPZQk3j5HWq3IA1WM6a9fvGa9ZMm28LZKzgbU4= X-Received: by 2002:a05:6102:689a:b0:4e6:a33d:9925 with SMTP id ada2fe7eead31-4ecc6a67b6emr1171422137.5.1750851768456; Wed, 25 Jun 2025 04:42:48 -0700 (PDT) MIME-Version: 1.0 References: <2c19a6cf-0b42-477b-a672-ed8c1edd4267@redhat.com> <20250624162503.78957-1-ioworker0@gmail.com> <27d174e0-c209-4851-825a-0baeb56df86f@redhat.com> <938c4726-b93e-46df-bceb-65c7574714a6@linux.dev> <5ba95609-302b-456a-a863-2bd5df51baf2@redhat.com> <6179dd30-5351-4a79-b0d6-f0e85650a926@redhat.com> In-Reply-To: <6179dd30-5351-4a79-b0d6-f0e85650a926@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Wed, 25 Jun 2025 23:42:36 +1200 X-Gm-Features: Ac12FXwHQ4LRq0KMwLDA3cdKc56s07Bk3-vMKyiAdfGW1BevfTrz5ebfq42hUs0 Message-ID: Subject: Re: [PATCH v4 3/4] mm: Support batched unmap for lazyfree large folios during reclamation To: David Hildenbrand Cc: Lance Yang , akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com, Lance Yang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 73558180008 X-Rspam-User: X-Rspamd-Server: rspam06 X-Stat-Signature: syo4zrgywjw6i8myjax78wkdyb74ebsn X-HE-Tag: 1750851769-885129 X-HE-Meta: U2FsdGVkX1/jvXJ9rTHUNKJS7XGk+xNU0OWLQMz2gf81U24sqBd0EZkCYjAbl0h4fD5Hikn5KR74VMGm4gSAxWar5ruL/aJcZJR3iIhPK5ffsoiEYjxFihSIKXv+6WL4q1nlzks5ke7WVWZiGemNWzmT3UMayL5sZDwmF2us1pCG5/ctRYDn71m/z9qHGsYsvS+9JKzUr4vJOr7wDvFF2nD8nopMGmBdqJSmL9DyjENMJ7TU0uh8CfOlGdeCiCVhZtV0ppR7g+J3szhPAmM39879dX1thZHkJq4GomOAd+/XV795sf1+Si8dAtwZg2aEcBocvh6pFtLewJ3BQpICHRpI1HRWVHFfgnQwkne+2CzwkhTI/FmM5tja2Yr/mYNEGv/x2T+zJVL/pjB/YloLkA11Pvd6llS5tuStRAYuoJeDdL3qWkqzMafZiMPGTnsZKoOgUGg7nk5G3K7iUzG0WA7u9bHsmv0QTNz/uBo1HV7UUMgcbXqEAeijHr80APacWCh2PYwdafUejdZZNWA7wBELOuiWzHVvmQOGyBjRsEWyGzmI7xu3Wgj0Lj85IjxR6yC68C0btKvL7CnWrubxCsfOw8XxqOIkMZMVLyI8kMViRETqdsbRv1o4S+LNZhx9GJayYPiRTab+FdxoYy0bDOvdJXOtx3ASe+g2ujX3vvkP7arfZ5VscTx/MdFXpKwV/tZmGgmleLAfnzffXZA6auiqj0swLjCZRqayBEWurnaHI+OdmcgopG5BqIwpKV29ubNVinzpPnEps/rPZ2r4cQAadUjm+SOLhiZqWxx74nDBDTcOjGvB1H/FJJQIfISTaZw5OLlv0NNMdNLKJ+hHNXm7+X43dQlybtD+2utsACET8tBfWsuoFHCg1d2dOhSu1Aqp/igVx8YpKjs2IfnnMjqEYwKHW7W+OgD7490IwaaqkU1YISWEbnA9H1TFb0359zl4znCRfG7bnsQw0YE psRmOPNg LwiWzQVkezNCh6aBNNv3bLxv/s6klugXXDjpBx79PVVw4rTC34cZ63VJxw5sOXtEPhU/PZQ0XbSPKn06KzZkRvLspkyhCGxds95CcFz5/xn55UME4taJArGzD3P1r/UXmLP/EQ5MFEZDwq0HOz6LQp0DO8Ub7HOYiflORll0IeUwx3fsGpLsGe4kB6m9jzd7lQG8mSDAvSQtnbqVZ9od3fmFtS1/7MwqY86DFGYykjZ1jKodLTG3ok4gJRAxAp8+0KENHnIsrt0R9bOMIBwcOkpsiVJJEneonOXov8EqNDIMvrG/sRMErk9YjM9AmXAodm/OT5zxMhFU+8Gn9o8cmzjM17JZK651ncQK4VNc0mW61LzOC++YR2f46PNW0gXiYXvYF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 25, 2025 at 11:27=E2=80=AFPM David Hildenbrand wrote: > > On 25.06.25 13:15, Barry Song wrote: > > On Wed, Jun 25, 2025 at 11:01=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 25.06.25 12:57, Barry Song wrote: > >>>>> > >>>>> Note that I don't quite understand why we have to batch the whole t= hing > >>>>> or fallback to > >>>>> individual pages. Why can't we perform other batches that span only= some > >>>>> PTEs? What's special > >>>>> about 1 PTE vs. 2 PTEs vs. all PTEs? > >>>> > >>>> That's a good point about the "all-or-nothing" batching logic ;) > >>>> > >>>> It seems the "all-or-nothing" approach is specific to the lazyfree u= se > >>>> case, which needs to unmap the entire folio for reclamation. If that= 's > >>>> not possible, it falls back to the single-page slow path. > >>> > >>> Other cases advance the PTE themselves, while try_to_unmap_one() reli= es > >>> on page_vma_mapped_walk() to advance the PTE. Unless we want to manua= lly > >>> modify pvmw.pte and pvmw.address outside of page_vma_mapped_walk(), w= hich > >>> to me seems like a violation of layers. :-) > >> > >> Please explain to me why the following is not clearer and better: > > > > This part is much clearer, but that doesn=E2=80=99t necessarily improve= the overall > > picture. The main challenge is how to exit the iteration of > > while (page_vma_mapped_walk(&pvmw)). > > Okay, I get what you mean now. > > > > > Right now, we have it laid out quite straightforwardly: > > /* We have already batched the entire folio */ > > if (nr_pages > 1) > > goto walk_done; > > > Given that the comment is completely confusing whens seeing the check ...= :) > > /* > * If we are sure that we batched the entire folio and cleared all PTEs, > * we can just optimize and stop right here. > */ > if (nr_pages =3D=3D folio_nr_pages(folio)) > goto walk_done; > > would make the comment match. Yes, that clarifies it. > > > > > with any nr between 1 and folio_nr_pages(), we have to consider two iss= ues: > > 1. How to skip PTE checks inside page_vma_mapped_walk for entries that > > were already handled in the previous batch; > > They are cleared if we reach that point. So the pte_none() checks will > simply skip them? > > > 2. How to break the iteration when this batch has arrived at the end. > > page_vma_mapped_walk() should be doing that? It seems you might have missed the part in my reply that says: "Of course, we could avoid both, but that would mean performing unnecessary checks inside page_vma_mapped_walk()." That=E2=80=99s true for both. But I=E2=80=99m wondering why we=E2=80=99re s= till doing the check, even when we=E2=80=99re fairly sure they=E2=80=99ve already been cleared or= we=E2=80=99ve reached the end :-) Somehow, I feel we could combine your cleanup code=E2=80=94which handles a = batch size of "nr" between 1 and nr_pages=E2=80=94with the "if (nr_pages =3D=3D folio_nr_pages(folio)) goto walk_done" check. In practice, this would let us skip almost all unnecessary checks, except for a few rare corner cases. For those corner cases where "nr" truly falls between 1 and nr_pages, we can just leave them as-is=E2=80=94performing the redundant check inside page_vma_mapped_walk(). > > -- > Cheers, > > David / dhildenb > Thanks Barry