From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53426C7EE30 for ; Wed, 25 Jun 2025 10:50:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E76786B0088; Wed, 25 Jun 2025 06:50:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E25AA6B008A; Wed, 25 Jun 2025 06:50:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D14686B0096; Wed, 25 Jun 2025 06:50:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BB44F6B0088 for ; Wed, 25 Jun 2025 06:50:12 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9987E1C8797 for ; Wed, 25 Jun 2025 10:50:12 +0000 (UTC) X-FDA: 83593603464.27.C316DD4 Received: from mail-ua1-f52.google.com (mail-ua1-f52.google.com [209.85.222.52]) by imf09.hostedemail.com (Postfix) with ESMTP id E2E8D140003 for ; Wed, 25 Jun 2025 10:50:10 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iqbEbprL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750848610; a=rsa-sha256; cv=none; b=269Tjt4MGLfCGXrm47M1rUdntJTdeQ8x6SkWc5px2fDQcYFOt2bPyym0LqeRTvT1XRrfhW JzngY2q+xOs21gDddRB9ItYsU4y/qncyFxtS7w9sfRy2ieJr49tgN4asYf2Jmgox17RKjO ON3s8QP+eCTjy5mRXMpPciAq4oy8lCo= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iqbEbprL; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750848610; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/sJoc+hZXTIC0xZRzuuXvo5VKveaaibtDlPs/XIETRE=; b=YO0xr82rRAwqB8QNeVsiNjp31D1XqFi5grTohVMgCLh6ErpLcbReI5TWFQK7tbUoEmhKv7 z+Lp7EcZK620IC0CRFHWF2AfWjNG4QSwYjquR5WiG22Mt+rWmIGzOvDJpApk5HokUx5TFX 2JTNg47djv964CHXkZok0aWWFjxZHLo= Received: by mail-ua1-f52.google.com with SMTP id a1e0cc1a2514c-86f9c719d63so1461719241.1 for ; Wed, 25 Jun 2025 03:50:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750848610; x=1751453410; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/sJoc+hZXTIC0xZRzuuXvo5VKveaaibtDlPs/XIETRE=; b=iqbEbprLEW52Q2ewyt4RdfllVRzeRm0Ny3HEG9pLyPybrejgovGpA50EGu+j9fPZv5 FR4xpR3RVXUS8/bq8o0uokb/JYeV896NsUUlONNECT6JWOqu8xzsDXCbscgZ32yD9AB3 41s+HrcxZUXqJqeFKGYOT9E9JFcfNpr6FGPP5i4jrib45ItChp5mjbJZn32ViCIp1Sk0 sCVQnQpnS/TseeFIi4ArJDTCGpGpZTG3s43JjcSTPOyf+dqfHmh2vjvQeJzsOdmVMDz1 2OaW86XDJnxTMzU54piAsyHujJHlVGHppUtc+xemgDGEH5RoxuxyoPMBAHHF/htc3Ehu 334g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750848610; x=1751453410; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/sJoc+hZXTIC0xZRzuuXvo5VKveaaibtDlPs/XIETRE=; b=TyogagA6TsQlkrdYqYuTG3F6/ePbRxLVFV/aF5StlmdI1/C/0MukFthX9k1FrxDjaL hLfS1qbumxDLj2QYYF6sqCmiu1BLNDKHt41qh5L06D7Hq0Y0r2WOmUrd+sszZ0pgNO2H GdrIXY9aFRc1U7fyw4TGFpOLgPipjHiLe9Q8H6QhAsqbIVuLki6DLrpBPVFq+gbRVf00 ccuKynS9gd8SxqPh/b1EHORXYO6FzOtoFGjK++1dV6qFl3YuQUkxkABmpxoAXliWEGBw x5r6VzfpbwbDMCFPffTpt5ZgkckTAfJdpq1VpXqpcxz2FyBo/HmRvTGQS+HQtejSNg26 0KLg== X-Forwarded-Encrypted: i=1; AJvYcCXpG//+JB4cEwYG5aFRzKVq7f7recQ8Z+9B908+uJV+PBIM1kVzwF2qFb2Q7l8mwjOvJu5kPs77xQ==@kvack.org X-Gm-Message-State: AOJu0YwI1jyOVWyzGOWh8hvz74r/8AP7h2t1q9WjWtPaAWJZYBL6TyPW LPFgJ5fKIQOLjrAI87P81OZ5jdbzdd62qsqo4NE4o7sg8PBAnK7ibAT/L6Hv2YmpxqgRmoD88ZY XtTtj2Aq7joCFkmb9HWXFDhfteb4p6k0= X-Gm-Gg: ASbGnctGCNNmxLVcJFy9bNuMDlvlGoXnP3XJdaRnwNrq9sg5EeQhW1BX0v4T4AChvDl JKgBpT4UY3jGko7HbJRB/R1Jj/Vf9S13WSw0YEDe0q3PDq7FIE67ajC5QlIDz7nHWPBfA3kG5Sp 7LqsTzaPcEV2O5uK7/dA8G97IdCK2CxAOUHPqhxgYedA8= X-Google-Smtp-Source: AGHT+IFIGWbS74JzjCEZUb4i5VKaQ/aFUR/LkXTomtH2wQ2EWDeymwBSyFo8o2yIrCkV5HClVkxBQa8gvmUemUzz4AI= X-Received: by 2002:a05:6102:4193:b0:4e9:968b:4416 with SMTP id ada2fe7eead31-4ecc76a9296mr1213988137.24.1750848609825; Wed, 25 Jun 2025 03:50:09 -0700 (PDT) MIME-Version: 1.0 References: <2c19a6cf-0b42-477b-a672-ed8c1edd4267@redhat.com> <20250624162503.78957-1-ioworker0@gmail.com> <27d174e0-c209-4851-825a-0baeb56df86f@redhat.com> <8a157228-0b7e-479d-a224-ec85b458ea75@redhat.com> In-Reply-To: <8a157228-0b7e-479d-a224-ec85b458ea75@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Wed, 25 Jun 2025 22:49:57 +1200 X-Gm-Features: Ac12FXxMNCsiYUkQXXJTm07hkaCKzJuFJqJgjEbwhAVFd_yNyGUFpCyPyMLyoy8 Message-ID: Subject: Re: [PATCH v4 3/4] mm: Support batched unmap for lazyfree large folios during reclamation To: David Hildenbrand Cc: Lance Yang , akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: ymq6xgkinannicawx1hmhomqjzjdyoxe X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: E2E8D140003 X-HE-Tag: 1750848610-818631 X-HE-Meta: U2FsdGVkX1/wXiCIrGKJoo55PWOAUXST/mZCgMtKC8KcdZaX40JbTPxRmYrZzV7pOjJ2+PkdvowX3hR6KcRKVCuR/qnihWFH8dMP1lPJgBit8pOPQ3d+Qw7l0A4hUMkwLO8MXrABv0ajq0S7xEJ8EC8XWhrg4F7jPDTAppz43y4keQlJIhd/b/Dfonoo7d1TsQAcbmxiCBtuEyBFsXa7nrlGwO6/iHYiO/zCYPY8m+DLN/07/kSDd404/9lWhRaPRGjr6kndhU+rXZUd0szaelVWTVeQp0//ju6jKJtWC8b9FRLq/HtKBEMDnTBIPUmb6J5icQymn+F7+tWhE7iZPoK45zPJ+2D2vBLGGj4ub/VoX5RKujZXaD1t9Yj89WNTnmI9nersgbBzkzWQCFGaWQyPa/xe6ef9r08sQcwGw1sCq3RrLFh9PVPmOCo7/FcsraBDhGd+XTNaKUHVdXFLcNMtR6IYDxdLElHUXRdHp97DoiRG149KI9Zc5Wo3HiEkEB1tyDWZHPTd7mHg3/W/PVZgEWyNu8U2y0ZfWTPTkxo42LSGNfKr1MLMW0C8jVp8up7sDoZRgu7zmlGkpZ67wWqHA7SHcuPJYTjqQUKNyEPkyIYsi2HQHP0mVpkZKVZKDvp5Sh6l3Gdmbjp/W32AidNIB8E1gk3ycr/rEYxstWFDi3HJY5kMHuubxEH/HjudY+Jz1NTKiuWZrG+GlfbwRvaWtV5NRF3+aW5BQJMND8b1wx3aUj0VaK2AwJEgyp+sy5ku7fgAyIucw3rBir2XoKyTT0AIYciLSUygXHU7QVkJoqacy5ujyml/jmyn6s6TInKHuYu03tkmTMkVYBq6pt1hs/T9E+LFWrcLCpzoBD3490qVx27B0ePKQAMVFf+/JcXA25wKgsvfO/0Y+feKDZfufmHYRgxA1hEmFNGqHuYTtdRjm8VsSLZ7xcJ18eSnkpAjx3GUNrgvbCL7KS8 GeBdEIKV JOGMgAXULP0nTutWKbLl0sRLTYiezgna3k4u1ylZsFmN9N1jg27NgPU3grw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 25, 2025 at 10:43=E2=80=AFPM David Hildenbrand wrote: > > On 25.06.25 12:38, Barry Song wrote: > >>> diff --git a/mm/rmap.c b/mm/rmap.c > >>> index fb63d9256f09..241d55a92a47 100644 > >>> --- a/mm/rmap.c > >>> +++ b/mm/rmap.c > >>> @@ -1847,12 +1847,25 @@ void folio_remove_rmap_pud(struct folio *foli= o, struct page *page, > >>> > >>> /* We support batch unmapping of PTEs for lazyfree large folios */ > >>> static inline bool can_batch_unmap_folio_ptes(unsigned long addr, > >>> - struct folio *folio, pte_t *ptep) > >>> + struct folio *folio, pte_= t *ptep, > >>> + struct vm_area_struct *vm= a) > >>> { > >>> const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_D= IRTY; > >>> + unsigned long next_pmd, vma_end, end_addr; > >>> int max_nr =3D folio_nr_pages(folio); > >>> pte_t pte =3D ptep_get(ptep); > >>> > >>> + /* > >>> + * Limit the batch scan within a single VMA and within a single > >>> + * page table. > >>> + */ > >>> + vma_end =3D vma->vm_end; > >>> + next_pmd =3D ALIGN(addr + 1, PMD_SIZE); > >>> + end_addr =3D addr + (unsigned long)max_nr * PAGE_SIZE; > >>> + > >>> + if (end_addr > min(next_pmd, vma_end)) > >>> + return false; > >> > >> May I suggest that we clean all that up as we fix it? > >> > >> Maybe something like this: > >> > >> diff --git a/mm/rmap.c b/mm/rmap.c > >> index 3b74bb19c11dd..11fbddc6ad8d6 100644 > >> --- a/mm/rmap.c > >> +++ b/mm/rmap.c > >> @@ -1845,23 +1845,38 @@ void folio_remove_rmap_pud(struct folio *folio= , struct page *page, > >> #endif > >> } > >> > >> -/* We support batch unmapping of PTEs for lazyfree large folios */ > >> -static inline bool can_batch_unmap_folio_ptes(unsigned long addr, > >> - struct folio *folio, pte_t *ptep) > >> +static inline unsigned int folio_unmap_pte_batch(struct folio *folio, > >> + struct page_vma_mapped_walk *pvmw, enum ttu_flags flag= s, > >> + pte_t pte) > >> { > >> const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT= _DIRTY; > >> - int max_nr =3D folio_nr_pages(folio); > >> - pte_t pte =3D ptep_get(ptep); > >> + struct vm_area_struct *vma =3D pvmw->vma; > >> + unsigned long end_addr, addr =3D pvmw->address; > >> + unsigned int max_nr; > >> + > >> + if (flags & TTU_HWPOISON) > >> + return 1; > >> + if (!folio_test_large(folio)) > >> + return 1; > >> + > >> + /* We may only batch within a single VMA and a single page tab= le. */ > >> + end_addr =3D min_t(unsigned long, ALIGN(addr + 1, PMD_SIZE), v= ma->vm_end); > > > > Is this pmd_addr_end()? > > > > Yes, that could be reused as well here. > > >> + max_nr =3D (end_addr - addr) >> PAGE_SHIFT; > >> > >> + /* We only support lazyfree batching for now ... */ > >> if (!folio_test_anon(folio) || folio_test_swapbacked(folio)) > >> - return false; > >> + return 1; > >> if (pte_unused(pte)) > >> - return false; > >> - if (pte_pfn(pte) !=3D folio_pfn(folio)) > >> - return false; > >> + return 1; > >> + /* ... where we must be able to batch the whole folio. */ > >> + if (pte_pfn(pte) !=3D folio_pfn(folio) || max_nr !=3D folio_nr= _pages(folio)) > >> + return 1; > >> + max_nr =3D folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr= , fpb_flags, > >> + NULL, NULL, NULL); > >> > >> - return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_fla= gs, NULL, > >> - NULL, NULL) =3D=3D max_nr; > >> + if (max_nr !=3D folio_nr_pages(folio)) > >> + return 1; > >> + return max_nr; > >> } > >> > >> /* > >> @@ -2024,9 +2039,7 @@ static bool try_to_unmap_one(struct folio *folio= , struct vm_area_struct *vma, > >> if (pte_dirty(pteval)) > >> folio_mark_dirty(folio); > >> } else if (likely(pte_present(pteval))) { > >> - if (folio_test_large(folio) && !(flags & TTU_H= WPOISON) && > >> - can_batch_unmap_folio_ptes(address, folio,= pvmw.pte)) > >> - nr_pages =3D folio_nr_pages(folio); > >> + nr_pages =3D folio_unmap_pte_batch(folio, &pvm= w, flags, pteval); > >> end_addr =3D address + nr_pages * PAGE_SIZE; > >> flush_cache_range(vma, address, end_addr); > >> > >> > >> Note that I don't quite understand why we have to batch the whole thin= g or fallback to > >> individual pages. Why can't we perform other batches that span only so= me PTEs? What's special > >> about 1 PTE vs. 2 PTEs vs. all PTEs? > >> > >> > >> Can someone enlighten me why that is required? > > > > It's probably not a strict requirement =E2=80=94 I thought cases where = the > > count is greater than 1 but less than nr_pages might not provide much > > practical benefit, except perhaps in very rare edge cases, since > > madv_free() already calls split_folio(). > > Okay, but it makes the code more complicated. If there is no reason to > prevent the batching, we should drop it. It's not necessarily more complex, since page_vma_mapped_walk() still has to check each PTE individually and can't skip ahead based on nr. With nr_pages batched, we can exit the loop early in one go. Thanks Barry