From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C9EAC7115C for ; Wed, 25 Jun 2025 09:38:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DECFA6B00A8; Wed, 25 Jun 2025 05:38:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DC53B6B00B3; Wed, 25 Jun 2025 05:38:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D020E6B00B4; Wed, 25 Jun 2025 05:38:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C21366B00A8 for ; Wed, 25 Jun 2025 05:38:36 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7742D106758 for ; Wed, 25 Jun 2025 09:38:36 +0000 (UTC) X-FDA: 83593423032.20.2E3707C Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180]) by imf05.hostedemail.com (Postfix) with ESMTP id 9605010000F for ; Wed, 25 Jun 2025 09:38:34 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VqCuePTT; spf=pass (imf05.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.180 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750844314; a=rsa-sha256; cv=none; b=nWbm834sBO/OQ55rHzvniaDX6K1MDSHdNXqVOmBegsgJF0lJBvsZQONjdgprOkaDO0zthZ v0e/BiRjjvkY5FRsbzZsqH0I5CcC5YmGUoLUg4HJRgvdnAwKdjdTqkdeiHnbKVwB3wYyHw Oe/ee/mlimHF2Faiia//elxZZTZFUHA= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=VqCuePTT; spf=pass (imf05.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.180 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750844314; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ik/O3lKW8D4PX3b9RKMi5Pl8dRCbo+JtmwcqMh8DqL4=; b=7WxF0VeQ7oTqNeP0aqw7QxL+/zrFmPPpiWOhSBQiVlQnfklG/9SZz3Uam8e6ljqrcXmGLz w20vTPZ03XBKI9qslYD31b9nln3sDmLFH8t2p+0WoUuM05afyd3qhP5uGtKgeSYF8nPI+o 8RUqzEl9pHinWypwf8iN7NKvsjMKmoM= Received: by mail-vk1-f180.google.com with SMTP id 71dfb90a1353d-531b4da8189so1484153e0c.1 for ; Wed, 25 Jun 2025 02:38:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750844313; x=1751449113; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Ik/O3lKW8D4PX3b9RKMi5Pl8dRCbo+JtmwcqMh8DqL4=; b=VqCuePTTUosJk9UmEcwSB10DCsu4hSuO5Ir9ye3AnTvQKC+iYebRlYHkM0ExhgwzUh 50fX3EymT+XYwL3VfB1CK+2/rqjGjNAaJrkwF8MULhFw4/JFQNIzCUCoXd3mr0W76XYu jgSU+P3WwMg74w73EoqklrfPhXStmiprDgG7wayKWCMdaRleaC4QukhchH5RUq6aTb4D LgtImM3mudb8KnvobeKKrKG+d9zoW43Clb89b2L5LIwRWewMELk2tv1PnK8t5LNWf9m+ 173OoXfbLEuxdOEo0yGG3nK5w43g+68SKoQTEDe4qDYYmzl4nQ+B3uXDhz0RCKPCQakO bw7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750844313; x=1751449113; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Ik/O3lKW8D4PX3b9RKMi5Pl8dRCbo+JtmwcqMh8DqL4=; b=KKdADOxrgt93ZSU/raeMocdgusnozsNDJlGlLivT7C7JtAaSfSffFne1iyNjN29sA1 uVq2YvD3KM7kQNnfztdK7j2Sv2q7AA2x654WNBlh7JuyHeXzHDKlgYoYcGFmKih/M8Bw eLYJIVp+XC8UhDVD7pvRYuwBTDIxFri84/evM02jreXvMnW50RHXs9jYtEuvPApBnIxN CGTArWWP7er0OtNVIiFOeEow/ZkYtljtrBvPFOy/etScmoRHgUn1mj6SfqUABnYFKm40 +o6nLleYSwhFZ48e+f4F5rIoZXb2Q9EJxRujvR4CHlID4Flrcweo8+lpCvSbNX4vLhL/ gDug== X-Forwarded-Encrypted: i=1; AJvYcCWCmw3e1r+XYirZOtE3N3gjo6qwzZshEHGj7L17dmOjAjy7Ap6gACObzxafWajFQEMvjv6B9btMGw==@kvack.org X-Gm-Message-State: AOJu0YxciAzv2y4wj46+GqKTkia72+xU0oTZBJYTOj1h8ZBUZQa2vsgm 94agH2jDnPuvSVUGOF8NMD0rFJUQo+WT+vTUdDulHN68gx1tPyS2UKNlcaDahhgD4w9fnzJeNWS 30h60l4ljBGDtpJO/nJKVJ5xmO1zwQz0= X-Gm-Gg: ASbGnctUclekQMMqMPyg5arhwTlCjzbnmx/t2YcZQvhQ6bVTSrNOyV3DbKc/SgTdc14 bFZdDD66SdMYHfdXgLd3TR6E5cHzl8Mt2pWWSnbR/pEu0xF/gDtoCQis5pPeggy4w8ypkCwIXde YG5i19Jg/ByjcH15oYA8VrH82s/UhDZvA7KXhkHL7axZM= X-Google-Smtp-Source: AGHT+IHFsEjf8mVqrGJlOglZXBqK+LXUAHLKDnHRegX9xwHltPGn1BuflQtnpCujXSYlVduWzyct2ogezPY53n9zi2Q= X-Received: by 2002:a05:6122:2516:b0:530:81ac:51be with SMTP id 71dfb90a1353d-532ef6cc1e4mr985532e0c.8.1750844313457; Wed, 25 Jun 2025 02:38:33 -0700 (PDT) MIME-Version: 1.0 References: <2c19a6cf-0b42-477b-a672-ed8c1edd4267@redhat.com> <20250624162503.78957-1-ioworker0@gmail.com> In-Reply-To: <20250624162503.78957-1-ioworker0@gmail.com> From: Barry Song <21cnbao@gmail.com> Date: Wed, 25 Jun 2025 21:38:21 +1200 X-Gm-Features: Ac12FXwzgBA5qA6Bu1rSIUYx-JEmY2u59OAGcqCQlCXMv4jDK2aouZ9-YNF9zfQ Message-ID: Subject: Re: [PATCH v4 3/4] mm: Support batched unmap for lazyfree large folios during reclamation To: Lance Yang Cc: david@redhat.com, akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, kasong@tencent.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 3xjxzw3wu7xcb714ur3548wyb5gj31w3 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9605010000F X-Rspam-User: X-HE-Tag: 1750844314-655922 X-HE-Meta: U2FsdGVkX1+01bYbk9EeFf5tSzE0CFkrz1DAyNQvf3SMXfnyJSxI0zuwNisqQOXYP6AXvdTMSV+qtAUDopNOZIPW7Urf5NxzpuTjAd33ilkMo2Jkw6CCQmlyRdJyt3fBahXhsL4b3u8mQvWBgK/CLHOZplv8WZO6iEmkr8/8Iu3mKki1F3omcxIhGrYqjgjCwRrIIb1tOhthiqsYx+FtnAMm2rUK7Zmq+mY1zHmuFy1PyNiPVLEgCLOiDwVpjU6UGru+Ls2mYkJWPYtJ9VAxgImH1yKBLzRfiwVUZU5mzgdcml5XF6eze2MHlOpo7wc+7m/azDx7uBNiDFOS2SjbAmsepaKX3WMJkr7ukrz6amR6HlrHNuD4aCvdIgzAMCUwmBs7tkL9WOOR8WW39TiUqxBCXWnPUNX83Mtl38b1m81/ZWcq4iavwmY5ifS7UDFqB2hcEnmJ6miU1InkhAJs5vT8w5b7ouVg43hFfpU4RcO2dKn/3HnoJraA7qyJNDKBXVzLBOkXUzUgeH9HMPWLHiYtpJqWeRLdplNxMeUcnQ1pmyndlwK9qTo7o107LNhjGBMCwHU+JxYFpDBHmaG8LuyVDCS5N9BQFYRjDwOVWxz7/xvxFGE0nhAErFB7rQIra67E0PZh2CUxgQiL3uCi3NydSwWaL4LWqNt0UlpPbWa38TD55r07+NeyTVzLlpc4gF6aUkDJ6StGnf4fUTZVbagfeQ85AGxKI8S0YjwG84yiXHWsshZG3JpFtE9Pj/gj5ZnuIlLFXvTBkltRzuSoQ/cJaACbF1ja36oSIFJIAX+9HRNzAvGhthFJq6mRDIgy43T7emtgikzTrVwtDeKyI2PScbPD56AtRft1mGRYYIPJFNQ8Zv/edJ4NYc49e6BCyI3BLwmllY18BngNbKyD0h0Gi55ftJo5vIlIatC9H9ggJ+YtYolLbg/Jb9B2UiEAN7Y7JQ8Ipq4UaC3EUEz wzgBRRZ6 EP12tex7SMZ4nQlNP7J/l4Wsh0nMJPkhSBiJZeK3PcCaMge2R0+YWkPWxcGQU46MeXPpM8r3BNr/yvIVUbeAKWAXPviVVTEwmMPeys/hL/l108Jv++gyrgtydQ47NkA9//WGc4MsuJOQJmet5FE+1XqNuf6adbZDnJllmKIfW8Pf2m0bEI9N1LZJtq13qZ+y5VeF609OOmwtCiMm7gXQbx3gGuG7QQp8OcBj2OBN46Z9IQ2dSvqAOHfKWYvq/RG2scKVMGB51m2vT1/ZN3czogE5LgmfbKKU0ywWJOrX+XyIfha409mo+CMyjBL3xwaGlhDWFL0Xmbp+gpBxbGIQnSDTmIxk86id55iB80W1hrCH9dSJaHbfvBplpTR9QNITOnNY0fKqzSBK13kt3jllasuAeklu4J//80CDXLGEGoku+RetVPCZhXpmjag== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 25, 2025 at 4:27=E2=80=AFAM Lance Yang wr= ote: > > On 2025/6/24 23:34, David Hildenbrand wrote: > > On 24.06.25 17:26, Lance Yang wrote: > >> On 2025/6/24 20:55, David Hildenbrand wrote: > >>> On 14.02.25 10:30, Barry Song wrote: > >>>> From: Barry Song > >> [...] > >>>> diff --git a/mm/rmap.c b/mm/rmap.c > >>>> index 89e51a7a9509..8786704bd466 100644 > >>>> --- a/mm/rmap.c > >>>> +++ b/mm/rmap.c > >>>> @@ -1781,6 +1781,25 @@ void folio_remove_rmap_pud(struct folio *foli= o, > >>>> struct page *page, > >>>> #endif > >>>> } > >>>> +/* We support batch unmapping of PTEs for lazyfree large folios */ > >>>> +static inline bool can_batch_unmap_folio_ptes(unsigned long addr, > >>>> + struct folio *folio, pte_t *ptep) > >>>> +{ > >>>> + const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DI= RTY; > >>>> + int max_nr =3D folio_nr_pages(folio); > >>> > >>> Let's assume we have the first page of a folio mapped at the last pag= e > >>> table entry in our page table. > >> > >> Good point. I'm curious if it is something we've seen in practice ;) > > > > I challenge you to write a reproducer :P I assume it might be doable > > through simple mremap(). > > > >> > >>> > >>> What prevents folio_pte_batch() from reading outside the page table? > >> > >> Assuming such a scenario is possible, to prevent any chance of an > >> out-of-bounds read, how about this change: > >> > >> diff --git a/mm/rmap.c b/mm/rmap.c > >> index fb63d9256f09..9aeae811a38b 100644 > >> --- a/mm/rmap.c > >> +++ b/mm/rmap.c > >> @@ -1852,6 +1852,25 @@ static inline bool > >> can_batch_unmap_folio_ptes(unsigned long addr, > >> const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIR= TY; > >> int max_nr =3D folio_nr_pages(folio); > >> pte_t pte =3D ptep_get(ptep); > >> + unsigned long end_addr; > >> + > >> + /* > >> + * To batch unmap, the entire folio's PTEs must be contiguous > >> + * and mapped within the same PTE page table, which corresponds t= o > >> + * a single PMD entry. Before calling folio_pte_batch(), which do= es > >> + * not perform boundary checks itself, we must verify that the > >> + * address range covered by the folio does not cross a PMD bounda= ry. > >> + */ > >> + end_addr =3D addr + (max_nr * PAGE_SIZE) - 1; > >> + > >> + /* > >> + * A fast way to check for a PMD boundary cross is to align both > >> + * the start and end addresses to the PMD boundary and see if the= y > >> + * are different. If they are, the range spans across at least tw= o > >> + * different PMD-managed regions. > >> + */ > >> + if ((addr & PMD_MASK) !=3D (end_addr & PMD_MASK)) > >> + return false; > > > > You should not be messing with max_nr =3D folio_nr_pages(folio) here at > > all. folio_pte_batch() takes care of that. > > > > Also, way too many comments ;) > > > > You may only batch within a single VMA and within a single page table. > > > > So simply align the addr up to the next PMD, and make sure it does not > > exceed the vma end. > > > > ALIGN and friends can help avoiding excessive comments. > > Thanks! How about this updated version based on your suggestion: > > diff --git a/mm/rmap.c b/mm/rmap.c > index fb63d9256f09..241d55a92a47 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1847,12 +1847,25 @@ void folio_remove_rmap_pud(struct folio *folio, s= truct page *page, > > /* We support batch unmapping of PTEs for lazyfree large folios */ > static inline bool can_batch_unmap_folio_ptes(unsigned long addr, > - struct folio *folio, pte_t *ptep) > + struct folio *folio, pte_t = *ptep, > + struct vm_area_struct *vma) > { > const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRT= Y; > + unsigned long next_pmd, vma_end, end_addr; > int max_nr =3D folio_nr_pages(folio); > pte_t pte =3D ptep_get(ptep); > > + /* > + * Limit the batch scan within a single VMA and within a single > + * page table. > + */ > + vma_end =3D vma->vm_end; > + next_pmd =3D ALIGN(addr + 1, PMD_SIZE); > + end_addr =3D addr + (unsigned long)max_nr * PAGE_SIZE; > + > + if (end_addr > min(next_pmd, vma_end)) > + return false; > + I had a similar check in do_swap_page() for both forward and backward out-of-bounds page tables, but I forgot to add it for this unmap path. this is do_swap_page(): if (folio_test_large(folio) && folio_test_swapcache(folio)) { int nr =3D folio_nr_pages(folio); unsigned long idx =3D folio_page_idx(folio, page); unsigned long folio_start =3D address - idx * PAGE_SIZE; unsigned long folio_end =3D folio_start + nr * PAGE_SIZE; pte_t *folio_ptep; pte_t folio_pte; if (unlikely(folio_start < max(address & PMD_MASK, vma->vm_start))) goto check_folio; if (unlikely(folio_end > pmd_addr_end(address, vma->vm_end)= )) goto check_folio; } So maybe something like folio_end > pmd_addr_end(address, vma->vm_end)? Thanks Barry