linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Lance Yang <ioworker0@gmail.com>
Cc: 21cnbao@gmail.com, akpm@linux-foundation.org,
	baolin.wang@linux.alibaba.com, chrisl@kernel.org,
	kasong@tencent.com, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com,
	ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org,
	ying.huang@intel.com, zhengtangquan@oppo.com
Subject: Re: [PATCH v4 3/4] mm: Support batched unmap for lazyfree large folios during reclamation
Date: Tue, 24 Jun 2025 17:34:23 +0200	[thread overview]
Message-ID: <2c19a6cf-0b42-477b-a672-ed8c1edd4267@redhat.com> (raw)
In-Reply-To: <20250624152654.38145-1-ioworker0@gmail.com>

On 24.06.25 17:26, Lance Yang wrote:
> On 2025/6/24 20:55, David Hildenbrand wrote:
>> On 14.02.25 10:30, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
> [...]
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 89e51a7a9509..8786704bd466 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1781,6 +1781,25 @@ void folio_remove_rmap_pud(struct folio *folio,
>>> struct page *page,
>>>    #endif
>>>    }
>>> +/* We support batch unmapping of PTEs for lazyfree large folios */
>>> +static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
>>> +            struct folio *folio, pte_t *ptep)
>>> +{
>>> +    const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
>>> +    int max_nr = folio_nr_pages(folio);
>>
>> Let's assume we have the first page of a folio mapped at the last page
>> table entry in our page table.
> 
> Good point. I'm curious if it is something we've seen in practice ;)

I challenge you to write a reproducer :P I assume it might be doable 
through simple mremap().

> 
>>
>> What prevents folio_pte_batch() from reading outside the page table?
> 
> Assuming such a scenario is possible, to prevent any chance of an
> out-of-bounds read, how about this change:
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index fb63d9256f09..9aeae811a38b 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1852,6 +1852,25 @@ static inline bool can_batch_unmap_folio_ptes(unsigned long addr,
>   	const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
>   	int max_nr = folio_nr_pages(folio);
>   	pte_t pte = ptep_get(ptep);
> +	unsigned long end_addr;
> +
> +	/*
> +	 * To batch unmap, the entire folio's PTEs must be contiguous
> +	 * and mapped within the same PTE page table, which corresponds to
> +	 * a single PMD entry. Before calling folio_pte_batch(), which does
> +	 * not perform boundary checks itself, we must verify that the
> +	 * address range covered by the folio does not cross a PMD boundary.
> +	 */
> +	end_addr = addr + (max_nr * PAGE_SIZE) - 1;
> +
> +	/*
> +	 * A fast way to check for a PMD boundary cross is to align both
> +	 * the start and end addresses to the PMD boundary and see if they
> +	 * are different. If they are, the range spans across at least two
> +	 * different PMD-managed regions.
> +	 */
> +	if ((addr & PMD_MASK) != (end_addr & PMD_MASK))
> +		return false;

You should not be messing with max_nr = folio_nr_pages(folio) here at 
all. folio_pte_batch() takes care of that.

Also, way too many comments ;)

You may only batch within a single VMA and within a single page table.

So simply align the addr up to the next PMD, and make sure it does not 
exceed the vma end.

ALIGN and friends can help avoiding excessive comments.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-06-24 15:34 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-14  9:30 [PATCH v4 0/4] mm: batched unmap " Barry Song
2025-02-14  9:30 ` [PATCH v4 1/4] mm: Set folio swapbacked iff folios are dirty in try_to_unmap_one Barry Song
2025-02-14  9:30 ` [PATCH v4 2/4] mm: Support tlbbatch flush for a range of PTEs Barry Song
2025-02-14  9:30 ` [PATCH v4 3/4] mm: Support batched unmap for lazyfree large folios during reclamation Barry Song
2025-06-24 12:55   ` David Hildenbrand
2025-06-24 15:26     ` Lance Yang
2025-06-24 15:34       ` David Hildenbrand [this message]
2025-06-24 16:25         ` Lance Yang
2025-06-25  9:38           ` Barry Song
2025-06-25 10:00           ` David Hildenbrand
2025-06-25 10:38             ` Barry Song
2025-06-25 10:43               ` David Hildenbrand
2025-06-25 10:49                 ` Barry Song
2025-06-25 10:59                   ` David Hildenbrand
2025-06-25 10:47             ` Lance Yang
2025-06-25 10:49               ` David Hildenbrand
2025-06-25 10:57               ` Barry Song
2025-06-25 11:01                 ` David Hildenbrand
2025-06-25 11:15                   ` Barry Song
2025-06-25 11:27                     ` David Hildenbrand
2025-06-25 11:42                       ` Barry Song
2025-06-25 12:09                         ` David Hildenbrand
2025-06-25 12:20                           ` Lance Yang
2025-06-25 12:25                             ` David Hildenbrand
2025-06-25 12:35                               ` Lance Yang
2025-06-25 21:03                               ` Barry Song
2025-06-26  1:17                                 ` Lance Yang
2025-06-26  8:17                                   ` David Hildenbrand
2025-06-26  9:29                                     ` Lance Yang
2025-06-26 12:44                                       ` Lance Yang
2025-06-26 13:16                                         ` David Hildenbrand
2025-06-26 13:52                                           ` Lance Yang
2025-06-26 14:39                                             ` David Hildenbrand
2025-06-26 15:06                                               ` Lance Yang
2025-06-26 21:46                                       ` Barry Song
2025-06-26 21:52                                         ` David Hildenbrand
2025-06-25 12:58                           ` Lance Yang
2025-06-25 13:02                             ` David Hildenbrand
2025-06-25  8:44         ` Lance Yang
2025-06-25  9:29           ` Lance Yang
2025-07-01 10:03   ` Harry Yoo
2025-07-01 13:27     ` Harry Yoo
2025-07-01 16:17       ` David Hildenbrand
2025-02-14  9:30 ` [PATCH v4 4/4] mm: Avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap Barry Song
2025-06-25 13:49 ` [PATCH v4 0/4] mm: batched unmap lazyfree large folios during reclamation Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c19a6cf-0b42-477b-a672-ed8c1edd4267@redhat.com \
    --to=david@redhat.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=chrisl@kernel.org \
    --cc=ioworker0@gmail.com \
    --cc=kasong@tencent.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=ryan.roberts@arm.com \
    --cc=v-songbaohua@oppo.com \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=zhengtangquan@oppo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox