linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: David Hildenbrand <david@redhat.com>
Cc: Zhang Qilong <zhangqilong3@huawei.com>,
	akpm@linux-foundation.org, Liam.Howlett@oracle.com,
	vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, jannh@google.com, pfalcato@suse.de,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	wangkefeng.wang@huawei.com, sunnanyong@huawei.com,
	Dev Jain <dev.jain@arm.com>, Ryan Roberts <ryan.roberts@arm.com>
Subject: Re: [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs batch optimization.
Date: Mon, 27 Oct 2025 19:51:14 +0000	[thread overview]
Message-ID: <0c15f911-cd4c-4ad3-97b7-e7153679f15e@lucifer.local> (raw)
In-Reply-To: <276b70aa-9853-40cc-8e7d-e790166706b5@redhat.com>

+Dev, Ryan

Please ensure to keep Dev + Ryan in the loop on all future iterations of this.

On Mon, Oct 27, 2025 at 08:24:40PM +0100, David Hildenbrand wrote:
> On 27.10.25 15:03, Zhang Qilong wrote:
> > Currently, the PTEs batch requires folio access, with the maximum
> > quantity limited to the PFNs contained within the folio. However,
> > in certain case (such as mremap_folio_pte_batch and mincore_pte_range),
> > accessing the folio is unnecessary and expensive.
> >
> > For scenarios that do not require folio access, this patch introduces
> > can_pte_batch_count(). With contiguous physical addresses and identical
> > PTE attribut bits, we can now process more page table entries at once,
> > in batch, not just limited to entries mapped within a single folio. On
> > the other hand, it avoid the folio access.
> >
> > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
> > ---
> >   mm/internal.h | 76 +++++++++++++++++++++++++++++++++++++++------------
> >   1 file changed, 58 insertions(+), 18 deletions(-)
> >
> > diff --git a/mm/internal.h b/mm/internal.h
> > index 1561fc2ff5b8..92034ca9092d 100644
> > --- a/mm/internal.h
> > +++ b/mm/internal.h
> > @@ -233,61 +233,62 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags)
> >   		pte = pte_wrprotect(pte);
> >   	return pte_mkold(pte);
> >   }
> >   /**
> > - * folio_pte_batch_flags - detect a PTE batch for a large folio
> > - * @folio: The large folio to detect a PTE batch for.
> > + * can_pte_batch_count - detect a PTE batch in range [ptep, to ptep + max_nr)
>
> I really don't like the name.
>
> Maybe it's just pte_batch().

Yeah the name's terrible.

But I'm iffy about this series as a whole.

'can' implies boolean, it should be something like get pte batch or count pte
batch or something like this. It's silly to partially replace other functions
also.

But I've doubtful as to whether any of this will work...

>
> >    * @vma: The VMA. Only relevant with FPB_MERGE_WRITE, otherwise can be NULL.
> >    * @ptep: Page table pointer for the first entry.
> >    * @ptentp: Pointer to a COPY of the first page table entry whose flags this
> >    *	    function updates based on @flags if appropriate.
> >    * @max_nr: The maximum number of table entries to consider.
> >    * @flags: Flags to modify the PTE batch semantics.
> >    *
> > - * Detect a PTE batch: consecutive (present) PTEs that map consecutive
> > - * pages of the same large folio in a single VMA and a single page table.
> > + * This interface is designed for this case that do not require folio access.
> > + * If folio consideration is needed, please call folio_pte_batch_flags instead.

I'm pretty certain we need to make sure we do not cross folio boundaries, which
kills this series if so does it not?

Ryan - can you confirm?

> > + *
> > + * Detect a PTE batch: consecutive (present) PTEs that map consecutive pages
> > + * in a single VMA and a single page table.
> >    *
> >    * All PTEs inside a PTE batch have the same PTE bits set, excluding the PFN,
> >    * the accessed bit, writable bit, dirty bit (unless FPB_RESPECT_DIRTY is set)
> >    * and soft-dirty bit (unless FPB_RESPECT_SOFT_DIRTY is set).
> >    *
> > - * @ptep must map any page of the folio. max_nr must be at least one and
> > + * @ptep point to the first entry in range, max_nr must be at least one and
> >    * must be limited by the caller so scanning cannot exceed a single VMA and
> >    * a single page table.
> >    *
> >    * Depending on the FPB_MERGE_* flags, the pte stored at @ptentp will
> >    * be updated: it's crucial that a pointer to a COPY of the first
> >    * page table entry, obtained through ptep_get(), is provided as @ptentp.
> >    *
> > - * This function will be inlined to optimize based on the input parameters;
> > - * consider using folio_pte_batch() instead if applicable.
> > + * The following folio_pte_batch_flags() deal with PTEs that mapped in a
> > + * single folio. However can_pte_batch_count has the capability to handle
> > + * PTEs that mapped in consecutive folios. If flags is not set, it will ignore
> > + * the accessed, writable and dirty bits. Once the flags is set, the respect
> > + * bit(s) will be compared in pte_same(), if the advanced pte_batch_hint()
> > + * respect pte bit is different, pte_same() will return false and break. This
> > + * ensures the correctness of handling multiple folio PTEs.
> > + *
> > + * This function will be inlined to optimize based on the input parameters.
> >    *
> >    * Return: the number of table entries in the batch.
> >    */
>
> I recall trouble if we try batching across folios:

Yup pretty sure Ryan said we don't/can't in a previous thread. Now cc'd...

>
> commit 7b08b74f3d99f6b801250683c751d391128799ec (tag: mm-hotfixes-stable-2025-05-10-14-23)
> Author: Petr Vaněk <arkamar@atlas.cz>
> Date:   Fri May 2 23:50:19 2025 +0200
>
>     mm: fix folio_pte_batch() on XEN PV
>     On XEN PV, folio_pte_batch() can incorrectly batch beyond the end of a
>     folio due to a corner case in pte_advance_pfn().  Specifically, when the
>     PFN following the folio maps to an invalidated MFN,
>             expected_pte = pte_advance_pfn(expected_pte, nr);
>     produces a pte_none().  If the actual next PTE in memory is also
>     pte_none(), the pte_same() succeeds,
>             if (!pte_same(pte, expected_pte))
>                     break;
>     the loop is not broken, and batching continues into unrelated memory.
> ...
>
>
> --
> Cheers
>
> David / dhildenb
>

Thanks, Lorenzk


  reply	other threads:[~2025-10-27 19:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 14:03 [RFC PATCH 0/3] mm: PTEs batch optimization in mincore and mremap Zhang Qilong
2025-10-27 14:03 ` [RFC PATCH 1/3] mm: Introduce can_pte_batch_count() for PTEs batch optimization Zhang Qilong
2025-10-27 19:24   ` David Hildenbrand
2025-10-27 19:51     ` Lorenzo Stoakes [this message]
2025-10-27 20:21       ` Ryan Roberts
2025-10-27 14:03 ` [RFC PATCH 2/3] mm/mincore: Use can_pte_batch_count() in mincore_pte_range() for pte batch mincore_pte_range() Zhang Qilong
2025-10-27 19:27   ` David Hildenbrand
2025-10-27 19:34   ` Lorenzo Stoakes
2025-10-27 14:03 ` [RFC PATCH 3/3] mm/mremap: Use can_pte_batch_count() instead of folio_pte_batch() for pte batch Zhang Qilong
2025-10-27 19:41   ` David Hildenbrand
2025-10-27 19:57   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0c15f911-cd4c-4ad3-97b7-e7153679f15e@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=pfalcato@suse.de \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=sunnanyong@huawei.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=zhangqilong3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox