Re: [PATCH] mm: support large mapping building for tmpfs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>,
	akpm@linux-foundation.org, hughd@google.com
Cc: ziy@nvidia.com, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
	dev.jain@arm.com, baohua@kernel.org, vbabka@suse.cz,
	rppt@kernel.org, surenb@google.com, mhocko@suse.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: support large mapping building for tmpfs
Date: Wed, 2 Jul 2025 13:38:02 +0200	[thread overview]
Message-ID: <7b17af10-b052-4719-bbce-ffad2d74006a@redhat.com> (raw)
In-Reply-To: <67c79f65-ca6d-43be-a4ec-decd08bbce0a@linux.alibaba.com>

>> So by mapping more in a single page fault, you end up increasing "RSS".
>> But I wouldn't
>> call that "expected". I rather suspect that nobody will really care :)
> 
> But tmpfs is a little special here. It uses the 'huge=' option to
> control large folio allocation. So, I think users should know they want
> to use large folios and build the whole mapping for the large folios.
> That is why I call it 'expected'.

Well, if your distribution decides to set huge= on /tmp or something 
like that, your application might have very little saying in that, right? :)

Again, I assume it's fine, but we might find surprises on the way.

>>
>> The thing is, when you *allocate* a new folio, it must adhere at least to
>> pagecache alignment (e.g., cannot place an order-2 folio at pgoff 1) --
> 
> Yes, agree.
> 
>> that is what
>> thp_vma_suitable_order() checks. Otherwise you cannot add it to the
>> pagecache.
> 
> But this alignment is not done by thp_vma_suitable_order().
> 
> For tmpfs, it will check the alignment in shmem_suitable_orders() via:
> "
> 	if (!xa_find(&mapping->i_pages, &aligned_index,
> 			aligned_index + pages - 1, XA_PRESENT))
> "

That's not really alignment check, that's just checking whether a 
suitable folio order spans already-present entries, no?

Finding suitable orders is still up to other code IIUC.

> 
> For other fs systems, it will check the alignment in
> __filemap_get_folio() via:
> "
> 	/* If we're not aligned, allocate a smaller folio */
> 	if (index & ((1UL << order) - 1))
> 		order = __ffs(index);
> "
> 
>> But once you *obtain* a folio from the pagecache and are supposed to map it
>> into the page tables, that must already hold true.
>>
>> So you should be able to just blindly map whatever is given to you here
>> AFAIKS.
>>
>> If you would get a pagecache folio that violates the linear page offset
>> requirement
>> at that point, something else would have messed up the pagecache.
> 
> Yes. But the comments from thp_vma_suitable_order() is not about the
> pagecache alignment, it says "the order-aligned addresses in the VMA map
> to order-aligned offsets within the file",

Let's dig, it's confusing.

The code in question is:

if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
		hpage_size >> PAGE_SHIFT))

So yes, I think this tells us: if we would have a PMD THP in the 
pagecache, would we be able to map it with a PMD. If not, then don't 
bother with allocating a PMD THP.

Of course, this also applies to other orders, but for PMD THPs it's 
probably most relevant: if we cannot even map it through a PMD, then 
probably it could be a wasted THP.

So yes, I agree: if we are both no missing something, then this 
primarily relevant for the PMD case.

And it's more about "optimization" than "correctness" I guess?

But when mapping a folio that is already in the pagecache, I assume this 
is not required.

Assume we have a 2 MiB THP in the pagecache.

If someone were to map it at virtual addr 1MiB, we could still map 1MiB 
worth of PTEs into a single page table in one go, and not fallback to 
individual PTEs.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2025-07-02 11:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-01  8:40 Baolin Wang
2025-07-01 13:08 ` David Hildenbrand
2025-07-02  2:03   ` Baolin Wang
2025-07-02  8:45     ` David Hildenbrand
2025-07-02  9:44       ` Baolin Wang
2025-07-02 11:38         ` David Hildenbrand [this message]
2025-07-02 11:55           ` David Hildenbrand
2025-07-04  2:35             ` Baolin Wang
2025-07-04  2:04           ` Baolin Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b17af10-b052-4719-bbce-ffad2d74006a@redhat.com \
    --to=david@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=dev.jain@arm.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=npache@redhat.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox