From: Baolin Wang <baolin.wang@linux.alibaba.com>
To: akpm@linux-foundation.org, hughd@google.com, david@redhat.com
Cc: ziy@nvidia.com, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
dev.jain@arm.com, baohua@kernel.org, vbabka@suse.cz,
rppt@kernel.org, surenb@google.com, mhocko@suse.com,
baolin.wang@linux.alibaba.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: [PATCH] mm: support large mapping building for tmpfs
Date: Tue, 1 Jul 2025 16:40:55 +0800 [thread overview]
Message-ID: <d4cb6e578bca8c430174d5972550cbeb530ec3fe.1751359073.git.baolin.wang@linux.alibaba.com> (raw)
After commit acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs"),
tmpfs can also support large folio allocation (not just PMD-sized large
folios).
However, when accessing tmpfs via mmap(), although tmpfs supports large folios,
we still establish mappings at the base page granularity, which is unreasonable.
We can establish large mappings according to the size of the large folio. On one
hand, this can reduce the overhead of page faults; on the other hand, it can
leverage hardware architecture optimizations to reduce TLB misses, such as
contiguous PTEs on the ARM architecture.
Moreover, since the user has already added the 'huge=' option when mounting tmpfs
to allow for large folio allocation, establishing large folios' mapping is expected
and will not surprise users by inflating the RSS of the process.
In order to support large mappings for tmpfs, besides checking VMA limits and
PMD pagetable limits, it is also necessary to check if the linear page offset
of the VMA is order-aligned within the file.
Performance test:
I created a 1G tmpfs file, populated with 64K large folios, and accessed it
sequentially via mmap(). I observed a significant performance improvement:
Before the patch:
real 0m0.214s
user 0m0.012s
sys 0m0.203s
After the patch:
real 0m0.025s
user 0m0.000s
sys 0m0.024s
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
mm/memory.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index 0f9b32a20e5b..6385a9385a9b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5383,10 +5383,10 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
/*
* Using per-page fault to maintain the uffd semantics, and same
- * approach also applies to non-anonymous-shmem faults to avoid
+ * approach also applies to non shmem/tmpfs faults to avoid
* inflating the RSS of the process.
*/
- if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma)) ||
+ if (!vma_is_shmem(vma) || unlikely(userfaultfd_armed(vma)) ||
unlikely(needs_fallback)) {
nr_pages = 1;
} else if (nr_pages > 1) {
@@ -5395,15 +5395,20 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
/* The index of the entry in the pagetable for fault page. */
pgoff_t pte_off = pte_index(vmf->address);
+ unsigned long hpage_size = PAGE_SIZE << folio_order(folio);
/*
* Fallback to per-page fault in case the folio size in page
- * cache beyond the VMA limits and PMD pagetable limits.
+ * cache beyond the VMA limits or PMD pagetable limits. And
+ * also check if the linear page offset of vma is order-aligned
+ * within the file for tmpfs.
*/
if (unlikely(vma_off < idx ||
vma_off + (nr_pages - idx) > vma_pages(vma) ||
pte_off < idx ||
- pte_off + (nr_pages - idx) > PTRS_PER_PTE)) {
+ pte_off + (nr_pages - idx) > PTRS_PER_PTE) ||
+ !IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
+ hpage_size >> PAGE_SHIFT)) {
nr_pages = 1;
} else {
/* Now we can set mappings for the whole large folio. */
--
2.43.5
next reply other threads:[~2025-07-01 8:41 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-01 8:40 Baolin Wang [this message]
2025-07-01 13:08 ` David Hildenbrand
2025-07-02 2:03 ` Baolin Wang
2025-07-02 8:45 ` David Hildenbrand
2025-07-02 9:44 ` Baolin Wang
2025-07-02 11:38 ` David Hildenbrand
2025-07-02 11:55 ` David Hildenbrand
2025-07-04 2:35 ` Baolin Wang
2025-07-04 2:04 ` Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d4cb6e578bca8c430174d5972550cbeb530ec3fe.1751359073.git.baolin.wang@linux.alibaba.com \
--to=baolin.wang@linux.alibaba.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=npache@redhat.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox