[RFC PATCH v2 0/2] Support large folios for tmpfs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH v2 0/2] Support large folios for tmpfs
@ 2024-09-26  8:27 Baolin Wang
  2024-09-26  8:27 ` [RFC PATCH v2 1/2] mm: shmem: add large folio support to the write and fallocate paths Baolin Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Baolin Wang @ 2024-09-26  8:27 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, david, wangkefeng.wang, 21cnbao, ryan.roberts, ioworker0,
	da.gomez, baolin.wang, linux-mm, linux-kernel

Hi,

This RFC patch series attempts to support large folios for tmpfs. The first
patch is based on Daniel's previous patches in [1], mainly using the length
in the write and fallocate paths to get a highest order hint for large
order allocation. The last patch adds mTHP filter control for tmpfs if mTHP
is set for the following reasons:

1. Maintain backward compatibility for the control interface. Tmpfs already
has a global 'huge=' mount option and '/sys/kernel/mm/transparent_hugepage/shmem_enabled'
interface to control large order allocations. mTHP extends this capability to a
per-size basis while maintaining good interface compatibility.

2. For the large order allocation of writable mmap() faults in tmpfs, we need
something like the mTHP interfaces to control large orders, as well as ensuring
consistent interfaces with shmem.

3. Ryan pointed out that large order allocations based on write length could
lead to memory fragmentation issue. Just quoting Ryan's comment [2]:
"And it's possible (likely even, in my opinion) that allocating lots of different
folio sizes will exacerbate memory fragmentation, leading to more order-0
fallbacks, which would hurt the overall system performance in the long run, vs
restricting to a couple of folio sizes."

4. Some hardware preferences, such as for the ARM64 architecture, can better
utilize the cont-pte feature to reduce TLB pressure and optimize performance with
a 64K size folio. Using mTHP can better leverage these hardware advantages.

Any comments and suggestions are appreciated. Thanks.

[1] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/
[2] https://lore.kernel.org/all/e83e1687-3e3c-40d0-bf0e-225871647092@arm.com/

Changes from RFC v1:
 - Drop patch 1.
 - Use 'write_end' to calculate the length in shmem_allowable_huge_orders().
 - Update shmem_mapping_size_order() per Daniel.

Baolin Wang (1):
  mm: shmem: use mTHP interface to control huge orders for tmpfs

Daniel Gomez (1):
  mm: shmem: add large folio support to the write and fallocate paths

 mm/memory.c |  4 ++--
 mm/shmem.c  | 66 +++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 58 insertions(+), 12 deletions(-)

-- 
2.39.3

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH v2 1/2] mm: shmem: add large folio support to the write and fallocate paths
  2024-09-26  8:27 [RFC PATCH v2 0/2] Support large folios for tmpfs Baolin Wang
@ 2024-09-26  8:27 ` Baolin Wang
  2024-09-26 12:16   ` Matthew Wilcox
  2024-09-26  8:27 ` [RFC PATCH v2 2/2] mm: shmem: use mTHP interface to control huge orders for tmpfs Baolin Wang
  2024-09-26 12:20 ` [RFC PATCH v2 0/2] Support large folios " Matthew Wilcox
  2 siblings, 1 reply; 9+ messages in thread
From: Baolin Wang @ 2024-09-26  8:27 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, david, wangkefeng.wang, 21cnbao, ryan.roberts, ioworker0,
	da.gomez, baolin.wang, linux-mm, linux-kernel

From: Daniel Gomez <da.gomez@samsung.com>

Add large folio support for shmem write and fallocate paths matching the
same high order preference mechanism used in the iomap buffered IO path
as used in __filemap_get_folio().

Add shmem_mapping_size_order() to get a hint for the order of the folio
based on the file size which takes care of the mapping requirements.

If the top level huge page (controlled by '/sys/kernel/mm/transparent_hugepage/shmem_enabled')
is enabled, we just allow PMD sized THP to keep interface backward
compatibility.

Co-developed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/shmem.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 0613421e09e7..6dece90ff421 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1672,6 +1672,36 @@ bool shmem_hpage_pmd_enabled(void)
 	return false;
 }
 
+/**
+ * shmem_mapping_size_order - Get maximum folio order for the given file size.
+ * @mapping: Target address_space.
+ * @index: The page index.
+ * @size: The suggested size of the folio to create.
+ *
+ * This returns a high order for folios (when supported) based on the file size
+ * which the mapping currently allows at the given index. The index is relevant
+ * due to alignment considerations the mapping might have. The returned order
+ * may be less than the size passed.
+ *
+ * Like __filemap_get_folio order calculation.
+ *
+ * Return: The order.
+ */
+static inline unsigned int
+shmem_mapping_size_order(struct address_space *mapping, pgoff_t index, size_t size)
+{
+	unsigned int order = get_order(max_t(size_t, size, PAGE_SIZE));
+
+	if (!mapping_large_folio_support(mapping))
+		return 0;
+
+	/* If we're not aligned, allocate a smaller folio */
+	if (index & ((1UL << order) - 1))
+		order = __ffs(index);
+
+	return min_t(size_t, order, MAX_PAGECACHE_ORDER);
+}
+
 unsigned long shmem_allowable_huge_orders(struct inode *inode,
 				struct vm_area_struct *vma, pgoff_t index,
 				loff_t write_end, bool shmem_huge_force)
@@ -1694,11 +1724,26 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
 	global_huge = shmem_huge_global_enabled(inode, index, write_end,
 					shmem_huge_force, vma, vm_flags);
 	if (!vma || !vma_is_anon_shmem(vma)) {
+		size_t len;
+
+		/*
+		 * For tmpfs, if top level huge page is enabled, we just allow
+		 * PMD sized THP to keep interface backward compatibility.
+		 */
+		if (global_huge)
+			return BIT(HPAGE_PMD_ORDER);
+
+		if (!write_end)
+			return 0;
+
 		/*
-		 * For tmpfs, we now only support PMD sized THP if huge page
-		 * is enabled, otherwise fallback to order 0.
+		 * Otherwise, get a highest order hint based on the size of
+		 * write and fallocate paths, then will try each allowable
+		 * huge orders.
 		 */
-		return global_huge ? BIT(HPAGE_PMD_ORDER) : 0;
+		len = write_end - (index << PAGE_SHIFT);
+		order = shmem_mapping_size_order(inode->i_mapping, index, len);
+		return order > 0 ? BIT(order + 1) - 1 : 0;
 	}
 
 	/*
-- 
2.39.3



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC PATCH v2 2/2] mm: shmem: use mTHP interface to control huge orders for tmpfs
  2024-09-26  8:27 [RFC PATCH v2 0/2] Support large folios for tmpfs Baolin Wang
  2024-09-26  8:27 ` [RFC PATCH v2 1/2] mm: shmem: add large folio support to the write and fallocate paths Baolin Wang
@ 2024-09-26  8:27 ` Baolin Wang
  2024-09-26 12:20 ` [RFC PATCH v2 0/2] Support large folios " Matthew Wilcox
  2 siblings, 0 replies; 9+ messages in thread
From: Baolin Wang @ 2024-09-26  8:27 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, david, wangkefeng.wang, 21cnbao, ryan.roberts, ioworker0,
	da.gomez, baolin.wang, linux-mm, linux-kernel

For the huge orders allowed by writable mmap() faults on tmpfs,
the mTHP interface is used to control the allowable huge orders,
while 'huge_shmem_orders_inherit' maintains backward compatibility
with top-level interface.

For the huge orders allowed by write() and fallocate() paths on tmpfs,
getting a highest order hint based on the size of write and fallocate
paths, then will try each allowable huge orders filtered by the mTHP
interfaces if set.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/memory.c |  4 ++--
 mm/shmem.c  | 51 ++++++++++++++++++++++++++-------------------------
 2 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 2366578015ad..99dd75b84605 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5098,10 +5098,10 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
 
 	/*
 	 * Using per-page fault to maintain the uffd semantics, and same
-	 * approach also applies to non-anonymous-shmem faults to avoid
+	 * approach also applies to non shmem/tmpfs faults to avoid
 	 * inflating the RSS of the process.
 	 */
-	if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) {
+	if (!vma_is_shmem(vma) || unlikely(userfaultfd_armed(vma))) {
 		nr_pages = 1;
 	} else if (nr_pages > 1) {
 		pgoff_t idx = folio_page_idx(folio, page);
diff --git a/mm/shmem.c b/mm/shmem.c
index 6dece90ff421..569d3ab37161 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1721,31 +1721,6 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
 	if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED))
 		return 0;
 
-	global_huge = shmem_huge_global_enabled(inode, index, write_end,
-					shmem_huge_force, vma, vm_flags);
-	if (!vma || !vma_is_anon_shmem(vma)) {
-		size_t len;
-
-		/*
-		 * For tmpfs, if top level huge page is enabled, we just allow
-		 * PMD sized THP to keep interface backward compatibility.
-		 */
-		if (global_huge)
-			return BIT(HPAGE_PMD_ORDER);
-
-		if (!write_end)
-			return 0;
-
-		/*
-		 * Otherwise, get a highest order hint based on the size of
-		 * write and fallocate paths, then will try each allowable
-		 * huge orders.
-		 */
-		len = write_end - (index << PAGE_SHIFT);
-		order = shmem_mapping_size_order(inode->i_mapping, index, len);
-		return order > 0 ? BIT(order + 1) - 1 : 0;
-	}
-
 	/*
 	 * Following the 'deny' semantics of the top level, force the huge
 	 * option off from all mounts.
@@ -1776,9 +1751,35 @@ unsigned long shmem_allowable_huge_orders(struct inode *inode,
 	if (vm_flags & VM_HUGEPAGE)
 		mask |= READ_ONCE(huge_shmem_orders_madvise);
 
+	global_huge = shmem_huge_global_enabled(inode, index, write_end,
+						shmem_huge_force, vma, vm_flags);
 	if (global_huge)
 		mask |= READ_ONCE(huge_shmem_orders_inherit);
 
+	/*
+	 * For the huge orders allowed by writable mmap() faults on tmpfs,
+	 * the mTHP interface is used to control the allowable huge orders,
+	 * while 'huge_shmem_orders_inherit' maintains backward compatibility
+	 * with top-level interface.
+	 *
+	 * For the huge orders allowed by write() and fallocate() paths on tmpfs,
+	 * get a highest order hint based on the size of write and fallocate
+	 * paths, then will try each allowable huge orders filtered by the mTHP
+	 * interfaces if set.
+	 */
+	if (!vma && !global_huge) {
+		size_t len;
+
+		if (!write_end)
+			return 0;
+
+		len = write_end - (index << PAGE_SHIFT);
+		order = shmem_mapping_size_order(inode->i_mapping, index, len);
+		if (!mask)
+			return order > 0 ? BIT(order + 1) - 1 : 0;
+
+		mask &= BIT(order + 1) - 1;
+	}
 	return THP_ORDERS_ALL_FILE_DEFAULT & mask;
 }
 
-- 
2.39.3



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/2] mm: shmem: add large folio support to the write and fallocate paths
  2024-09-26  8:27 ` [RFC PATCH v2 1/2] mm: shmem: add large folio support to the write and fallocate paths Baolin Wang
@ 2024-09-26 12:16   ` Matthew Wilcox
  2024-09-26 12:58     ` Daniel Gomez
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2024-09-26 12:16 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, hughd, david, wangkefeng.wang, 21cnbao, ryan.roberts,
	ioworker0, da.gomez, linux-mm, linux-kernel

On Thu, Sep 26, 2024 at 04:27:26PM +0800, Baolin Wang wrote:
> +static inline unsigned int
> +shmem_mapping_size_order(struct address_space *mapping, pgoff_t index, size_t size)
> +{
> +	unsigned int order = get_order(max_t(size_t, size, PAGE_SIZE));

Why introduce the max_t() call here?  Did nobody read the documentation
or implementation for get_order() before writing this patch?

Besides, get_order() is wrong (at least relative to other filesystems).
get_order() rounds up instead of down, so what should we do for a write()
of size 512 * 1024 + 1 byte?  Other filesystems allocate an order-8 folio
plus an order-0 folio.  This code would have us allocate an order-9 folio.
I think that's a bad idea.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 0/2] Support large folios for tmpfs
  2024-09-26  8:27 [RFC PATCH v2 0/2] Support large folios for tmpfs Baolin Wang
  2024-09-26  8:27 ` [RFC PATCH v2 1/2] mm: shmem: add large folio support to the write and fallocate paths Baolin Wang
  2024-09-26  8:27 ` [RFC PATCH v2 2/2] mm: shmem: use mTHP interface to control huge orders for tmpfs Baolin Wang
@ 2024-09-26 12:20 ` Matthew Wilcox
  2024-09-27  2:36   ` Baolin Wang
  2 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2024-09-26 12:20 UTC (permalink / raw)
  To: Baolin Wang
  Cc: akpm, hughd, david, wangkefeng.wang, 21cnbao, ryan.roberts,
	ioworker0, da.gomez, linux-mm, linux-kernel

On Thu, Sep 26, 2024 at 04:27:25PM +0800, Baolin Wang wrote:
> This RFC patch series attempts to support large folios for tmpfs. The first
> patch is based on Daniel's previous patches in [1], mainly using the length
> in the write and fallocate paths to get a highest order hint for large
> order allocation. The last patch adds mTHP filter control for tmpfs if mTHP
> is set for the following reasons:
> 
> 1. Maintain backward compatibility for the control interface. Tmpfs already
> has a global 'huge=' mount option and '/sys/kernel/mm/transparent_hugepage/shmem_enabled'
> interface to control large order allocations. mTHP extends this capability to a
> per-size basis while maintaining good interface compatibility.

... it's confusing as hell to anyone who tries to understand it and
you've made it more complicated.  Well done.

> 2. For the large order allocation of writable mmap() faults in tmpfs, we need
> something like the mTHP interfaces to control large orders, as well as ensuring
> consistent interfaces with shmem.

tmpfs and shmem do NOT need to be consistent!  I don't know why anyone
thinks this is a goal.  tmpfs should be consistent with OTHER FILE
SYSTEMS.  shmem should do the right thing for the shared anon use case.

> 3. Ryan pointed out that large order allocations based on write length could
> lead to memory fragmentation issue. Just quoting Ryan's comment [2]:
> "And it's possible (likely even, in my opinion) that allocating lots of different
> folio sizes will exacerbate memory fragmentation, leading to more order-0
> fallbacks, which would hurt the overall system performance in the long run, vs
> restricting to a couple of folio sizes."

I disagree with this.  It's a buddy allocator; it's resistant to this
kind of fragmentation.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/2] mm: shmem: add large folio support to the write and fallocate paths
  2024-09-26 12:16   ` Matthew Wilcox
@ 2024-09-26 12:58     ` Daniel Gomez
  2024-09-26 13:40       ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Daniel Gomez @ 2024-09-26 12:58 UTC (permalink / raw)
  To: Matthew Wilcox, Baolin Wang
  Cc: akpm, hughd, david, wangkefeng.wang, 21cnbao, ryan.roberts,
	ioworker0, linux-mm, linux-kernel

On 9/26/2024 2:16 PM, Matthew Wilcox wrote:
> On Thu, Sep 26, 2024 at 04:27:26PM +0800, Baolin Wang wrote:
>> +static inline unsigned int
>> +shmem_mapping_size_order(struct address_space *mapping, pgoff_t index, size_t size)
>> +{
>> +	unsigned int order = get_order(max_t(size_t, size, PAGE_SIZE));
> 
> Why introduce the max_t() call here?  Did nobody read the documentation
> or implementation for get_order() before writing this patch?

get_order() result is undefined if the size is 0. I've used max_t() here 
to avoid that case. Perhaps should we prevent that case before getting here?

> 
> Besides, get_order() is wrong (at least relative to other filesystems).
> get_order() rounds up instead of down, so what should we do for a write()
> of size 512 * 1024 + 1 byte?  Other filesystems allocate an order-8 folio
> plus an order-0 folio.  This code would have us allocate an order-9 folio.
> I think that's a bad idea.
> 

I think one of my earlier attemps was to use fgf_set_order + 
FGF_GET_ORDER() as in iomap. But the solution taken there was to share 
code between shmem and filemap and that wasn't considered a good idea. 
Shall we just replicate iomap_get_folio()? Or else, what do you suggest 
here?

Daniel


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/2] mm: shmem: add large folio support to the write and fallocate paths
  2024-09-26 12:58     ` Daniel Gomez
@ 2024-09-26 13:40       ` Matthew Wilcox
  2024-09-27  2:12         ` Baolin Wang
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2024-09-26 13:40 UTC (permalink / raw)
  To: Daniel Gomez
  Cc: Baolin Wang, akpm, hughd, david, wangkefeng.wang, 21cnbao,
	ryan.roberts, ioworker0, linux-mm, linux-kernel

On Thu, Sep 26, 2024 at 02:58:31PM +0200, Daniel Gomez wrote:
> On 9/26/2024 2:16 PM, Matthew Wilcox wrote:
> > On Thu, Sep 26, 2024 at 04:27:26PM +0800, Baolin Wang wrote:
> > > +static inline unsigned int
> > > +shmem_mapping_size_order(struct address_space *mapping, pgoff_t index, size_t size)
> > > +{
> > > +	unsigned int order = get_order(max_t(size_t, size, PAGE_SIZE));
> > 
> > Why introduce the max_t() call here?  Did nobody read the documentation
> > or implementation for get_order() before writing this patch?
> 
> get_order() result is undefined if the size is 0. I've used max_t() here to
> avoid that case. Perhaps should we prevent that case before getting here?

Surely we've handled a length-0 write before we get here?

> I think one of my earlier attemps was to use fgf_set_order + FGF_GET_ORDER()
> as in iomap. But the solution taken there was to share code between shmem
> and filemap and that wasn't considered a good idea. Shall we just replicate
> iomap_get_folio()? Or else, what do you suggest here?

We could move three of the four lines from fgf_set_order() into a
new function and call it from both fgf_set_order() and shmem?
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 1/2] mm: shmem: add large folio support to the write and fallocate paths
  2024-09-26 13:40       ` Matthew Wilcox
@ 2024-09-27  2:12         ` Baolin Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Baolin Wang @ 2024-09-27  2:12 UTC (permalink / raw)
  To: Matthew Wilcox, Daniel Gomez
  Cc: akpm, hughd, david, wangkefeng.wang, 21cnbao, ryan.roberts,
	ioworker0, linux-mm, linux-kernel



On 2024/9/26 21:40, Matthew Wilcox wrote:
> On Thu, Sep 26, 2024 at 02:58:31PM +0200, Daniel Gomez wrote:
>> On 9/26/2024 2:16 PM, Matthew Wilcox wrote:
>>> On Thu, Sep 26, 2024 at 04:27:26PM +0800, Baolin Wang wrote:
>>>> +static inline unsigned int
>>>> +shmem_mapping_size_order(struct address_space *mapping, pgoff_t index, size_t size)
>>>> +{
>>>> +	unsigned int order = get_order(max_t(size_t, size, PAGE_SIZE));
>>>
>>> Why introduce the max_t() call here?  Did nobody read the documentation
>>> or implementation for get_order() before writing this patch?
>>
>> get_order() result is undefined if the size is 0. I've used max_t() here to
>> avoid that case. Perhaps should we prevent that case before getting here?
> 
> Surely we've handled a length-0 write before we get here?
> 
>> I think one of my earlier attemps was to use fgf_set_order + FGF_GET_ORDER()
>> as in iomap. But the solution taken there was to share code between shmem
>> and filemap and that wasn't considered a good idea. Shall we just replicate
>> iomap_get_folio()? Or else, what do you suggest here?
> 
> We could move three of the four lines from fgf_set_order() into a
> new function and call it from both fgf_set_order() and shmem?

Sounds good. How about the following changes? Do you have a perferred 
name for the new helper? Thanks.

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index d9c7edb6422b..ce418acd2737 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -629,6 +629,16 @@ typedef unsigned int __bitwise fgf_t;

  #define FGP_WRITEBEGIN         (FGP_LOCK | FGP_WRITE | FGP_CREAT | 
FGP_STABLE)

+static inline unsigned int filemap_get_order(size_t size)
+{
+       unsigned int shift = ilog2(size);
+
+       if (shift <= PAGE_SHIFT)
+               return 0;
+
+       return shift - PAGE_SHIFT;
+}
+
  /**
   * fgf_set_order - Encode a length in the fgf_t flags.
   * @size: The suggested size of the folio to create.
@@ -642,11 +652,11 @@ typedef unsigned int __bitwise fgf_t;
   */
  static inline fgf_t fgf_set_order(size_t size)
  {
-       unsigned int shift = ilog2(size);
+       unsigned int order = filemap_get_order(size);

-       if (shift <= PAGE_SHIFT)
+       if (!order)
                 return 0;
-       return (__force fgf_t)((shift - PAGE_SHIFT) << 26);
+       return (__force fgf_t)(order << 26);
  }

  void *filemap_get_entry(struct address_space *mapping, pgoff_t index);


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC PATCH v2 0/2] Support large folios for tmpfs
  2024-09-26 12:20 ` [RFC PATCH v2 0/2] Support large folios " Matthew Wilcox
@ 2024-09-27  2:36   ` Baolin Wang
  0 siblings, 0 replies; 9+ messages in thread
From: Baolin Wang @ 2024-09-27  2:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: akpm, hughd, david, wangkefeng.wang, 21cnbao, ryan.roberts,
	ioworker0, da.gomez, linux-mm, linux-kernel



On 2024/9/26 20:20, Matthew Wilcox wrote:
> On Thu, Sep 26, 2024 at 04:27:25PM +0800, Baolin Wang wrote:
>> This RFC patch series attempts to support large folios for tmpfs. The first
>> patch is based on Daniel's previous patches in [1], mainly using the length
>> in the write and fallocate paths to get a highest order hint for large
>> order allocation. The last patch adds mTHP filter control for tmpfs if mTHP
>> is set for the following reasons:
>>
>> 1. Maintain backward compatibility for the control interface. Tmpfs already
>> has a global 'huge=' mount option and '/sys/kernel/mm/transparent_hugepage/shmem_enabled'
>> interface to control large order allocations. mTHP extends this capability to a
>> per-size basis while maintaining good interface compatibility.
> 
> ... it's confusing as hell to anyone who tries to understand it and
> you've made it more complicated.  Well done.
> 
>> 2. For the large order allocation of writable mmap() faults in tmpfs, we need
>> something like the mTHP interfaces to control large orders, as well as ensuring
>> consistent interfaces with shmem.
> 
> tmpfs and shmem do NOT need to be consistent!  I don't know why anyone
> thinks this is a goal.  tmpfs should be consistent with OTHER FILE
> SYSTEMS.  shmem should do the right thing for the shared anon use case.
> 
>> 3. Ryan pointed out that large order allocations based on write length could
>> lead to memory fragmentation issue. Just quoting Ryan's comment [2]:
>> "And it's possible (likely even, in my opinion) that allocating lots of different
>> folio sizes will exacerbate memory fragmentation, leading to more order-0
>> fallbacks, which would hurt the overall system performance in the long run, vs
>> restricting to a couple of folio sizes."
> 
> I disagree with this.  It's a buddy allocator; it's resistant to this
> kind of fragmentation.

Fine. Thanks for sharing your opinion. So far, I can drop patch 2 to 
stop adding mTHP interfaces for tmpfs.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-09-27  2:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-26  8:27 [RFC PATCH v2 0/2] Support large folios for tmpfs Baolin Wang
2024-09-26  8:27 ` [RFC PATCH v2 1/2] mm: shmem: add large folio support to the write and fallocate paths Baolin Wang
2024-09-26 12:16   ` Matthew Wilcox
2024-09-26 12:58     ` Daniel Gomez
2024-09-26 13:40       ` Matthew Wilcox
2024-09-27  2:12         ` Baolin Wang
2024-09-26  8:27 ` [RFC PATCH v2 2/2] mm: shmem: use mTHP interface to control huge orders for tmpfs Baolin Wang
2024-09-26 12:20 ` [RFC PATCH v2 0/2] Support large folios " Matthew Wilcox
2024-09-27  2:36   ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox