* [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver
@ 2025-07-28 8:03 Baolin Wang
2025-07-28 21:44 ` Andrew Morton
2025-07-30 6:54 ` Hugh Dickins
0 siblings, 2 replies; 8+ messages in thread
From: Baolin Wang @ 2025-07-28 8:03 UTC (permalink / raw)
To: akpm, hughd
Cc: patryk, ville.syrjala, david, willy, maarten.lankhorst, mripard,
tzimmermann, airlied, simona, jani.nikula, joonas.lahtinen,
rodrigo.vivi, tursulin, christian.koenig, ray.huang,
matthew.auld, matthew.brost, dri-devel, linux-kernel, linux-mm
After commit acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs"),
we extend the 'huge=' option to allow any sized large folios for tmpfs,
which means tmpfs will allow getting a highest order hint based on the size
of write() and fallocate() paths, and then will try each allowable large order.
However, when the i915 driver allocates shmem memory, it doesn't provide hint
information about the size of the large folio to be allocated, resulting in
the inability to allocate PMD-sized shmem, which in turn affects GPU performance.
To fix this issue, add the 'end' information for shmem_read_folio_gfp() to help
allocate PMD-sized large folios. Additionally, use the maximum allocation chunk
(via mapping_max_folio_size()) to determine the size of the large folios to
allocate in the i915 driver.
Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
Reported-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
drivers/gpu/drm/drm_gem.c | 2 +-
drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 7 ++++++-
drivers/gpu/drm/ttm/ttm_backup.c | 2 +-
include/linux/shmem_fs.h | 4 ++--
mm/shmem.c | 7 ++++---
5 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 4bf0a76bb35e..5ed34a9211a4 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -627,7 +627,7 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)
i = 0;
while (i < npages) {
long nr;
- folio = shmem_read_folio_gfp(mapping, i,
+ folio = shmem_read_folio_gfp(mapping, i, 0,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
goto fail;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index f263615f6ece..778290f49853 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -69,6 +69,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
struct scatterlist *sg;
unsigned long next_pfn = 0; /* suppress gcc warning */
gfp_t noreclaim;
+ size_t chunk;
int ret;
if (overflows_type(size / PAGE_SIZE, page_count))
@@ -94,6 +95,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
mapping_set_unevictable(mapping);
noreclaim = mapping_gfp_constraint(mapping, ~__GFP_RECLAIM);
noreclaim |= __GFP_NORETRY | __GFP_NOWARN;
+ chunk = mapping_max_folio_size(mapping);
sg = st->sgl;
st->nents = 0;
@@ -105,10 +107,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915, struct sg_table *st,
0,
}, *s = shrink;
gfp_t gfp = noreclaim;
+ loff_t bytes = (page_count - i) << PAGE_SHIFT;
+ loff_t pos = i << PAGE_SHIFT;
+ bytes = min_t(loff_t, chunk, bytes);
do {
cond_resched();
- folio = shmem_read_folio_gfp(mapping, i, gfp);
+ folio = shmem_read_folio_gfp(mapping, i, pos + bytes, gfp);
if (!IS_ERR(folio))
break;
diff --git a/drivers/gpu/drm/ttm/ttm_backup.c b/drivers/gpu/drm/ttm/ttm_backup.c
index 6f2e58be4f3e..0c90ae338afb 100644
--- a/drivers/gpu/drm/ttm/ttm_backup.c
+++ b/drivers/gpu/drm/ttm/ttm_backup.c
@@ -100,7 +100,7 @@ ttm_backup_backup_page(struct file *backup, struct page *page,
struct folio *to_folio;
int ret;
- to_folio = shmem_read_folio_gfp(mapping, idx, alloc_gfp);
+ to_folio = shmem_read_folio_gfp(mapping, idx, 0, alloc_gfp);
if (IS_ERR(to_folio))
return PTR_ERR(to_folio);
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 6d0f9c599ff7..203eebad6b38 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -156,12 +156,12 @@ enum sgp_type {
int shmem_get_folio(struct inode *inode, pgoff_t index, loff_t write_end,
struct folio **foliop, enum sgp_type sgp);
struct folio *shmem_read_folio_gfp(struct address_space *mapping,
- pgoff_t index, gfp_t gfp);
+ pgoff_t index, loff_t end, gfp_t gfp);
static inline struct folio *shmem_read_folio(struct address_space *mapping,
pgoff_t index)
{
- return shmem_read_folio_gfp(mapping, index, mapping_gfp_mask(mapping));
+ return shmem_read_folio_gfp(mapping, index, 0, mapping_gfp_mask(mapping));
}
static inline struct page *shmem_read_mapping_page(
diff --git a/mm/shmem.c b/mm/shmem.c
index e6cdfda08aed..c79f5760cfc9 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -5960,6 +5960,7 @@ int shmem_zero_setup(struct vm_area_struct *vma)
* shmem_read_folio_gfp - read into page cache, using specified page allocation flags.
* @mapping: the folio's address_space
* @index: the folio index
+ * @end: end of a read if allocating a new folio
* @gfp: the page allocator flags to use if allocating
*
* This behaves as a tmpfs "read_cache_page_gfp(mapping, index, gfp)",
@@ -5972,14 +5973,14 @@ int shmem_zero_setup(struct vm_area_struct *vma)
* with the mapping_gfp_mask(), to avoid OOMing the machine unnecessarily.
*/
struct folio *shmem_read_folio_gfp(struct address_space *mapping,
- pgoff_t index, gfp_t gfp)
+ pgoff_t index, loff_t end, gfp_t gfp)
{
#ifdef CONFIG_SHMEM
struct inode *inode = mapping->host;
struct folio *folio;
int error;
- error = shmem_get_folio_gfp(inode, index, 0, &folio, SGP_CACHE,
+ error = shmem_get_folio_gfp(inode, index, end, &folio, SGP_CACHE,
gfp, NULL, NULL);
if (error)
return ERR_PTR(error);
@@ -5998,7 +5999,7 @@ EXPORT_SYMBOL_GPL(shmem_read_folio_gfp);
struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
pgoff_t index, gfp_t gfp)
{
- struct folio *folio = shmem_read_folio_gfp(mapping, index, gfp);
+ struct folio *folio = shmem_read_folio_gfp(mapping, index, 0, gfp);
struct page *page;
if (IS_ERR(folio))
--
2.43.5
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver
2025-07-28 8:03 [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver Baolin Wang
@ 2025-07-28 21:44 ` Andrew Morton
2025-07-28 21:59 ` Patryk Kowalczyk
2025-07-28 22:08 ` Patryk Kowalczyk
2025-07-30 6:54 ` Hugh Dickins
1 sibling, 2 replies; 8+ messages in thread
From: Andrew Morton @ 2025-07-28 21:44 UTC (permalink / raw)
To: Baolin Wang
Cc: hughd, patryk, ville.syrjala, david, willy, maarten.lankhorst,
mripard, tzimmermann, airlied, simona, jani.nikula,
joonas.lahtinen, rodrigo.vivi, tursulin, christian.koenig,
ray.huang, matthew.auld, matthew.brost, dri-devel, linux-kernel,
linux-mm
On Mon, 28 Jul 2025 16:03:53 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
> After commit acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs"),
> we extend the 'huge=' option to allow any sized large folios for tmpfs,
> which means tmpfs will allow getting a highest order hint based on the size
> of write() and fallocate() paths, and then will try each allowable large order.
>
> However, when the i915 driver allocates shmem memory, it doesn't provide hint
> information about the size of the large folio to be allocated, resulting in
> the inability to allocate PMD-sized shmem, which in turn affects GPU performance.
>
> To fix this issue, add the 'end' information for shmem_read_folio_gfp() to help
> allocate PMD-sized large folios. Additionally, use the maximum allocation chunk
> (via mapping_max_folio_size()) to determine the size of the large folios to
> allocate in the i915 driver.
What is the magnitude of the performance change?
> Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
> Reported-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Tested-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
This isn't a regression fix, is it? acd7ccb284b8 adds a new feature
and we have now found a flaw in it.
Still, we could bend the rules a little bit and backport this, depends
on how significant the runtime effect is.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver
2025-07-28 21:44 ` Andrew Morton
@ 2025-07-28 21:59 ` Patryk Kowalczyk
2025-07-28 22:16 ` Andrew Morton
2025-07-28 22:08 ` Patryk Kowalczyk
1 sibling, 1 reply; 8+ messages in thread
From: Patryk Kowalczyk @ 2025-07-28 21:59 UTC (permalink / raw)
To: Andrew Morton
Cc: Baolin Wang, hughd, ville.syrjala, david, willy,
maarten.lankhorst, mripard, tzimmermann, airlied, simona,
jani.nikula, joonas.lahtinen, rodrigo.vivi, tursulin,
christian.koenig, ray.huang, matthew.auld, matthew.brost,
dri-devel, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 2182 bytes --]
Hi,
In my tests, the performance drop ranges from a few percent up to 13% in
Unigine Superposition
under heavy memory usage on the CPU Core Ultra 155H with the Xe 128 EU GPU.
Other users have reported performance impact up to 30% on certain workloads.
Please find more in the regressions reports:
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14645
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13845
I believe the change should be backported to all active kernel branches
after version 6.12.
best regards,
Patryk
pon., 28 lip 2025 o 23:44 Andrew Morton <akpm@linux-foundation.org>
napisał(a):
> On Mon, 28 Jul 2025 16:03:53 +0800 Baolin Wang <
> baolin.wang@linux.alibaba.com> wrote:
>
> > After commit acd7ccb284b8 ("mm: shmem: add large folio support for
> tmpfs"),
> > we extend the 'huge=' option to allow any sized large folios for tmpfs,
> > which means tmpfs will allow getting a highest order hint based on the
> size
> > of write() and fallocate() paths, and then will try each allowable large
> order.
> >
> > However, when the i915 driver allocates shmem memory, it doesn't provide
> hint
> > information about the size of the large folio to be allocated, resulting
> in
> > the inability to allocate PMD-sized shmem, which in turn affects GPU
> performance.
> >
> > To fix this issue, add the 'end' information for shmem_read_folio_gfp()
> to help
> > allocate PMD-sized large folios. Additionally, use the maximum
> allocation chunk
> > (via mapping_max_folio_size()) to determine the size of the large folios
> to
> > allocate in the i915 driver.
>
> What is the magnitude of the performance change?
>
> > Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
> > Reported-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
> > Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Tested-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
>
> This isn't a regression fix, is it? acd7ccb284b8 adds a new feature
> and we have now found a flaw in it.
>
> Still, we could bend the rules a little bit and backport this, depends
> on how significant the runtime effect is.
>
[-- Attachment #2: Type: text/html, Size: 3163 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver
2025-07-28 21:44 ` Andrew Morton
2025-07-28 21:59 ` Patryk Kowalczyk
@ 2025-07-28 22:08 ` Patryk Kowalczyk
1 sibling, 0 replies; 8+ messages in thread
From: Patryk Kowalczyk @ 2025-07-28 22:08 UTC (permalink / raw)
To: Andrew Morton
Cc: Baolin Wang, hughd, ville.syrjala, david, willy,
maarten.lankhorst, mripard, tzimmermann, airlied, simona,
jani.nikula, joonas.lahtinen, rodrigo.vivi, tursulin,
christian.koenig, ray.huang, matthew.auld, matthew.brost,
dri-devel, linux-kernel, linux-mm
Hi,
I apologize for the second email; the first one contained HTML content
that was not accepted by the group.
In my tests, the performance drop ranges from a few percent up to 13%
in Unigine Superposition
under heavy memory usage on the CPU Core Ultra 155H with the Xe 128 EU GPU.
Other users have reported performance impact up to 30% on certain workloads.
Please find more in the regressions reports:
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14645
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13845
I believe the change should be backported to all active kernel
branches after version 6.12.
best regards,
Patryk
pon., 28 lip 2025 o 23:44 Andrew Morton <akpm@linux-foundation.org> napisał(a):
>
> On Mon, 28 Jul 2025 16:03:53 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote:
>
> > After commit acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs"),
> > we extend the 'huge=' option to allow any sized large folios for tmpfs,
> > which means tmpfs will allow getting a highest order hint based on the size
> > of write() and fallocate() paths, and then will try each allowable large order.
> >
> > However, when the i915 driver allocates shmem memory, it doesn't provide hint
> > information about the size of the large folio to be allocated, resulting in
> > the inability to allocate PMD-sized shmem, which in turn affects GPU performance.
> >
> > To fix this issue, add the 'end' information for shmem_read_folio_gfp() to help
> > allocate PMD-sized large folios. Additionally, use the maximum allocation chunk
> > (via mapping_max_folio_size()) to determine the size of the large folios to
> > allocate in the i915 driver.
>
> What is the magnitude of the performance change?
>
> > Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
> > Reported-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
> > Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Tested-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
>
> This isn't a regression fix, is it? acd7ccb284b8 adds a new feature
> and we have now found a flaw in it.
>
> Still, we could bend the rules a little bit and backport this, depends
> on how significant the runtime effect is.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver
2025-07-28 21:59 ` Patryk Kowalczyk
@ 2025-07-28 22:16 ` Andrew Morton
0 siblings, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2025-07-28 22:16 UTC (permalink / raw)
To: Patryk Kowalczyk
Cc: Baolin Wang, hughd, ville.syrjala, david, willy,
maarten.lankhorst, mripard, tzimmermann, airlied, simona,
jani.nikula, joonas.lahtinen, rodrigo.vivi, tursulin,
christian.koenig, ray.huang, matthew.auld, matthew.brost,
dri-devel, linux-kernel, linux-mm
On Mon, 28 Jul 2025 23:59:22 +0200 Patryk Kowalczyk <patryk@kowalczyk.ws> wrote:
> In my tests, the performance drop ranges from a few percent up to 13% in
> Unigine Superposition
> under heavy memory usage on the CPU Core Ultra 155H with the Xe 128 EU GPU.
> Other users have reported performance impact up to 30% on certain workloads.
> Please find more in the regressions reports:
> https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14645
> https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13845
>
> I believe the change should be backported to all active kernel branches
> after version 6.12.
OK, thanks. I added this (important!) info to the changelog and I
added a cc:stable, requesting a backport into everything which has
acd7ccb284b8.
I'll place this in mm.git's mm-hotfixes-unstable branch with a plan to
upstream it sometime during the 6.17-rcX cycle.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver
2025-07-28 8:03 [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver Baolin Wang
2025-07-28 21:44 ` Andrew Morton
@ 2025-07-30 6:54 ` Hugh Dickins
2025-07-30 7:46 ` Baolin Wang
1 sibling, 1 reply; 8+ messages in thread
From: Hugh Dickins @ 2025-07-30 6:54 UTC (permalink / raw)
To: Baolin Wang
Cc: akpm, hughd, patryk, ville.syrjala, david, willy,
maarten.lankhorst, mripard, tzimmermann, airlied, simona,
jani.nikula, joonas.lahtinen, rodrigo.vivi, tursulin,
christian.koenig, ray.huang, matthew.auld, matthew.brost,
dri-devel, linux-kernel, linux-mm
[-- Attachment #1: Type: text/plain, Size: 2808 bytes --]
On Mon, 28 Jul 2025, Baolin Wang wrote:
> After commit acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs"),
> we extend the 'huge=' option to allow any sized large folios for tmpfs,
> which means tmpfs will allow getting a highest order hint based on the size
> of write() and fallocate() paths, and then will try each allowable large order.
>
> However, when the i915 driver allocates shmem memory, it doesn't provide hint
> information about the size of the large folio to be allocated, resulting in
> the inability to allocate PMD-sized shmem, which in turn affects GPU performance.
>
> To fix this issue, add the 'end' information for shmem_read_folio_gfp() to help
> allocate PMD-sized large folios. Additionally, use the maximum allocation chunk
> (via mapping_max_folio_size()) to determine the size of the large folios to
> allocate in the i915 driver.
>
> Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
> Reported-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Tested-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
> drivers/gpu/drm/drm_gem.c | 2 +-
> drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 7 ++++++-
> drivers/gpu/drm/ttm/ttm_backup.c | 2 +-
> include/linux/shmem_fs.h | 4 ++--
> mm/shmem.c | 7 ++++---
> 5 files changed, 14 insertions(+), 8 deletions(-)
I know I said "I shall not object to a temporary workaround to suit the
i915 driver", but really, I have to question this patch. Why should any
change be required at the drivers/gpu/drm end?
And in drivers/gpu/drm/{i915,v3d} I find they are using huge=within_size:
I had been complaining about the userspace regression in huge=always,
and thought it had been changed to behave like huge=within_size,
but apparently huge=within_size has itself regressed too.
Please explain why the below is not a better patch for i915 and v3d
(but still a temporary workaround, because the root of the within_size
regression must lie deeper, in the handling of write_end versus i_size).
Hugh
---
mm/shmem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 3a5a65b1f41a..c67dfc17a819 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -5928,8 +5928,8 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
struct folio *folio;
int error;
- error = shmem_get_folio_gfp(inode, index, 0, &folio, SGP_CACHE,
- gfp, NULL, NULL);
+ error = shmem_get_folio_gfp(inode, index, i_size_read(inode),
+ &folio, SGP_CACHE, gfp, NULL, NULL);
if (error)
return ERR_PTR(error);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver
2025-07-30 6:54 ` Hugh Dickins
@ 2025-07-30 7:46 ` Baolin Wang
2025-07-30 16:11 ` Patryk Kowalczyk
0 siblings, 1 reply; 8+ messages in thread
From: Baolin Wang @ 2025-07-30 7:46 UTC (permalink / raw)
To: Hugh Dickins
Cc: akpm, patryk, ville.syrjala, david, willy, maarten.lankhorst,
mripard, tzimmermann, airlied, simona, jani.nikula,
joonas.lahtinen, rodrigo.vivi, tursulin, christian.koenig,
ray.huang, matthew.auld, matthew.brost, dri-devel, linux-kernel,
linux-mm
On 2025/7/30 14:54, Hugh Dickins wrote:
> On Mon, 28 Jul 2025, Baolin Wang wrote:
>
>> After commit acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs"),
>> we extend the 'huge=' option to allow any sized large folios for tmpfs,
>> which means tmpfs will allow getting a highest order hint based on the size
>> of write() and fallocate() paths, and then will try each allowable large order.
>>
>> However, when the i915 driver allocates shmem memory, it doesn't provide hint
>> information about the size of the large folio to be allocated, resulting in
>> the inability to allocate PMD-sized shmem, which in turn affects GPU performance.
>>
>> To fix this issue, add the 'end' information for shmem_read_folio_gfp() to help
>> allocate PMD-sized large folios. Additionally, use the maximum allocation chunk
>> (via mapping_max_folio_size()) to determine the size of the large folios to
>> allocate in the i915 driver.
>>
>> Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
>> Reported-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
>> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> Tested-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>> drivers/gpu/drm/drm_gem.c | 2 +-
>> drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 7 ++++++-
>> drivers/gpu/drm/ttm/ttm_backup.c | 2 +-
>> include/linux/shmem_fs.h | 4 ++--
>> mm/shmem.c | 7 ++++---
>> 5 files changed, 14 insertions(+), 8 deletions(-)
>
> I know I said "I shall not object to a temporary workaround to suit the
> i915 driver", but really, I have to question this patch. Why should any
> change be required at the drivers/gpu/drm end?
>
> And in drivers/gpu/drm/{i915,v3d} I find they are using huge=within_size:
> I had been complaining about the userspace regression in huge=always,
> and thought it had been changed to behave like huge=within_size,
> but apparently huge=within_size has itself regressed too.
I'm preparing a RFC patch to discuss this.
> Please explain why the below is not a better patch for i915 and v3d
> (but still a temporary workaround, because the root of the within_size
> regression must lie deeper, in the handling of write_end versus i_size).
OK. This looks good to me. Patryk, could you try Hugh's simple patch?
Thanks.
> ---
> mm/shmem.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 3a5a65b1f41a..c67dfc17a819 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -5928,8 +5928,8 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
> struct folio *folio;
> int error;
>
> - error = shmem_get_folio_gfp(inode, index, 0, &folio, SGP_CACHE,
> - gfp, NULL, NULL);
> + error = shmem_get_folio_gfp(inode, index, i_size_read(inode),
> + &folio, SGP_CACHE, gfp, NULL, NULL);
> if (error)
> return ERR_PTR(error);
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver
2025-07-30 7:46 ` Baolin Wang
@ 2025-07-30 16:11 ` Patryk Kowalczyk
0 siblings, 0 replies; 8+ messages in thread
From: Patryk Kowalczyk @ 2025-07-30 16:11 UTC (permalink / raw)
To: Baolin Wang
Cc: Hugh Dickins, akpm, ville.syrjala, david, willy,
maarten.lankhorst, mripard, tzimmermann, airlied, simona,
jani.nikula, joonas.lahtinen, rodrigo.vivi, tursulin,
christian.koenig, ray.huang, matthew.auld, matthew.brost,
dri-devel, linux-kernel, linux-mm
Hi,
This patch solves the performance issue very well.
best regards,
Patryk
śr., 30 lip 2025 o 09:46 Baolin Wang <baolin.wang@linux.alibaba.com> napisał(a):
>
>
>
> On 2025/7/30 14:54, Hugh Dickins wrote:
> > On Mon, 28 Jul 2025, Baolin Wang wrote:
> >
> >> After commit acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs"),
> >> we extend the 'huge=' option to allow any sized large folios for tmpfs,
> >> which means tmpfs will allow getting a highest order hint based on the size
> >> of write() and fallocate() paths, and then will try each allowable large order.
> >>
> >> However, when the i915 driver allocates shmem memory, it doesn't provide hint
> >> information about the size of the large folio to be allocated, resulting in
> >> the inability to allocate PMD-sized shmem, which in turn affects GPU performance.
> >>
> >> To fix this issue, add the 'end' information for shmem_read_folio_gfp() to help
> >> allocate PMD-sized large folios. Additionally, use the maximum allocation chunk
> >> (via mapping_max_folio_size()) to determine the size of the large folios to
> >> allocate in the i915 driver.
> >>
> >> Fixes: acd7ccb284b8 ("mm: shmem: add large folio support for tmpfs")
> >> Reported-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
> >> Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> >> Tested-by: Patryk Kowalczyk <patryk@kowalczyk.ws>
> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >> ---
> >> drivers/gpu/drm/drm_gem.c | 2 +-
> >> drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 7 ++++++-
> >> drivers/gpu/drm/ttm/ttm_backup.c | 2 +-
> >> include/linux/shmem_fs.h | 4 ++--
> >> mm/shmem.c | 7 ++++---
> >> 5 files changed, 14 insertions(+), 8 deletions(-)
> >
> > I know I said "I shall not object to a temporary workaround to suit the
> > i915 driver", but really, I have to question this patch. Why should any
> > change be required at the drivers/gpu/drm end?
> >
> > And in drivers/gpu/drm/{i915,v3d} I find they are using huge=within_size:
> > I had been complaining about the userspace regression in huge=always,
> > and thought it had been changed to behave like huge=within_size,
> > but apparently huge=within_size has itself regressed too.
>
> I'm preparing a RFC patch to discuss this.
>
> > Please explain why the below is not a better patch for i915 and v3d
> > (but still a temporary workaround, because the root of the within_size
> > regression must lie deeper, in the handling of write_end versus i_size).
>
> OK. This looks good to me. Patryk, could you try Hugh's simple patch?
> Thanks.
>
> > ---
> > mm/shmem.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 3a5a65b1f41a..c67dfc17a819 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -5928,8 +5928,8 @@ struct folio *shmem_read_folio_gfp(struct address_space *mapping,
> > struct folio *folio;
> > int error;
> >
> > - error = shmem_get_folio_gfp(inode, index, 0, &folio, SGP_CACHE,
> > - gfp, NULL, NULL);
> > + error = shmem_get_folio_gfp(inode, index, i_size_read(inode),
> > + &folio, SGP_CACHE, gfp, NULL, NULL);
> > if (error)
> > return ERR_PTR(error);
> >
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-07-30 16:11 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-28 8:03 [PATCH] mm: shmem: fix the shmem large folio allocation for the i915 driver Baolin Wang
2025-07-28 21:44 ` Andrew Morton
2025-07-28 21:59 ` Patryk Kowalczyk
2025-07-28 22:16 ` Andrew Morton
2025-07-28 22:08 ` Patryk Kowalczyk
2025-07-30 6:54 ` Hugh Dickins
2025-07-30 7:46 ` Baolin Wang
2025-07-30 16:11 ` Patryk Kowalczyk
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox