* [PATCH 0/2] vmalloc: Introduce vmap_file()
@ 2025-01-31 0:18 Vishal Moola (Oracle)
2025-01-31 0:18 ` [PATCH 1/2] mm/vmalloc: " Vishal Moola (Oracle)
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Vishal Moola (Oracle) @ 2025-01-31 0:18 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, hch, urezki, intel-gfx, Vishal Moola (Oracle)
Currently, users have to call vmap() or vmap_pfn() to map pages to
kernel virtual space. vmap() requires the page references, and
vmap_pfn() requires page pfns. If we have a file but no page references,
we have to do extra work to map them.
Create a function, vmap_file(), to map a specified range of a given
file to kernel virtual space. Also convert a user that benefits from
vmap_file().
Vishal Moola (Oracle) (2):
mm/vmalloc: Introduce vmap_file()
drm: Use vmap_file() in shmem_pin_map()
drivers/gpu/drm/i915/gt/shmem_utils.c | 23 +------
include/linux/vmalloc.h | 2 +
mm/vmalloc.c | 97 +++++++++++++++++++++++++++
3 files changed, 102 insertions(+), 20 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] mm/vmalloc: Introduce vmap_file()
2025-01-31 0:18 [PATCH 0/2] vmalloc: Introduce vmap_file() Vishal Moola (Oracle)
@ 2025-01-31 0:18 ` Vishal Moola (Oracle)
2025-01-31 7:09 ` Christoph Hellwig
2025-01-31 0:18 ` [PATCH 2/2] drm: Use vmap_file() in shmem_pin_map() Vishal Moola (Oracle)
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Vishal Moola (Oracle) @ 2025-01-31 0:18 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, hch, urezki, intel-gfx, Vishal Moola (Oracle)
vmap_file() is effectively an in-kernel equivalent to calling mmap()
on a file. A user can pass in a file mapping, and vmap_file() will map
the specified portion of that file directly to kernel virtual space.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
include/linux/vmalloc.h | 2 +
mm/vmalloc.c | 97 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 99 insertions(+)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 31e9ffd936e3..d5420985865f 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -192,6 +192,8 @@ extern void vfree_atomic(const void *addr);
extern void *vmap(struct page **pages, unsigned int count,
unsigned long flags, pgprot_t prot);
+void *vmap_file(struct address_space *mapping, loff_t start, loff_t end,
+ unsigned long flags, pgprot_t prot);
void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot);
extern void vunmap(const void *addr);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a6e7acebe9ad..4b1e31a8aad9 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3475,6 +3475,103 @@ void *vmap(struct page **pages, unsigned int count,
}
EXPORT_SYMBOL(vmap);
+/**
+ * vmap_file - map all folios in a file to virtually contiguous space.
+ * @mapping: The address space to map.
+ * @start: The starting byte.
+ * @end: The final byte to map.
+ * @flags: vm_area->flags.
+ * @prot: page protection for the mapping.
+ *
+ * Maps a file into contiguous kernel virtual space. The caller is expected
+ * to ensure that the folios caching the file are present and uptodate. The
+ * folios must remain so until the file is unmapped.
+ *
+ * If @start or @end are not PAGE_ALIGNED, vmap_file() will round
+ * @start down and @end up to encompass the entire range. The
+ * address returned is always PAGE_ALIGNED.
+ *
+ * Return: the address of the area or %NULL on failure.
+ */
+void *vmap_file(struct address_space *mapping, loff_t start, loff_t end,
+ unsigned long flags, pgprot_t prot)
+{
+ struct vm_struct *area;
+ struct folio *folio;
+ unsigned long addr;
+ pgoff_t first = start >> PAGE_SHIFT;
+ pgoff_t last = end >> PAGE_SHIFT;
+ XA_STATE(xas, &mapping->i_pages, first);
+
+ unsigned long size = (last - first + 1) << PAGE_SHIFT;
+
+ if (WARN_ON_ONCE(flags & VM_FLUSH_RESET_PERMS))
+ return NULL;
+
+ /*
+ * Your top guard is someone else's bottom guard. Not having a top
+ * guard compromises someone else's mappings too.
+ */
+ if (WARN_ON_ONCE(flags & VM_NO_GUARD))
+ flags &= ~VM_NO_GUARD;
+
+ area = get_vm_area_caller(size, flags, __builtin_return_address(0));
+ if (!area)
+ return NULL;
+
+ addr = (unsigned long) area->addr;
+
+ rcu_read_lock();
+ xas_for_each(&xas, folio, last) {
+ int err;
+ bool pmd_bound;
+
+ if (xas_retry(&xas, folio))
+ continue;
+ if (!folio || xa_is_value(folio) ||
+ !folio_test_uptodate(folio))
+ goto out;
+
+ /* We need to check if this folio will cross the pmd boundary.
+ * If it does, we drop the rcu lock to allow for a new page
+ * table allocation.
+ */
+
+ pmd_bound = (addr == (unsigned long) area->addr) ||
+ (IS_ALIGNED(addr, PMD_SIZE)) ||
+ ((addr & PMD_MASK) !=
+ ((addr + folio_size(folio)) & PMD_MASK));
+
+ if (pmd_bound) {
+ xas_pause(&xas);
+ rcu_read_unlock();
+ }
+
+ err = vmap_range_noflush(addr, addr + folio_size(folio),
+ folio_pfn(folio) << PAGE_SHIFT, prot,
+ PAGE_SHIFT);
+
+ if (pmd_bound)
+ rcu_read_lock();
+
+ if (err) {
+ vunmap(area->addr);
+ area->addr = NULL;
+ goto out;
+ }
+
+ addr += folio_size(folio);
+ }
+
+out:
+ rcu_read_unlock();
+ flush_cache_vmap((unsigned long)area->addr,
+ (unsigned long)area->addr + size);
+
+ return area->addr;
+}
+EXPORT_SYMBOL(vmap_file);
+
#ifdef CONFIG_VMAP_PFN
struct vmap_pfn_data {
unsigned long *pfns;
--
2.47.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] drm: Use vmap_file() in shmem_pin_map()
2025-01-31 0:18 [PATCH 0/2] vmalloc: Introduce vmap_file() Vishal Moola (Oracle)
2025-01-31 0:18 ` [PATCH 1/2] mm/vmalloc: " Vishal Moola (Oracle)
@ 2025-01-31 0:18 ` Vishal Moola (Oracle)
2025-01-31 0:48 ` [PATCH 0/2] vmalloc: Introduce vmap_file() Andrew Morton
2025-01-31 7:10 ` Christoph Hellwig
3 siblings, 0 replies; 9+ messages in thread
From: Vishal Moola (Oracle) @ 2025-01-31 0:18 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, hch, urezki, intel-gfx, Vishal Moola (Oracle)
We no longer need to allocate a new array of pages to map this file
to kernel virtual space. This simplifies shmem_pin_map(), and gets rid
of a user of VM_MAP_PUT_PAGES.
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
drivers/gpu/drm/i915/gt/shmem_utils.c | 23 +++--------------------
1 file changed, 3 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c
index bb696b29ee2c..79d930ed5229 100644
--- a/drivers/gpu/drm/i915/gt/shmem_utils.c
+++ b/drivers/gpu/drm/i915/gt/shmem_utils.c
@@ -57,32 +57,15 @@ struct file *shmem_create_from_object(struct drm_i915_gem_object *obj)
void *shmem_pin_map(struct file *file)
{
- struct page **pages;
- size_t n_pages, i;
void *vaddr;
- n_pages = file->f_mapping->host->i_size >> PAGE_SHIFT;
- pages = kvmalloc_array(n_pages, sizeof(*pages), GFP_KERNEL);
- if (!pages)
- return NULL;
-
- for (i = 0; i < n_pages; i++) {
- pages[i] = shmem_read_mapping_page_gfp(file->f_mapping, i,
- GFP_KERNEL);
- if (IS_ERR(pages[i]))
- goto err_page;
- }
+ vaddr = vmap_file(file->f_mapping, 0, file->f_mapping->host->i_size,
+ VM_MAP, PAGE_KERNEL);
- vaddr = vmap(pages, n_pages, VM_MAP_PUT_PAGES, PAGE_KERNEL);
if (!vaddr)
- goto err_page;
+ return NULL;
mapping_set_unevictable(file->f_mapping);
return vaddr;
-err_page:
- while (i--)
- put_page(pages[i]);
- kvfree(pages);
- return NULL;
}
void shmem_unpin_map(struct file *file, void *ptr)
--
2.47.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] vmalloc: Introduce vmap_file()
2025-01-31 0:18 [PATCH 0/2] vmalloc: Introduce vmap_file() Vishal Moola (Oracle)
2025-01-31 0:18 ` [PATCH 1/2] mm/vmalloc: " Vishal Moola (Oracle)
2025-01-31 0:18 ` [PATCH 2/2] drm: Use vmap_file() in shmem_pin_map() Vishal Moola (Oracle)
@ 2025-01-31 0:48 ` Andrew Morton
2025-02-03 18:53 ` Vishal Moola
2025-01-31 7:10 ` Christoph Hellwig
3 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2025-01-31 0:48 UTC (permalink / raw)
To: Vishal Moola (Oracle); +Cc: linux-mm, linux-kernel, hch, urezki, intel-gfx
On Thu, 30 Jan 2025 16:18:04 -0800 "Vishal Moola (Oracle)" <vishal.moola@gmail.com> wrote:
> Currently, users have to call vmap() or vmap_pfn() to map pages to
> kernel virtual space. vmap() requires the page references, and
> vmap_pfn() requires page pfns. If we have a file but no page references,
> we have to do extra work to map them.
>
> Create a function, vmap_file(), to map a specified range of a given
> file to kernel virtual space. Also convert a user that benefits from
> vmap_file().
>
Seems like a pretty specialized thing. Have you identified any other
potential users of vmap_file()? I couldn't see any.
If drm is likely to remain the only user of this, perhaps we should
leave the code down in drivers/gpu/drm for now?
Also, the amount of copy-n-pasting from vmap() into vmap_file() is
undesirable - code size, maintenance overhead, etc.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/vmalloc: Introduce vmap_file()
2025-01-31 0:18 ` [PATCH 1/2] mm/vmalloc: " Vishal Moola (Oracle)
@ 2025-01-31 7:09 ` Christoph Hellwig
2025-02-03 19:23 ` Vishal Moola
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2025-01-31 7:09 UTC (permalink / raw)
To: Vishal Moola (Oracle)
Cc: akpm, linux-mm, linux-kernel, hch, urezki, intel-gfx
On Thu, Jan 30, 2025 at 04:18:05PM -0800, Vishal Moola (Oracle) wrote:
> + rcu_read_lock();
> + xas_for_each(&xas, folio, last) {
This only maps folios currently in the page cache, which makes it
usefull for everything except ramfs-style purely in-memory file systems.
I.e. for the shmem use case in the second patch it fails to swap in
swapped out tmpfs folios.
> +EXPORT_SYMBOL(vmap_file);
EXPORT_SYMBOL_GPL for any advances vmalloc-layer functionality, please.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] vmalloc: Introduce vmap_file()
2025-01-31 0:18 [PATCH 0/2] vmalloc: Introduce vmap_file() Vishal Moola (Oracle)
` (2 preceding siblings ...)
2025-01-31 0:48 ` [PATCH 0/2] vmalloc: Introduce vmap_file() Andrew Morton
@ 2025-01-31 7:10 ` Christoph Hellwig
3 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2025-01-31 7:10 UTC (permalink / raw)
To: Vishal Moola (Oracle)
Cc: akpm, linux-mm, linux-kernel, hch, urezki, intel-gfx
On Thu, Jan 30, 2025 at 04:18:04PM -0800, Vishal Moola (Oracle) wrote:
> Currently, users have to call vmap() or vmap_pfn() to map pages to
> kernel virtual space. vmap() requires the page references, and
> vmap_pfn() requires page pfns. If we have a file but no page references,
> we have to do extra work to map them.
>
> Create a function, vmap_file(), to map a specified range of a given
> file to kernel virtual space. Also convert a user that benefits from
> vmap_file().
As far as I can tell there is exatly one user that maps file pages
into vmalloc space. It's a pretty odd thing to do, so figuring out
a way to get rid of that might be a better use of time.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] vmalloc: Introduce vmap_file()
2025-01-31 0:48 ` [PATCH 0/2] vmalloc: Introduce vmap_file() Andrew Morton
@ 2025-02-03 18:53 ` Vishal Moola
2025-04-08 14:04 ` Brendan Jackman
0 siblings, 1 reply; 9+ messages in thread
From: Vishal Moola @ 2025-02-03 18:53 UTC (permalink / raw)
To: Andrew Morton
Cc: linux-mm, linux-kernel, hch, urezki, intel-gfx, Matthew Wilcox
On Thu, Jan 30, 2025 at 4:48 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 30 Jan 2025 16:18:04 -0800 "Vishal Moola (Oracle)" <vishal.moola@gmail.com> wrote:
>
> > Currently, users have to call vmap() or vmap_pfn() to map pages to
> > kernel virtual space. vmap() requires the page references, and
> > vmap_pfn() requires page pfns. If we have a file but no page references,
> > we have to do extra work to map them.
> >
> > Create a function, vmap_file(), to map a specified range of a given
> > file to kernel virtual space. Also convert a user that benefits from
> > vmap_file().
> >
>
> Seems like a pretty specialized thing. Have you identified any other
> potential users of vmap_file()? I couldn't see any.
>
> If drm is likely to remain the only user of this, perhaps we should
> leave the code down in drivers/gpu/drm for now?
This function is generally useful for file-systems that use the pagecache.
I simply chose to highlight the most obvious user that benefits from it (and
so that the function is introduced with a user).
I haven't identified any other specific users of vmap_file() myself. I know
Matthew has some other ideas for it; I've cc-ed him so he can chime in.
>
> Also, the amount of copy-n-pasting from vmap() into vmap_file() is
> undesirable - code size, maintenance overhead, etc.
I wasn't particularly a fan of it either, but I couldn't find a more readable
way to do this (without reorganizing multiple other functions). Aside from
the initial flags checks, the rest of the function is slightly different from
vmap(), so calling existing functions won't suffice.
I considered passing more arguments through to vmap(), but I think that
would make the code more confusing, especially because the 2 functions
have some different usage prerequisites.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] mm/vmalloc: Introduce vmap_file()
2025-01-31 7:09 ` Christoph Hellwig
@ 2025-02-03 19:23 ` Vishal Moola
0 siblings, 0 replies; 9+ messages in thread
From: Vishal Moola @ 2025-02-03 19:23 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: akpm, linux-mm, linux-kernel, urezki, intel-gfx
On Thu, Jan 30, 2025 at 11:09 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Jan 30, 2025 at 04:18:05PM -0800, Vishal Moola (Oracle) wrote:
> > + rcu_read_lock();
> > + xas_for_each(&xas, folio, last) {
>
> This only maps folios currently in the page cache, which makes it
> usefull for everything except ramfs-style purely in-memory file systems.
> I.e. for the shmem use case in the second patch it fails to swap in
> swapped out tmpfs folios.
Ah, I see. I can drop that patch then. Its primary purpose was to provide a
user for vmap_file(). As you've pointed out, that won't work with tmpfs or
anon pages. I'll hold off on a v2 until there are better usecases for
vmap_file().
> > +EXPORT_SYMBOL(vmap_file);
>
> EXPORT_SYMBOL_GPL for any advances vmalloc-layer functionality, please.
Ok, I'll keep that in mind in the future.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/2] vmalloc: Introduce vmap_file()
2025-02-03 18:53 ` Vishal Moola
@ 2025-04-08 14:04 ` Brendan Jackman
0 siblings, 0 replies; 9+ messages in thread
From: Brendan Jackman @ 2025-04-08 14:04 UTC (permalink / raw)
To: Vishal Moola, Andrew Morton
Cc: linux-mm, linux-kernel, hch, urezki, intel-gfx, Matthew Wilcox
On Mon Feb 3, 2025 at 6:53 PM UTC, Vishal Moola wrote:
> On Thu, Jan 30, 2025 at 4:48 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>> On Thu, 30 Jan 2025 16:18:04 -0800 "Vishal Moola (Oracle)" <vishal.moola@gmail.com> wrote:
>>
>> > Currently, users have to call vmap() or vmap_pfn() to map pages to
>> > kernel virtual space. vmap() requires the page references, and
>> > vmap_pfn() requires page pfns. If we have a file but no page references,
>> > we have to do extra work to map them.
>> >
>> > Create a function, vmap_file(), to map a specified range of a given
>> > file to kernel virtual space. Also convert a user that benefits from
>> > vmap_file().
>> >
>>
>> Seems like a pretty specialized thing. Have you identified any other
>> potential users of vmap_file()? I couldn't see any.
>>
>> If drm is likely to remain the only user of this, perhaps we should
>> leave the code down in drivers/gpu/drm for now?
>
> This function is generally useful for file-systems that use the pagecache.
> I simply chose to highlight the most obvious user that benefits from it (and
> so that the function is introduced with a user).
>
> I haven't identified any other specific users of vmap_file() myself. I know
> Matthew has some other ideas for it; I've cc-ed him so he can chime in.
Not much to add but just to confirm - yep, this seems like it might be
useful as a part of the solution to the page cache perf issue[1] with
ASI that I spoke about (briefly and chaotically) at the end of the
LSF/MM/BPF session[0] on ASI this year.
[0] https://lwn.net/Articles/1016013/
[1] https://lore.kernel.org/linux-mm/20250129144320.2675822-1-jackmanb@google.com/
But, for the moment this is all still pretty vague stuff, not at all
clear yet that this idea makes total sense. Hopefully I'll be able to
follow up in a few weeks after I've made some time to stare at/prototype
things.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-04-08 14:04 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-31 0:18 [PATCH 0/2] vmalloc: Introduce vmap_file() Vishal Moola (Oracle)
2025-01-31 0:18 ` [PATCH 1/2] mm/vmalloc: " Vishal Moola (Oracle)
2025-01-31 7:09 ` Christoph Hellwig
2025-02-03 19:23 ` Vishal Moola
2025-01-31 0:18 ` [PATCH 2/2] drm: Use vmap_file() in shmem_pin_map() Vishal Moola (Oracle)
2025-01-31 0:48 ` [PATCH 0/2] vmalloc: Introduce vmap_file() Andrew Morton
2025-02-03 18:53 ` Vishal Moola
2025-04-08 14:04 ` Brendan Jackman
2025-01-31 7:10 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox