linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/1] Introduce vmap_file()
@ 2025-03-28 21:13 Vishal Moola (Oracle)
  2025-03-28 21:13 ` [RFC PATCH v2 1/1] mm/vmalloc: " Vishal Moola (Oracle)
  2025-03-31  2:05 ` [RFC PATCH v2 0/1] " Huan Yang
  0 siblings, 2 replies; 11+ messages in thread
From: Vishal Moola (Oracle) @ 2025-03-28 21:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Brendan Jackman, Huan Yang, linux-kernel,
	Vishal Moola (Oracle)

Currently, users have to call vmap() or vmap_pfn() to map pages to
kernel virtual space. vmap_pfn() is for special pages (i.e. pfns
without struct page). vmap() handles normal pages.

With large folios, we may want to map ranges that only span
part of a folio (i.e. mapping half of a 2Mb folio).
vmap_file() will allow us to do so.
 
Create a function, vmap_file(), to map a specified range of a given
file to kernel virtual space. vmap_file() is an in-kernel equivalent
to mmap(), and can be useful for filesystems.

---
v2:
  - Reword cover letter to provide a clearer overview of the current
  vmalloc APIs, and usefulness of vmap_file()
  - EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL()
  - Provide support to partially map file folios
  - Demote this to RFC while we look for users
--
I don't have a user for this function right now, but it will be
useful as users start converting to using large folios. I'm just
putting it out here for anyone that may find a use for it.

This seems like the sensible way to implement it, but I'm open
to tweaking the functions semantics.

I've Cc-ed a couple people that mentioned they might be interested
in using it.

Vishal Moola (Oracle) (1):
  mm/vmalloc: Introduce vmap_file()

 include/linux/vmalloc.h |   2 +
 mm/vmalloc.c            | 113 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+)

-- 
2.48.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH v2 1/1] mm/vmalloc: Introduce vmap_file()
  2025-03-28 21:13 [RFC PATCH v2 0/1] Introduce vmap_file() Vishal Moola (Oracle)
@ 2025-03-28 21:13 ` Vishal Moola (Oracle)
  2025-03-31  2:05 ` [RFC PATCH v2 0/1] " Huan Yang
  1 sibling, 0 replies; 11+ messages in thread
From: Vishal Moola (Oracle) @ 2025-03-28 21:13 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, Brendan Jackman, Huan Yang, linux-kernel,
	Vishal Moola (Oracle)

vmap_file() is effectively an in-kernel equivalent to calling mmap()
on a file. A user can pass in a file mapping, and vmap_file() will map
the specified portion of that file directly to kernel virtual space.

Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
---
 include/linux/vmalloc.h |   2 +
 mm/vmalloc.c            | 113 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 115 insertions(+)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 31e9ffd936e3..d5420985865f 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -192,6 +192,8 @@ extern void vfree_atomic(const void *addr);
 
 extern void *vmap(struct page **pages, unsigned int count,
 			unsigned long flags, pgprot_t prot);
+void *vmap_file(struct address_space *mapping, loff_t start, loff_t end,
+			unsigned long flags, pgprot_t prot);
 void *vmap_pfn(unsigned long *pfns, unsigned int count, pgprot_t prot);
 extern void vunmap(const void *addr);
 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3ed720a787ec..b94489032ab5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3475,6 +3475,119 @@ void *vmap(struct page **pages, unsigned int count,
 }
 EXPORT_SYMBOL(vmap);
 
+/**
+ * vmap_file - map all folios in a file to virtually contiguous space.
+ * @mapping: The address space to map.
+ * @start: The starting byte.
+ * @end: The final byte to map.
+ * @flags: vm_area->flags.
+ * @prot: page protection for the mapping.
+ *
+ * Maps a file into contiguous kernel virtual space. The caller is expected
+ * to ensure that the folios caching the file are present and uptodate. The
+ * folios must remain so until the file is unmapped.
+ *
+ * If @start or @end are not PAGE_ALIGNED, vmap_file() will round
+ * @start down and @end up to encompass the desired pages. The
+ * address returned is always PAGE_ALIGNED.
+ *
+ * Return: the address of the area or %NULL on failure.
+ */
+void *vmap_file(struct address_space *mapping, loff_t start, loff_t end,
+		unsigned long flags, pgprot_t prot)
+{
+	struct vm_struct *area;
+	struct folio *folio;
+	unsigned long addr, end_addr;
+	const pgoff_t first = start >> PAGE_SHIFT;
+	const pgoff_t last = end >> PAGE_SHIFT;
+	XA_STATE(xas, &mapping->i_pages, first);
+
+	unsigned long size = (last - first + 1) << PAGE_SHIFT;
+
+	if (WARN_ON_ONCE(flags & VM_FLUSH_RESET_PERMS))
+		return NULL;
+
+	/*
+	 * Your top guard is someone else's bottom guard. Not having a top
+	 * guard compromises someone else's mappings too.
+	 */
+	if (WARN_ON_ONCE(flags & VM_NO_GUARD))
+		flags &= ~VM_NO_GUARD;
+
+	area = get_vm_area_caller(size, flags, __builtin_return_address(0));
+	if (!area)
+		return NULL;
+
+	addr = (unsigned long) area->addr;
+	end_addr = addr + size;
+
+	rcu_read_lock();
+	xas_for_each(&xas, folio, last) {
+		phys_addr_t map_start;
+		int map_size, err;
+		bool pmd_bound, is_first_map;
+
+		if (xas_retry(&xas, folio))
+			continue;
+		if (!folio || xa_is_value(folio) ||
+				!folio_test_uptodate(folio))
+			goto out;
+
+		is_first_map = (addr == (unsigned long) area->addr);
+		map_start = folio_pfn(folio) << PAGE_SHIFT;
+		map_size = folio_size(folio);
+
+		/* We can unconditionally calculate values for the first
+		 * folio. This lets us handle skipping pages in the first
+		 * folio without verifying addresses every iteration.
+		 */
+		if (is_first_map) {
+			map_size -= (first - folio->index) << PAGE_SHIFT;
+			map_start += (first - folio->index) << PAGE_SHIFT;
+		}
+
+		if (addr + map_size > end_addr)
+			map_size = end_addr - addr;
+
+		/* We need to check if this folio will cross the pmd boundary.
+		 * If it does, we drop the rcu lock to allow for a new page
+		 * table allocation.
+		 */
+
+		pmd_bound = is_first_map ||
+			(IS_ALIGNED(addr, PMD_SIZE)) ||
+			((addr & PMD_MASK) !=
+			((addr + map_size) & PMD_MASK));
+
+		if (pmd_bound) {
+			xas_pause(&xas);
+			rcu_read_unlock();
+		}
+
+		err = vmap_range_noflush(addr, addr + map_size,
+				map_start, prot, PAGE_SHIFT);
+
+		if (pmd_bound)
+			rcu_read_lock();
+
+		if (err) {
+			vunmap(area->addr);
+			area->addr = NULL;
+			goto out;
+		}
+
+		addr += map_size;
+	}
+
+out:
+	rcu_read_unlock();
+	flush_cache_vmap((unsigned long)area->addr, end_addr);
+
+	return area->addr;
+}
+EXPORT_SYMBOL_GPL(vmap_file);
+
 #ifdef CONFIG_VMAP_PFN
 struct vmap_pfn_data {
 	unsigned long	*pfns;
-- 
2.48.1



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 0/1] Introduce vmap_file()
  2025-03-28 21:13 [RFC PATCH v2 0/1] Introduce vmap_file() Vishal Moola (Oracle)
  2025-03-28 21:13 ` [RFC PATCH v2 1/1] mm/vmalloc: " Vishal Moola (Oracle)
@ 2025-03-31  2:05 ` Huan Yang
  2025-04-01  1:50   ` Vishal Moola (Oracle)
  1 sibling, 1 reply; 11+ messages in thread
From: Huan Yang @ 2025-03-31  2:05 UTC (permalink / raw)
  To: Vishal Moola (Oracle), linux-mm
  Cc: Andrew Morton, Brendan Jackman, linux-kernel

HI Vishal,

在 2025/3/29 05:13, Vishal Moola (Oracle) 写道:
> Currently, users have to call vmap() or vmap_pfn() to map pages to
> kernel virtual space. vmap_pfn() is for special pages (i.e. pfns
> without struct page). vmap() handles normal pages.
>
> With large folios, we may want to map ranges that only span
> part of a folio (i.e. mapping half of a 2Mb folio).
> vmap_file() will allow us to do so.

You mention vmap_file can support range folio vmap, but when I look code, I can't figure out

how to use, maybe I missed something? :)

And this API still aim to file vmap, Maybe not suitable for the problem I mentioned in:

https://lore.kernel.org/lkml/20250312061513.1126496-1-link@vivo.com/

Thanks,
Huan Yang

>   
> Create a function, vmap_file(), to map a specified range of a given
> file to kernel virtual space. vmap_file() is an in-kernel equivalent
> to mmap(), and can be useful for filesystems.
>
> ---
> v2:
>    - Reword cover letter to provide a clearer overview of the current
>    vmalloc APIs, and usefulness of vmap_file()
>    - EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL()
>    - Provide support to partially map file folios
>    - Demote this to RFC while we look for users
> --
> I don't have a user for this function right now, but it will be
> useful as users start converting to using large folios. I'm just
> putting it out here for anyone that may find a use for it.
>
> This seems like the sensible way to implement it, but I'm open
> to tweaking the functions semantics.
>
> I've Cc-ed a couple people that mentioned they might be interested
> in using it.
>
> Vishal Moola (Oracle) (1):
>    mm/vmalloc: Introduce vmap_file()
>
>   include/linux/vmalloc.h |   2 +
>   mm/vmalloc.c            | 113 ++++++++++++++++++++++++++++++++++++++++
>   2 files changed, 115 insertions(+)
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 0/1] Introduce vmap_file()
  2025-03-31  2:05 ` [RFC PATCH v2 0/1] " Huan Yang
@ 2025-04-01  1:50   ` Vishal Moola (Oracle)
  2025-04-01  2:21     ` Huan Yang
  0 siblings, 1 reply; 11+ messages in thread
From: Vishal Moola (Oracle) @ 2025-04-01  1:50 UTC (permalink / raw)
  To: Huan Yang; +Cc: linux-mm, Andrew Morton, Brendan Jackman, linux-kernel

On Mon, Mar 31, 2025 at 10:05:53AM +0800, Huan Yang wrote:
> HI Vishal,
> 
> 在 2025/3/29 05:13, Vishal Moola (Oracle) 写道:
> > Currently, users have to call vmap() or vmap_pfn() to map pages to
> > kernel virtual space. vmap_pfn() is for special pages (i.e. pfns
> > without struct page). vmap() handles normal pages.
> > 
> > With large folios, we may want to map ranges that only span
> > part of a folio (i.e. mapping half of a 2Mb folio).
> > vmap_file() will allow us to do so.
> 
> You mention vmap_file can support range folio vmap, but when I look code, I can't figure out
> 
> how to use, maybe I missed something? :)

I took a look at the udma-buf code. Rather than iterating through the
folios using pfns, you can calculate the corresponding file offsets 
(maybe you already have them?) to map the desired folios.

> And this API still aim to file vmap, Maybe not suitable for the problem I mentioned in:
> 
> https://lore.kernel.org/lkml/20250312061513.1126496-1-link@vivo.com/

I'm not sure which problem you're referring to, could you be more
specific?

> Thanks,
> Huan Yang
> 
> > Create a function, vmap_file(), to map a specified range of a given
> > file to kernel virtual space. vmap_file() is an in-kernel equivalent
> > to mmap(), and can be useful for filesystems.
> > 
> > ---
> > v2:
> >    - Reword cover letter to provide a clearer overview of the current
> >    vmalloc APIs, and usefulness of vmap_file()
> >    - EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL()
> >    - Provide support to partially map file folios
> >    - Demote this to RFC while we look for users
> > --
> > I don't have a user for this function right now, but it will be
> > useful as users start converting to using large folios. I'm just
> > putting it out here for anyone that may find a use for it.
> > 
> > This seems like the sensible way to implement it, but I'm open
> > to tweaking the functions semantics.
> > 
> > I've Cc-ed a couple people that mentioned they might be interested
> > in using it.
> > 
> > Vishal Moola (Oracle) (1):
> >    mm/vmalloc: Introduce vmap_file()
> > 
> >   include/linux/vmalloc.h |   2 +
> >   mm/vmalloc.c            | 113 ++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 115 insertions(+)
> > 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 0/1] Introduce vmap_file()
  2025-04-01  1:50   ` Vishal Moola (Oracle)
@ 2025-04-01  2:21     ` Huan Yang
  2025-04-01  3:19       ` Vishal Moola (Oracle)
  0 siblings, 1 reply; 11+ messages in thread
From: Huan Yang @ 2025-04-01  2:21 UTC (permalink / raw)
  To: Vishal Moola (Oracle)
  Cc: linux-mm, Andrew Morton, Brendan Jackman, linux-kernel


在 2025/4/1 09:50, Vishal Moola (Oracle) 写道:
> On Mon, Mar 31, 2025 at 10:05:53AM +0800, Huan Yang wrote:
>> HI Vishal,
>>
>> 在 2025/3/29 05:13, Vishal Moola (Oracle) 写道:
>>> Currently, users have to call vmap() or vmap_pfn() to map pages to
>>> kernel virtual space. vmap_pfn() is for special pages (i.e. pfns
>>> without struct page). vmap() handles normal pages.
>>>
>>> With large folios, we may want to map ranges that only span
>>> part of a folio (i.e. mapping half of a 2Mb folio).
>>> vmap_file() will allow us to do so.
>> You mention vmap_file can support range folio vmap, but when I look code, I can't figure out
>>
>> how to use, maybe I missed something? :)
> I took a look at the udma-buf code. Rather than iterating through the
> folios using pfns, you can calculate the corresponding file offsets
> (maybe you already have them?) to map the desired folios.

Currently udmabuf folio's not simple based on file(even each memory from memfd). User can provide

random range of memfd  to udmabuf to use. For example:

We get a memfd maybe 4M, user split it into [0, 2M), [1M, 2M), [2M, 4M), so you can see 1M-2M range repeat.

This range can gathered by udmabuf_create_list, then udmabuf use it. So, udmabuf record it by folio array+offset array.

I think vmap_file based on address_space's range can't help.

>
>> And this API still aim to file vmap, Maybe not suitable for the problem I mentioned in:
>>
>> https://lore.kernel.org/lkml/20250312061513.1126496-1-link@vivo.com/
> I'm not sure which problem you're referring to, could you be more
> specific?

1. udmabuf not same to file vmap usage

2. udmabuf can't use page struct if HVO hugetlb enabled and use.

It still need pfn based vmap or folio's offset based range vmap.(Or, just simple reject HVO folio use vmap) :)

>
>> Thanks,
>> Huan Yang
>>
>>> Create a function, vmap_file(), to map a specified range of a given
>>> file to kernel virtual space. vmap_file() is an in-kernel equivalent
>>> to mmap(), and can be useful for filesystems.
>>>
>>> ---
>>> v2:
>>>     - Reword cover letter to provide a clearer overview of the current
>>>     vmalloc APIs, and usefulness of vmap_file()
>>>     - EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL()
>>>     - Provide support to partially map file folios
>>>     - Demote this to RFC while we look for users
>>> --
>>> I don't have a user for this function right now, but it will be
>>> useful as users start converting to using large folios. I'm just
>>> putting it out here for anyone that may find a use for it.
>>>
>>> This seems like the sensible way to implement it, but I'm open
>>> to tweaking the functions semantics.
>>>
>>> I've Cc-ed a couple people that mentioned they might be interested
>>> in using it.
>>>
>>> Vishal Moola (Oracle) (1):
>>>     mm/vmalloc: Introduce vmap_file()
>>>
>>>    include/linux/vmalloc.h |   2 +
>>>    mm/vmalloc.c            | 113 ++++++++++++++++++++++++++++++++++++++++
>>>    2 files changed, 115 insertions(+)
>>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 0/1] Introduce vmap_file()
  2025-04-01  2:21     ` Huan Yang
@ 2025-04-01  3:19       ` Vishal Moola (Oracle)
  2025-04-01  6:08         ` Huan Yang
  0 siblings, 1 reply; 11+ messages in thread
From: Vishal Moola (Oracle) @ 2025-04-01  3:19 UTC (permalink / raw)
  To: Huan Yang; +Cc: linux-mm, Andrew Morton, Brendan Jackman, linux-kernel

On Tue, Apr 01, 2025 at 10:21:46AM +0800, Huan Yang wrote:
> 
> 在 2025/4/1 09:50, Vishal Moola (Oracle) 写道:
> > On Mon, Mar 31, 2025 at 10:05:53AM +0800, Huan Yang wrote:
> > > HI Vishal,
> > > 
> > > 在 2025/3/29 05:13, Vishal Moola (Oracle) 写道:
> > > > Currently, users have to call vmap() or vmap_pfn() to map pages to
> > > > kernel virtual space. vmap_pfn() is for special pages (i.e. pfns
> > > > without struct page). vmap() handles normal pages.
> > > > 
> > > > With large folios, we may want to map ranges that only span
> > > > part of a folio (i.e. mapping half of a 2Mb folio).
> > > > vmap_file() will allow us to do so.
> > > You mention vmap_file can support range folio vmap, but when I look code, I can't figure out
> > > 
> > > how to use, maybe I missed something? :)
> > I took a look at the udma-buf code. Rather than iterating through the
> > folios using pfns, you can calculate the corresponding file offsets
> > (maybe you already have them?) to map the desired folios.
> 
> Currently udmabuf folio's not simple based on file(even each memory from memfd). User can provide
> 
> random range of memfd  to udmabuf to use. For example:
> 
> We get a memfd maybe 4M, user split it into [0, 2M), [1M, 2M), [2M, 4M), so you can see 1M-2M range repeat.
> 
> This range can gathered by udmabuf_create_list, then udmabuf use it. So, udmabuf record it by folio array+offset array.

I was thinking you could call vmap_file() on every sub-range and use
those addresses. It should work, we'd have to look at making udmabuf api's
support it.

> I think vmap_file based on address_space's range can't help.

I'm not familiar with the memfd/gup code yet, but I'm fairly confident
those memfds will have associated ->f_mappings that would suffice. They
are file descriptors after all.

> > 
> > > And this API still aim to file vmap, Maybe not suitable for the problem I mentioned in:
> > > 
> > > https://lore.kernel.org/lkml/20250312061513.1126496-1-link@vivo.com/
> > I'm not sure which problem you're referring to, could you be more
> > specific?
> 
> 1. udmabuf not same to file vmap usage
> 
> 2. udmabuf can't use page struct if HVO hugetlb enabled and use.

vmap_file() doesn't depend on tail page structs.

> It still need pfn based vmap or folio's offset based range vmap.(Or, just simple reject HVO folio use vmap) :)
> 
> > 
> > > Thanks,
> > > Huan Yang
> > > 
> > > > Create a function, vmap_file(), to map a specified range of a given
> > > > file to kernel virtual space. vmap_file() is an in-kernel equivalent
> > > > to mmap(), and can be useful for filesystems.
> > > > 
> > > > ---
> > > > v2:
> > > >     - Reword cover letter to provide a clearer overview of the current
> > > >     vmalloc APIs, and usefulness of vmap_file()
> > > >     - EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL()
> > > >     - Provide support to partially map file folios
> > > >     - Demote this to RFC while we look for users
> > > > --
> > > > I don't have a user for this function right now, but it will be
> > > > useful as users start converting to using large folios. I'm just
> > > > putting it out here for anyone that may find a use for it.
> > > > 
> > > > This seems like the sensible way to implement it, but I'm open
> > > > to tweaking the functions semantics.
> > > > 
> > > > I've Cc-ed a couple people that mentioned they might be interested
> > > > in using it.
> > > > 
> > > > Vishal Moola (Oracle) (1):
> > > >     mm/vmalloc: Introduce vmap_file()
> > > > 
> > > >    include/linux/vmalloc.h |   2 +
> > > >    mm/vmalloc.c            | 113 ++++++++++++++++++++++++++++++++++++++++
> > > >    2 files changed, 115 insertions(+)
> > > > 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 0/1] Introduce vmap_file()
  2025-04-01  3:19       ` Vishal Moola (Oracle)
@ 2025-04-01  6:08         ` Huan Yang
  2025-04-01  9:47           ` Uladzislau Rezki
  2025-04-01 17:31           ` Vishal Moola (Oracle)
  0 siblings, 2 replies; 11+ messages in thread
From: Huan Yang @ 2025-04-01  6:08 UTC (permalink / raw)
  To: Vishal Moola (Oracle)
  Cc: linux-mm, Andrew Morton, Brendan Jackman, linux-kernel


在 2025/4/1 11:19, Vishal Moola (Oracle) 写道:
> On Tue, Apr 01, 2025 at 10:21:46AM +0800, Huan Yang wrote:
>> 在 2025/4/1 09:50, Vishal Moola (Oracle) 写道:
>>> On Mon, Mar 31, 2025 at 10:05:53AM +0800, Huan Yang wrote:
>>>> HI Vishal,
>>>>
>>>> 在 2025/3/29 05:13, Vishal Moola (Oracle) 写道:
>>>>> Currently, users have to call vmap() or vmap_pfn() to map pages to
>>>>> kernel virtual space. vmap_pfn() is for special pages (i.e. pfns
>>>>> without struct page). vmap() handles normal pages.
>>>>>
>>>>> With large folios, we may want to map ranges that only span
>>>>> part of a folio (i.e. mapping half of a 2Mb folio).
>>>>> vmap_file() will allow us to do so.
>>>> You mention vmap_file can support range folio vmap, but when I look code, I can't figure out
>>>>
>>>> how to use, maybe I missed something? :)
>>> I took a look at the udma-buf code. Rather than iterating through the
>>> folios using pfns, you can calculate the corresponding file offsets
>>> (maybe you already have them?) to map the desired folios.
>> Currently udmabuf folio's not simple based on file(even each memory from memfd). User can provide
>>
>> random range of memfd  to udmabuf to use. For example:
>>
>> We get a memfd maybe 4M, user split it into [0, 2M), [1M, 2M), [2M, 4M), so you can see 1M-2M range repeat.
>>
>> This range can gathered by udmabuf_create_list, then udmabuf use it. So, udmabuf record it by folio array+offset array.
> I was thinking you could call vmap_file() on every sub-range and use
> those addresses. It should work, we'd have to look at making udmabuf api's
> support it.

Hmmm, how to get contigous virtual address? Or there are a way to merge each split vmap's return address?

IMO, user invoke vmap want to map each scatter memory into contigous virtual address, but as your suggestion,

I think can't to this. :)

>
>> I think vmap_file based on address_space's range can't help.
> I'm not familiar with the memfd/gup code yet, but I'm fairly confident
> those memfds will have associated ->f_mappings that would suffice. They
> are file descriptors after all.
Agree with this.
>
>>>> And this API still aim to file vmap, Maybe not suitable for the problem I mentioned in:
>>>>
>>>> https://lore.kernel.org/lkml/20250312061513.1126496-1-link@vivo.com/
>>> I'm not sure which problem you're referring to, could you be more
>>> specific?
>> 1. udmabuf not same to file vmap usage
>>
>> 2. udmabuf can't use page struct if HVO hugetlb enabled and use.
> vmap_file() doesn't depend on tail page structs.
>
>> It still need pfn based vmap or folio's offset based range vmap.(Or, just simple reject HVO folio use vmap) :)
>>
>>>> Thanks,
>>>> Huan Yang
>>>>
>>>>> Create a function, vmap_file(), to map a specified range of a given
>>>>> file to kernel virtual space. vmap_file() is an in-kernel equivalent
>>>>> to mmap(), and can be useful for filesystems.
>>>>>
>>>>> ---
>>>>> v2:
>>>>>      - Reword cover letter to provide a clearer overview of the current
>>>>>      vmalloc APIs, and usefulness of vmap_file()
>>>>>      - EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL()
>>>>>      - Provide support to partially map file folios
>>>>>      - Demote this to RFC while we look for users
>>>>> --
>>>>> I don't have a user for this function right now, but it will be
>>>>> useful as users start converting to using large folios. I'm just
>>>>> putting it out here for anyone that may find a use for it.
>>>>>
>>>>> This seems like the sensible way to implement it, but I'm open
>>>>> to tweaking the functions semantics.
>>>>>
>>>>> I've Cc-ed a couple people that mentioned they might be interested
>>>>> in using it.
>>>>>
>>>>> Vishal Moola (Oracle) (1):
>>>>>      mm/vmalloc: Introduce vmap_file()
>>>>>
>>>>>     include/linux/vmalloc.h |   2 +
>>>>>     mm/vmalloc.c            | 113 ++++++++++++++++++++++++++++++++++++++++
>>>>>     2 files changed, 115 insertions(+)
>>>>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 0/1] Introduce vmap_file()
  2025-04-01  6:08         ` Huan Yang
@ 2025-04-01  9:47           ` Uladzislau Rezki
  2025-04-01 11:09             ` Huan Yang
  2025-04-01 17:31           ` Vishal Moola (Oracle)
  1 sibling, 1 reply; 11+ messages in thread
From: Uladzislau Rezki @ 2025-04-01  9:47 UTC (permalink / raw)
  To: Huan Yang, Vishal Moola (Oracle)
  Cc: Vishal Moola (Oracle),
	linux-mm, Andrew Morton, Brendan Jackman, linux-kernel

On Tue, Apr 01, 2025 at 02:08:53PM +0800, Huan Yang wrote:
> 
> 在 2025/4/1 11:19, Vishal Moola (Oracle) 写道:
> > On Tue, Apr 01, 2025 at 10:21:46AM +0800, Huan Yang wrote:
> > > 在 2025/4/1 09:50, Vishal Moola (Oracle) 写道:
> > > > On Mon, Mar 31, 2025 at 10:05:53AM +0800, Huan Yang wrote:
> > > > > HI Vishal,
> > > > > 
> > > > > 在 2025/3/29 05:13, Vishal Moola (Oracle) 写道:
> > > > > > Currently, users have to call vmap() or vmap_pfn() to map pages to
> > > > > > kernel virtual space. vmap_pfn() is for special pages (i.e. pfns
> > > > > > without struct page). vmap() handles normal pages.
> > > > > > 
> > > > > > With large folios, we may want to map ranges that only span
> > > > > > part of a folio (i.e. mapping half of a 2Mb folio).
> > > > > > vmap_file() will allow us to do so.
> > > > > You mention vmap_file can support range folio vmap, but when I look code, I can't figure out
> > > > > 
> > > > > how to use, maybe I missed something? :)
> > > > I took a look at the udma-buf code. Rather than iterating through the
> > > > folios using pfns, you can calculate the corresponding file offsets
> > > > (maybe you already have them?) to map the desired folios.
> > > Currently udmabuf folio's not simple based on file(even each memory from memfd). User can provide
> > > 
> > > random range of memfd  to udmabuf to use. For example:
> > > 
> > > We get a memfd maybe 4M, user split it into [0, 2M), [1M, 2M), [2M, 4M), so you can see 1M-2M range repeat.
> > > 
> > > This range can gathered by udmabuf_create_list, then udmabuf use it. So, udmabuf record it by folio array+offset array.
> > I was thinking you could call vmap_file() on every sub-range and use
> > those addresses. It should work, we'd have to look at making udmabuf api's
> > support it.
> 
> Hmmm, how to get contigous virtual address? Or there are a way to merge each split vmap's return address?
> 
The patch in question maps whole file to continues memory as i see, but
i can miss something. Partly populate technique requires to get an area
and partly populate it.

As i see we have something similar:

<snip>
/**
 * vm_area_map_pages - map pages inside given sparse vm_area
 * @area: vm_area
 * @start: start address inside vm_area
 * @end: end address inside vm_area
 * @pages: pages to map (always PAGE_SIZE pages)
 */
int vm_area_map_pages(struct vm_struct *area, unsigned long start,
		      unsigned long end, struct page **pages)
{
...
<snip>

it is used by the BPF.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 0/1] Introduce vmap_file()
  2025-04-01  9:47           ` Uladzislau Rezki
@ 2025-04-01 11:09             ` Huan Yang
  2025-04-01 16:43               ` Uladzislau Rezki
  0 siblings, 1 reply; 11+ messages in thread
From: Huan Yang @ 2025-04-01 11:09 UTC (permalink / raw)
  To: Uladzislau Rezki, Vishal Moola (Oracle)
  Cc: linux-mm, Andrew Morton, Brendan Jackman, linux-kernel


在 2025/4/1 17:47, Uladzislau Rezki 写道:
> On Tue, Apr 01, 2025 at 02:08:53PM +0800, Huan Yang wrote:
>> 在 2025/4/1 11:19, Vishal Moola (Oracle) 写道:
>>> On Tue, Apr 01, 2025 at 10:21:46AM +0800, Huan Yang wrote:
>>>> 在 2025/4/1 09:50, Vishal Moola (Oracle) 写道:
>>>>> On Mon, Mar 31, 2025 at 10:05:53AM +0800, Huan Yang wrote:
>>>>>> HI Vishal,
>>>>>>
>>>>>> 在 2025/3/29 05:13, Vishal Moola (Oracle) 写道:
>>>>>>> Currently, users have to call vmap() or vmap_pfn() to map pages to
>>>>>>> kernel virtual space. vmap_pfn() is for special pages (i.e. pfns
>>>>>>> without struct page). vmap() handles normal pages.
>>>>>>>
>>>>>>> With large folios, we may want to map ranges that only span
>>>>>>> part of a folio (i.e. mapping half of a 2Mb folio).
>>>>>>> vmap_file() will allow us to do so.
>>>>>> You mention vmap_file can support range folio vmap, but when I look code, I can't figure out
>>>>>>
>>>>>> how to use, maybe I missed something? :)
>>>>> I took a look at the udma-buf code. Rather than iterating through the
>>>>> folios using pfns, you can calculate the corresponding file offsets
>>>>> (maybe you already have them?) to map the desired folios.
>>>> Currently udmabuf folio's not simple based on file(even each memory from memfd). User can provide
>>>>
>>>> random range of memfd  to udmabuf to use. For example:
>>>>
>>>> We get a memfd maybe 4M, user split it into [0, 2M), [1M, 2M), [2M, 4M), so you can see 1M-2M range repeat.
>>>>
>>>> This range can gathered by udmabuf_create_list, then udmabuf use it. So, udmabuf record it by folio array+offset array.
Here, :)
>>> I was thinking you could call vmap_file() on every sub-range and use
>>> those addresses. It should work, we'd have to look at making udmabuf api's
>>> support it.
>> Hmmm, how to get contigous virtual address? Or there are a way to merge each split vmap's return address?
>>
> The patch in question maps whole file to continues memory as i see, but
> i can miss something. Partly populate technique requires to get an area
Hmm, maybe you missed ahead talk, I point above. :)
> and partly populate it.
>
> As i see we have something similar:
>
> <snip>
> /**
>   * vm_area_map_pages - map pages inside given sparse vm_area
>   * @area: vm_area
>   * @start: start address inside vm_area
>   * @end: end address inside vm_area
>   * @pages: pages to map (always PAGE_SIZE pages)
>   */
> int vm_area_map_pages(struct vm_struct *area, unsigned long start,
> 		      unsigned long end, struct page **pages)
> {
> ...
> <snip>
>
> it is used by the BPF.
>
> --
> Uladzislau Rezki


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 0/1] Introduce vmap_file()
  2025-04-01 11:09             ` Huan Yang
@ 2025-04-01 16:43               ` Uladzislau Rezki
  0 siblings, 0 replies; 11+ messages in thread
From: Uladzislau Rezki @ 2025-04-01 16:43 UTC (permalink / raw)
  To: Huan Yang
  Cc: Uladzislau Rezki, Vishal Moola (Oracle),
	linux-mm, Andrew Morton, Brendan Jackman, linux-kernel

On Tue, Apr 01, 2025 at 07:09:57PM +0800, Huan Yang wrote:
> 
> 在 2025/4/1 17:47, Uladzislau Rezki 写道:
> > On Tue, Apr 01, 2025 at 02:08:53PM +0800, Huan Yang wrote:
> > > 在 2025/4/1 11:19, Vishal Moola (Oracle) 写道:
> > > > On Tue, Apr 01, 2025 at 10:21:46AM +0800, Huan Yang wrote:
> > > > > 在 2025/4/1 09:50, Vishal Moola (Oracle) 写道:
> > > > > > On Mon, Mar 31, 2025 at 10:05:53AM +0800, Huan Yang wrote:
> > > > > > > HI Vishal,
> > > > > > > 
> > > > > > > 在 2025/3/29 05:13, Vishal Moola (Oracle) 写道:
> > > > > > > > Currently, users have to call vmap() or vmap_pfn() to map pages to
> > > > > > > > kernel virtual space. vmap_pfn() is for special pages (i.e. pfns
> > > > > > > > without struct page). vmap() handles normal pages.
> > > > > > > > 
> > > > > > > > With large folios, we may want to map ranges that only span
> > > > > > > > part of a folio (i.e. mapping half of a 2Mb folio).
> > > > > > > > vmap_file() will allow us to do so.
> > > > > > > You mention vmap_file can support range folio vmap, but when I look code, I can't figure out
> > > > > > > 
> > > > > > > how to use, maybe I missed something? :)
> > > > > > I took a look at the udma-buf code. Rather than iterating through the
> > > > > > folios using pfns, you can calculate the corresponding file offsets
> > > > > > (maybe you already have them?) to map the desired folios.
> > > > > Currently udmabuf folio's not simple based on file(even each memory from memfd). User can provide
> > > > > 
> > > > > random range of memfd  to udmabuf to use. For example:
> > > > > 
> > > > > We get a memfd maybe 4M, user split it into [0, 2M), [1M, 2M), [2M, 4M), so you can see 1M-2M range repeat.
> > > > > 
> > > > > This range can gathered by udmabuf_create_list, then udmabuf use it. So, udmabuf record it by folio array+offset array.
> Here, :)
> > > > I was thinking you could call vmap_file() on every sub-range and use
> > > > those addresses. It should work, we'd have to look at making udmabuf api's
> > > > support it.
> > > Hmmm, how to get contigous virtual address? Or there are a way to merge each split vmap's return address?
> > > 
> > The patch in question maps whole file to continues memory as i see, but
> > i can miss something. Partly populate technique requires to get an area
> Hmm, maybe you missed ahead talk, I point above. :)
>
I pointed to how BPF does it, probably it would just give you both some
extra input.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH v2 0/1] Introduce vmap_file()
  2025-04-01  6:08         ` Huan Yang
  2025-04-01  9:47           ` Uladzislau Rezki
@ 2025-04-01 17:31           ` Vishal Moola (Oracle)
  1 sibling, 0 replies; 11+ messages in thread
From: Vishal Moola (Oracle) @ 2025-04-01 17:31 UTC (permalink / raw)
  To: Huan Yang; +Cc: linux-mm, Andrew Morton, Brendan Jackman, linux-kernel

On Tue, Apr 01, 2025 at 02:08:53PM +0800, Huan Yang wrote:
> 
> 在 2025/4/1 11:19, Vishal Moola (Oracle) 写道:
> > On Tue, Apr 01, 2025 at 10:21:46AM +0800, Huan Yang wrote:
> > > 在 2025/4/1 09:50, Vishal Moola (Oracle) 写道:
> > > > On Mon, Mar 31, 2025 at 10:05:53AM +0800, Huan Yang wrote:
> > > > > HI Vishal,
> > > > > 
> > > > > 在 2025/3/29 05:13, Vishal Moola (Oracle) 写道:
> > > > > > Currently, users have to call vmap() or vmap_pfn() to map pages to
> > > > > > kernel virtual space. vmap_pfn() is for special pages (i.e. pfns
> > > > > > without struct page). vmap() handles normal pages.
> > > > > > 
> > > > > > With large folios, we may want to map ranges that only span
> > > > > > part of a folio (i.e. mapping half of a 2Mb folio).
> > > > > > vmap_file() will allow us to do so.
> > > > > You mention vmap_file can support range folio vmap, but when I look code, I can't figure out
> > > > > 
> > > > > how to use, maybe I missed something? :)
> > > > I took a look at the udma-buf code. Rather than iterating through the
> > > > folios using pfns, you can calculate the corresponding file offsets
> > > > (maybe you already have them?) to map the desired folios.
> > > Currently udmabuf folio's not simple based on file(even each memory from memfd). User can provide
> > > 
> > > random range of memfd  to udmabuf to use. For example:
> > > 
> > > We get a memfd maybe 4M, user split it into [0, 2M), [1M, 2M), [2M, 4M), so you can see 1M-2M range repeat.
> > > 
> > > This range can gathered by udmabuf_create_list, then udmabuf use it. So, udmabuf record it by folio array+offset array.
> > I was thinking you could call vmap_file() on every sub-range and use
> > those addresses. It should work, we'd have to look at making udmabuf api's
> > support it.
> 
> Hmmm, how to get contigous virtual address? Or there are a way to merge each split vmap's return address?

I'm not sure, I'd have to take a look at that. Maybe going into a large
folio world that might be a useful expansion on the APIs?

> IMO, user invoke vmap want to map each scatter memory into contigous virtual address, but as your suggestion,
> 
> I think can't to this. :)

We could discuss vmap_file() supporting a series of offsets to map
portions of a file; I think thats a reasonable ask for the general API.

We could potentially do multiple files as well, but things start getting
really complex at that point so I'd like to avoid that.

The Udma code looks to be doing some buggy stuff, so I'd prefer
we look at fixing/reworking those before hacking in a 'generic' API just
so they can keep doing that.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-04-01 17:31 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-28 21:13 [RFC PATCH v2 0/1] Introduce vmap_file() Vishal Moola (Oracle)
2025-03-28 21:13 ` [RFC PATCH v2 1/1] mm/vmalloc: " Vishal Moola (Oracle)
2025-03-31  2:05 ` [RFC PATCH v2 0/1] " Huan Yang
2025-04-01  1:50   ` Vishal Moola (Oracle)
2025-04-01  2:21     ` Huan Yang
2025-04-01  3:19       ` Vishal Moola (Oracle)
2025-04-01  6:08         ` Huan Yang
2025-04-01  9:47           ` Uladzislau Rezki
2025-04-01 11:09             ` Huan Yang
2025-04-01 16:43               ` Uladzislau Rezki
2025-04-01 17:31           ` Vishal Moola (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox