* [PATCH v3 bpf-next 0/3] mm: Cleanup and identify various users of kernel virtual address space
@ 2024-02-29 23:43 Alexei Starovoitov
2024-02-29 23:43 ` [PATCH v3 bpf-next 1/3] mm: Enforce VM_IOREMAP flag and range in ioremap_page_range Alexei Starovoitov
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Alexei Starovoitov @ 2024-02-29 23:43 UTC (permalink / raw)
To: bpf
Cc: daniel, andrii, torvalds, brho, hannes, lstoakes, akpm, urezki,
hch, boris.ostrovsky, sstabellini, jgross, linux-mm, xen-devel,
kernel-team
From: Alexei Starovoitov <ast@kernel.org>
v2 -> v3
- added Christoph's reviewed-by to patch 1
- cap commit log lines to 75 chars
- factored out common checks in patch 3 into helper
- made vm_area_unmap_pages() return void
There are various users of kernel virtual address space:
vmalloc, vmap, ioremap, xen.
- vmalloc use case dominates the usage. Such vm areas have VM_ALLOC flag
and these areas are treated differently by KASAN.
- the areas created by vmap() function should be tagged with VM_MAP
(as majority of the users do).
- ioremap areas are tagged with VM_IOREMAP and vm area start is aligned
to size of the area unlike vmalloc/vmap.
- there is also xen usage that is marked as VM_IOREMAP, but it doesn't
call ioremap_page_range() unlike all other VM_IOREMAP users.
To clean this up:
1. Enforce that ioremap_page_range() checks the range and VM_IOREMAP flag
2. Introduce VM_XEN flag to separate xen us cases from ioremap
In addition BPF would like to reserve regions of kernel virtual address
space and populate it lazily, similar to xen use cases.
For that reason, introduce VM_SPARSE flag and vm_area_[un]map_pages()
helpers to populate this sparse area.
In the end the /proc/vmallocinfo will show
"vmalloc"
"vmap"
"ioremap"
"xen"
"sparse"
categories for different kinds of address regions.
ioremap, xen, sparse will return zero when dumped through /proc/kcore
Alexei Starovoitov (3):
mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.
mm, xen: Separate xen use cases from ioremap.
mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages().
arch/x86/xen/grant-table.c | 2 +-
drivers/xen/xenbus/xenbus_client.c | 2 +-
include/linux/vmalloc.h | 6 +++
mm/vmalloc.c | 75 +++++++++++++++++++++++++++++-
4 files changed, 81 insertions(+), 4 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 bpf-next 1/3] mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.
2024-02-29 23:43 [PATCH v3 bpf-next 0/3] mm: Cleanup and identify various users of kernel virtual address space Alexei Starovoitov
@ 2024-02-29 23:43 ` Alexei Starovoitov
2024-02-29 23:43 ` [PATCH v3 bpf-next 2/3] mm, xen: Separate xen use cases from ioremap Alexei Starovoitov
2024-02-29 23:43 ` [PATCH v3 bpf-next 3/3] mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages() Alexei Starovoitov
2 siblings, 0 replies; 4+ messages in thread
From: Alexei Starovoitov @ 2024-02-29 23:43 UTC (permalink / raw)
To: bpf
Cc: daniel, andrii, torvalds, brho, hannes, lstoakes, akpm, urezki,
hch, boris.ostrovsky, sstabellini, jgross, linux-mm, xen-devel,
kernel-team
From: Alexei Starovoitov <ast@kernel.org>
There are various users of get_vm_area() + ioremap_page_range() APIs.
Enforce that get_vm_area() was requested as VM_IOREMAP type and range
passed to ioremap_page_range() matches created vm_area to avoid
accidentally ioremap-ing into wrong address range.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
mm/vmalloc.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d12a17fc0c17..f42f98a127d5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -307,8 +307,21 @@ static int vmap_range_noflush(unsigned long addr, unsigned long end,
int ioremap_page_range(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot)
{
+ struct vm_struct *area;
int err;
+ area = find_vm_area((void *)addr);
+ if (!area || !(area->flags & VM_IOREMAP)) {
+ WARN_ONCE(1, "vm_area at addr %lx is not marked as VM_IOREMAP\n", addr);
+ return -EINVAL;
+ }
+ if (addr != (unsigned long)area->addr ||
+ (void *)end != area->addr + get_vm_area_size(area)) {
+ WARN_ONCE(1, "ioremap request [%lx,%lx) doesn't match vm_area [%lx, %lx)\n",
+ addr, end, (long)area->addr,
+ (long)area->addr + get_vm_area_size(area));
+ return -ERANGE;
+ }
err = vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot),
ioremap_max_page_shift);
flush_cache_vmap(addr, end);
--
2.34.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 bpf-next 2/3] mm, xen: Separate xen use cases from ioremap.
2024-02-29 23:43 [PATCH v3 bpf-next 0/3] mm: Cleanup and identify various users of kernel virtual address space Alexei Starovoitov
2024-02-29 23:43 ` [PATCH v3 bpf-next 1/3] mm: Enforce VM_IOREMAP flag and range in ioremap_page_range Alexei Starovoitov
@ 2024-02-29 23:43 ` Alexei Starovoitov
2024-02-29 23:43 ` [PATCH v3 bpf-next 3/3] mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages() Alexei Starovoitov
2 siblings, 0 replies; 4+ messages in thread
From: Alexei Starovoitov @ 2024-02-29 23:43 UTC (permalink / raw)
To: bpf
Cc: daniel, andrii, torvalds, brho, hannes, lstoakes, akpm, urezki,
hch, boris.ostrovsky, sstabellini, jgross, linux-mm, xen-devel,
kernel-team
From: Alexei Starovoitov <ast@kernel.org>
xen grant table and xenbus ring are not ioremap the way arch specific code
is using it, so let's add VM_XEN flag to separate these use cases from
VM_IOREMAP users. xen will not and should not be calling
ioremap_page_range() on that range. /proc/vmallocinfo will print such
regions as "xen" instead of "ioremap".
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
arch/x86/xen/grant-table.c | 2 +-
drivers/xen/xenbus/xenbus_client.c | 2 +-
include/linux/vmalloc.h | 1 +
mm/vmalloc.c | 7 +++++--
4 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 1e681bf62561..b816db0349c4 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -104,7 +104,7 @@ static int arch_gnttab_valloc(struct gnttab_vm_area *area, unsigned nr_frames)
area->ptes = kmalloc_array(nr_frames, sizeof(*area->ptes), GFP_KERNEL);
if (area->ptes == NULL)
return -ENOMEM;
- area->area = get_vm_area(PAGE_SIZE * nr_frames, VM_IOREMAP);
+ area->area = get_vm_area(PAGE_SIZE * nr_frames, VM_XEN);
if (!area->area)
goto out_free_ptes;
if (apply_to_page_range(&init_mm, (unsigned long)area->area->addr,
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c
index 32835b4b9bc5..b9c81a2d578b 100644
--- a/drivers/xen/xenbus/xenbus_client.c
+++ b/drivers/xen/xenbus/xenbus_client.c
@@ -758,7 +758,7 @@ static int xenbus_map_ring_pv(struct xenbus_device *dev,
bool leaked = false;
int err = -ENOMEM;
- area = get_vm_area(XEN_PAGE_SIZE * nr_grefs, VM_IOREMAP);
+ area = get_vm_area(XEN_PAGE_SIZE * nr_grefs, VM_XEN);
if (!area)
return -ENOMEM;
if (apply_to_page_range(&init_mm, (unsigned long)area->addr,
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c720be70c8dd..71075ece0ed2 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -28,6 +28,7 @@ struct iov_iter; /* in uio.h */
#define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */
#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */
#define VM_ALLOW_HUGE_VMAP 0x00000400 /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */
+#define VM_XEN 0x00000800 /* xen grant table and xenbus use cases */
#if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
!defined(CONFIG_KASAN_VMALLOC)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f42f98a127d5..d53ece3f38ee 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3822,9 +3822,9 @@ long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
if (flags & VMAP_RAM)
copied = vmap_ram_vread_iter(iter, addr, n, flags);
- else if (!(vm && (vm->flags & VM_IOREMAP)))
+ else if (!(vm && (vm->flags & (VM_IOREMAP | VM_XEN))))
copied = aligned_vread_iter(iter, addr, n);
- else /* IOREMAP area is treated as memory hole */
+ else /* IOREMAP | XEN area is treated as memory hole */
copied = zero_iter(iter, n);
addr += copied;
@@ -4415,6 +4415,9 @@ static int s_show(struct seq_file *m, void *p)
if (v->flags & VM_IOREMAP)
seq_puts(m, " ioremap");
+ if (v->flags & VM_XEN)
+ seq_puts(m, " xen");
+
if (v->flags & VM_ALLOC)
seq_puts(m, " vmalloc");
--
2.34.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 bpf-next 3/3] mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages().
2024-02-29 23:43 [PATCH v3 bpf-next 0/3] mm: Cleanup and identify various users of kernel virtual address space Alexei Starovoitov
2024-02-29 23:43 ` [PATCH v3 bpf-next 1/3] mm: Enforce VM_IOREMAP flag and range in ioremap_page_range Alexei Starovoitov
2024-02-29 23:43 ` [PATCH v3 bpf-next 2/3] mm, xen: Separate xen use cases from ioremap Alexei Starovoitov
@ 2024-02-29 23:43 ` Alexei Starovoitov
2 siblings, 0 replies; 4+ messages in thread
From: Alexei Starovoitov @ 2024-02-29 23:43 UTC (permalink / raw)
To: bpf
Cc: daniel, andrii, torvalds, brho, hannes, lstoakes, akpm, urezki,
hch, boris.ostrovsky, sstabellini, jgross, linux-mm, xen-devel,
kernel-team
From: Alexei Starovoitov <ast@kernel.org>
vmap/vmalloc APIs are used to map a set of pages into contiguous kernel
virtual space.
get_vm_area() with appropriate flag is used to request an area of kernel
address range. It'se used for vmalloc, vmap, ioremap, xen use cases.
- vmalloc use case dominates the usage. Such vm areas have VM_ALLOC flag.
- the areas created by vmap() function should be tagged with VM_MAP.
- ioremap areas are tagged with VM_IOREMAP.
- xen use cases are VM_XEN.
BPF would like to extend the vmap API to implement a lazily-populated
sparse, yet contiguous kernel virtual space. Introduce VM_SPARSE flag
and vm_area_map_pages(area, start_addr, count, pages) API to map a set
of pages within a given area.
It has the same sanity checks as vmap() does.
It also checks that get_vm_area() was created with VM_SPARSE flag
which identifies such areas in /proc/vmallocinfo
and returns zero pages on read through /proc/kcore.
The next commits will introduce bpf_arena which is a sparsely populated
shared memory region between bpf program and user space process. It will
map privately-managed pages into a sparse vm area with the following steps:
// request virtual memory region during bpf prog verification
area = get_vm_area(area_size, VM_SPARSE);
// on demand
vm_area_map_pages(area, kaddr, kend, pages);
vm_area_unmap_pages(area, kaddr, kend);
// after bpf program is detached and unloaded
free_vm_area(area);
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/vmalloc.h | 5 ++++
mm/vmalloc.c | 59 +++++++++++++++++++++++++++++++++++++++--
2 files changed, 62 insertions(+), 2 deletions(-)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 71075ece0ed2..dfbcfb9f9a08 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -29,6 +29,7 @@ struct iov_iter; /* in uio.h */
#define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */
#define VM_ALLOW_HUGE_VMAP 0x00000400 /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */
#define VM_XEN 0x00000800 /* xen grant table and xenbus use cases */
+#define VM_SPARSE 0x00001000 /* sparse vm_area. not all pages are present. */
#if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \
!defined(CONFIG_KASAN_VMALLOC)
@@ -233,6 +234,10 @@ static inline bool is_vm_area_hugepages(const void *addr)
}
#ifdef CONFIG_MMU
+int vm_area_map_pages(struct vm_struct *area, unsigned long start,
+ unsigned long end, struct page **pages);
+void vm_area_unmap_pages(struct vm_struct *area, unsigned long start,
+ unsigned long end);
void vunmap_range(unsigned long addr, unsigned long end);
static inline void set_vm_flush_reset_perms(void *addr)
{
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d53ece3f38ee..dae98b1f78a8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -648,6 +648,58 @@ static int vmap_pages_range(unsigned long addr, unsigned long end,
return err;
}
+static int check_sparse_vm_area(struct vm_struct *area, unsigned long start,
+ unsigned long end)
+{
+ might_sleep();
+ if (WARN_ON_ONCE(area->flags & VM_FLUSH_RESET_PERMS))
+ return -EINVAL;
+ if (WARN_ON_ONCE(area->flags & VM_NO_GUARD))
+ return -EINVAL;
+ if (WARN_ON_ONCE(!(area->flags & VM_SPARSE)))
+ return -EINVAL;
+ if ((end - start) >> PAGE_SHIFT > totalram_pages())
+ return -E2BIG;
+ if (start < (unsigned long)area->addr ||
+ (void *)end > area->addr + get_vm_area_size(area))
+ return -ERANGE;
+ return 0;
+}
+
+/**
+ * vm_area_map_pages - map pages inside given sparse vm_area
+ * @area: vm_area
+ * @start: start address inside vm_area
+ * @end: end address inside vm_area
+ * @pages: pages to map (always PAGE_SIZE pages)
+ */
+int vm_area_map_pages(struct vm_struct *area, unsigned long start,
+ unsigned long end, struct page **pages)
+{
+ int err;
+
+ err = check_sparse_vm_area(area, start, end);
+ if (err)
+ return err;
+
+ return vmap_pages_range(start, end, PAGE_KERNEL, pages, PAGE_SHIFT);
+}
+
+/**
+ * vm_area_unmap_pages - unmap pages inside given sparse vm_area
+ * @area: vm_area
+ * @start: start address inside vm_area
+ * @end: end address inside vm_area
+ */
+void vm_area_unmap_pages(struct vm_struct *area, unsigned long start,
+ unsigned long end)
+{
+ if (check_sparse_vm_area(area, start, end))
+ return;
+
+ vunmap_range(start, end);
+}
+
int is_vmalloc_or_module_addr(const void *x)
{
/*
@@ -3822,9 +3874,9 @@ long vread_iter(struct iov_iter *iter, const char *addr, size_t count)
if (flags & VMAP_RAM)
copied = vmap_ram_vread_iter(iter, addr, n, flags);
- else if (!(vm && (vm->flags & (VM_IOREMAP | VM_XEN))))
+ else if (!(vm && (vm->flags & (VM_IOREMAP | VM_XEN | VM_SPARSE))))
copied = aligned_vread_iter(iter, addr, n);
- else /* IOREMAP | XEN area is treated as memory hole */
+ else /* IOREMAP | XEN | SPARSE area is treated as memory hole */
copied = zero_iter(iter, n);
addr += copied;
@@ -4418,6 +4470,9 @@ static int s_show(struct seq_file *m, void *p)
if (v->flags & VM_XEN)
seq_puts(m, " xen");
+ if (v->flags & VM_SPARSE)
+ seq_puts(m, " sparse");
+
if (v->flags & VM_ALLOC)
seq_puts(m, " vmalloc");
--
2.34.1
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-02-29 23:43 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-29 23:43 [PATCH v3 bpf-next 0/3] mm: Cleanup and identify various users of kernel virtual address space Alexei Starovoitov
2024-02-29 23:43 ` [PATCH v3 bpf-next 1/3] mm: Enforce VM_IOREMAP flag and range in ioremap_page_range Alexei Starovoitov
2024-02-29 23:43 ` [PATCH v3 bpf-next 2/3] mm, xen: Separate xen use cases from ioremap Alexei Starovoitov
2024-02-29 23:43 ` [PATCH v3 bpf-next 3/3] mm: Introduce VM_SPARSE kind and vm_area_[un]map_pages() Alexei Starovoitov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox