linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: bpf <bpf@vger.kernel.org>, Daniel Borkmann <daniel@iogearbox.net>,
	 Andrii Nakryiko <andrii@kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	 Barret Rhoden <brho@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Lorenzo Stoakes <lstoakes@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Uladzislau Rezki <urezki@gmail.com>,
	linux-mm <linux-mm@kvack.org>, Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next] mm: Introduce vm_area_[un]map_pages().
Date: Wed, 21 Feb 2024 11:05:09 -0800	[thread overview]
Message-ID: <CAADnVQLrUpJizoVeYjFwSEPg1Eo=ACR-Gf5LWhD=c-KnnOTKuA@mail.gmail.com> (raw)
In-Reply-To: <ZdWPjmwi8D0n01HP@infradead.org>

On Tue, Feb 20, 2024 at 9:52 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Feb 20, 2024 at 11:26:13AM -0800, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > vmap() API is used to map a set of pages into contiguous kernel virtual space.
> >
> > BPF would like to extend the vmap API to implement a lazily-populated
> > contiguous kernel virtual space which size and start address is fixed early.
> >
> > The vmap API has functions to request and release areas of kernel address space:
> > get_vm_area() and free_vm_area().
>
> As said before I really hate growing more get_vm_area and
> free_vm_area outside the core vmalloc code.  We have a few of those
> mostly due to ioremap (which is beeing consolidate) and executable code
> allocation (which there have been various attempts at consolidation,
> and hopefully one finally succeeds..).  So let's take a step back and
> think how we can do that without it.

There are also xen grant tables that grab the range with get_vm_area(),
but manage it on their own. It's not an ioremap case.
It looks to me the vmalloc address range has different kinds of areas
already: vmalloc, vmap, ioremap, xen.

Maybe we can do:
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 7d112cc5f2a3..633c7b643daa 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -28,6 +28,7 @@ struct iov_iter;              /* in uio.h */
 #define VM_MAP_PUT_PAGES       0x00000200      /* put pages and free
array in vfree */
 #define VM_ALLOW_HUGE_VMAP     0x00000400      /* Allow for huge
pages on archs with HAVE_ARCH_HUGE_VMALLOC */
+#define VM_BPF                 0x00000800      /* bpf_arena pages */

+static inline struct vm_struct *get_bpf_vm_area(unsigned long size)
+{
+       return get_vm_area(size, VM_BPF);
+}

and enforce that flag in vm_area_[un]map_pages() ?

vmallocinfo can display it or skip it.
Things like find_vm_area() can do something different with such an area
(if that was the concern).

> For the dynamically growing part do you need a special allocator or
> can we just go straight to the page allocator and implement this
> in common code?

It's a bit special allocator that is using maple tree to manage
range within 4G region and
alloc_pages_node(GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT)
to grab pages.
With extra dance:
        memcg = bpf_map_get_memcg(map);
        old_memcg = set_active_memcg(memcg);
to make sure memcg accounting is done the common way for all bpf maps.

The tricky bpf specific part is a computation of pgoff,
since it's a shared memory region between user space and bpf prog.
The lower 32-bits of the pointer have to be the same for user space and bpf.

Not much changed in the patch since the earlier thread.
Either find it in your email or here:
https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=arena&id=364c9b5d233d775728ec2bf3b4168fa6909e58d1

Are you suggesting the api like:

struct vm_struct *area = get_sparse_vm_area(size);
vm_area_alloc_pages(struct vm_struct *area, ulong addr, int page_cnt,
int numa_id);

and vm_area_alloc_pages() will allocate pages and vmap_pages_range()
them while all code in mm/vmalloc.c ?

I can give it a shot.

The ugly part is bpf_map_get_memcg() would need to be passed in somehow.

Another bpf specific bit is the guard pages before and after 4G range
and such vm_area_alloc_pages() would need to skip them.

> > For BPF use case the area_size will be 4Gbyte plus 64Kbyte of guard pages and
> > area->addr known and fixed at the program verification time.
>
> How is this ever going to to work on 32-bit platforms?

bpf_arena requires 64bit and mmu.

ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy)
obj-$(CONFIG_BPF_SYSCALL) += arena.o
endif

and special JIT support too.

With bpf_arena we can finally deprecate a bunch of things like bloom
filter bpf map, etc.


  reply	other threads:[~2024-02-21 19:05 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-20 19:26 Alexei Starovoitov
2024-02-21  5:52 ` Christoph Hellwig
2024-02-21 19:05   ` Alexei Starovoitov [this message]
2024-02-22 23:25     ` Alexei Starovoitov
2024-02-24  0:00       ` Alexei Starovoitov
2024-02-23 17:14     ` Christoph Hellwig
2024-02-23 17:27       ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAADnVQLrUpJizoVeYjFwSEPg1Eo=ACR-Gf5LWhD=c-KnnOTKuA@mail.gmail.com' \
    --to=alexei.starovoitov@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brho@google.com \
    --cc=daniel@iogearbox.net \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=kernel-team@fb.com \
    --cc=linux-mm@kvack.org \
    --cc=lstoakes@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=urezki@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox