From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61DD7C48BEB for ; Wed, 21 Feb 2024 19:05:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AC7D26B007E; Wed, 21 Feb 2024 14:05:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A78CE6B0080; Wed, 21 Feb 2024 14:05:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 918566B0082; Wed, 21 Feb 2024 14:05:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8317C6B007E for ; Wed, 21 Feb 2024 14:05:24 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 239CB160B00 for ; Wed, 21 Feb 2024 19:05:24 +0000 (UTC) X-FDA: 81816739368.28.DA3C550 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by imf24.hostedemail.com (Postfix) with ESMTP id 68D76180010 for ; Wed, 21 Feb 2024 19:05:22 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kl70bU9H; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708542322; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Sja2zbt3rEOa8Avi/qXGyDB6MFdPpzWjHQILVKnQ8bk=; b=sfJKItQWvaAvDcOez/4mAi1PiGS6DiA/39wa6/yP3vTu6ZeApikaxB/U5quTeI+oW7LHYD wTNAaNYJPhFpDQZxBO7vADViZirVAVsIAWQv0PTcjK0rrzgqFHGfyiUyURkiFhozBsci5f oa027FrqUsjYlybdoqmyQQFkPdTndno= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=kl70bU9H; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708542322; a=rsa-sha256; cv=none; b=pMZeWA0qb6YW5GRYrdWDWLaH4oPAIylDBISyrCtXxMd4XFKv73chN+m0mh2Zewta7Cfxg8 P61ncp1cnUxuOB6wirVJ64i/mvJDClyk0RxMBB2ydcMHojU4aXi4KHrOQZFL6D5dAY9qfy BJIOw/gPhTd+CunU+YpE9ztCiBjE02E= Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-33d32f74833so718591f8f.3 for ; Wed, 21 Feb 2024 11:05:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708542321; x=1709147121; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Sja2zbt3rEOa8Avi/qXGyDB6MFdPpzWjHQILVKnQ8bk=; b=kl70bU9HmIyLT3K7nDIGRLyFiVvNsIQvL+bkbp760mqg4J4WkS7FxZZAeIcn71+XAe ORVOgoysMar7t0dddkICsY47aJavTs7nOxZ+GAqd1L4hjSaTlLLBcUidFA85p2ITgsnr 6LCqH6kaPeJT7uqo7r6rnMP8706LlnzyjD1s58k8cuOPeOm29eDDvQFqN0jLsr1ElJwX NCJOSYcppWRSkWYTnoJi3xG9wQ2Sfy/Xq4UYvgBIQdQMeuPNvRZz3UHCvB6EtU/nhimr b1wTgqe4f/F32/ZHdMDJHNs6CZbvU38eS4ML6GHQ3mqGiYeX7hytqLT+kBaYnZheMylK we1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708542321; x=1709147121; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Sja2zbt3rEOa8Avi/qXGyDB6MFdPpzWjHQILVKnQ8bk=; b=IjhQXKyW2CPE2/dTuYQ6Q6WZtotjD1zar1cJf0xKA65YB7xtVjV0ZnVXgxnBq+iul2 32u7iFH51LaD4Zhc1ZTpWiR8hRCfwSKnuB92Oz2jNh8bi5btcwgKUk4igUPBwLT6sLq+ hTQiDn8Tk2BzYwzyj3f6+sNC7a1Oa2q/WZwFTqL5G+pvFoUNHd75eTu4qtfkBtCYQmuZ FgkwQEGaTTCLhT9r87CM84OAP55ymStBXl/Isb7Hjt5aq0S9NQlihSf+NWeso8Fa1C1o wRAEN2UMzHAYFCy4o/PNiyPkW4JRHHo7DubD01SoHkZ26+5mM/eLI2ZAzBVWlvrEqEzi JfXQ== X-Forwarded-Encrypted: i=1; AJvYcCXzdyGtZMbF0ekMBiazhZSb7Nd58y0qmr7s9sQEoRse4eS98xaCfOFqU8Kf1M6W5PvdXT7fdZlV/3l+ylABX3CxrZ4= X-Gm-Message-State: AOJu0YwmqKKn58HvA1PHdzU6NKlXdIiyYiu53rC9CwxgpdDDpT7feQws sZMTCl+Mthcl3pll95vsZnDCnHHCwv6qg+tKEimiae4z54+ftks61W7NwsWRsJzTtq2k1qTXK0B IZcOm/nGzNhm3OjQBMdzx3B0EOFU= X-Google-Smtp-Source: AGHT+IH4OpfT+YGkYMhhDR2Euz6SXzY0iXevzboZ2yy0+99M/24qMSrjS4feQAc65Yt0v7PKH3ikGlt3+TSDGVKxUiM= X-Received: by 2002:a5d:5402:0:b0:33d:3b19:a2c3 with SMTP id g2-20020a5d5402000000b0033d3b19a2c3mr8327469wrv.57.1708542320538; Wed, 21 Feb 2024 11:05:20 -0800 (PST) MIME-Version: 1.0 References: <20240220192613.8840-1-alexei.starovoitov@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Wed, 21 Feb 2024 11:05:09 -0800 Message-ID: Subject: Re: [PATCH bpf-next] mm: Introduce vm_area_[un]map_pages(). To: Christoph Hellwig Cc: bpf , Daniel Borkmann , Andrii Nakryiko , Linus Torvalds , Barret Rhoden , Johannes Weiner , Lorenzo Stoakes , Andrew Morton , Uladzislau Rezki , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 68D76180010 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: t9a7kcpnmd9f8unsa5pdwqnkju45ocq6 X-HE-Tag: 1708542322-641224 X-HE-Meta: U2FsdGVkX1/GYfNm+Pm2bMw1jaLoUHpReLss26PTMTiPdMZCTzXfZb7KrjM22dwx7ZfGAnWX1bRyApxuczBY21eZJ5rIYC2DA/YrTY/4YEM8gY2AaRRjJ9y00qMP9K49oyxRleJgorNok1o5eqHmM5M+IwPjKykgvwruHCkjc75Av0nxbWdjrT6PSZnhpn1uNYWREPWihsWtth6Pn7+9A1p3egASvVUhs5PJL11L8aEbRFq33T+IsX/HXwhkos9U7Rk650Iqdail6k3Djh5qu9BsAyHGiPm30Xu23HV1CbdgIPqeYogrD6uCQ9qH4dLA7waFeq9s8iHxcUkQ+tFW3LBIKXKLyEaJ5Gk+Br/6W+/XjGlAPaVU89tiZJqJzPSS1kqivokMo8pdGll65MftD0FONHTTPkGa1XF0xSjLJowJiOOKkYVT6LTaMfl292K4rvJLwhQfJVsIjc7rp+aVSqHjONJUjAUMHZwLWSysQlwhB/pUXwSMHoJzXw1casouzB0MLeviga+/aqg1900fMzAmy8f1IXhS+M1hjL7wMMuyV6ZyegzQHZ8upXs+oAe4SAaXnkZ3aYVadMUGMRs7lOkFzDRUxnEFZj0QqmIhWYObv6QFPpw7SSapQptjW1UK9QcwkU8CGMSp3zJfcIB2n3wIyIC1Ok5KYeyzg6BdBKEJAi6BMZdL1z5h0hram50K4kSagxiJ69IJAPRK9qVsbOGhnJoX5t4zSb3FInASDqQPdKov5s8vzMC+aF8/OC9lZsUhadmJ+DbaIgGejJ8JgjgLo8uAwAugHzHJXitQ6QA6uEfWh0Dv9cd5zEW/aqbNsNcrGNJVwI6J9EVBZkCpIyubXolX/pKe5uCl+RS/Dtocm+0ZnIH32+PZAZa6Yd9Mr9TekJlit2uIaqnFfUeSxSpE/k8DSCuNdTNAkfREbmC7rVF3bVTW8vOS9bADg67qP8wzgh/CVFIMQirLV9g AyUs8Wxk k52Yyn24aGBDOi2WjLPv9qbxqRTW+1MTjLcgM7fax2575romvepTEO12kbQNnAbZNeQVP4wWgPMeVo5SPosrpEpYEQxX0LtICMUUEYSeJQZNpIc9uztLKaXUCUA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 20, 2024 at 9:52=E2=80=AFPM Christoph Hellwig wrote: > > On Tue, Feb 20, 2024 at 11:26:13AM -0800, Alexei Starovoitov wrote: > > From: Alexei Starovoitov > > > > vmap() API is used to map a set of pages into contiguous kernel virtual= space. > > > > BPF would like to extend the vmap API to implement a lazily-populated > > contiguous kernel virtual space which size and start address is fixed e= arly. > > > > The vmap API has functions to request and release areas of kernel addre= ss space: > > get_vm_area() and free_vm_area(). > > As said before I really hate growing more get_vm_area and > free_vm_area outside the core vmalloc code. We have a few of those > mostly due to ioremap (which is beeing consolidate) and executable code > allocation (which there have been various attempts at consolidation, > and hopefully one finally succeeds..). So let's take a step back and > think how we can do that without it. There are also xen grant tables that grab the range with get_vm_area(), but manage it on their own. It's not an ioremap case. It looks to me the vmalloc address range has different kinds of areas already: vmalloc, vmap, ioremap, xen. Maybe we can do: diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 7d112cc5f2a3..633c7b643daa 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -28,6 +28,7 @@ struct iov_iter; /* in uio.h */ #define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */ #define VM_ALLOW_HUGE_VMAP 0x00000400 /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */ +#define VM_BPF 0x00000800 /* bpf_arena pages */ +static inline struct vm_struct *get_bpf_vm_area(unsigned long size) +{ + return get_vm_area(size, VM_BPF); +} and enforce that flag in vm_area_[un]map_pages() ? vmallocinfo can display it or skip it. Things like find_vm_area() can do something different with such an area (if that was the concern). > For the dynamically growing part do you need a special allocator or > can we just go straight to the page allocator and implement this > in common code? It's a bit special allocator that is using maple tree to manage range within 4G region and alloc_pages_node(GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT) to grab pages. With extra dance: memcg =3D bpf_map_get_memcg(map); old_memcg =3D set_active_memcg(memcg); to make sure memcg accounting is done the common way for all bpf maps. The tricky bpf specific part is a computation of pgoff, since it's a shared memory region between user space and bpf prog. The lower 32-bits of the pointer have to be the same for user space and bpf= . Not much changed in the patch since the earlier thread. Either find it in your email or here: https://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git/commit/?h=3Dare= na&id=3D364c9b5d233d775728ec2bf3b4168fa6909e58d1 Are you suggesting the api like: struct vm_struct *area =3D get_sparse_vm_area(size); vm_area_alloc_pages(struct vm_struct *area, ulong addr, int page_cnt, int numa_id); and vm_area_alloc_pages() will allocate pages and vmap_pages_range() them while all code in mm/vmalloc.c ? I can give it a shot. The ugly part is bpf_map_get_memcg() would need to be passed in somehow. Another bpf specific bit is the guard pages before and after 4G range and such vm_area_alloc_pages() would need to skip them. > > For BPF use case the area_size will be 4Gbyte plus 64Kbyte of guard pag= es and > > area->addr known and fixed at the program verification time. > > How is this ever going to to work on 32-bit platforms? bpf_arena requires 64bit and mmu. ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy) obj-$(CONFIG_BPF_SYSCALL) +=3D arena.o endif and special JIT support too. With bpf_arena we can finally deprecate a bunch of things like bloom filter bpf map, etc.