From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F408FC48260 for ; Thu, 8 Feb 2024 18:29:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 80C226B007D; Thu, 8 Feb 2024 13:29:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7BC406B007E; Thu, 8 Feb 2024 13:29:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 683CE6B0081; Thu, 8 Feb 2024 13:29:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 506356B007D for ; Thu, 8 Feb 2024 13:29:49 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0E23A1209B7 for ; Thu, 8 Feb 2024 18:29:49 +0000 (UTC) X-FDA: 81769475298.13.DAEE8D3 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf23.hostedemail.com (Postfix) with ESMTP id 4BB9E140002 for ; Thu, 8 Feb 2024 18:29:47 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=H4C3ZZjQ; spf=pass (imf23.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707416987; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yiy2726XY6zxZ3MywguUuT6LWPXhrHbitdbUp49Tq/4=; b=OwVGEO22l+I1VLg6Z73dUWF5l6+p4sy2m6wkgoaB1kmIez66teLroMdtlCyFYxaplx+z4q qolN1W/hJmVHORtN4FJpHrCF2+GAQOdtNJ0OCMCz9lPklZ4MtZGLo4HxI0YMH9wd2FPOu2 EzkBj5Z1upyE0m/ByypCagsNbF4wFAg= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=H4C3ZZjQ; spf=pass (imf23.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707416987; a=rsa-sha256; cv=none; b=JwS++rhHiTUhFGSxSdDmn3AXepVuv/LxWtT1ZulrehGT9SWj+MvH1wEnKYR57LxesQp1SH 6uo7UFqUcorp4mqcu7stx7qs5NkdhspmVsJKvVLQ+YISP2pSkAKKn3VeTGRYOTscghAC3C 3mUuhg2/s3bNMa4mR/eG6VjX8WuXk1k= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-1d7354ba334so707755ad.1 for ; Thu, 08 Feb 2024 10:29:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1707416986; x=1708021786; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Yiy2726XY6zxZ3MywguUuT6LWPXhrHbitdbUp49Tq/4=; b=H4C3ZZjQF8OwgPmModqI+qDFGYu/gs8xDUCtVVj65s2LjCymY5oYHYrpRVF8AEG3X6 q8395DHy+bLiNtCZdCarqo25j2Xmi9CO3ctYAh3kwC//4u3jIMIIaLEoCKmeGAygSC5E uxNo62Cm714WGCJbWCXJ0e5zufePEr4yDKmINkrspuE2Pq3e0WM0FNPNVCj+kLEracsB hLYWX4ZYBgrz6bznd+CYhqw4PNmFWml/e3Mhm3g4+MEEBbO8h1oK+mWfiM9Pxaodur8R Iec630xp6SHU+HwOCQPPdC1z04rWisnGaDS0nAlyRbFGLAlXsgJkxknnIERZ4QM1I9/L w84g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707416986; x=1708021786; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Yiy2726XY6zxZ3MywguUuT6LWPXhrHbitdbUp49Tq/4=; b=eyYOUMYkpomnuMjSfoLjdrjNtu2pMVFOJWYc/a1bypyIutUwhz2sf7VmsTT/PpMsyD 0GhGsf/9Jgs9GMwdIDfStePE7QGEt0j3itEr6FuIGovJHZT45ie0xbW5PgQLu8hYbP2l OF3ss1he4feO/ae4GJoXZX0eHJHWXM/yolD7DxeHRHXSwGdycHVHHZBxY6Uo88qCAfry Fvb7AwdYcDGt80zrz6edVjJHltGWD8knj1vVnz4wEylImMW1HOCz5YSIkhRq99G/CC5p g8TSjGs6Ea9cKjv4aT6Mlab+AJNFJZ5E+RykUScCV4L+F/Wp1a5yO027tg8ijCnDjzIM pUig== X-Gm-Message-State: AOJu0YwrnDn7oKAFd9esKwjuwPXsU6enUw7wagLO3p2Np/6Fv1jdqtPJ 1G0QqT8jEibd+mZP10TcnZAePQJtq62to44CMF6uwSx6af4dr+bqfRsgapv9yjmi77QJoE1PJq5 Y4ZfOdVp5cqwbHzRBdW1DINsLlh4= X-Google-Smtp-Source: AGHT+IGSxuSi3Hxfhlfc15XElQTc98pCE+ECL/SI36tLlqv4n4K+6Jzk4NXQC2Se99qz+oxjY+aI5NTOf4AKtMQsd/0= X-Received: by 2002:a17:90b:1012:b0:296:f417:4b79 with SMTP id gm18-20020a17090b101200b00296f4174b79mr99800pjb.11.1707416986028; Thu, 08 Feb 2024 10:29:46 -0800 (PST) MIME-Version: 1.0 References: <20240206220441.38311-1-alexei.starovoitov@gmail.com> <20240206220441.38311-12-alexei.starovoitov@gmail.com> In-Reply-To: From: Andrii Nakryiko Date: Thu, 8 Feb 2024 10:29:34 -0800 Message-ID: Subject: Re: [PATCH bpf-next 11/16] libbpf: Add support for bpf_arena. To: Alexei Starovoitov Cc: bpf , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Kumar Kartikeya Dwivedi , Eddy Z , Tejun Heo , Barret Rhoden , Johannes Weiner , linux-mm , Kernel Team Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4BB9E140002 X-Rspam-User: X-Stat-Signature: uy4dzhmko5y8xjh4tb6178987nw5ue3m X-Rspamd-Server: rspam01 X-HE-Tag: 1707416987-208044 X-HE-Meta: U2FsdGVkX1/r+EviaEiZMneDpB9PYbyDR0SkEBOCbMs2UMlHAH+nqhmoEWDqKm1RXLDVBipWNUXXZnA1niieEscrzQUsWbF5tQ+ZLLh+N5ResGh/iyQ9H/TTustrImYd3SDL1EyKVOZV2d/vAf0X/Oo2ZV9+tDkexOvBspzNH/KtZtl1yJkECnedWFSWftwHLRbS27XEp7Hmqla7f34ze2ZYzh4/NqJcAg8a9zyKE4yoHf9j0us7MmWUlR2HIaMPVYYZCcFrO1IbSWnmEaN3q0OsD/WzQEMNFU7Ww6WdcSBYCQ6adTbb7uCgt+1dnNMSVccYNbv/4SWqx8evpmA38JCECkwipSRP4bETMPSVz0j0hVXm8cFYdTu33hNj/go3FsnzKsE6VlhDC5IdErkkQZpEk9SJL0cZzEZNi/E2N2ae3iJkHDYEJQGojpM5nGc3JxqJMzDiPkm8fU6SPVGYGkqwujPgam9eQA1Ndp6UzUJYugUIB2CPZo8PCzZeDrH2D7JdRJ5scZAaoBDH3U/0DkJoslp4KNNIey0pMyfV8DdZmlb4IRK43Jlk7aESPI29qydfux8IyTIdmdRt4D81n7r+uIKK7s9Z6qVa0k6bi7VShA772dBJvIvbL1Zepwokd4XgX2g7l8KE8vY1+NBNlWKbVzFzuDexj+OKIMsf7IN6BkMBQcdGx08Rt92B3tYwNYh9PIDQm0e0wj2gzLHxqbSCVQyK/U5LfMzYAeEom4kdWpnvjmCOQhOn5r9HMeOX2N4se6OsTrfo2jb/AJsJ5zqLXMnMe3aLOxsw16RpFPSnHgLCUD6+Bn/DzauF6l04g4i89hac1O27qU5GP75YxHqTCGc69BuEEfRt9nadB8Fvq5NubxEg5co9h2BGbTc/JQYW0s4gBtZfaIv5rQfqXSRr/7hDvF5KRUA79U0tub6p1i2d01EZEz+yWbZdOe7tfPHnc0Ei2Et/40g0Cgx iHPq0WtL FKwpRgLfWodsh9kEpmtzir/hkIkE53aSbD/3Pwb2QEaLKCFShiXWEII5QJKu0nrVdzJG6ZeB7qwyOsYt/K7lbBXCJIhvT7KZHLGowGgDDgDBtWUubHUs7Aei7W//UWboGdnVi3deYVBVeVq/aA3KIME3AppZaLzCOdp2nO/wYwPRKE9HP6oWCId8LahJ+v5aRSfDvCJbs9rqm+O+2GiEGP/G9qqiDUEQ1xGytG5+MSUjp6NF6jOLyKelhOzKqAYzAp8TLyaGKbSWkLJEjpZSToIKJ4dTeoIZ9o6M8V3oAIWq7Gi7HC6YPkIrBFOcjXU8A+wfOsn87EGv9FoK/jwEzOrfe36HRNrf7G1PSIlYdYejs0lplibLOuTerB1vDVl8wx1f6uubd3FyA23olZYMOLgUyo85X74AuBiu7XbjdemzDb25leqGEpugss5zb4z6yEdW2KdM9L93IniOZ4MdhZbHlUFD+IGl+Ys6Oc55RqeAK8k8fEC9hxMsGBztIi0xz3imrOK+yUTbX5lQSLFR1LCer/mZtFZkEJoulD14An5OLDrVeor5q3+HLzS5t4i1gJAoE0taRwcTXa2U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 7, 2024 at 5:38=E2=80=AFPM Alexei Starovoitov wrote: > > On Wed, Feb 7, 2024 at 5:15=E2=80=AFPM Andrii Nakryiko > wrote: > > > > On Tue, Feb 6, 2024 at 2:05=E2=80=AFPM Alexei Starovoitov > > wrote: > > > > > > From: Alexei Starovoitov > > > > > > mmap() bpf_arena right after creation, since the kernel needs to > > > remember the address returned from mmap. This is user_vm_start. > > > LLVM will generate bpf_arena_cast_user() instructions where > > > necessary and JIT will add upper 32-bit of user_vm_start > > > to such pointers. > > > > > > Use traditional map->value_size * map->max_entries to calculate mmap = sz, > > > though it's not the best fit. > > > > We should probably make bpf_map_mmap_sz() aware of specific map type > > and do different calculations based on that. It makes sense to have > > round_up(PAGE_SIZE) for BPF map arena, and use just just value_size or > > max_entries to specify the size (fixing the other to be zero). > > I went with value_size =3D=3D key_size =3D=3D 8 in order to be able to ex= tend > it in the future and allow map_lookup/update/delete to do something > useful. Ex: lookup/delete can behave just like arena_alloc/free_pages. > > Are you proposing to force key/value_size to zero ? Yeah, I was thinking either (value_size=3D and max_entries=3D0) or (value_size=3D0 and max_entries=3D). The latter is what we do for BPF ringbuf, for example. What you are saying about lookup/update already seems different from any "normal" map anyways, so I'm not sure that's a good enough reason to have hard-coded 8 for value size. And it seems like in practice instead of doing lookup/update through syscall, the more natural way of working with arena is going to be mmap() anyways, so not even sure we need to implement the syscall side of lookup/update. But just as an extra aside, what you have in mind for lookup/update for the arena map can be generalized into "partial lookup/update" for any map where it makes sense. I.e., instead of expecting the user to read/update the entire value size, we can allow them to provide a subrange to read/update (i.e., offset+size combo to specify subrange within full map value range). This will work for the arena, but also for most other maps (if not all) that currently support LOOKUP/UPDATE. but specifically for bpf_map_mmap_sz(), regardless of what we decide we should still change it to be something like: switch (map_type) { case BPF_MAP_TYPE_ARRAY: return ; case BPF_MAP_TYPE_ARENA: /* round up to page size */ return round_up(, page_size); default: return 0; /* not supported */ } we can also add a still different case for RINGBUF, where it's (2 * max_entries). The general point is that mmapable size rules differ by map type, so we best express this explicitly in this helper. > That was my first attempt. > key_size can be zero, but syscall side of lookup/update expects > a non-zero value_size for all maps regardless of type. > We can modify bpf/syscall.c, of course, but it feels arena would be > too different of a map if generic map handling code would need > to be specialized. > > Then since value_size is > 0 then what sizes make sense? > When it's 8 it can be an indirection to anything. > key/value would be user pointers to other structs that > would be meaningful for an arena. > Right now it costs nothing to force both to 8 and pick any logic > when we decide what lookup/update should do. > > But then when value_size =3D=3D 8 than making max_entries to > mean the size of arena in bytes or pages.. starting to look odd > and different from all other maps. > > We could go with max_entries=3D=3D0 and value_size to mean the size of > arena in bytes, but it will prevent us from defining lookup/update > in the future, which doesn't feel right. > > Considering all this I went with map->value_size * map->max_entries choic= e. > Though it's not pretty. > > > > @@ -4908,6 +4910,22 @@ static int bpf_object__create_map(struct bpf_o= bject *obj, struct bpf_map *map, b > > > if (map->fd =3D=3D map_fd) > > > return 0; > > > > > > + if (def->type =3D=3D BPF_MAP_TYPE_ARENA) { > > > + size_t mmap_sz; > > > + > > > + mmap_sz =3D bpf_map_mmap_sz(def->value_size, def->max= _entries); > > > + map->mmaped =3D mmap((void *)map->map_extra, mmap_sz,= PROT_READ | PROT_WRITE, > > > + map->map_extra ? MAP_SHARED | MAP_= FIXED : MAP_SHARED, > > > + map_fd, 0); > > > + if (map->mmaped =3D=3D MAP_FAILED) { > > > + err =3D -errno; > > > + map->mmaped =3D NULL; > > > + pr_warn("map '%s': failed to mmap bpf_arena: = %d\n", > > > + bpf_map__name(map), err); > > > + return err; > > > > leaking map_fd here, you need to close(map_fd) before erroring out > > ahh. good catch.