From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B55D6C00140 for ; Tue, 2 Aug 2022 04:55:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E365B6B0071; Tue, 2 Aug 2022 00:55:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE6226B0072; Tue, 2 Aug 2022 00:55:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C604D8E0001; Tue, 2 Aug 2022 00:55:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B269C6B0071 for ; Tue, 2 Aug 2022 00:55:37 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 680FA1A0699 for ; Tue, 2 Aug 2022 04:55:37 +0000 (UTC) X-FDA: 79753439514.06.8D20667 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf16.hostedemail.com (Postfix) with ESMTP id E7ED3180076 for ; Tue, 2 Aug 2022 04:55:36 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id b10so12601900pjq.5 for ; Mon, 01 Aug 2022 21:55:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=CfmpytoP729KBrN0HCzeLOcgvNe2bvNCsG8itxfX62A=; b=HutUsTmKqv1x84jl4mEIEyI3ISfnr19onU2OnC3p4Wu9LqhA/aldHwYOntlFOHEJ/c erYKWaw+04Bh0o3tWH8mFSR1GeEtQHps708aVOr9jybaBNoSdelKef28JcAyHME7Wj9V a0igeAWQ2inL0sN9BUS6LjKtvuEQ8C69IVgpz+hyv2nkV06hcrKuZHDdWTTAVJBeRmwY mmdP8kc68jZAYgcfp5Lag/6UlcUN9hgeqWNujtpjisUtNBh9ee5+r/gKzZMmw9zyw7NB c/JV+uFCnblP2k3htlaOBx6Nhh+2xIR5zwVNDRIsdkodKzaW8JKeBRffYOWUTNZIPjAf k6ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=CfmpytoP729KBrN0HCzeLOcgvNe2bvNCsG8itxfX62A=; b=p6XgcDmdLXr7k5du9C7FKxU2UW85w0PLp+kD1wvTk5yr4+v39Bb7vfLWeLFSQ5lUPe 6L8REKdjoVJ8CNQzZV6XySvHUzCRVmKrFCMW+PuUgQW51QqwhlWERDFXzsSSAqkpkaR0 82BGQebTut/9cjfIh8aQDtrAh/6NWGDRTGX6Y5dkD5vp7LOVc3n7lDcSH7yGE8ecXRUh 29QNO7SWSVH5KVqE0fylr2UePWbV3rNdx7czeJE+Cq83NZFlBWMpOTHsRqhRwwbqE8X7 7pqKSsmy+ePinX0N7NZiaQrT5hjMc2WrOe3f36bwkMlgm9G5MhCcJbCKzXPApa+UTXIi rMZg== X-Gm-Message-State: ACgBeo3w6tyP17g/DOHaSfA6PCtlxTxFxLFssH3eD1V1VK9p4qTT5Gi6 4wrdgfGdRzaUgxR8a81pwaA= X-Google-Smtp-Source: AA6agR5abzkookXYg+7B+7RpeEfKg6JXotSBK2J6TwDpU2O6h3YFjqgHl6wqe1OHdaTr3EOlS/w0Xw== X-Received: by 2002:a17:902:db06:b0:16e:e5fc:56d8 with SMTP id m6-20020a170902db0600b0016ee5fc56d8mr9232424plx.100.1659416135598; Mon, 01 Aug 2022 21:55:35 -0700 (PDT) Received: from macbook-pro-3.dhcp.thefacebook.com ([2620:10d:c090:400::5:f128]) by smtp.gmail.com with ESMTPSA id v10-20020a1709028d8a00b0016bfbd99f64sm10537070plo.118.2022.08.01.21.55.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Aug 2022 21:55:35 -0700 (PDT) Date: Mon, 1 Aug 2022 21:55:31 -0700 From: Alexei Starovoitov To: Yafang Shao Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, songmuchun@bytedance.com, akpm@linux-foundation.org, netdev@vger.kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH bpf-next 15/15] bpf: Introduce selectable memcg for bpf map Message-ID: <20220802045531.6oi2pt3fyjhotmjo@macbook-pro-3.dhcp.thefacebook.com> References: <20220729152316.58205-1-laoar.shao@gmail.com> <20220729152316.58205-16-laoar.shao@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220729152316.58205-16-laoar.shao@gmail.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659416137; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CfmpytoP729KBrN0HCzeLOcgvNe2bvNCsG8itxfX62A=; b=TVhmIqt5F1N2H1ciueFA1ROYXipk0+28Xw1VzbJk+crovbqhkaj4vxO9IpbC3YsWnln0By YBnJTEoV10vkgj9edxe8RddbCjThTUA4VgPU5ejyrUBJpLpxzsjBPrz68RNIbuV7I7IaPu 39Sn1c55DFgxPXPGAkhCUKDPg5j0w+w= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=HutUsTmK; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659416137; a=rsa-sha256; cv=none; b=pBMsIcAd8cOmKXom4kXikwJj3eoGxUzvsZI3Wlmu9rcxm01hOrylKS87SnxK2M0xkFYD0r pfvDjxWZd/yzO4oYUPE1SwQH6dzD/HM9BGIZFd9pHXfR8ywezOiVxudfxXEulilTFpqtmI QHHPwzY/LjXxXqN1acINsZTYwkssTq4= Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=HutUsTmK; spf=pass (imf16.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.216.42 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E7ED3180076 X-Stat-Signature: ij3ri4d5cpnrzt93fr55etnyaboxqxub X-HE-Tag: 1659416136-777619 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jul 29, 2022 at 03:23:16PM +0000, Yafang Shao wrote: > A new member memcg_fd is introduced into bpf attr of BPF_MAP_CREATE > command, which is the fd of an opened cgroup directory. In this cgroup, > the memory subsystem must be enabled. This value is valid only when > BPF_F_SELECTABLE_MEMCG is set in map_flags. Once the kernel get the > memory cgroup from this fd, it will set this memcg into bpf map, then > all the subsequent memory allocation of this map will be charge to the > memcg. > > The map creation paths in libbpf are also changed consequently. > > Currently it is only supported for cgroup2 directory. > > The usage of this new member as follows, > struct bpf_map_create_opts map_opts = { > .sz = sizeof(map_opts), > .map_flags = BPF_F_SELECTABLE_MEMCG, > }; > int memcg_fd, int map_fd; > int key, value; > > memcg_fd = open("/cgroup2", O_DIRECTORY); > if (memcg_fd < 0) { > perror("memcg dir open"); > return -1; > } > > map_opts.memcg_fd = memcg_fd; > map_fd = bpf_map_create(BPF_MAP_TYPE_HASH, "map_for_memcg", > sizeof(key), sizeof(value), > 1024, &map_opts); > if (map_fd <= 0) { > perror("map create"); > return -1; > } Overall the api extension makes sense. The flexibility of selecting memcg is useful. > Signed-off-by: Yafang Shao > --- > include/uapi/linux/bpf.h | 2 ++ > kernel/bpf/syscall.c | 47 ++++++++++++++++++++++++++-------- > tools/include/uapi/linux/bpf.h | 2 ++ > tools/lib/bpf/bpf.c | 1 + > tools/lib/bpf/bpf.h | 3 ++- > tools/lib/bpf/libbpf.c | 2 ++ > 6 files changed, 46 insertions(+), 11 deletions(-) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index d5fc1ea70b59..a6e02c8be924 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -1296,6 +1296,8 @@ union bpf_attr { > * struct stored as the > * map value > */ > + __s32 memcg_fd; /* selectable memcg */ > + __s32 :32; /* hole */ new fields cannot be inserted in the middle of uapi struct. > /* Any per-map-type extra fields > * > * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 6401cc417fa9..9900e2b87315 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -402,14 +402,30 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock) > } > > #ifdef CONFIG_MEMCG_KMEM > -static void bpf_map_save_memcg(struct bpf_map *map) > +static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr) > { > - /* Currently if a map is created by a process belonging to the root > - * memory cgroup, get_obj_cgroup_from_current() will return NULL. > - * So we have to check map->objcg for being NULL each time it's > - * being used. > - */ > - map->objcg = get_obj_cgroup_from_current(); > + struct obj_cgroup *objcg; > + struct cgroup *cgrp; > + > + if (attr->map_flags & BPF_F_SELECTABLE_MEMCG) { The flag is unnecessary. Just add memcg_fd to the end of attr and use != 0 as a condition that it should be used instead of get_obj_cgroup_from_current(). There are other parts of bpf uapi that have similar fd handling logic. > + cgrp = cgroup_get_from_fd(attr->memcg_fd); > + if (IS_ERR(cgrp)) > + return -EINVAL; > + > + objcg = get_obj_cgroup_from_cgroup(cgrp); > + if (IS_ERR(objcg)) > + return PTR_ERR(objcg); > + } else { > + /* Currently if a map is created by a process belonging to the root > + * memory cgroup, get_obj_cgroup_from_current() will return NULL. > + * So we have to check map->objcg for being NULL each time it's > + * being used. > + */ > + objcg = get_obj_cgroup_from_current(); > + } > + > + map->objcg = objcg; > + return 0; > } > > static void bpf_map_release_memcg(struct bpf_map *map) > @@ -485,8 +501,9 @@ void __percpu *bpf_map_alloc_percpu(const struct bpf_map *map, size_t size, > } > > #else > -static void bpf_map_save_memcg(struct bpf_map *map) > +static int bpf_map_save_memcg(struct bpf_map *map, union bpf_attr *attr) > { > + return 0; > } > > static void bpf_map_release_memcg(struct bpf_map *map) > @@ -530,13 +547,18 @@ void *bpf_map_container_alloc(union bpf_attr *attr, u64 size, int numa_node) High level uapi struct should not be passed into low level helper like this. Pls pass memcg_fd instead. > { > struct bpf_map *map; > void *container; > + int ret; > > container = __bpf_map_area_alloc(size, numa_node, false); > if (!container) > return ERR_PTR(-ENOMEM); > > map = (struct bpf_map *)container; > - bpf_map_save_memcg(map); > + ret = bpf_map_save_memcg(map, attr); > + if (ret) { > + bpf_map_area_free(container); > + return ERR_PTR(ret); > + } > > return container; > } > @@ -547,6 +569,7 @@ void *bpf_map_container_mmapable_alloc(union bpf_attr *attr, u64 size, > struct bpf_map *map; > void *container; > void *ptr; > + int ret; > > /* kmalloc'ed memory can't be mmap'ed, use explicit vmalloc */ > ptr = __bpf_map_area_alloc(size, numa_node, true); > @@ -555,7 +578,11 @@ void *bpf_map_container_mmapable_alloc(union bpf_attr *attr, u64 size, > > container = ptr + align - offset; > map = (struct bpf_map *)container; > - bpf_map_save_memcg(map); > + ret = bpf_map_save_memcg(map, attr); > + if (ret) { > + bpf_map_area_free(ptr); > + return ERR_PTR(ret); > + } > > return ptr; > } > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index d5fc1ea70b59..a6e02c8be924 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -1296,6 +1296,8 @@ union bpf_attr { > * struct stored as the > * map value > */ > + __s32 memcg_fd; /* selectable memcg */ > + __s32 :32; /* hole */ > /* Any per-map-type extra fields > * > * BPF_MAP_TYPE_BLOOM_FILTER - the lowest 4 bits indicate the > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c > index 5eb0df90eb2b..662ce5808386 100644 > --- a/tools/lib/bpf/bpf.c > +++ b/tools/lib/bpf/bpf.c > @@ -199,6 +199,7 @@ int bpf_map_create(enum bpf_map_type map_type, > attr.map_extra = OPTS_GET(opts, map_extra, 0); > attr.numa_node = OPTS_GET(opts, numa_node, 0); > attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0); > + attr.memcg_fd = OPTS_GET(opts, memcg_fd, 0); > > fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz); > return libbpf_err_errno(fd); > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h > index 88a7cc4bd76f..481aad49422b 100644 > --- a/tools/lib/bpf/bpf.h > +++ b/tools/lib/bpf/bpf.h > @@ -51,8 +51,9 @@ struct bpf_map_create_opts { > > __u32 numa_node; > __u32 map_ifindex; > + __u32 memcg_fd; > }; > -#define bpf_map_create_opts__last_field map_ifindex > +#define bpf_map_create_opts__last_field memcg_fd > > LIBBPF_API int bpf_map_create(enum bpf_map_type map_type, > const char *map_name, > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index 50d41815f431..86916d550031 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -505,6 +505,7 @@ struct bpf_map { > bool pinned; > bool reused; > bool autocreate; > + __s32 memcg_fd; > __u64 map_extra; > }; > > @@ -4928,6 +4929,7 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b > create_attr.map_ifindex = map->map_ifindex; > create_attr.map_flags = def->map_flags; > create_attr.numa_node = map->numa_node; > + create_attr.memcg_fd = map->memcg_fd; > create_attr.map_extra = map->map_extra; > > if (bpf_map__is_struct_ops(map)) > -- > 2.17.1 >