From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 785ACC2D0E4 for ; Wed, 18 Nov 2020 01:11:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D327424671 for ; Wed, 18 Nov 2020 01:11:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Rto2KIJM" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D327424671 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C5ACF6B007D; Tue, 17 Nov 2020 20:11:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C0ADE6B007E; Tue, 17 Nov 2020 20:11:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF9266B0080; Tue, 17 Nov 2020 20:11:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id 821DB6B007D for ; Tue, 17 Nov 2020 20:11:14 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 318FD180AD81A for ; Wed, 18 Nov 2020 01:11:14 +0000 (UTC) X-FDA: 77495760468.25.rat74_2311b4527336 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 0EB431804E3A0 for ; Wed, 18 Nov 2020 01:11:14 +0000 (UTC) X-HE-Tag: rat74_2311b4527336 X-Filterd-Recvd-Size: 6814 Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Wed, 18 Nov 2020 01:11:13 +0000 (UTC) Received: by mail-lj1-f196.google.com with SMTP id 11so500932ljf.2 for ; Tue, 17 Nov 2020 17:11:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=k1uTQmCaLL4z76gRIZgW5OMObzd3UGBaR1Yyv4E9eQk=; b=Rto2KIJM7zEuqxypzt93PDHfcpYVAnMQjlVgugOtPIY1GPn2Tw90KZC8IDmMAgskdH MpG1EIk215aNH6cupXTUHxfQ8Bbl+mpgpfJp0peA/fe0RrKAAWolRMLkeCG7jLDMd11k zJK4RTe+XCWdXkNCnfBznOiiUS7Lj+G1iWigoKsKEjh8zCYqadxrwBTxZ+XkfHdKSHwx XHlCfihABAX4t7PfS2ZuU7+/Hwz0tzGdqYDBz7M9CVw2X7Lf2X+uu4fsie3usrLuIv++ 4I2MXFy1zla+tRXxG6yQFQyQ9iVrmAP9Z9zeYW5/nkcLo2Jbw0StYzyA55eqBiNnqDCJ 3/Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=k1uTQmCaLL4z76gRIZgW5OMObzd3UGBaR1Yyv4E9eQk=; b=ar76RGRJdlan6yTDtd3ElrlTEDMrwwSwolMfPNgD+2O4MAZRumDHaLvWQIlynN9uDq CS0vKhIG+4K+ud7YQGV2XvX0Sd7FhwkJfbsPbBmWpRnsD/OsoUrvdfu+9pfKdM1VuAHZ qkFmmjnhoZxBqD6AgigQ2FDxnrI32fmTXroqo3K1xu6c6UG07yizjH6Y1xgN9Ckp0q2Y 4UsBvlLZrJzi8HY+HtpvIDlP8QDnD+cVG5Rcel88pkO1Kz9dJxkATPrwRk8pbLKGO2l3 0DFUNfiYVS57OpSwkBt0Jx3NUU7JCU0tfenMZiHzaQaVKilbiNJY+umKcOqsoxe3fdEk P/MA== X-Gm-Message-State: AOAM5324k0BUPW9FpN8Z1eBZMHETcHqSJuf4WqZ3IDnkcICmOMJR3CBR pDZaMumWi+/3DSZ5o47J3yEzvNUQAPamqcz2GqU= X-Google-Smtp-Source: ABdhPJzb6cde2dyBm3bIrgT1Vq6ukk4rXZepBUsZDjJ6oDvxq1XcIFvseiplfDvZZYigk6moJ3qfcsvtkmrpxniKtxE= X-Received: by 2002:a2e:8982:: with SMTP id c2mr3041179lji.121.1605661872055; Tue, 17 Nov 2020 17:11:12 -0800 (PST) MIME-Version: 1.0 References: <20201117034108.1186569-1-guro@fb.com> <20201117034108.1186569-7-guro@fb.com> <41eb5e5b-e651-4cb3-a6ea-9ff6b8aa41fb@iogearbox.net> <20201118004634.GA179309@carbon.dhcp.thefacebook.com> <20201118010703.GC156448@carbon.DHCP.thefacebook.com> In-Reply-To: <20201118010703.GC156448@carbon.DHCP.thefacebook.com> From: Alexei Starovoitov Date: Tue, 17 Nov 2020 17:11:00 -0800 Message-ID: Subject: Re: [PATCH bpf-next v6 06/34] bpf: prepare for memcg-based memory accounting for bpf maps To: Roman Gushchin Cc: Daniel Borkmann , bpf , Alexei Starovoitov , Network Development , Andrii Nakryiko , Andrew Morton , linux-mm , LKML , Kernel Team Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 17, 2020 at 5:07 PM Roman Gushchin wrote: > > On Tue, Nov 17, 2020 at 04:46:34PM -0800, Roman Gushchin wrote: > > On Wed, Nov 18, 2020 at 01:06:17AM +0100, Daniel Borkmann wrote: > > > On 11/17/20 4:40 AM, Roman Gushchin wrote: > > > > In the absolute majority of cases if a process is making a kernel > > > > allocation, it's memory cgroup is getting charged. > > > > > > > > Bpf maps can be updated from an interrupt context and in such > > > > case there is no process which can be charged. It makes the memory > > > > accounting of bpf maps non-trivial. > > > > > > > > Fortunately, after commit 4127c6504f25 ("mm: kmem: enable kernel > > > > memcg accounting from interrupt contexts") and b87d8cefe43c > > > > ("mm, memcg: rework remote charging API to support nesting") > > > > it's finally possible. > > > > > > > > To do it, a pointer to the memory cgroup of the process which created > > > > the map is saved, and this cgroup is getting charged for all > > > > allocations made from an interrupt context. > > > > > > > > Allocations made from a process context will be accounted in a usual way. > > > > > > > > Signed-off-by: Roman Gushchin > > > > Acked-by: Song Liu > > > [...] > > > > +#ifdef CONFIG_MEMCG_KMEM > > > > +static __always_inline int __bpf_map_update_elem(struct bpf_map *map, void *key, > > > > + void *value, u64 flags) > > > > +{ > > > > + struct mem_cgroup *old_memcg; > > > > + bool in_interrupt; > > > > + int ret; > > > > + > > > > + /* > > > > + * If update from an interrupt context results in a memory allocation, > > > > + * the memory cgroup to charge can't be determined from the context > > > > + * of the current task. Instead, we charge the memory cgroup, which > > > > + * contained a process created the map. > > > > + */ > > > > + in_interrupt = in_interrupt(); > > > > + if (in_interrupt) > > > > + old_memcg = set_active_memcg(map->memcg); > > > > + > > > > + ret = map->ops->map_update_elem(map, key, value, flags); > > > > + > > > > + if (in_interrupt) > > > > + set_active_memcg(old_memcg); > > > > + > > > > + return ret; > > > > > > Hmm, this approach here won't work, see also commit 09772d92cd5a ("bpf: avoid > > > retpoline for lookup/update/delete calls on maps") which removes the indirect > > > call, so the __bpf_map_update_elem() and therefore the set_active_memcg() is > > > not invoked for the vast majority of cases. > > > > I see. Well, the first option is to move these calls into map-specific update > > functions, but the list is relatively long: > > nsim_map_update_elem() > > cgroup_storage_update_elem() > > htab_map_update_elem() > > htab_percpu_map_update_elem() > > dev_map_update_elem() > > dev_map_hash_update_elem() > > trie_update_elem() > > cpu_map_update_elem() > > bpf_pid_task_storage_update_elem() > > bpf_fd_inode_storage_update_elem() > > bpf_fd_sk_storage_update_elem() > > sock_map_update_elem() > > xsk_map_update_elem() > > > > Alternatively, we can set the active memcg for the whole duration of bpf > > execution. It's simpler, but will add some overhead. Maybe we can somehow > > mark programs calling into update helpers and skip all others? > > Actually, this is problematic if a program updates several maps, because > in theory they can belong to different cgroups. > So it seems that the first option is the way to go. Do you agree? May be instead of kmalloc_node() that is used by most of the map updates introduce bpf_map_kmalloc_node() that takes a map pointer as an argument? And do set_memcg inside?