From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3CD1C004D4 for ; Wed, 18 Jan 2023 05:39:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 08F616B0074; Wed, 18 Jan 2023 00:39:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 03F5F6B0075; Wed, 18 Jan 2023 00:39:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E489F6B0078; Wed, 18 Jan 2023 00:39:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id D69136B0074 for ; Wed, 18 Jan 2023 00:39:32 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A62691C5E81 for ; Wed, 18 Jan 2023 05:39:32 +0000 (UTC) X-FDA: 80366817384.16.09BABF6 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) by imf12.hostedemail.com (Postfix) with ESMTP id 133B840005 for ; Wed, 18 Jan 2023 05:39:30 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=R6diirM1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674020371; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sCkYNjQz0eVYQeRBw5clOPzwavuUa2lDilvlVQo0adk=; b=ZsoI6wMWxq/ZyZ+uKD2Wi5MXsOzMbPhAi85+4d/OPgBmaABAagkmqC9ZCYPgk1qC5lQiMS WCdKZApTTZ3nLl0iesP92hqAM3Myy9kFsUj7Ptmr8QpNGVNlNGufYZMXTPlK5w7tzRrlqy xNijo7Xjp3LwWWElNa6eTaVP4jfVxco= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=R6diirM1; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.218.52 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674020371; a=rsa-sha256; cv=none; b=kdp4p0ETUT/gKfGdKggEuxMqrhwQwfusuZ+dM4uSj9MqTZSV3XWctVnL/3EW+E+Rlcg3vA Dl9yjjdqVhfsGWXG95f8I9RJLAnE0vnYTEwW4J6/u5H9H8A1Bz9vkk2cMzbr43QTd1iZOu SVPFUzaoaaNVBHYmEEVu7XhmdVce4yk= Received: by mail-ej1-f52.google.com with SMTP id mg12so8230898ejc.5 for ; Tue, 17 Jan 2023 21:39:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sCkYNjQz0eVYQeRBw5clOPzwavuUa2lDilvlVQo0adk=; b=R6diirM1lvUNsgA39gpKs4Y4vTaMcG3TsP8XXw0chXTsUUeDehamB/6IRhUACT873E 9nHXHgqyGH9kEwk2bfaSb27Q0vvkBqI7gyBGuwXGaTN5FKh+8ZGH0QFLsahRjezh/J9y /cAFIzszaPpHpFGm9qRL0zZ8wx9OHIiy/enrz+WnDW0PZOGVT/2G99WZkv85n/LIvuyp LMwkpw5FdZ/MN0meUGBPIwSqbHOONk/vTpiSGTPKIQIJf99XSwkShRYJ+H0eXtkjmrg9 zzobkdu+5WoJTGGplWQiVIcWjR0Jzis5VQjfOqJhEjBdUwmAqMIlLdxuYfTwE5Y204pF czXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sCkYNjQz0eVYQeRBw5clOPzwavuUa2lDilvlVQo0adk=; b=C+/Hr0S81BEdJHJkxE1IbauiSujUyiBPpnl24wNaZeiZXzTN9JKMESvybob8AFOysa HiYTrxn6kWX6fgsSgKiERW15gL1t5NzQK6SiUCywx462OWX9o+0Dbz9jyjUm+gPLwBJF Mi4xNq2O8rUYtV1Byvnx0Wc3+4DItpHSwRgFKrlmNH2SjzguvSMQqgZt0b8GUdB8Tc94 sue85SGQnqQXb7cgOIGqIavHqcOpF+W2jhA6B77SoXtv4p3+mr2I+KePAzRqBbiAtcM+ 08Ma85jHfRxxr1JTedeHZd+kXRqKbPEXGFEj9+csz7otUsMdVHfId8150lJt7wwg50tW DMCg== X-Gm-Message-State: AFqh2kpnPuccJfF74aDTHyawk4YkNwoG763Gs9TkEG7oX0xjxroEzICT 6JNfLb2KSGBfbWKsqUceSW+rzcqB6TqF0tiw38g= X-Google-Smtp-Source: AMrXdXvAhe4Hizyfjsf2WGaF+txf7La5oe9VkKVF2FVL/XasSyEHwZfDOMaLPmUFO/dPf9a0cATh/8BEnAYXY621etA= X-Received: by 2002:a17:906:40d7:b0:836:e897:648a with SMTP id a23-20020a17090640d700b00836e897648amr334327ejk.94.1674020369519; Tue, 17 Jan 2023 21:39:29 -0800 (PST) MIME-Version: 1.0 References: <20230112155326.26902-1-laoar.shao@gmail.com> In-Reply-To: From: Alexei Starovoitov Date: Tue, 17 Jan 2023 21:39:18 -0800 Message-ID: Subject: Re: [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo To: Yafang Shao Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>, Vlastimil Babka , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Tejun Heo , dennis@kernel.org, Chris Lameter , Andrew Morton , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , linux-mm , bpf Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 133B840005 X-Stat-Signature: 3szayd5tb8rqfbo78zsckk6n9sfm3qj6 X-HE-Tag: 1674020370-864307 X-HE-Meta: U2FsdGVkX1+hsDqCbHkmCz3eXrNTJO/RkNPfyNt41tINeJqhPO7DQd8N8mEqUnwb/0l0Ghuaj0yt1sciKi5o5Rli26/P8zVT1zq2Ghlw/bkRGlp/PeZ7ZVCL7VUOfk7RBkXQZt8TSfUXjadkO7sBFpCpKmYdEEjzmgnc0eYzHcxpimaucPojNM883i4kiMBhxQmYHfEspYFXgjxSkTMP3YRp2HOHGPUM7m7TYJTN2vEb/WMQoUW414kyE40p/Akcg0wGeW/93YW0zRVnrlHJ0nWMokXAYFBxGsNDAIXvjUp/H/4TAhuq6eVO880rG02tCS+e3F0cNnqzLD+A9apMuFjQtPqF1fOQzxv2mYplKQHvmEbaXq0apEj4qsjzfNh8XDKJ2r+qAe96meFRkIjkhO15/bGAXdOV+cDXfbje3F+woGwdqQwmbyAQtvIN1xKFq5SENyLiXIXZA+KDVysJwEeDPlCYqlofUPGr8fJBuxly29oYGyA5OaDunIi0ld2q1YBngehWguPyQnSpYVWsNjjAyhXZMhfLnW8YfL3vppHNxIo7DSPNbDKG7cN2T8GxndgtxDw0D4IyrIeAb74Yj7ljHLmk2AorFu2RmnqFK6wY79n9RpQSgpPPxIpLehyku8gL+DyMVe5V4ku3uvPocTM5f/W5ET89Rrgrxlo/WwTEUMxY3eRDOQVACLFTqHWOK7e3mZlT4/dT0mB+h4VNbcrs+vAVj2p0vPaSlFmeZQDxdg1fddFRR4FQssY/i67D6kUw72Ku2y6DrtHPy9W41zJjWk8XYgcrNgmmuVHe5aQnHHE8zPwDHdCmZ5HbCODUQzNAkFYYju7QG7WrC/xk1Zr191o4toR/WmBstZwqSOp8/wLkp1ARBbrHp+FnnXOVOSlG0mGEHyY6tUlgeoNI3SelnIuM7SjMlf2AfbtTEeByqpRiPOalh1/o37lmhe7I/H5xDnDvNE1iz/Iz2UV openWQ28 kiIMJM+L+p9j0Ur8GMBOOKYXLGhtJNppLJ4sBF7Fw3rc9/jHkuAdeqLbIYHtHkzF7Vk1J+3GBHnJ95NC9edVag3tQbMc+UZnAo3qHqN74gPmKMF/H5+c0YP7wdlzH9+i7suYRmpoLpOoD9WtvyImFHs/91M5YOqBVQs8u X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 17, 2023 at 7:08 PM Yafang Shao wrote: > > On Wed, Jan 18, 2023 at 1:25 AM Alexei Starovoitov > wrote: > > > > On Fri, Jan 13, 2023 at 3:53 AM Yafang Shao wrote: > > > > > > On Fri, Jan 13, 2023 at 5:05 AM Alexei Starovoitov > > > wrote: > > > > > > > > On Thu, Jan 12, 2023 at 7:53 AM Yafang Shao wrote: > > > > > > > > > > Currently there's no way to get BPF memory usage, while we can only > > > > > estimate the usage by bpftool or memcg, both of which are not reliable. > > > > > > > > > > - bpftool > > > > > `bpftool {map,prog} show` can show us the memlock of each map and > > > > > prog, but the memlock is vary from the real memory size. The memlock > > > > > of a bpf object is approximately > > > > > `round_up(key_size + value_size, 8) * max_entries`, > > > > > so 1) it can't apply to the non-preallocated bpf map which may > > > > > increase or decrease the real memory size dynamically. 2) the element > > > > > size of some bpf map is not `key_size + value_size`, for example the > > > > > element size of htab is > > > > > `sizeof(struct htab_elem) + round_up(key_size, 8) + round_up(value_size, 8)` > > > > > That said the differece between these two values may be very great if > > > > > the key_size and value_size is small. For example in my verifaction, > > > > > the size of memlock and real memory of a preallocated hash map are, > > > > > > > > > > $ grep BPF /proc/meminfo > > > > > BPF: 350 kB <<< the size of preallocated memalloc pool > > > > > > > > > > (create hash map) > > > > > > > > > > $ bpftool map show > > > > > 41549: hash name count_map flags 0x0 > > > > > key 4B value 4B max_entries 1048576 memlock 8388608B > > > > > > > > > > $ grep BPF /proc/meminfo > > > > > BPF: 82284 kB > > > > > > > > > > So the real memory size is $((82284 - 350)) which is 81934 kB > > > > > while the memlock is only 8192 kB. > > > > > > > > hashmap with key 4b and value 4b looks artificial to me, > > > > but since you're concerned with accuracy of bpftool reporting, > > > > please fix the estimation in bpf_map_memory_footprint(). > > > > > > I thought bpf_map_memory_footprint() was deprecated, so I didn't try > > > to fix it before. > > > > It's not deprecated. It's trying to be accurate. > > See bpf_map_value_size(). > > In the past we had to be precise when we calculated the required memory > > before we allocated and that was causing ongoing maintenance issues. > > Now bpf_map_memory_footprint() is an estimate for show_fdinfo. > > It can be made more accurate for this map with corner case key/value sizes. > > > > Thanks for the clarification. > > > > > You're correct that: > > > > > > > > > size of some bpf map is not `key_size + value_size`, for example the > > > > > element size of htab is > > > > > `sizeof(struct htab_elem) + round_up(key_size, 8) + round_up(value_size, 8)` > > > > > > > > So just teach bpf_map_memory_footprint() to do this more accurately. > > > > Add bucket size to it as well. > > > > Make it even more accurate with prealloc vs not. > > > > Much simpler change than adding run-time overhead to every alloc/free > > > > on bpf side. > > > > > > > > > > It seems that we'd better introduce ->memory_footprint for some > > > specific bpf maps. I will think about it. > > > > No. Don't build it into a replica of what we had before. > > Making existing bpf_map_memory_footprint() more accurate. > > > > I just don't want to add many if-elses or switch-cases into > bpf_map_memory_footprint(), because I think it is a little ugly. > Introducing a new map ops could make it more clear. For example, > static unsigned long bpf_map_memory_footprint(const struct bpf_map *map) > { > unsigned long size; > > if (map->ops->map_mem_footprint) > return map->ops->map_mem_footprint(map); > > size = round_up(map->key_size + bpf_map_value_size(map), 8); > return round_up(map->max_entries * size, PAGE_SIZE); > } It is also ugly, because bpf_map_value_size() already has if-stmt. I prefer to keep all estimates in one place. There is no need to be 100% accurate. With a callback devs will start thinking that this is somehow a requirement to report precise memory. > > > > bpf side tracks all of its allocation. There is no need to do that > > > > in generic mm side. > > > > Exposing an aggregated single number if /proc/meminfo also looks wrong. > > > > > > Do you mean that we shouldn't expose it in /proc/meminfo ? > > > > We should not because it helps one particular use case only. > > Somebody else might want map mem info per container, > > then somebody would need it per user, etc. > > It seems we should show memcg info and user info in bpftool map show. Show memcg info? What do you have in mind? The user info is often useless. We're printing it in bpftool prog show and some folks suggested to remove it because it always prints 'uid 0' Notice we use bpf iterators in both bpftool prog/map show that prints the process that created the map. That is much more useful than 'user id'. In bpftool we can add 'verbosity' flag and print more things. There is also json output. And, of course, nothing stops you from having your own prog/map stats collectors.