From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 858E1C27C76 for ; Sat, 28 Jan 2023 11:49:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E5BE66B0072; Sat, 28 Jan 2023 06:49:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E0A5B6B0073; Sat, 28 Jan 2023 06:49:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD2B56B0074; Sat, 28 Jan 2023 06:49:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BD1BA6B0072 for ; Sat, 28 Jan 2023 06:49:47 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 8A6A5A0999 for ; Sat, 28 Jan 2023 11:49:47 +0000 (UTC) X-FDA: 80404038414.10.ED7F572 Received: from mail-qv1-f51.google.com (mail-qv1-f51.google.com [209.85.219.51]) by imf27.hostedemail.com (Postfix) with ESMTP id C46104000C for ; Sat, 28 Jan 2023 11:49:45 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AcLZflbF; spf=pass (imf27.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674906585; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ePpzN6qRFIEbKg9D3BVBzunxUucei+EPla7AGvFoiyA=; b=5EU4M76lLiW/gH4o1sgDxKdZvZrFETdf/KOHOzqFyfLoF8ajFMhBHVV5VkzXuwjNgnGmuT hP7mS70f7sB0Cob6mGsa0Ue29cGD45Rf0pf66Rsw//yhLz7U/1wTwBd0zk/Q1OUWQwxycJ BsJ33WO8gxDUFxtcN/XTy5W2tyz0DzI= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=AcLZflbF; spf=pass (imf27.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.51 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674906585; a=rsa-sha256; cv=none; b=OQq7nCdWZnv5x0K96zcjg/1eRkYJSea0ve6WGSU4wtmjjWwfJXsOXV/vrUmD4qb2sP35CG wTwaHZ5KcmV5OfA0cDsAkqfaCiCM86STsrn4NTOUYsoW5ANPmFGA3dm97Yfp9SXZG8BUFi x4Xe14gPvoVvoYepVvo3HQNCqJ/rPPs= Received: by mail-qv1-f51.google.com with SMTP id k12so5743115qvj.5 for ; Sat, 28 Jan 2023 03:49:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ePpzN6qRFIEbKg9D3BVBzunxUucei+EPla7AGvFoiyA=; b=AcLZflbFD5YHqYSEh1R1kZ9nXNMTRunbGDaR+QGVZByhqbEAhhoUZieUdfCfs4M7cs GvBJXkzVg/NUXTDG3S+87xG0CQgqj113CVZGKvANKckhp47yfytuLxsLTuPqgB0L1RSQ jFp8Wjq3dmpybwYNY65IOp84Lthlef2CaQDIttQwRgz2F9Z6bOmxGH7dLWLMLaCGjhd1 vJdOorFPPG4EMASqzW2Es71tqpsbxbx3Se/Ox5ypJ23xVSom0WVTSC8CONl6FWMMcqeu FDvLLsXI2rHMOEzKy9ybdwcZr2V2Zo0+mBGg0WOAFok1Ai79SHblk3tLtacNdKyMgUS8 F2+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ePpzN6qRFIEbKg9D3BVBzunxUucei+EPla7AGvFoiyA=; b=xgyypV6mh26PvzqywOhXMn14kSU30ksA41+7FOBfKjK7A49uWRYDyXukbXAqNl5yyb heGKGmgyjZ7putzlKL7BwnIg0IBPjxjLVdYQB9rYYaU5NPyfQzpbf4+udi74UaZX7hIA XSd0RRWBLCUGl9O30S+OZrgMpnVDaeYllvUOJkpa8PShVajV6r3EsiWZ87JXlUcunb8J llD53Wo3dGLHq+1eUw277j1zB1PDEyTV0XWQPTS1LEim0yXvsOCe3oZMSaTpH3qukmvI umSq0QDbvVvmXzcAIVGAOJwtsOTaPwBWKPM3IMfPJYYtI6ez8MuGcL+HtBHXeQSdabIT ruoQ== X-Gm-Message-State: AO0yUKWqHwrrJFbQEBbq66Qh/UdChjUK9aHPQAo5g3R8anz+kwaCy5Aj 6ps8ECHtggko70DGZcYSa1S+JLsB19+6tHjtQuM= X-Google-Smtp-Source: AK7set9rMswu63KF4x1RGdf3cnOhRrW9s4JWqiyE3fCHaI98pkqMvJ/fyjdYQxXknuxk4i4QTsehIERQ+ZFi37sWpec= X-Received: by 2002:a0c:fe8e:0:b0:538:887f:be54 with SMTP id d14-20020a0cfe8e000000b00538887fbe54mr426983qvs.104.1674906584929; Sat, 28 Jan 2023 03:49:44 -0800 (PST) MIME-Version: 1.0 References: <20230112155326.26902-1-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Sat, 28 Jan 2023 19:49:08 +0800 Message-ID: Subject: Re: [RFC PATCH bpf-next v2 00/11] mm, bpf: Add BPF into /proc/meminfo To: Alexei Starovoitov , Andrew Morton , urezki@gmail.com, Christoph Hellwig Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>, Vlastimil Babka , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Tejun Heo , dennis@kernel.org, Chris Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , linux-mm , bpf Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: gk5344nudsaesp6xq3wtf5gdcj5ffdfu X-Rspam-User: X-Rspamd-Queue-Id: C46104000C X-Rspamd-Server: rspam06 X-HE-Tag: 1674906585-279200 X-HE-Meta: U2FsdGVkX18UpRnJVIuLhWrbdr94/Gnbmn+BjFW8IZtBvO1ilQkFk+jBl5xLcwa1YNQmiBoP9UBKDumAkk7tpe8kpIc2/QogGnGxZUKPQQj/UD1moVblqDTFnr/56ltIzZ6RwMtVle9K34JsaThAnlZVd/U/GRANSV8akyjELEcBZQ2Mx5QLugvitcEGwCosvHklYC3Z+/hpcNbetNoyFykxv1VFzuJMUlPyDverRckWswje2IGH2jvIQBFcd3aj2qD8+FPxicCxJWP7bp9VjJAE/opq4ZffNRz1Bs/JMnidcRFx1hBsDJvBOO5eMoT4Ug2ZcnSNYUgEupm4S3DRaNp2qU0C5a3Zzd3y6RaSRSkrIOonkvv12p/fkBZEqWGO+4ZY8dDsT4NooM3jtglqJo45gipmt4G4i4akyHKqtf5M2V+hK68PteJpDSb7hDK6kCPwx0fMKilu3aB8cHErDleOCHk46e6KEEgHB+BNLBn3lRfp6ubHbQJX6Qeu70xOwffqOXiVNLiyzy5L2idUC38ZoO7a+Cmu75GvQ1xJ6DThP+FbJb3CsbMK9f0hSF7wrX7+qdd7ltlfwrLt0PTfCKQpSpYah1k48chedOXE14qqo6zREetXskN9NgZIHmVmZ2ZUP/+QKOc2nArwf0UEXUPg8le/tC+4nJA5IeDXPK3BX0vcILUi5JA7IXRTsnWwWHL/u9UTkdBCeNR8zmmVmTH1Sw6A/2ogc3JoXwgJkYzHO0VBwokTjDqaWduU23JYLNqCJBYIhaFpoBv7ZnncMYDsygs8P7hMzmBtWb3kYxtYoXVQAQRfOKhjwnphlYiHwgnIMS17wELqM+1108/V3jIYYUWcTm6fmI1jHD2QqUGm+8aF63PXBEoIVF8JSwAfkG8FFe24h1HEbqaGGeRvD44Hnc+lpkso9jRFaN+1R+GjM5xwJ13gbs6yhnxsdkaavYFMTT7KkBkJXZ+ofL0 6I9/UvO1 Ey05M6vDEaay7gNtj2hyCmE+l3MPKwUL+40UkaDg+9wbwqOkfMZad7346PV3/S5NsTiEzW7TN/F9JpENvXukfVQlENTk2fxEr06E4F/TSvse9FXULFAwTIzvjZKIT/Wl1Shyw1M3V6IRhyEaYW6L5tl5eY1GBV49f2ug7MjIzCYCZ6RvMp6o2pHjRQfHfg652tBRJ+WT2etU0qPvvwL9P+ivrCCGbtZpkclhRY7HHyFo9nHc0bop+pWI1lI14ZmFHT5YVs6ja6RvW70gk9KE8uKxmzDSi7JOJmwbuXwxJuFNQc5u2Q03VBKfFsoh6qoxPS554Gk3Mq+7k1x1hlLhMWunIwKAp2b0UiV4+u/1bIsTsc1JRcbf0JjtJBQGgAJfa8dj8vsqYiESl7FwqrryKbmt+b8KwL5oFHJR/sv3ArAHDW6VmyXGwQYXiRPq0IXtEx9DLwu+/SIXHjES9GtizG1CJq7IYhsfR5ivwzIgsjEazGDoFMfDhb4Y5DAk2+MC1zLddPI8GqcyimTQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 26, 2023 at 1:45 PM Alexei Starovoitov wrote: > > On Tue, Jan 17, 2023 at 10:49 PM Yafang Shao wrote: > > > > I just don't want to add many if-elses or switch-cases into > > > > bpf_map_memory_footprint(), because I think it is a little ugly. > > > > Introducing a new map ops could make it more clear. For example, > > > > static unsigned long bpf_map_memory_footprint(const struct bpf_map *map) > > > > { > > > > unsigned long size; > > > > > > > > if (map->ops->map_mem_footprint) > > > > return map->ops->map_mem_footprint(map); > > > > > > > > size = round_up(map->key_size + bpf_map_value_size(map), 8); > > > > return round_up(map->max_entries * size, PAGE_SIZE); > > > > } > > > > > > It is also ugly, because bpf_map_value_size() already has if-stmt. > > > I prefer to keep all estimates in one place. > > > There is no need to be 100% accurate. > > > > Per my investigation, it can be almost accurate with little effort. > > Take the htab for example, > > static unsigned long htab_mem_footprint(const struct bpf_map *map) > > { > > struct bpf_htab *htab = container_of(map, struct bpf_htab, map); > > unsigned long size = 0; > > > > if (!htab_is_prealloc(htab)) { > > size += htab_elements_size(htab); > > } > > size += kvsize(htab->elems); > > size += percpu_size(htab->extra_elems); > > size += kvsize(htab->buckets); > > size += bpf_mem_alloc_size(&htab->pcpu_ma); > > size += bpf_mem_alloc_size(&htab->ma); > > if (htab->use_percpu_counter) > > size += percpu_size(htab->pcount.counters); > > size += percpu_size(htab->map_locked[i]) * HASHTAB_MAP_LOCK_COUNT; > > size += kvsize(htab); > > return size; > > } > > Please don't. > Above doesn't look maintainable. It is similar to htab_map_free(). These pointers are the pointers which will be freed in map_free(). We just need to keep map_mem_footprint() in sync with map_free(). It won't be a problem for maintenance. > Look at kvsize(htab). Do you really care about hundred bytes? > Just accept that there will be a small constant difference > between what show_fdinfo reports and the real memory. The point is we don't have a clear idea what the margin is. > You cannot make it 100%. > There is kfence that will allocate 4k though you asked kmalloc(8). > We already have ksize()[1], which covers the kfence. [1]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/mm/slab_common.c#n1431 > > We just need to get the real memory size from the pointer instead of > > calculating the size again. > > For non-preallocated htab, it is a little trouble to get the element > > size (not the unit_size), but it won't be a big deal. > > You'd have to convince mm folks that kvsize() is worth doing. > I don't think it will be easy. > As I mentioned above, we already have ksize(), so we only need to introduce vsize(). Per my understanding, we can simply use vm_struct->size to get the vmalloc size, see also the patch #5 in this patchset[2]. Andrew, Uladzislau, Christoph, do you have any comments on this newly introduced vsize()[2] ? [2]. https://lore.kernel.org/bpf/20230112155326.26902-6-laoar.shao@gmail.com/ > > > With a callback devs will start thinking that this is somehow > > > a requirement to report precise memory. > > > > > > > > > > bpf side tracks all of its allocation. There is no need to do that > > > > > > > in generic mm side. > > > > > > > Exposing an aggregated single number if /proc/meminfo also looks wrong. > > > > > > > > > > > > Do you mean that we shouldn't expose it in /proc/meminfo ? > > > > > > > > > > We should not because it helps one particular use case only. > > > > > Somebody else might want map mem info per container, > > > > > then somebody would need it per user, etc. > > > > > > > > It seems we should show memcg info and user info in bpftool map show. > > > > > > Show memcg info? What do you have in mind? > > > > > > > Each bpf map is charged to a memcg. If we know a bpf map belongs to > > which memcg, we can know the map mem info per container. > > Currently we can get the memcg info from the process which loads it, > > but it can't apply to pinned-bpf-map. > > So it would be better if we can show it in bpftool-map-show. > > That sounds useful. > Have you looked at bpf iterators and how bpftool is using > them to figure out which process loaded bpf prog and created particular map? Yes, I have looked at it. I know what you mean. It seems we can introduce a memcg_iter or something else to implement it. -- Regards Yafang