From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C85BBC636D6 for ; Wed, 8 Feb 2023 03:34:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D35AF6B0071; Tue, 7 Feb 2023 22:34:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CE57E6B0072; Tue, 7 Feb 2023 22:34:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAC866B0073; Tue, 7 Feb 2023 22:34:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AC75E6B0071 for ; Tue, 7 Feb 2023 22:34:32 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 71DD08056E for ; Wed, 8 Feb 2023 03:34:32 +0000 (UTC) X-FDA: 80442707184.01.C389DC8 Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf12.hostedemail.com (Postfix) with ESMTP id 970604000E for ; Wed, 8 Feb 2023 03:34:30 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=e9tTj1Yu; spf=pass (imf12.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675827270; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8njTTPO7Ws6+B27NcFo6Rw0G6yrzVzCpbkk7pqzSOtU=; b=jWoPyn3AvcdQAZCbwvjLjsRxhuyunbYZU19CYfCNGFey/YesVt/kxNL9Mx9FHUltR0MKjd q/1yTI+YTW/6ZujsqI2kB3ZT7es7dmveftv7tU6A8Aqas/2jMQmtuVqPYXpIPeUXB4Dk6q oqcDVU08BSAQH4DU+jYC083nETudBd8= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=e9tTj1Yu; spf=pass (imf12.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675827270; a=rsa-sha256; cv=none; b=t1PjcUoTMcQ9g0Pa22jUkhMuQp6LyXD5ZyvAYQ+GQqBO3QHtQvS0K1lYJX8FtaPsq+CV0z /c0aC8CRA2LKlZyV8jngwVXdTT5Ug/gZ5L1XuiAqan+mAnDPzkLYp+8QtinPCzW+rSARDm dEi9hYoSeycehbPwTqEmfE09BR0FtaQ= Received: by mail-qt1-f181.google.com with SMTP id w3so19498592qts.7 for ; Tue, 07 Feb 2023 19:34:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8njTTPO7Ws6+B27NcFo6Rw0G6yrzVzCpbkk7pqzSOtU=; b=e9tTj1YueTFbNfCC6l0mVGcAVZZJdqbF+41dFz34/E+NoTb7u3/kG43eISG+WwxGyN uQ8kBvaDdCqkEnCMqoTGwzsDYTP9qFZTSysTYHP3KtnAgwa/vmInmKpzt2COSHCzV+I0 4Lqvd66vQHaVqhxHAQKI0quFrcJjMSOow92IAED49//Bp94f/vYKN+rdBiSevYD0Xfc6 xVaA3hCUTlwd33JtYp/kN6Bejn3iX+ANqfqAlzz3rWH3o0pcg6p6XTj5+MdDEpPWvjgL Sbd3ClPGZYQCL4fZvy3wYc8/gooCwHc1P3lZzgpPL+BCh9V0Lr/9ylgIPlfdK0V+G+en Xn8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8njTTPO7Ws6+B27NcFo6Rw0G6yrzVzCpbkk7pqzSOtU=; b=UlxgL6GaahuDzobII2v/5b13ppTv1TX7ygeg0u9C1Caw3OQ7k2BlPkkKy+d1fXUDpm bcdxEwi1IrYkrkVdw516CKtpn6KntU5cpU2ibOMVpK6w56R5iGmi2nCPeewGFUWVkbkK CXx0wndmD5GSMbz6p48Fz6LPHwCkcI24u/iV0Qnpt6lGkFxJmRR4wKS5Z9IcTRctag36 CThbF3jn3NGiEgzfESF8p8zPdM/CCS/sE3mXOKw+s4yRckzPaSGK2iKCtzrSgwjXKSvq z9If/l2+EBZbveJ4Dj4uuWMQdvZQNV3VEUA3R8vxO3iQf+HzWBOrH8LHab+yPgYcpW2o 4Rmg== X-Gm-Message-State: AO0yUKX0IISk6K85PkI3r1ZzePKxoiBcDSk6zCDnU/x73roP5i+fosbe ZXbU2HqKZNWxdoYhYiX/C8RxS1jErz2VWt7aQcU= X-Google-Smtp-Source: AK7set/uxt6BzYh1nLwg699CHLl8mnuKj1YXOd7omWQz1IygKsIuElPndBjwEfIpzOXiUkMcUFzsq92XNCL2dfD5WAk= X-Received: by 2002:a05:622a:2c7:b0:3ba:240b:99ad with SMTP id a7-20020a05622a02c700b003ba240b99admr1166819qtx.65.1675827268424; Tue, 07 Feb 2023 19:34:28 -0800 (PST) MIME-Version: 1.0 References: <20230202014158.19616-1-laoar.shao@gmail.com> <20230202014158.19616-8-laoar.shao@gmail.com> <63ddbc69ef50f_6bb1520813@john.notmuch> In-Reply-To: From: Yafang Shao Date: Wed, 8 Feb 2023 11:33:52 +0800 Message-ID: Subject: Re: [PATCH bpf-next 7/7] bpf: hashtab memory usage To: Alexei Starovoitov Cc: John Fastabend , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Tejun Heo , dennis@kernel.org, Chris Lameter , Andrew Morton , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Vlastimil Babka , urezki@gmail.com, linux-mm , bpf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: f8piz4hnthm9kr58jogroncths1yds1c X-Rspam-User: X-Rspamd-Queue-Id: 970604000E X-Rspamd-Server: rspam06 X-HE-Tag: 1675827270-272386 X-HE-Meta: U2FsdGVkX19s0vDSMioMYsPlPX5Xmx9cffDZxQaEorWFwIYTL+9OzUpRRWsx15dzjHgVEVLNHQSgqx2iHl73+qGj/9UlP3Lz+qhNnYEyvH/D+48ayIdVei03rdRGKLqBIE6sMpjYETAGKSAO+AxR9ImlW0vDSMlRIxk2dRNBthTgfRujzFPjKYmwbi2RYSg7niobDgIXper+p3mcXShHNdaMEQg2yDn8C3jNTcngGRmRpSek6SHMAKAPKECH6Rm2rIBlVSvKhHjKfUXtNoDcqwUpFpNQa5yYTOWMAdxU0AoFhFSVMk//gWji2fVwii0RY1+wXntGlFSmcAW/UkzLT4OJhe9NfxzdSkBRpdiaYkW3qiqQtNCIGwC8xzvF3o3vYJt3nBHcbnO9UqpCcCqtx35cGeeDWQ3tLvRVGtcPkqZcsCtROWRlieUCwwylyW4IMtDsxPy3CgFDR5kYSSJRLTrdAM8Une0UnY4dgb5UFJTXgzaV0rAol2CZzRD+LxnSINHlh45JJNB8hmSdms44g0XOVZV5oYs3Hue1z7hxa1zoJT2gro9XKCQ5GACIye/FHXAbQw/ncF2sQXuzPkMdV5pM+RSB4udx+4HILccvg/ksjR+KoiNgVxU/lo6xFrn6ajgQ6j2Jum9WMqeyypaebyKspss+egGECnkKrDhdj3i/a3/9qSRwozJw0maFs40FHA31pPKlGp+06VlKjEJv2UMsGfg+TjcCiYEBXcLjNw1roDNmMXH+/bD8JBQzkK4SIbJqzdP7IrfCCvB+MoxmEkfI48XQxqWGIEZXzCDVQ+G9moB1bVy7EO/8+R44IUTZIubX9yIZC+LQqjZRrwZvFsHSWYWu8vhCOZmS+f26X4EkQYsBO9KRrfqYL33zDt3RlmhySkB0tOWjMPcqmqiPuxvwQtuMT9/EgLHNw+CZp52BgIeUEU2fXn7V5sPjq8Rs8GHWqdngRDu9uU7BD7B fRUu9yNs tsOvkyWEP+aDnXuetCkJoh7XLYIkXvSbG2kOF1s1zBj+vvAV4+tRc0G7B3nk9KlpBNfJ+o9S3EJHMfKZEeCx9qKZD3Rkeu+y7ngnqDqgz71SA59uraY7YTGcSpfzqHHQw4CKGZ5N9HWihlgtk+SQOxju//k1bjSW26nCW9vS8bSD9N3gZxlnIpl+g4TIKe4lx+kMh07MyJifhWFyrWWz3W89+UUrE3F77UthYHJRgevzdF3Yg0jc67naEn3cb0sUmqe9DHaeG+9+iV3rIL7TOwXXjBAVDx3zgsIQZpIhFfje2CrQD2W5PqQWWdWbHu7C/Mc4rZhNb3tAthQ+ANBZj5J5stveoT92NiRZU6yQR32AJfhwL7nwfevW2KMgvqqf0ePTdYz/LUzYidzZcnF3bsNZXEH/o2JBYHcFq6PQwbNhKk4LZ1ZKiNM1zWYhLwI+oemfhlhzD0RBwMG3zk62Pv5aQX0A6UewqAuZ+i5piPLP9ImyQhfuDXuLNNw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 8, 2023 at 9:56 AM Alexei Starovoitov wrote: > > On Sat, Feb 4, 2023 at 7:56 PM Yafang Shao wrote: > > > > On Sat, Feb 4, 2023 at 10:01 AM John Fastabend wrote: > > > > > > Yafang Shao wrote: > > > > Get htab memory usage from the htab pointers we have allocated. Som= e > > > > small pointers are ignored as their size are quite small compared w= ith > > > > the total size. > > > > > > > > The result as follows, > > > > - before this change > > > > 1: hash name count_map flags 0x0 <<<< prealloc > > > > key 16B value 24B max_entries 1048576 memlock 41943040B > > > > 2: hash name count_map flags 0x1 <<<< non prealloc, fully set > > > > key 16B value 24B max_entries 1048576 memlock 41943040B > > > > 3: hash name count_map flags 0x1 <<<< non prealloc, non set > > > > key 16B value 24B max_entries 1048576 memlock 41943040B > > > > > > > > The memlock is always a fixed number whatever it is preallocated or > > > > not, and whatever the allocated elements number is. > > > > > > > > - after this change > > > > 1: hash name count_map flags 0x0 <<<< prealloc > > > > key 16B value 24B max_entries 1048576 memlock 109064464B > > > > 2: hash name count_map flags 0x1 <<<< non prealloc, fully set > > > > key 16B value 24B max_entries 1048576 memlock 117464320B > > > > 3: hash name count_map flags 0x1 <<<< non prealloc, non set > > > > key 16B value 24B max_entries 1048576 memlock 16797952B > > > > > > > > The memlock now is hashtab actually allocated. > > > > > > > > At worst, the difference can be 10x, for example, > > > > - before this change > > > > 4: hash name count_map flags 0x0 > > > > key 4B value 4B max_entries 1048576 memlock 8388608B > > > > > > > > - after this change > > > > 4: hash name count_map flags 0x0 > > > > key 4B value 4B max_entries 1048576 memlock 83898640B > > > > > > > > > > This walks the entire map and buckets to get the size? Inside a > > > rcu critical section as well :/ it seems. > > > > > > > No, it doesn't walk the entire map and buckets, but just gets one > > random element. > > So it won't be a problem to do it inside a rcu critical section. > > > > > What am I missing, if you know how many elements are added (which > > > you can track on map updates) how come we can't just calculate the > > > memory size directly from this? > > > > > > > It is less accurate and hard to understand. Take non-preallocated > > percpu hashtab for example, > > The size can be calculated as follows, > > key_size =3D round_up(htab->map.key_size, 8)=EF=BC=9B > > value_size =3D round_up(htab->map.value_size, 8); > > pcpu_meta_size =3D sizeof(struct llist_node) + sizeof(void *); > > usage =3D ((value_size * num_possible_cpus() +\ > > pcpu_meta_size + key_size) * max_entries > > > > That is quite unfriendly to the newbies, and may be error-prone. > > Please do that instead. I can do it as you suggested, but it seems we shouldn't keep all estimates in one place. Because ... > map_mem_usage callback is a no go as I mentioned earlier. ...we have to introduce the map_mem_usage callback. Take the lpm_trie for example, its usage is usage =3D (sizeof(struct lpm_trie_node) + trie->data_size) * trie->n_entrie= s; I don't think we want to declare struct lpm_trie_node in kernel/bpf/syscal= l.c. WDYT ? > > > Furthermore, it is less accurate because there is underlying memory > > allocation in the MM subsystem. > > Now we can get a more accurate usage with little overhead. Why not do i= t? > > because htab_mem_usage() is not maintainable long term. > 100% accuracy is a non-goal. htab_mem_usage() gives us an option to continue to make it more accurate with considerable overhead. But I won't insist on it if you think we don't need to make it more accurat= e. --=20 Regards Yafang