From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D47EC05027 for ; Wed, 8 Feb 2023 14:22:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 978576B0071; Wed, 8 Feb 2023 09:22:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9286D6B0072; Wed, 8 Feb 2023 09:22:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F0D36B0073; Wed, 8 Feb 2023 09:22:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 698116B0071 for ; Wed, 8 Feb 2023 09:22:56 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2F40D1C6693 for ; Wed, 8 Feb 2023 14:22:56 +0000 (UTC) X-FDA: 80444341152.13.548AAFF Received: from mail-qv1-f45.google.com (mail-qv1-f45.google.com [209.85.219.45]) by imf09.hostedemail.com (Postfix) with ESMTP id 1C018140003 for ; Wed, 8 Feb 2023 14:22:53 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=E89lUuqe; spf=pass (imf09.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.45 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675866174; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n6oF9j80LVBPF/KMmx1PXi5SmA7WM0UIQRKbeBKyQgI=; b=eQnEpBNFryZYCDJcYQmYCj71nJGLpvnBkmWTHTUc/xJRsVwB0ozCXnj8WCWHdkjP0UXYVC rFZ4JXzyChuiiqfLVvnRpnjZw6CwoTIbcOrx7ceentQnGJUPTnT9CTom5oQhG6cyVqKuea KxnyCXbpcJ3UyDVq6BGaVcrPlBAeMK0= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=E89lUuqe; spf=pass (imf09.hostedemail.com: domain of laoar.shao@gmail.com designates 209.85.219.45 as permitted sender) smtp.mailfrom=laoar.shao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675866174; a=rsa-sha256; cv=none; b=xl6zMruuG0ubwRmqkHLIAR+ihddtczxCX9LTQiyz0CoXxTy0haZu9kPhAyqq3SzD1eBunW LkyGZYpnOEym30CyHkzdraOIFP3H66lmAEOURSEryrEfSKLOSvw5sKTLWNixjjaNNc4GWO q7cEyu0bgGfcqIWAlWM+r9YR13YM89M= Received: by mail-qv1-f45.google.com with SMTP id w11so6085282qvs.7 for ; Wed, 08 Feb 2023 06:22:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=n6oF9j80LVBPF/KMmx1PXi5SmA7WM0UIQRKbeBKyQgI=; b=E89lUuqeLZS3ADFJnvoOQl6yUCAWs+K5M9M4EGkQuTzbQTD6xGf7ip58voQlIaA6ci +uPoV/BJKVDRaliJIAorRILur/DjVSStTvhauGDXmThLZDpj6oLqp/o6Z2GMcU3wCKI+ vexE9gWqCJ36u+IWSN9Eo0QwM04flZaqZZwPYZIBhiDOEiNuTFvHLv7QD+jfAiz+1MLy xDQjS1Fv1i14g/PSVYvsWt2pCCJS7+PQ0909gkv1sQ/IQZW1SRA/OwWVIhPsYwFa/C8h 44Rsr7t83hEASpqYoSiC3UYoeuONieIxLkoIYjLv+QNmuiXJWCXZjpzgyYhbWBu50NbK A6zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n6oF9j80LVBPF/KMmx1PXi5SmA7WM0UIQRKbeBKyQgI=; b=xGMLmd+bSFla6ZtUuhLMyc7CzdfIIvSvwxzGZk4sIMtISsNSIDkW13mUUarjCHs074 Niz2jLEbsbkWs/2g8NEiFTHQ3KCMTwQHDDElbW1RumVsENy706TqFzBKtZkJaKnsDebi 1c/x0fa6dyYuBiAalJOD4Aw6GEu1gFwcT0Iw1zJBHFcrAB4zYmLFgd4eoIJj//wB2gdK xiRV867erNwcwygPc9j7m3Ohx9g2dwlHt3PonEEZoRDKYX+B3iwMcHrNpJIAV8Nw6C9Z 7LWIdAdO0ghkT24KS97kYQ2SEtL0CpvyvHFalcNSSesy+gPY2QGeScq1w7KqMbd0r4Rg OwCw== X-Gm-Message-State: AO0yUKU0E09V3zG1mbyCoJ5V6eUhUjQUNdZl2QRK703mAdT1qlSc+/4D uqiY++5UNg99hgc4RHRWGzxtdmMmtAhOqy7Jqb8= X-Google-Smtp-Source: AK7set9pEr6SGniaXDs+fgtB00D1pgrh0Y4TXWMUSNUg1OrEr8NpOfoqYNccPZdFN2vc0DBM1tg3WEqikFFyHNoiA/o= X-Received: by 2002:a0c:aa0d:0:b0:537:6e55:eeb7 with SMTP id d13-20020a0caa0d000000b005376e55eeb7mr720491qvb.66.1675866173212; Wed, 08 Feb 2023 06:22:53 -0800 (PST) MIME-Version: 1.0 References: <20230202014158.19616-1-laoar.shao@gmail.com> <20230202014158.19616-8-laoar.shao@gmail.com> <63ddbc69ef50f_6bb1520813@john.notmuch> In-Reply-To: From: Yafang Shao Date: Wed, 8 Feb 2023 22:22:17 +0800 Message-ID: Subject: Re: [PATCH bpf-next 7/7] bpf: hashtab memory usage To: Alexei Starovoitov Cc: John Fastabend , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Tejun Heo , dennis@kernel.org, Chris Lameter , Andrew Morton , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Vlastimil Babka , urezki@gmail.com, linux-mm , bpf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 1C018140003 X-Stat-Signature: m86ds6ynbtgn61yimcma7s7kyc53guoa X-Rspam-User: X-HE-Tag: 1675866173-618829 X-HE-Meta: U2FsdGVkX19PQtNMv7v08bbIDHqT8Emw1QY52ArvrGkimXx24JhMA1aC+AV11MzWspMIk0Ot+0Lq47sSo9u8xp/cvbStqh2aNhXCgkmePf4dFiuaqscAH1DCLw2YjWo/61HAIKqB16FM71GY94ohT2nxTgXQfgqF/jCxegy+CjtJR4sqSCnRE9oAUKqP+1dhKYOzgdq3OCZL0p2sXVejOFNu9B03Sr+FnBu0i2BJJbS5mIA6Im3AHt1q05kmDN3Wps3lKAXGwOeUZqDkbUl0KLZPcTYI6UicNov1IYl07OJ+AB8nz+aPr5uZXDDjQGJrfQWA58pJqFBFg+XTzu7IlNX++AitKRAWHx4G8hmVl6lDWV3/LBfJlrW2jFReYc90O6SSTHDshQ2PVVw6EcWyxFfBmpUkMYh3r3ID1Tb49XsxjxUgA1EFAS2v10XD/mmMUaZjhNIFCEdGnNaP9dJ8otJ58WWu6K9swnWs1M/FCaX3/PGPeHzl3MqD08OAx3t0sLb2du6VzDcixC7oopO+ov85ij9/JCY8DKcbSAl5xqAClIQg6ycWvUEW7lI9RHqZgt4/km4cBGFwxUlFXsqsD4bUbvGgV5AjT/zj63U3WgcpDYiLQfK5SPktDMHWSTgvxp8B4TPlueTMQbM4gKBTBqhGjZWQihhKzaVQa0Cqir7PQoFWP/irWH2Bv2vYHzPRW0Aj2WeQsVy9xkePKpuHw4OOm7+2ly9P2UEnpLvEL+at9LcmDk3IRv0wAX+JLXMxZIBsEdY0NSsNHrX6a37Qx1k9zJmmgr9p0MVPZ2HgNeiTLKfme+q8VEt9PKWzyjFuZsHsCn7NaVelv+nJhqee4ttfTtBBC3c57BcWeo0EnZq4rW91x3JPdSQApCIFu2HGmqfcqXd3Oq3Vn6ECTTfYDgI4gIbzNJe3rT7xllX4BpmFu9bEqrMU7guvOQ2pO4sqkngD+OrQDNVoR+p0LMK VMtGa04/ G/uSAFbNIVON5vLkj8EvhcQvyad0UoGhugiciE5+/EtUfuyd346OjViSYwKYdxhT9MwW8e+V+qfM8V3tDbLbXhsaY2OUXnm5HL4FktKIUpCqTOoKeRZIgqFleje6qFu8v+VXOmDaaUumatQNZTA7DhxQtnFtreuiLa/N6uY+8klz3VUxiJizMNLnubHZiMUzwlN96GHH7LNRZbOfBFjcXMG0KONsUlO3nx6pyzOg1rejHBz35/LzVPuS5gl28JVAeHsHiS3uwgsBHViF+rey+BjYxHxB7SOIfNdpIlsni2Wiora7wZ8N/2YjdLII0A2yHWWIRoCAe+d6442eLKMFGAz6hovsZT2PNyeigTUdB51L+bVjYAEYxAF8kO8AAH/DEX/YorOzSz//bamoKp2Fyb3z9VVxDimokjwL1kzMQCMsrV76gp7UhKT1FPyPXlBWHNOrGA8bW3xg6sfdWkFBXX0JWlLWMG34R2DvLA3VwEkAMFIdTiVuUVhjW6A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 8, 2023 at 12:29 PM Alexei Starovoitov wrote: > > On Tue, Feb 7, 2023 at 7:34 PM Yafang Shao wrote: > > > > On Wed, Feb 8, 2023 at 9:56 AM Alexei Starovoitov > > wrote: > > > > > > On Sat, Feb 4, 2023 at 7:56 PM Yafang Shao wro= te: > > > > > > > > On Sat, Feb 4, 2023 at 10:01 AM John Fastabend wrote: > > > > > > > > > > Yafang Shao wrote: > > > > > > Get htab memory usage from the htab pointers we have allocated.= Some > > > > > > small pointers are ignored as their size are quite small compar= ed with > > > > > > the total size. > > > > > > > > > > > > The result as follows, > > > > > > - before this change > > > > > > 1: hash name count_map flags 0x0 <<<< prealloc > > > > > > key 16B value 24B max_entries 1048576 memlock 419430= 40B > > > > > > 2: hash name count_map flags 0x1 <<<< non prealloc, fully se= t > > > > > > key 16B value 24B max_entries 1048576 memlock 419430= 40B > > > > > > 3: hash name count_map flags 0x1 <<<< non prealloc, non set > > > > > > key 16B value 24B max_entries 1048576 memlock 419430= 40B > > > > > > > > > > > > The memlock is always a fixed number whatever it is preallocate= d or > > > > > > not, and whatever the allocated elements number is. > > > > > > > > > > > > - after this change > > > > > > 1: hash name count_map flags 0x0 <<<< prealloc > > > > > > key 16B value 24B max_entries 1048576 memlock 109064= 464B > > > > > > 2: hash name count_map flags 0x1 <<<< non prealloc, fully se= t > > > > > > key 16B value 24B max_entries 1048576 memlock 117464= 320B > > > > > > 3: hash name count_map flags 0x1 <<<< non prealloc, non set > > > > > > key 16B value 24B max_entries 1048576 memlock 167979= 52B > > > > > > > > > > > > The memlock now is hashtab actually allocated. > > > > > > > > > > > > At worst, the difference can be 10x, for example, > > > > > > - before this change > > > > > > 4: hash name count_map flags 0x0 > > > > > > key 4B value 4B max_entries 1048576 memlock 8388608B > > > > > > > > > > > > - after this change > > > > > > 4: hash name count_map flags 0x0 > > > > > > key 4B value 4B max_entries 1048576 memlock 83898640= B > > > > > > > > > > > > > > > > This walks the entire map and buckets to get the size? Inside a > > > > > rcu critical section as well :/ it seems. > > > > > > > > > > > > > No, it doesn't walk the entire map and buckets, but just gets one > > > > random element. > > > > So it won't be a problem to do it inside a rcu critical section. > > > > > > > > > What am I missing, if you know how many elements are added (which > > > > > you can track on map updates) how come we can't just calculate th= e > > > > > memory size directly from this? > > > > > > > > > > > > > It is less accurate and hard to understand. Take non-preallocated > > > > percpu hashtab for example, > > > > The size can be calculated as follows, > > > > key_size =3D round_up(htab->map.key_size, 8)=EF=BC=9B > > > > value_size =3D round_up(htab->map.value_size, 8); > > > > pcpu_meta_size =3D sizeof(struct llist_node) + sizeof(void *); > > > > usage =3D ((value_size * num_possible_cpus() +\ > > > > pcpu_meta_size + key_size) * max_entries > > > > > > > > That is quite unfriendly to the newbies, and may be error-prone. > > > > > > Please do that instead. > > > > I can do it as you suggested, but it seems we shouldn't keep all > > estimates in one place. Because ... > > > > > map_mem_usage callback is a no go as I mentioned earlier. > > > > ...we have to introduce the map_mem_usage callback. Take the lpm_trie > > for example, its usage is > > usage =3D (sizeof(struct lpm_trie_node) + trie->data_size) * trie->n_en= tries; > > sizeof(struct lpm_trie_node) + trie->data_size + trie->map.value_size. > Thanks for correcting it. > and it won't count the inner nodes, but _it is ok_. > > > I don't think we want to declare struct lpm_trie_node in kernel/bpf/sy= scall.c. > > WDYT ? > > Good point. Fine. Let's go with callback, but please keep it > to a single function without loops like htab_non_prealloc_elems_size() > and htab_prealloc_elems_size(). > > Also please implement it for all maps. Sure, I will do it. > Doing it just for hash and arguing that every byte of accuracy matters > while not addressing lpm and other maps doesn't give credibility > to the accuracy argument. --=20 Regards Yafang