From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEF40C636CC for ; Wed, 8 Feb 2023 04:30:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 03B2E6B0073; Tue, 7 Feb 2023 23:30:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F2CC66B0074; Tue, 7 Feb 2023 23:30:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF4106B0075; Tue, 7 Feb 2023 23:30:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id CABC66B0073 for ; Tue, 7 Feb 2023 23:30:00 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 07E461A09EA for ; Wed, 8 Feb 2023 04:30:00 +0000 (UTC) X-FDA: 80442846960.16.4D43FCC Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf30.hostedemail.com (Postfix) with ESMTP id 3EA0D80003 for ; Wed, 8 Feb 2023 04:29:57 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Mh33QyL0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675830597; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/v51f2sKhv0Rifso1sCtjYR4vUAot9Q+i8d3sbt1rOc=; b=lLYjOS7fZTN2DiTgdv58zo2zZbQnCco/hj5fs0Nb6lYua+iBh/SR40tX5LnD/mrjJhtRpp H/gg5OA44vayDh2vfpV+px23TTmNRitCer9lzlFirjRbRkfhCJH2HKA1ewoXO7dWjS7Yrr dtZuLHmjs21nWL/yhULlKjteaQ/fWT8= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Mh33QyL0; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of alexei.starovoitov@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=alexei.starovoitov@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675830597; a=rsa-sha256; cv=none; b=jJ6dyEoLpijjsW7nf+2p+phcW9R7fHPC+iT7pVWFRxElUV4GF+jadWgPID0ZsGF/eNo8xM bLmcWecLCaFSUbpED7fxYYZfVcS9HY4es6p5wrO3XvEAsG7yIPTtMFS3dqOO0m8czwPKPZ tOYxT2CX0FR21W0CzETJ/GqW00JB0Ns= Received: by mail-ed1-f53.google.com with SMTP id v13so18797429eda.11 for ; Tue, 07 Feb 2023 20:29:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/v51f2sKhv0Rifso1sCtjYR4vUAot9Q+i8d3sbt1rOc=; b=Mh33QyL0aZxb9h/t0pgS3xfkZ8kKszlfGvDbOSoZMZGHPS83Czbd/S5Q361Pd1P1Jm xMQWOQZlY4TB5sA+zl2ky4K6/HEkl5mNA7Y7OPPjY+e7dPUx5vpdfPVNPqF1ebhOWO+A ufDRKo9H37a69foN36QZXXYE/ToTJ5VZ2ThbtO71hMpyPtpTPZM3iIYdcSKcmS0nLxcV F0nxs4Ocdr99Oe7V7zzbjcO3ZtrPPANYcqXCltulcb7dAkJQNOlrozXIe+NrVups+/Bl Ti0tz/Vg+2ZweLrO3dlVxugommx4H1vnMUv8ahluo0onx3TEt2Ds4lp12Y9o6sLIbP5C IGQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/v51f2sKhv0Rifso1sCtjYR4vUAot9Q+i8d3sbt1rOc=; b=Avsc1m0M1a8g2PjuRkHjRHpDpLV9Rq4Ahc1h4i5lkp+O4Z42RySuR8rzRtwZypSuW2 0WOKnfAV6FOadZoaFMHIrEpMFDB59kWn0XMgCJSFi0/sjrQ3Q4c7Dv3OV+9E9I49gg4n PBh/EiJ4KZLGyxoaTzP1XKyKExWM8CM0GtTDcT0rTTw0RyNosR1w+CKoHwRTAIzIR/ZU RLm8jvFrLjlQmfOZl0bXquzXuAiAzlwXRXkPSUEENA/J7PNyU1rgoujD2ZBO9kdKteGD 09vHMb3GzcI/Qsbu+eLOmOSqfe/ukU4c+Mlng7nIQPcxrBr6QMbkwnEgMboP2lw04JON z1XQ== X-Gm-Message-State: AO0yUKWZfD5HyhVIJCMdWy5YQksjV/gHgjBKI9kQFlWkfWTrTSE9a9/k gZed0bheBzL1kvjPicUP9GKD6gMN38rDEiX+WxA= X-Google-Smtp-Source: AK7set8m7REp8XimiRtS19hcOkbqhhhqicP23pKdQQ80e4zrjuPbuWBjjdC6H1c5pelkhAe3zx/+GYghtwpdnDNPP80= X-Received: by 2002:a50:d717:0:b0:4aa:a4e6:b323 with SMTP id t23-20020a50d717000000b004aaa4e6b323mr1512869edi.34.1675830595739; Tue, 07 Feb 2023 20:29:55 -0800 (PST) MIME-Version: 1.0 References: <20230202014158.19616-1-laoar.shao@gmail.com> <20230202014158.19616-8-laoar.shao@gmail.com> <63ddbc69ef50f_6bb1520813@john.notmuch> In-Reply-To: From: Alexei Starovoitov Date: Tue, 7 Feb 2023 20:29:44 -0800 Message-ID: Subject: Re: [PATCH bpf-next 7/7] bpf: hashtab memory usage To: Yafang Shao Cc: John Fastabend , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Tejun Heo , dennis@kernel.org, Chris Lameter , Andrew Morton , Pekka Enberg , David Rientjes , Joonsoo Kim , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Vlastimil Babka , urezki@gmail.com, linux-mm , bpf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 3EA0D80003 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: rg4szeoz5ywonzat5pygnsn47mamw3x7 X-HE-Tag: 1675830597-395892 X-HE-Meta: U2FsdGVkX19YFOQNZSE5nGnmTNXqhQk4fT7XZ1WGfPOGIV0K6ICVeNj00aUm8BpwZt4Vtvcu2eHyN/hHsefPtQgYbc6pX4DYXTt7WtvxbChbD7ajHi1DW6pdpzQSAKrz64YWQ9k6PCpeVrYo6fbtbkybB9KDP72Bq7RSyZC06Q6ihJjqWHvZ8zxpHrIe6PgcGDGafrQXewq0DJ5IY4LsJdTTNpwqW/Daq6YoyYtCovw08HqKXQ3BezN1Cfpk+EXqELdqeL7Rh5xARdnncimvZ/pAW5mGM7s/GlPcHgyGziG0IbRxxgURbdXbDi3+XOBUHSswbjv3CdLKvnU486dcndTIr1fb4UTV/MoSQqQNMFSX+bADODKGQD+/h3w48AuYg/8Pp+dzdeeBA/fHwzUw9GL3/UYXCutbcZTVCHXW0Fk93ewt2EphhYbY0sdSzV/StNba+G2/z1SfrqjpyNiEfOMUZF5Uypp6QLUu2TQ4tNT0eGKS+yB+TXNVxVcPRw8i3tAvSglhERt0kfgn1JVcGG6YfdO0xEDhzD5Lz56Qfi+5sgElQZOl16gdi2A1G7ihJedWowGHA1ul8SsudXKV7aVHixk9jW1Typ8Lm6R2i7XlujSJ3QDgcK5FcHsb5LzdwO6+/NRwxmMIeNehyp+vrOUf6QAnrePP5pCnBHZ+tWCUcvt2BKGrI3BYZjUQ+4uBqhlrYj1hivYsPrPY7q1h1RvLuXWTTWM2ANSPCJRC9btF5yXL7CYgG56tSSsM814YH85wEXYgWUhRy/6sMWvjrFiOVHKoeTTGj/YEyff1AvsouvigJG9kL15yr9OPbAmaOJ+1Cjs+fV3nwre5Fm7pXvTURJVOtv1zaYGclxzSrepyxmFpxYRFPPkNKN2K7OuHLuMCFNqsoBUwhsVBdbHNEVj2/+lglWialpquk+JcyhlZsa3lq2voXnA9kH9fakhUwT1AEYHoO8EPu76fUaN dY+CZsSj P8Od1mKfcrSj4bRiAqLNWDuE8qfvjJFKbUNwOfWfW0G3/SALt0e6c4xXUt0bYBibB5TvE2ZRIgFsn/91rG0mqqM/P3uNkxNXJ1qpDblhvxNlu8Kf6I/yAf++aOycXkpEixiuKowMt5BwzuRPijGlJQ5/vpiKVjm8OJuPC5edS+VOjHGvlBLnyf6T//NnqUlgxmKzIdRY1+ka2wX7gCclvLnscgreVPC4jh0U0VzJGqoSz83A+5Y2jaxs1vI2FvEFiYH+E9olYUaYm1PwbftPwZ2edh2kr0zQp0l/d8V1OCjsz+Zaga0vqA7rstQwOwyaN5LFZPhpqp96tI0Hu62xHqP27eDTI4gI1pKWtW6DL1ODYdZdZY3cJ3x/YynNYt5UJqWjQmh2IWhlCeU5UciJwkdpU0ppQOFEm8RRbWnWF73mvKKVg/OUAzBiNmw7gvxg1gZbzSxbtnCWo1hmcvnpt7P3QbmCO1kFxfpjtBiOCGlkw6NVzH5CiuVYQnA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 7, 2023 at 7:34 PM Yafang Shao wrote: > > On Wed, Feb 8, 2023 at 9:56 AM Alexei Starovoitov > wrote: > > > > On Sat, Feb 4, 2023 at 7:56 PM Yafang Shao wrote= : > > > > > > On Sat, Feb 4, 2023 at 10:01 AM John Fastabend wrote: > > > > > > > > Yafang Shao wrote: > > > > > Get htab memory usage from the htab pointers we have allocated. S= ome > > > > > small pointers are ignored as their size are quite small compared= with > > > > > the total size. > > > > > > > > > > The result as follows, > > > > > - before this change > > > > > 1: hash name count_map flags 0x0 <<<< prealloc > > > > > key 16B value 24B max_entries 1048576 memlock 41943040= B > > > > > 2: hash name count_map flags 0x1 <<<< non prealloc, fully set > > > > > key 16B value 24B max_entries 1048576 memlock 41943040= B > > > > > 3: hash name count_map flags 0x1 <<<< non prealloc, non set > > > > > key 16B value 24B max_entries 1048576 memlock 41943040= B > > > > > > > > > > The memlock is always a fixed number whatever it is preallocated = or > > > > > not, and whatever the allocated elements number is. > > > > > > > > > > - after this change > > > > > 1: hash name count_map flags 0x0 <<<< prealloc > > > > > key 16B value 24B max_entries 1048576 memlock 10906446= 4B > > > > > 2: hash name count_map flags 0x1 <<<< non prealloc, fully set > > > > > key 16B value 24B max_entries 1048576 memlock 11746432= 0B > > > > > 3: hash name count_map flags 0x1 <<<< non prealloc, non set > > > > > key 16B value 24B max_entries 1048576 memlock 16797952= B > > > > > > > > > > The memlock now is hashtab actually allocated. > > > > > > > > > > At worst, the difference can be 10x, for example, > > > > > - before this change > > > > > 4: hash name count_map flags 0x0 > > > > > key 4B value 4B max_entries 1048576 memlock 8388608B > > > > > > > > > > - after this change > > > > > 4: hash name count_map flags 0x0 > > > > > key 4B value 4B max_entries 1048576 memlock 83898640B > > > > > > > > > > > > > This walks the entire map and buckets to get the size? Inside a > > > > rcu critical section as well :/ it seems. > > > > > > > > > > No, it doesn't walk the entire map and buckets, but just gets one > > > random element. > > > So it won't be a problem to do it inside a rcu critical section. > > > > > > > What am I missing, if you know how many elements are added (which > > > > you can track on map updates) how come we can't just calculate the > > > > memory size directly from this? > > > > > > > > > > It is less accurate and hard to understand. Take non-preallocated > > > percpu hashtab for example, > > > The size can be calculated as follows, > > > key_size =3D round_up(htab->map.key_size, 8)=EF=BC=9B > > > value_size =3D round_up(htab->map.value_size, 8); > > > pcpu_meta_size =3D sizeof(struct llist_node) + sizeof(void *); > > > usage =3D ((value_size * num_possible_cpus() +\ > > > pcpu_meta_size + key_size) * max_entries > > > > > > That is quite unfriendly to the newbies, and may be error-prone. > > > > Please do that instead. > > I can do it as you suggested, but it seems we shouldn't keep all > estimates in one place. Because ... > > > map_mem_usage callback is a no go as I mentioned earlier. > > ...we have to introduce the map_mem_usage callback. Take the lpm_trie > for example, its usage is > usage =3D (sizeof(struct lpm_trie_node) + trie->data_size) * trie->n_entr= ies; sizeof(struct lpm_trie_node) + trie->data_size + trie->map.value_size. and it won't count the inner nodes, but _it is ok_. > I don't think we want to declare struct lpm_trie_node in kernel/bpf/sysc= all.c. > WDYT ? Good point. Fine. Let's go with callback, but please keep it to a single function without loops like htab_non_prealloc_elems_size() and htab_prealloc_elems_size(). Also please implement it for all maps. Doing it just for hash and arguing that every byte of accuracy matters while not addressing lpm and other maps doesn't give credibility to the accuracy argument.