From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35404C54E66 for ; Wed, 13 Mar 2024 22:40:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A91458006E; Wed, 13 Mar 2024 18:40:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A255580063; Wed, 13 Mar 2024 18:40:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 899C48006E; Wed, 13 Mar 2024 18:40:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 70A1B80063 for ; Wed, 13 Mar 2024 18:40:42 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 338091208EA for ; Wed, 13 Mar 2024 22:40:42 +0000 (UTC) X-FDA: 81893486724.11.2365C49 Received: from mail-yw1-f174.google.com (mail-yw1-f174.google.com [209.85.128.174]) by imf04.hostedemail.com (Postfix) with ESMTP id 5F73A40008 for ; Wed, 13 Mar 2024 22:40:40 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=zinzwDlI; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf04.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710369640; a=rsa-sha256; cv=none; b=iVxfIeCL9XeUk2C7GlK536T4JfPomDqjs2b0FwXGyLCGcldsm5wYOrZOcsq7LgCnmwgI9h P+OLgrUdjCm7QdKSCjK96FwOTFVd3w5VdVFYfBub3Zx9pVChfo+Apcsn4IJuuJHUuZOpV0 qPrXSsfs1oqon4BT296vnvfetG7Z/nI= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=zinzwDlI; dmarc=pass (policy=none) header.from=soleen.com; spf=pass (imf04.hostedemail.com: domain of pasha.tatashin@soleen.com designates 209.85.128.174 as permitted sender) smtp.mailfrom=pasha.tatashin@soleen.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710369640; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kQPlLLYuK9rGDu9iFj4vfuJRTUe5t6j7sL0t/2BVD8s=; b=AljLhO6Yb3kOCTx6QQzAhmqRbHtFK9Tq3ex2+e8gX+u/B8RRMtZtdmCoH+P4uHL5egydgV gjYaoS5hkYmEotkpk3qt/UbHpWTtVWUdt/AkaLrh8ml+XX7n/oLsMHzHUDll9C73UmgBr3 w42jIU1GAv65+PCW3hhWWITbJHAGuMw= Received: by mail-yw1-f174.google.com with SMTP id 00721157ae682-60a057b6601so3640667b3.2 for ; Wed, 13 Mar 2024 15:40:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710369639; x=1710974439; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kQPlLLYuK9rGDu9iFj4vfuJRTUe5t6j7sL0t/2BVD8s=; b=zinzwDlI4kdymB8qG6xRWxhrHdI8juJJq8D5bcxoJAENv0iBCnY9Z7XItdVK3C8Xkq RVHOWge1K3+3yHnjTgE8HAfChxPU7pzz19ApnMAvB4qsajs5EubrFPZkUc2ACGzLu5Op 5TTyKNEUzoa4u6nWfK78RnXExhlVbTFfKG2a9ICc7xvZZEiOiSiiNEOSa5kC8s1cWYlV 6DeTDKxgFbHGyPKMsvy+F04HrJl+K51T6SG33jLF5NCY+arueEy3/mVZvsA6ufb5c8x4 4pHDGCMwbzaCFc3KkWo+9C6EKqpkdnpBOmbI0MKdXV5IAMub/E4EKFwk7xKL50juYBjw QFCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710369639; x=1710974439; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kQPlLLYuK9rGDu9iFj4vfuJRTUe5t6j7sL0t/2BVD8s=; b=jlZZ9Pt/ozng6XipjBWfBW50M00Y2I3C/sgjAjFK2s8KAK4N972VkXNfvCM1ho1iZA 5waQAON1RVEnqaD5LiXPfyabnx5MvDP8Z1pgmLAOKXKkmxTSkwYTgHUyLT4YEvN0zAcK 5siZCxJc7zhXIq0lNP8rcuW+Pf26lXpfLOeCtiEGXT5BImdWMhO05U+WdCri/sYDe2YG JfCKaOoUju0VNzzLAYeg9pxaPpT8uXsgUveAB1N+JexrCTIVviEsjTPMJsZm+Oa2r33W Tn+wr0XEaIRPF5ZTtRO3DpXVL3OQ+uD/34NFJCPjn+FVClIDhllzY1fy/KjKq+z5+Oe3 sBVQ== X-Forwarded-Encrypted: i=1; AJvYcCXPJOhXaW0tP5PmlYO2/GDiedasmRuktRRRjOUU49NqWo8mTymmO3SxokcICv67oXGEV+rV4UJXtnFyv3FqKl4BmKc= X-Gm-Message-State: AOJu0Yz+YxUeVMfgayLQ+9Ur8rr+LOvTXFynzYAyEFZ0Qiu7M0jdWo76 mpXzA3hOX64JB4+7KLPyrfqsL68X/Rl/Y4PCHaORipv5rfTyLALcQ9rlSCe0QS14I11Ds9PbKJb BOV17IKaEVwsUUcgd2/997Q130jNa6bxSJHkTLw== X-Google-Smtp-Source: AGHT+IEE4G/qpX58EhUm1MonVHMQobSiJgTN5F+R/XDKRvfIgvrnS2HXR6msMnAV4g83xTteuIsqwqymEt+VSwymYE8= X-Received: by 2002:a81:7105:0:b0:608:13ee:8f3f with SMTP id m5-20020a817105000000b0060813ee8f3fmr15835ywc.27.1710369639489; Wed, 13 Mar 2024 15:40:39 -0700 (PDT) MIME-Version: 1.0 References: <20240220214558.3377482-1-souravpanda@google.com> <20240220214558.3377482-2-souravpanda@google.com> In-Reply-To: <20240220214558.3377482-2-souravpanda@google.com> From: Pasha Tatashin Date: Wed, 13 Mar 2024 18:40:03 -0400 Message-ID: Subject: Re: [PATCH v9 1/1] mm: report per-page metadata information To: Sourav Panda Cc: corbet@lwn.net, gregkh@linuxfoundation.org, rafael@kernel.org, akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, rppt@kernel.org, david@redhat.com, rdunlap@infradead.org, chenlinxuan@uniontech.com, yang.yang29@zte.com.cn, tomas.mudrunka@gmail.com, bhelgaas@google.com, ivan@cloudflare.com, yosryahmed@google.com, hannes@cmpxchg.org, shakeelb@google.com, kirill.shutemov@linux.intel.com, wangkefeng.wang@huawei.com, adobriyan@gmail.com, vbabka@suse.cz, Liam.Howlett@oracle.com, surenb@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, weixugc@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 5F73A40008 X-Stat-Signature: htu94afaubpq38gqqfm175q6pss3mb4c X-HE-Tag: 1710369640-152877 X-HE-Meta: U2FsdGVkX19ifXRS/JEFd/HbvIErAMI6fjQrAWzEBeJpTtZumKf56+eymYHbmQzqq8cULdzogC/bsdkJuGaVWdmyjl+a3dPSSeF39cmJczF+/iewZq8R8SRotd0cXzXCkSYE4QmcqccuFIR78m/O+NS2hS03A+tYU9qZDZx8W8WunRZ1T5SOFl6IrNFOpVaK+fjrJj0EtXF6uFvwvdENkxm5C2K6e122k5poRb8NsaE2EfYGvmEUw0zebDqu7ZU4DmluPP+dKXtcTLVs3utNlWCr3o3WbVRBdI1s9uJOv9VNH1Rzrcua8OOPPu17V4b+24+jk4kSU8KweqwDnPAbTJg9m2ku+uHIsMi+wwWMw8L7p8RSXKxHQAY2ZjO+c5yPDxK7hReoZBZeVk5T/SiV6DpumAKBMfwSmRP2/+MArjpxmpyW8W+/HBl3ZpLGxZmA0pFUU/iQw+2ogh91FfHJt5NXtpE89/7y1mWGysvCETG37dZuyaoxIdi/GzIj3zg8xmrHgAxQKdFVqEYcxdpsqGkGqVAmDOU2q/0c3JdvjIqw3KCkGmKV8NyD+xr/tIjAAOjdHolJZkZi+PAayFXz/puwLysnxXO30qZjMp2Xqoncm6g19al6tkenrq/O5AKLm+XiAXRhaAFwmgQjSs6ubUufBKFiqppzy8tyZ6VnM1ESYgfgwmxIvOGq0ASY5sK/Xzi+jK19MvWJm5Pmd6oO7DyQLfigShMHuBaVJpmXA5CTakDYUvFLY0I/g9d4VMhZ6OSHqiDJ0HXT34SwHeVGBiO7QzI6M+Adf9PhMdBPhSZt0csDyIB/0bmZ3iNDROFk+T/Wr6HEKKx4xqjP8f8bnuxVKjtMas7A71QcDB2ioPDVLWgec8PuP5nPwHvUj9WYaEpm9lXgzaN59V/qkN6Oeew5ba9g2Q3fGXweOMkmv+3peJzelUdWiv/rwMKbbByVMVNqKYG5Hre+qSWRnGr qZSzPU8v N1JEyjQO5vvQoZZ5DM35/mR3m2xyOcbDuhQ2I4SEZbvYbcIu9MWlMeasYdZxhJy1zk0TJsOWxAShgno4q/xeABZeZa3+iAYUFOdD3yWS02lBRn7loDe9YtmClnrxnLrBYncg0rL1eEcOa1kFYm1+ZoI/8vEWTk+lTrlHS4nFDOyPLicvA629Ub49V/kCaV9BNZNbDhU1DfRz7n6dxzzqzc0YX8shNgdRXS/nV6/rRwLZburqmWlh6CTn/pn5T4wcVnBnyF8xkVlATUn1/lWb8f0kU+o1mdNL56Pn8/FIaUa1P9DzAMSIMmCnmq0enihZ3wbU1BENZ7fB62RM/KQ4iLWxBtsaQceVwqKRJi26IfUrIwnJzEys8wQ9LfspHVlY+ZTgqy4rjNif4cpWIpxveYR79hlc1tK7MjUaZ3pJI2g//EMCfVczyvo7yIyGSh7ldlNio X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 20, 2024 at 4:46=E2=80=AFPM Sourav Panda wrote: > > Adds two new per-node fields, namely nr_memmap and nr_memmap_boot, > to /sys/devices/system/node/nodeN/vmstat and a global Memmap field > to /proc/meminfo. This information can be used by users to see how > much memory is being used by per-page metadata, which can vary > depending on build configuration, machine architecture, and system > use. > > Per-page metadata is the amount of memory that Linux needs in order to > manage memory at the page granularity. The majority of such memory is > used by "struct page" and "page_ext" data structures. In contrast to > most other memory consumption statistics, per-page metadata might not > be included in MemTotal. For example, MemTotal does not include memblock > allocations but includes buddy allocations. In this patch, exported > field nr_memmap in /sys/devices/system/node/nodeN/vmstat would > exclusively track buddy allocations while nr_memmap_boot would > exclusively track memblock allocations. Furthermore, Memmap in > /proc/meminfo would exclusively track buddy allocations allowing it to > be compared against MemTotal. > > This memory depends on build configurations, machine architectures, and > the way system is used: > > Build configuration may include extra fields into "struct page", > and enable / disable "page_ext" > Machine architecture defines base page sizes. For example 4K x86, > 8K SPARC, 64K ARM64 (optionally), etc. The per-page metadata > overhead is smaller on machines with larger page sizes. > System use can change per-page overhead by using vmemmap > optimizations with hugetlb pages, and emulated pmem devdax pages. > Also, boot parameters can determine whether page_ext is needed > to be allocated. This memory can be part of MemTotal or be outside > MemTotal depending on whether the memory was hot-plugged, booted with, > or hugetlb memory was returned back to the system. > > Utility for userspace: > > Application Optimization: Depending on the kernel version and command > line options, the kernel would relinquish a different number of pages > (that contain struct pages) when a hugetlb page is reserved (e.g., 0, 6 > or 7 for a 2MB hugepage). The userspace application would want to know > the exact savings achieved through page metadata deallocation without > dealing with the intricacies of the kernel. > > Observability: Struct page overhead can only be calculated on-paper at > boot time (e.g., 1.5% machine capacity). Beyond boot once hugepages are > reserved or memory is hotplugged, the computation becomes complex. > Per-page metrics will help explain part of the system memory overhead, > which shall help guide memory optimizations and memory cgroup sizing. > > Debugging: Tracking the changes or absolute value in struct pages can > help detect anomalies as they can be correlated with other metrics in > the machine (e.g., memtotal, number of huge pages, etc). > > page_ext overheads: Some kernel features such as page_owner > page_table_check that use page_ext can be optionally enabled via kernel > parameters. Having the total per-page metadata information helps users > precisely measure impact. > > Suggested-by: Pasha Tatashin > Signed-off-by: Sourav Panda Reviewed-by: Pasha Tatashin