From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20029C4332F for ; Fri, 3 Nov 2023 04:28:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B6E38D00B6; Fri, 3 Nov 2023 00:28:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 766D38D000F; Fri, 3 Nov 2023 00:28:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6098F8D00B6; Fri, 3 Nov 2023 00:28:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4F4BA8D000F for ; Fri, 3 Nov 2023 00:28:00 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 126A0A07BC for ; Fri, 3 Nov 2023 04:28:00 +0000 (UTC) X-FDA: 81415360320.25.37DBD2A Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by imf10.hostedemail.com (Postfix) with ESMTP id 3C9E5C0006 for ; Fri, 3 Nov 2023 04:27:57 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=j+Rn6ul8; spf=pass (imf10.hostedemail.com: domain of weixugc@google.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=weixugc@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698985678; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=4NzF+QYdc4GZZ07M9FU4Y1CY8SWLRyCkROtQo9J/VZc=; b=taIhCrTZ3CLE2EnsHnIvcFKc2Ppug66w/r5wKpgH1K2g92HFObB8YOE7luUcWHNYJAaoyy 7Mt43WrobRFlj8g9WHpSE1qSnK2FiE4Es7McTNsaz+YsES8TBTzzreDYN+6oNvfrhSN6yg X/gEq7CuXaO/LTZ9lqiLoAVbqNqGffA= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698985678; a=rsa-sha256; cv=none; b=FsGbZ56DBdR2eG0xE52JLKtjwqu7TO/5yoMLKh1nj/3/TF1QFwa503Jseinf9HeUVjAmNx YAR5ktyXRujvdkGUYIZTACbrqQX4IEm9KUODOGg1OOiRKfEJ42NNsKpQk/ATF0DJlM1hda n2WqBq4p4vNq4rG8mkhrAv8/rIkz1gk= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=j+Rn6ul8; spf=pass (imf10.hostedemail.com: domain of weixugc@google.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=weixugc@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-lj1-f179.google.com with SMTP id 38308e7fff4ca-2c594196344so22388351fa.3 for ; Thu, 02 Nov 2023 21:27:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698985676; x=1699590476; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4NzF+QYdc4GZZ07M9FU4Y1CY8SWLRyCkROtQo9J/VZc=; b=j+Rn6ul8JG6J8iRleWjggXJgHCbNqUXBrx15tuR18jYe8hdR4s/8Sff8OZ+oD6WZnc qOpdHn7mI/xlOObAc/1IbhCsdg81D9BX+AZ0NBOWXyK0Dv6Pj0zmbDyalX46jSk/Kytm /46AO4tDwei5BiOIVbzP3rTrorQIjFnVR3GmXqGRMjmiDH1G/kd0Pj5n4mz6xjOeqA6f vqKuFsUB1HCREmjkX4YLPTvTEqgDpgIh51Vo+++l+kMQXq8BJM4SkLk0vz46t77TVAqE ccLAg6dCB/Xa4FI58DbWUPAi25TXO0NK6qHWDrXwOPlGUs6nDppYnwBJ12LmFCwhVN1h u+/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698985676; x=1699590476; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4NzF+QYdc4GZZ07M9FU4Y1CY8SWLRyCkROtQo9J/VZc=; b=kLVsLGqaezeZfSZ2ffA6yi4zsKDxINci+SxgDubav7rwl2E0arclsONbzCryp0cT9G TanrDDtaBhzBdzN9gqsfkjk72qlTU6PJrmQHFkPTx6LBMDYXkk72n46R8AnuHBg7B2xV XahakX08R5S6DaBlIBwPTbknMC0qWX9PUhRUMJr2T0JiGahQyr40/TS4BZ+L1mikj1X/ T27ib3EO3AWFlw7ZAcn67QMpE2BLr0zAgEtGJGzp7wOjqWREkYPx+rTnRR24nyW2TqZu oXFlATDeQyq7KU7Woli4qVcT5PTeXodUVozmdgBdfZ+UZkFshzVguETzJooatkAVwpyB +41g== X-Gm-Message-State: AOJu0YzRoc4NDw7vh0AGaoqlYw3He6chQAUF/y5nwM4O2Ypjt8bMqnxb vNJo7IMR5tbFvWgZa8axEZoL/4zkmLAj/3c+TZCY0w== X-Google-Smtp-Source: AGHT+IFReT8Mv1qBMUJ//bwYq0iJ0clsZXXC+fAwz/7jI2PINOY0OkEECy3Mp0J7sCt//N6b2yAfY5MgtY1PyQD2IFM= X-Received: by 2002:a2e:97d1:0:b0:2c5:2eaa:5397 with SMTP id m17-20020a2e97d1000000b002c52eaa5397mr14952675ljj.11.1698985676222; Thu, 02 Nov 2023 21:27:56 -0700 (PDT) MIME-Version: 1.0 References: <20231101230816.1459373-1-souravpanda@google.com> <20231101230816.1459373-2-souravpanda@google.com> <1e99ff39-b1cf-48b8-8b6d-ba5391e00db5@redhat.com> <025ef794-91a9-4f0c-9eb6-b0a4856fa10a@redhat.com> <99113dee-6d4d-4494-9eda-62b1faafdbae@redhat.com> In-Reply-To: From: Wei Xu Date: Thu, 2 Nov 2023 21:27:44 -0700 Message-ID: Subject: Re: [PATCH v5 1/1] mm: report per-page metadata information To: Pasha Tatashin Cc: David Hildenbrand , Sourav Panda , corbet@lwn.net, gregkh@linuxfoundation.org, rafael@kernel.org, akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, rppt@kernel.org, rdunlap@infradead.org, chenlinxuan@uniontech.com, yang.yang29@zte.com.cn, tomas.mudrunka@gmail.com, bhelgaas@google.com, ivan@cloudflare.com, yosryahmed@google.com, hannes@cmpxchg.org, shakeelb@google.com, kirill.shutemov@linux.intel.com, wangkefeng.wang@huawei.com, adobriyan@gmail.com, vbabka@suse.cz, Liam.Howlett@oracle.com, surenb@google.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org, Greg Thelen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: cy6kzjubwjrc6gghmme9k1jgno5qde96 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 3C9E5C0006 X-Rspam-User: X-HE-Tag: 1698985677-282740 X-HE-Meta: U2FsdGVkX19vsuY6SteD5eCG49AnUBQ78eQhh6ctQAvsfARMwsUKhIPKzcA9EeHY46GSJqvQSAEBlPxamgswjwndPVlBSTVIHO0i245DU/dzAf7bKCrFmRFSiqRdqcpxWhuDAQf+53isNtpiSXl+kxT8xz7ykh5BG8iRZ0Cmy0YH85zl7hJJlj7ClMC96OsBvjAP7EaFCQXlbUuiVZ7ZdBlIIGd1MgA2RaraNAFSuZgoCj9nJMKs1RnBAZer4bJasKt4ZlOwtvrbaneMawRuueFQdMOA2fozYcnV/Ocl07gRrGixo6QqKnvKLKExmnRjgl3VO4In7Ie1oASI1RAtTRJNePR/I/b7Nake47VD0JPQ0V7aYu//GXAGUW6b7Ahq7KxhdRVpEz7K2P4MLPD7DPDUNfnBJyJSiBgprNVlytiAI3jjxStHUAB6z/1RuiwlTmvQoUBRNq6Bbm+0F+cdc/AjcvFv738YUe/i/oOOQyR9q3IEc6qZ3EtnF+cqN2/Feaill4qgPaqCvzuTWTXcNGLkyuav+Vb4MEs+4pPEuf8ijLBRrsTOFbYzmFwiWQMKIEInVQp4poX81D9veaqW8pr8NObd4MVgmxsdv+bgdsZVx081g8b6fLG0w1VDRY3WZlqvcU+7n7xeNM4Jr0Pr5fnUWAAOB5u8BrpQ7QDHzWV0ibNgMzKK6QLHIEOIco5u2uKzPBrVQFLNYQoRfMoMH/SsR006KwVgk12TvMSKNkRavKv5+IKpiFx7+LEJwfy+GLiHfavkaOnTOvxlOzsk215IihMuxDWJG5ElVnhUTC+H/NpAFCNeN1T+4vCXjneJYGhiZhO1KXxGBsra/w5Cm6EU4skVsqZLXyWRj8NrYC1fXl/ILkv8NdD+bmOchYzZSWd9rjQqj7955frHsluewrPO0nrdNxKqcO62UUOwjGVLJRtB+uM/JB1TKLMvXkNTJ/P5Apo8kkSedO6Kxbk oIk0NnaQ yQd00WGzGUo1kF3G+qoBJEXkZy9MKf0blNIBnCMAzW7r8Ue/1PDy7YSeAPEFGY9d4/1ZoeeTGSCyNoqHZUCMjoYMTCHzHAINUg2q7jjci8Awn6+tmyEnFvpnCObjqwjHIPojWvIk4jPgQjWXahvkKpE0MKIQUjiOGK11JmGjRDz/Gd/yESD6Ja1biTI1tit39D7JzUo/Tg2umXGNHqyIcJJspYbD6L4uvjmY7jrsQxKIvixNACVhnssM3/ovz3GS5lSeHh33ohHNkXArftAIUIrGaFtFHeSU4qPu/LIvu+seKKpwmHHag+NzThx8frsv8zGXv4sCa5dPZHgcOyHiuQ3xhLV3rf65yXVw/0ddGnrvtPfyS1j7owGN1Wzd1CVwrgjco X-Bogosity: Ham, tests=bogofilter, spamicity=0.000305, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 2, 2023 at 6:07=E2=80=AFPM Pasha Tatashin wrote: > > On Thu, Nov 2, 2023 at 4:22=E2=80=AFPM Wei Xu wrote: > > > > On Thu, Nov 2, 2023 at 11:34=E2=80=AFAM Pasha Tatashin > > wrote: > > > > > > > > > I could have sworn that I pointed that out in a previous versio= n and > > > > > > requested to document that special case in the patch descriptio= n. :) > > > > > > > > > > Sounds, good we will document that parts of per-page may not be p= art > > > > > of MemTotal. > > > > > > > > But this still doesn't answer how we can use the new PageMetadata > > > > field to help break down the runtime kernel overhead within MemUsed > > > > (MemTotal - MemFree). > > > > > > I am not sure it matters to the end users: they look at PageMetadata > > > with or without Page Owner, page_table_check, HugeTLB and it shows > > > exactly how much per-page overhead changed. Where the kernel allocate= d > > > that memory is not that important to the end user as long as that > > > memory became available to them. > > > > > > In addition, it is still possible to estimate the actual memblock par= t > > > of Per-page metadata by looking at /proc/zoneinfo: > > > > > > Memblock reserved per-page metadata: "present_pages - managed_pages" > > > > This assumes that all reserved memblocks are per-page metadata. As I > > Right after boot, when all Per-page metadata is still from memblocks, > we could determine what part of the zone reserved memory is not > per-page, and use it later in our calculations. > > > mentioned earlier, it is not a robust approach. > > > If there is something big that we will allocate in that range, we > > > should probably also export it in some form. > > > > > > If this field does not fit in /proc/meminfo due to not fully being > > > part of MemTotal, we could just keep it under nodeN/, as a separate > > > file, as suggested by Greg. > > > > > > However, I think it is useful enough to have an easy system wide view > > > for Per-page metadata. > > > > It is fine to have this as a separate, informational sysfs file under > > nodeN/, outside of meminfo. I just don't think as in the current > > implementation (where PageMetadata is a mixture of buddy and memblock > > allocations), it can help with the use case that motivates this > > change, i.e. to improve the breakdown of the kernel overhead. > > > > > > > are allocated), so what would be the best way to export page = metadata > > > > > > > without redefining MemTotal? Keep the new field in /proc/memi= nfo but > > > > > > > be ok that it is not part of MemTotal or do two counters? If = we do two > > > > > > > counters, we will still need to keep one that is a buddy allo= cator in > > > > > > > /proc/meminfo and the other one somewhere outside? > > > > > > > > > > > > > > I think the simplest thing to do now is to only report the buddy > > > > allocations of per-page metadata in meminfo. The meaning of the ne= w > > > > > > This will cause PageMetadata to be 0 on 99% of the systems, and > > > essentially become useless to the vast majority of users. > > > > I don't think it is a major issue. There are other fields (e.g. Zswap) > > in meminfo that remain 0 when the feature is not used. > > Since we are going to use two independent interfaces > /proc/meminfo/PageMetadata and nodeN/page_metadata (in a separate file > as requested by Greg) How about if in /proc/meminfo we provide only > the buddy allocator part, and in nodeN/page_metadata we provide the > total per-page overhead in the given node that include memblock > reserves, and buddy allocator memory? What we want is the system-wide breakdown of kernel memory usage. It works for this use case with the new PageMetadata counter in /proc/meminfo to report only buddy-allocated per-page metadata. > Pasha