From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BD80D3DEAB for ; Fri, 18 Oct 2024 18:39:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE2666B00B0; Fri, 18 Oct 2024 14:39:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D91A66B00B1; Fri, 18 Oct 2024 14:39:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C319C6B00B3; Fri, 18 Oct 2024 14:39:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A42086B00B0 for ; Fri, 18 Oct 2024 14:39:05 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3362E1208D0 for ; Fri, 18 Oct 2024 18:38:54 +0000 (UTC) X-FDA: 82687584714.12.1676021 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) by imf18.hostedemail.com (Postfix) with ESMTP id 9E4391C0006 for ; Fri, 18 Oct 2024 18:38:58 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QEdRGQtt; spf=pass (imf18.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.221.44 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729276548; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uXj2vblEs/WlhkDdzuqJptgAkmHLF67IfskPKZMxe1c=; b=wI2Urt2wNv9kbrbD2JNKnaA/JRI6kiVKVlTpancbK/xa6kyMVNO5jcHyh7h2a504+i+COz uU1Qrn8f8Q7MIbZMrmZE/Z2eGqjYtXduw9e3BtVXwVWednAIscEv1GjWSF/X+QAbtxmX4g oCMvxWTqhQGC44LgiN0jmU1eZVpr9Y8= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=QEdRGQtt; spf=pass (imf18.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.221.44 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729276548; a=rsa-sha256; cv=none; b=8b0UPqVrdLYzMMYxPXk3YAvVs82jle0ZXIq2bnK48wQzMhGcXc66rViJy/6K+PQQDsVbCR 1JzVwsUEJY289BRZx2ZIL5DLdhzgoloaNSCLMFb6YO4kUm2iAEwoVncLkcEYKfFJuzqFcC ADBH53yi8uSl/loQFvOa+rnSEg09Cr0= Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-37d462c91a9so1694562f8f.2 for ; Fri, 18 Oct 2024 11:39:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1729276742; x=1729881542; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uXj2vblEs/WlhkDdzuqJptgAkmHLF67IfskPKZMxe1c=; b=QEdRGQttd+VNAFv80FC8/u/NNOD4eQrBHbok8OV8qTLWD9Oyce95ACS+FLSBFPS0lv tvQ5kwhYROsIo6I2DM6R/OtOMNWrufZiJkLjcBRVihDjT86yocKd4IhBUaQfSu+Fbwax VvOi2hqrT3hje9Qpi62f8lMPggKprqQR9urb/NgEew5m8RRQ2BAxQNaiyAXxmbyeVXTh bI6UYdI7F8CSvzKpcX9GdZw9oFjr4s5e7h1lqWOF5AntxGKJ+vOpDlBnIoY1KSvQO9iN FiMtqCzpWvjwm/Sr7sLeIYWTnTbJLWIqxm3lbKJFDeEl/Z5gDPpnXBXzftOJ9QegzT0P Pm4g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729276742; x=1729881542; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uXj2vblEs/WlhkDdzuqJptgAkmHLF67IfskPKZMxe1c=; b=YbRzaIC/+JwtI+qmdC95RReb8ol9Gyp247FO/Xb8fRD6HQCid8SVJSsNeDedKzgqxp RCkt/+cHHFrJg7HdGdsC02kH/HhqelWvl3JjtZwDP+cIDupxsZuqJMenPLAlzsT/YUfr Vf7ablVIL4hdhITSNl2kUlBxHtvqwghXNOdxJxrtonuufYExUtsnqc3jykfvE/JJ0EdE KNRuCxeZuQJE9Q93ejelU8AsW84ELZlMwkQJsk/MgeHlaGGz8pK+60kwpHbAzI0FssZt UopcUepr6Ng3IDmkvLjZ+Aq7jmOfM4R5ouAGLb+SsxrEpkC8/39ZBvQyQ02pWxkGPl0N r1IA== X-Forwarded-Encrypted: i=1; AJvYcCUt2RcK9PNP8R5SEt67ahX9EWUnprNh1w6IOjou6ajVBMPcPspW9wnbtHWK6Lhmrsd4uAnu9EofQA==@kvack.org X-Gm-Message-State: AOJu0Yxg7Hhv4YC42oFBvycl0yNcgJCBawLj+1/vLsevEiEXUa0uOfj9 qDDxc0Lhv64n+2IvsTxAIZthT/zi/4rrWCB5M0ug40foaE9Eow+ThBsjqeg9/77brCl6NFbBYQ+ SH7niB6fdqAnzHKdDGvlyhfsJhdY= X-Google-Smtp-Source: AGHT+IHiniw7KI+UuesyYI7EYwpugwCS8A1Mtubu3dtGbo4lqnm5fBpFsbLFNoaq9R957MjbjAdI4P+f2ngpcZpnkpc= X-Received: by 2002:a5d:4983:0:b0:37d:33a3:de1c with SMTP id ffacd0b85a97d-37eb486ad19mr2477676f8f.38.1729276741686; Fri, 18 Oct 2024 11:39:01 -0700 (PDT) MIME-Version: 1.0 References: <20241017160438.3893293-1-joshua.hahnjy@gmail.com> <20241018123122.GB71939@cmpxchg.org> In-Reply-To: From: Joshua Hahn Date: Fri, 18 Oct 2024 14:38:48 -0400 Message-ID: Subject: Re: [PATCH 0/1] memcg/hugetlb: Adding hugeTLB counters to memory controller To: Shakeel Butt Cc: Michal Hocko , Johannes Weiner , nphamcs@gmail.com, roman.gushchin@linux.dev, muchun.song@linux.dev, akpm@linux-foundation.org, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, lnyng@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9E4391C0006 X-Stat-Signature: fdw7414xdixsrqzfut7p3zdf4pssmoy8 X-Rspam-User: X-HE-Tag: 1729276738-633993 X-HE-Meta: U2FsdGVkX1/xEghi2ziBpGrv50ULCww0YmqIAQisz5fXEoDktfh9QG8hPUsQDEI3zKYIpL0j9Zw+vkZ2bKE7gC2nIjogolfA2kGU+8TnqJKRYLKRR66pp5HVeYlnkqMCl151KqjkMmxXUfShIef2ji4K6ooNi56lIVVfbDUHzE2DurW43LtIaNMnu6kjm6ndRe6MjJcK4WO6YhtoAwuG5R/3eyXJF7hOL0pa45CggBHW3Ucfq0Yla2xD10qltdF/JGiBdX8utTuZ0j1cGeuXXJ6eTUJp0jA30o61jbPVlZIbhSBGTwrDsV7Lb0T5fFJ90/4f3womiNbNxXRYtdy7XqM1kiuTJMgzkj+FoA6Dl/eDFxQOlMRSgzWdGRBIeIeJxBKBdZn9ljMSv1Tsu4aEQvTU3S/iyQrWfRG65i4lAM3heWqB1pMyn+BwMvtkkW8NjARsBbmlrIuHDYhSKlRRSkgVpdAhsiM0PEz1LCyK4/x92fpCln5xs5pW5rPW+8pl2gOAOHA/Tt2HwqsM/4jsCnaspfS4tU0WzG1a0p3qtfTaADoZ72nsrBLha1RympaFyrBBGGhy66BoBdgvyWStXSC3s6PmV/QNX96Y5Qv5Tdw5R7yZf3IpUhCv4/CpN0V9A4tgY2v2QCQy6mzSYaDfYGDfKDmfEyZWDncsdwhqYCNn+kVo/c7E7f2CeDxJhoFrW/UvY0QZnC6ZmaKNSt6H0OZZ/xcxio5UEUH8vj1g/7INuxjR9fAhI2N2R3htrAma98cSRE9RWip2pt+yQZSi5c0N6pUD/wQYxNndWqQ5H6Pfnnwvs6wZJVX1tMPz+BMy2F5bVBUnF5QD1fe39l2EtgQ2ruyKeD9QwnlkM+yW4T2r0CcdME4vsv6n9CGjKkU16ELdFzyMuYFF/Q1T7Gbe7+95uWHCXyJyqVtc03gkYIvOhTKZHfOKlb+Z95Rc5r0FbwHYwKkwzH0+TiebZK4 rV+G1cR3 VVMZAH6/b4RK1Caz+vsvfNN5rkDnpDJw1GsglAxb98N0d1kKiiAI1mnCp+M9C0kl0/tx8u63uLtHjYRE59/wFL4sogQ+EDZU7lxYiyI8uoxsocyUrLS4cVd8jcXN85wroIQU1xWT2R1PxH+H0c/PAUQ229VVpfqezVtnOf0jv74H9Xj7m7qTlScd8srP2X/50hO/elIaYPcT6iNJJb8j+R3W73VICM9SZk7BI/EzGnjwGaMlbRMcP/W7rE6dsnB1oc1Z//jWvs7/WIl0FPHB7KlKSvjsR7gAPhsM2e+xifhrkCOBQmESLPb/BrB5SzXuE0i9uoqAMNl/f7512akWtPzGoLVWSb3+gBw1H X-Bogosity: Ham, tests=bogofilter, spamicity=0.001035, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Oct 18, 2024 at 2:11=E2=80=AFPM Shakeel Butt wrote: > On Fri, Oct 18, 2024 at 03:42:13PM GMT, Michal Hocko wrote: > > On Fri 18-10-24 08:31:22, Johannes Weiner wrote: > > > On Fri, Oct 18, 2024 at 12:12:00PM +0200, Michal Hocko wrote: > > > > On Thu 17-10-24 09:04:37, Joshua Hahn wrote: > > > > > HugeTLB usage is a metric that can provide utility for monitors h= oping > > > > > to get more insight into the memory usage patterns in cgroups. It= also > > > > > helps identify if large folios are being distributed efficiently = across > > > > > workloads, so that tasks that can take most advantage of reduced = TLB > > > > > misses are prioritized. > > > > > > > > This seems really confusing because memcg controller is not respons= ible > > > > for the hugetlb memory. Could you be more specific why enabling hug= etlb > > > > controller is not really desirable when the actual per-group tracki= ng is > > > > needed? > > > > > > However, we now have potentially a sizable portion of memory in > > > memory.current that doesn't show up in memory.stat. Joshua's patch > > > addresses that, so userspace can understand its memory footprint. > > > > > > I hope that makes sense. > > > > and it would be great to have an explanation why the lack of tracking > > has proven problematic. Also the above doesn't really explain why those > > who care cannot really enabled hugetlb controller to gain the > > consumption information. > > Let me give my take on this. The reason is the ease and convenience to > see what is happening when I see unexpectedly large memory.current > value. Logically I would look at memory.stat to make sense of it. > Without this I have to remember that the user might have hugetlb memcg > accounting option enabled and they are use hugetlb cgroup to find the > answer. If they have not enabled hugetlb cgroup then I am in dark. > > > > > Also what happens if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled. > > Should we report potentially misleading data? > > I think with what Johannes has suggested (to use lruvec_stat_mod_folio), > the metric will only get updated when hugetlb memcg accounting is > enabled and zero otherwise. Hi Michal, Johannes, and Shakeel, Thank you all for taking the time to review my patch. I was writing my response when Shakeel responded, and I think it includes an important point. I am sending out this message in the hopes that I can gather insight on what direction would make most sense for everyone. Michal -- You brought up several points in your response, so I'll do my best to answer them below. 1. Why is the lack of tracking hugeTLB problematic? The biggest problem that I see is that if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTI= NG is enabled, then there is a discrepancy between what is reported in memory.stat and memory.current, as Johannes explained in his response above= . As Shakeel expanded as well, it is just convenient to have the value explicitly there, so users don't have to go through and remember where hugeTLB pages might be used and where they might not be used. Aside from consistency between the two files, we can see benefits in observability. There are many reasons userspace might be intersted in understanding the hugeTLB footprint of cgroups. To name a few, system administrators might want to verify that hugeTLB usage is distributed as expected across tasks: i.e. memory-intensive tasks are using more hugeTLB pages than tasks that don't consume a lot of memory, or is seen to fault frequently. Note that this is separate from wanting to inspect the distribution for limiting purposes (in that case, it makes sense to use the controller) 2. Why can't you enable the hugeTLB controller, if tracking is so important= ? By turning on the hugeTLB controller, we gain all of the observability that I mentioned above; users can just check the respective hugetlb files. However, the discrepancy between memory.stat and memory.current is still there. When I check memory.current, I expect to be able to explain the usag= e by looking at memory.stat and trying to understand the breakdown, not by go= ing into the various hugetlb controller files to check how/if the memory is accounted for. But even if we are okay with this, I think it might be overkill to enable the hugeTLB controller for the convenience of being able to inspect the hugeTLB usage for cgroups. This is especially true in workloads where we can predict what usage patterns will be like, and we do not need to enfo= rce specific limits on hugeTLB usage. 3. What if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled? This is a great point. The way the patch is currently implemented, it should still do the accounting to memory.stat, even if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is disabled. This would give us the rev= erse problem where hugeTLB usage that is reported in the statistics are no longe= r accounted for in current... I think it makes sense to show hugeTLB statistics in memory.stat only if hugeTLB is accounted for in memory.current as well (i.e. check if CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING is enabled before doing the accounting, or move the accounting from hugeTLB alloc/free --> hugeTLB charge/uncharge, which should only happen if hugeTLBs are accounted for in memory.current). What do you think? Joshua