From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36583E732C8 for ; Thu, 28 Sep 2023 12:52:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 441008D0092; Thu, 28 Sep 2023 08:52:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CB7A8D0038; Thu, 28 Sep 2023 08:52:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 291228D0092; Thu, 28 Sep 2023 08:52:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 13DAB8D0038 for ; Thu, 28 Sep 2023 08:52:42 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A1B681C9E91 for ; Thu, 28 Sep 2023 12:52:41 +0000 (UTC) X-FDA: 81285995322.17.71BC52A Received: from mail-qv1-f53.google.com (mail-qv1-f53.google.com [209.85.219.53]) by imf02.hostedemail.com (Postfix) with ESMTP id 967FE80011 for ; Thu, 28 Sep 2023 12:52:39 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=DUigCwUF; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf02.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.53 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695905559; a=rsa-sha256; cv=none; b=eoM1GAKsU4VtGw9HcUugyybqnpdE5dxi9YsZ4VNdy3B+f6fv5hFzXS3TJui57irVVThbKD R3bNTOVToTqASj7YbfRvx2SnMRJ5KZscCUREZ8wzL75I/eLGqPuuKIbzm2tblolb6mgS/u x/PG6dT6arqtPbLj+UPh3dCBhtAR1R8= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=DUigCwUF; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf02.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.219.53 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695905559; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JpkzH31Yq2C5TtSflxHZb2eMAQY8GuLcUePer8rUOKM=; b=Y26FMApsrXjyuGMX420QUQgz0PnGj0rqzkIwpqZ/ps5oGyZfCYBKYHm3xisZB3rUXQPAsI sHTT0h9RpDmtM9l6G43lQPQrTwH6/aYray3vyqgGHyj0yV90xokKz55LDhj+i9bvTJxNPf PJQKk7tTnzuR2F8+v8o1bL8K2E+iT7o= Received: by mail-qv1-f53.google.com with SMTP id 6a1803df08f44-65af75a0209so52347946d6.3 for ; Thu, 28 Sep 2023 05:52:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1695905558; x=1696510358; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JpkzH31Yq2C5TtSflxHZb2eMAQY8GuLcUePer8rUOKM=; b=DUigCwUFuAD/nOoy4bdrxxIetKoFcocjTJGQi6XOh6anGBeGZI9jcl1E89UszRlTID hy3CZAeom0cX37qQKErZhWQF20Dn+V1z8D1XBDMEa5Y7ZD3Nco3dHBu88uEgIE68qJV0 Q44hNzhbaPXQ4FH4xca4D43UCw7+LupjuO7FyH8+g3ElPzzSqoxhnMQswwRDATNTjNE4 h4tlvZtCzcnbd7j1tqxDyEC7MS+qfoK3rnY/AfWxGjkU3KnpMyt8kdbnYCLQmhBHkf0Z nO5J2ggW+D/CIChaNkpEMYnIAPuRFFKyM/x5zDtMDl8QXeeoz0KtP6SimIIY8PAucPH7 SL2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695905558; x=1696510358; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JpkzH31Yq2C5TtSflxHZb2eMAQY8GuLcUePer8rUOKM=; b=JcFX6pyEymobr9PTi0+SETUfo7h96eTYrdlaaoGlYRRqkQfYvgimoLwvxs2OPuFsho FaB0A5md+BLP59q6HexAxLr+pWE+xlvbx2EoedIS8rmwLslTQ2bs1otF11VlqRFAfOTk zMNdLyXMUHT0UHovzr9LOK2LuhO54Vf45p+U0hKeGNXw8Dypg4phFLNUnFAK+lscQzN1 zmaV4sR30TdS4iwE8M8kA8BdijQU/v4A+bhTb5yR69jI26kTCi6oIvQx8n3bB3REiSF9 X0xSIbN32DF8tQ68GxqaBK4bl1m+Ir1IDH4UwxZppFMSxXlenUPcKHqd1+SJnilUOFE6 vXqw== X-Gm-Message-State: AOJu0Yw3NHbaCkYz/nHTuqmnHidMGDNx5+5OXbGI/T/SlGb/WsdzjYLC oO2qbWIol8sA2OMVCtHRursafQ== X-Google-Smtp-Source: AGHT+IFCbHcqAmY51j1yK/l67N14ymge0mJS2ELWGRT3ta0V1oAfu54V3IVoBh1epAxg2hsK8CIkKQ== X-Received: by 2002:a05:6214:1925:b0:64f:3795:c10 with SMTP id es5-20020a056214192500b0064f37950c10mr926742qvb.10.1695905558577; Thu, 28 Sep 2023 05:52:38 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:ba06]) by smtp.gmail.com with ESMTPSA id r10-20020a0c9e8a000000b006562b70805bsm3806364qvd.84.2023.09.28.05.52.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 05:52:38 -0700 (PDT) Date: Thu, 28 Sep 2023 08:52:37 -0400 From: Johannes Weiner To: Roman Gushchin Cc: Michal Hocko , Nhat Pham , akpm@linux-foundation.org, riel@surriel.com, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH 0/2] hugetlb memcg accounting Message-ID: <20230928125237.GA407389@cmpxchg.org> References: <20230926194949.2637078-1-nphamcs@gmail.com> <20230927184738.GC365513@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 967FE80011 X-Stat-Signature: 4haoqd5net5fkm9wteug8chx5oghnmdr X-HE-Tag: 1695905559-978538 X-HE-Meta: U2FsdGVkX18v77HBP+8LsSnQCcmwJtI+uIRcxlSnPgYHBUyQpvVByKdZ0fDIcz5PQppxIt9F4eaDS2Jq/og784wIQe/nMjGwnlX6aOodOqPRTg1yB24KYMfcUZmHNcPqWvJDbQRQv/RFLsq6yKeTzDMz5UjnJKlLGgILjYvy375OaaXQTZAXqoEHH5JhF0X3vcz1yBnmhZFMgau0E017l4IEtAdPITeAyP2jGyDh7TBvAsvu1XNxMjrI9Fi8LqQE6RMc5pKYTEnXA2Ixo66cUjieQy9s6J/6aUWWj0dMhxhwvDfQNmCDIFVywbZUqAjZmtep2Y+pGt+BZofsNZZpyULahi3I6CYcBEXFxOC+cCiwE1emY2uCtcpJta1Zfq9xzyU1Q6MNhwTxi5b8oaR8xNq+hvN8OhhVbgsP/0Xq3j21USe7M5X1hw+zL4UNNLSkGo6NV06LUUQsyOkIXwFBrAor4luLjG7n0jiDni9KJdDYxDrADhtxIDDSMI198ZAaz5yhmMHOs6BV7SC/IakbKwMHb477gNdAam3n2idjBQOJfDHwbKS0ggKoJRsjp7rQ20XWZjyvCkeS6iLvvu/b0D/iv1nNfpLJjAie6Az9eD3iHCJpq/bOPhnhI29OkevB5a0TLEaGR22avfAVrmCeRVtLP5EwrYyb6yufVq9sZU4TA87Zssl5DLcLJq+318hWOllJ75NWmyKq23h3eFCzqdVUp5GAWplFxfO2H7diEyHNBGzXat4i1CTCEpW4TCz8Kx6tyuL7nUrpIJJsfnSY86RfANkJLj6L+I+XeMQU4SN5i49SX6GnTwtcqmQbMRyldFCTPQ0dZmRTkMedznL7HMz4bQ1IoulXB3FrBb1rPEGosCUwKuY3FTIDNekulJ8K/ZseUoJpwzKAFWRE490cpBE7DhxqjkZ0azfduBcEndLTGbGVCULeo/x4I9Hfk66Gdq8WLJgLJtg3ssa26oi 6Zv5wT0J Ce2afxOYIvQLtQh+BRUQ4IoUk6sQ/Tn0l8k4TCXy9T4ZezHJE6J7CcigIWNwwAax12Rgrb1IUicCGkJuzElUchkT4TpuhDySzy0o6GE3mVXJDPogsWNLO00lYUKZxPsQkkd3gqpIQgNtN6Df6cahmAUepKAiUhAir87KWHIAGnlfYT3SZjD3YM4Pef/GUYbmbaCj7I2z9L8bpX+o4LhF5HZEIwrsU6nYkG2Yr+ABgicKlrQUXJqAqqjUdNGa2epb4+NWPH8jt/JLxFjGDP+Kv4s9BgDjqGWLnrwz34yAI4Aknt7Ew4WR9fmITRhvg4G+pm6bTu0J1zSI9xxGfdc3jSoCynwEpxCTTZzkosYlRiHdlF1uV3rqOgOKvH1JHtX2A78/UzR0ynACLLuRIsecllH9EAZWpE0zhSqpKhfZZhVRItdA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 27, 2023 at 02:37:47PM -0700, Roman Gushchin wrote: > On Wed, Sep 27, 2023 at 02:47:38PM -0400, Johannes Weiner wrote: > > On Wed, Sep 27, 2023 at 01:21:20PM +0200, Michal Hocko wrote: > > > On Tue 26-09-23 12:49:47, Nhat Pham wrote: > > > > Currently, hugetlb memory usage is not acounted for in the memory > > > > controller, which could lead to memory overprotection for cgroups with > > > > hugetlb-backed memory. This has been observed in our production system. > > > > > > > > This patch series rectifies this issue by charging the memcg when the > > > > hugetlb folio is allocated, and uncharging when the folio is freed. In > > > > addition, a new selftest is added to demonstrate and verify this new > > > > behavior. > > > > > > The primary reason why hugetlb is living outside of memcg (and the core > > > MM as well) is that it doesn't really fit the whole scheme. In several > > > aspects. First and the foremost it is an independently managed resource > > > with its own pool management, use and lifetime. > > > > Honestly, the simpler explanation is that few people have used hugetlb > > in regular, containerized non-HPC workloads. > > > > Hugetlb has historically been much more special, and it retains a > > specialness that warrants e.g. the hugetlb cgroup container. But it > > has also made strides with hugetlb_cma, migratability, madvise support > > etc. that allows much more on-demand use. It's no longer the case that > > you just put a static pool of memory aside during boot and only a few > > blessed applications are using it. > > > > For example, we're using hugetlb_cma very broadly with generic > > containers. The CMA region is fully usable by movable non-huge stuff > > until huge pages are allocated in it. With the hugetlb controller you > > can define a maximum number of hugetlb pages that can be used per > > container. But what if that container isn't using any? Why shouldn't > > it be allowed to use its overall memory allowance for anon and cache > > instead? > > Cool, I remember proposing hugetlb memcg stats several years ago and if > I remember correctly at that time you was opposing it based on the idea > that huge pages are not a part of the overall memcg flow: they are not > a subject for memory pressure, can't be evicted, etc. And thp's were seen > as a long-term replacement. Even though all above it's true, hugetlb has > it's niche and I don't think thp's will realistically replace it any time > soon. Yeah, Michal's arguments very much reminded me of my stance then. I stand corrected. I'm still hopeful that we can make 2M work transparently. I'd expect 1G to remain in the hugetlb domain for some time to come, but even those are mostly dynamic now with your hugetlb_cma feature! > So I'm glad to see this effort (and very supportive) on making hugetlb > more convenient and transparent for an end user. Thanks!