From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B89F5E75455 for ; Tue, 3 Oct 2023 12:54:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 279448D006B; Tue, 3 Oct 2023 08:54:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 229348D0003; Tue, 3 Oct 2023 08:54:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 118258D006B; Tue, 3 Oct 2023 08:54:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 03FBE8D0003 for ; Tue, 3 Oct 2023 08:54:49 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C100D1CA1B6 for ; Tue, 3 Oct 2023 12:54:48 +0000 (UTC) X-FDA: 81304144656.12.F945C84 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf29.hostedemail.com (Postfix) with ESMTP id BA7E6120026 for ; Tue, 3 Oct 2023 12:54:46 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=RbhU9WGr; spf=pass (imf29.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.180 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696337687; a=rsa-sha256; cv=none; b=y8adFEDBzjaLilpmkdRVuk0oJvfWzPtRPO1gpItwoFj2EIVAV+AwSyl7HSeyrwF1JzVtGr DKmdbOwXnPked2zg/IqoImTlPEHjb6BvYBdRo0dhHpzzyzgVx8Y0wN5kyy/3Jiy9lpADUp q7u8ywc1rwMSiT+YB7bKi1DF4e3fWb8= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=RbhU9WGr; spf=pass (imf29.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.180 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696337687; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=JSjIWFluF7K4ottX8oDikBL846wHdq5PEeEcvlyqteQ=; b=vmcGGPPz7i11rbuwYwb66MFKfYjCTdLX2cvDmaR9teCBnsjsq4XAi1UMRZfbMSe9JhBusK LwglFO/dJYuCZifpmUSKHBqHn7Qw/zCSDppCmxLHLI4WIS4EEEhC22wj9jy0QBlcjuFsGT w8FbXL3E3GYE+veJs7Pwxnqik7vd8/U= Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-419787a43ebso6111101cf.1 for ; Tue, 03 Oct 2023 05:54:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1696337686; x=1696942486; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JSjIWFluF7K4ottX8oDikBL846wHdq5PEeEcvlyqteQ=; b=RbhU9WGr8WpxLketDpBnoSFsIbkvkal8AJnSQf9cgLuArZsOP9Iy8GjSxrW+W5DEh1 K3TpTTQKEisPwZy/krecH836zQFJzAfoNG/G4GP6YAMzs003BUQngINpPfkoRLuyVnpZ th1AnLPXrG60qGkhdzdfNm8SS7BE8IqIMg9DUCEMLA/kDj5a7huZXM7GZ4M0DGTtEUoy DXGs9sCQReuRSpN/0YVPd7wT2M5P2ds1M5rwTu/1psZLhq1Mi4SGJ+KJs2DTd7KBfhZs Yww9MxM9xX8yxdF3GkQYEMcSnm6dBEicWpYL1b+jwIkWzgGHtcL15J3jIOlXsOj7iRjC ABEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696337686; x=1696942486; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JSjIWFluF7K4ottX8oDikBL846wHdq5PEeEcvlyqteQ=; b=QwN/S6mviJw7vggjqv9Q8/zbqwOMEmw09vFIGJtCBsVvDMdAGe+o4NKFrN5GG2cLoO dMA/Nfkzx400A4A6HlPnUCoCjw1q0hPgXNpZMOTe02Nglw+iqfzEm5mADD4FSchP7Er8 +DYmB+NAPLOoP+UphpilS0ycl+2IctTkaAteD7n28Z+wbxhDfCLoFpKiBmGorHqFoDXA 9ywLZillBmTLgytC7chC+mVwmTjSY1BhEVFM4H2W0EpPo+mC8xV9dviYHmhMRUMxgfQz A4usqQkFTKqyYYHCgFJmk3uInbvmOJ5tV2G09Wc/K8aZ6t1A7ImexMUENpjoBYxkuT7p hYSQ== X-Gm-Message-State: AOJu0YzxpqroeKWC7SQ3YcNu1kRz8YJIuNHaqMCwHfELo5acJULBfCJe YrpXFEsaC68AHTPPUeZQO5P4eA== X-Google-Smtp-Source: AGHT+IHZv887dDUOgB7qFhNHpGvRSai6kezmkPAVt9oX0LeFnqHRYx2EA/hBrOuFjSW/JLXeSpXpmg== X-Received: by 2002:a05:622a:1a92:b0:416:5e11:f7ec with SMTP id s18-20020a05622a1a9200b004165e11f7ecmr18798717qtc.52.1696337685795; Tue, 03 Oct 2023 05:54:45 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-3012-16a2-6bc2-2937.res6.spectrum.com. [2603:7000:c01:2716:3012:16a2:6bc2:2937]) by smtp.gmail.com with ESMTPSA id c15-20020ac84e0f000000b004197d6d97c4sm413230qtw.24.2023.10.03.05.54.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 05:54:45 -0700 (PDT) Date: Tue, 3 Oct 2023 08:54:44 -0400 From: Johannes Weiner To: Nhat Pham Cc: akpm@linux-foundation.org, riel@surriel.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, fvdl@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in memory controller Message-ID: <20231003125444.GB17012@cmpxchg.org> References: <20231003001828.2554080-1-nphamcs@gmail.com> <20231003001828.2554080-3-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231003001828.2554080-3-nphamcs@gmail.com> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: BA7E6120026 X-Stat-Signature: b5tnfcqcay3hpmwkqghebxb5yxawhqfo X-Rspam-User: X-HE-Tag: 1696337686-224102 X-HE-Meta: U2FsdGVkX1/acbrkIT2P1I1CscCGVtAlRHvhED97wOTk5bnujtZ4pjOcgtyNbghV8JKbngHmuqIkpy6R/2egfYYo/Zyq4rMyGfmzAHOPaz8Ug53x7kobMZ/jFpR+K6exyzZkIxqmSPmQAU0SW/u+Snhxrm6lxPwhmAR2OM+rdkAmuZFXn2aSuaNXB5Lm4GQXxi03UFRjMGLqXUGL0IzYH35qBNtd6N5fHEOLLE4PjRwb+MLQcZ0OrU5RQSQcjaV4YBc1FKY8GGK7wSuYw9lIZ/yTKBtHrTxojm2MwLGxUEAvx36CnmuCqnv53moH0z7SDbrfVnPVubmrB38BeVlHszT+yTtN5apqlVFlZa4jkEtsTqC4plzvaxS2QJpS1KDS1s28vDBUVbfhJRypzn6isG5AS5Wy80CTS2LVT2dKt+kmU2v2GXRLe8ikC0+REDPIHh39CFvMitLJ955JejYhEIFBVwoWXsEyBIwqKvP5rQyG8h/VEUfxaT6ZOEFAurBUjxg8jHUxyi6qcVg0Hr0648SuL2ya8tZeAjWN4jX/y6ncLj36WMIEcDtBAll4EueKAneq7Ns1gI9mFDQIbBV6G4apr80gTawb2Cia5fZxdYYDcENujrW96tXbcre8sFA4G3S2XYfk/8h+7rF38kDr/+sokjSajrA/vfOrTVay5UzdyYEcdxO8UUGlpNwSiVtiiQTqeYfRRKa0/UpembbI8VSlxPDmwfANBw+V+zp7dwbIQx7C2H3OV0kHe3uWKtEAZVfSoeVpED0NnrysgarQ9Bg8HyR18YyrAn4e60euiRJ/BbOgzH295cMB1XNFV047TxPcU8/zNYniglnu64N2p0jAQrSJMPAUMMbpr1P40twvIk/5BYtphtRe0czUhqNSdzjdhJruBueyg/S6JlvMb8k/kVWtNQRIlTgLv4sLaMSikBo1sWTyDXk+kmSWGsFjq0/5sfjbgQOMivWd49V mfFJqQqy Z6nIqcSVFY3z8gh7EcKYuntdtcA/efWalLYsM0zeztzi54HmZOF13OTJxru6S+UwnvPITpexDGLKjd24Mv5J85uAF+ONwbY62buTqYVNndgBCEVsmeGLq/moR5xHmq7zFbtoUuuAEKYVNpdKBHPfIxJncemkJPBNIpf0FIIfI6IMRfOJJ20c3B7sUk5eQ7RvZ8rIGKAcAK1DE/Vqbw/tJlG9wY/kGEGW57oT2X/utO1Ln9Ju0JZ777mL/sbJ8dJeUtE9TXekw1eHUcuiP0RZze714gZK5jZINo6cdyBsBsODFJ3FkApaEaqSYbbcNN6ZJgaaHkiWMZWxeWdjT0QuL02SCWb71ccuVkizUYi4LoiPbrb+ZCSRmaIVJVSRXDli9ANx+yqpocl0cyoqq2ODZHuN/XKDmZmbYhNw25MEueKBBn5kmaLc07MTUpPV1swvpCC9BMQSWJKiiONBNs6HcnXpjaiJpsILWxi7d X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 02, 2023 at 05:18:27PM -0700, Nhat Pham wrote: > Currently, hugetlb memory usage is not acounted for in the memory > controller, which could lead to memory overprotection for cgroups with > hugetlb-backed memory. This has been observed in our production system. > > For instance, here is one of our usecases: suppose there are two 32G > containers. The machine is booted with hugetlb_cma=6G, and each > container may or may not use up to 3 gigantic page, depending on the > workload within it. The rest is anon, cache, slab, etc. We can set the > hugetlb cgroup limit of each cgroup to 3G to enforce hugetlb fairness. > But it is very difficult to configure memory.max to keep overall > consumption, including anon, cache, slab etc. fair. > > What we have had to resort to is to constantly poll hugetlb usage and > readjust memory.max. Similar procedure is done to other memory limits > (memory.low for e.g). However, this is rather cumbersome and buggy. > Furthermore, when there is a delay in memory limits correction, (for e.g > when hugetlb usage changes within consecutive runs of the userspace > agent), the system could be in an over/underprotected state. > > This patch rectifies this issue by charging the memcg when the hugetlb > folio is utilized, and uncharging when the folio is freed (analogous to > the hugetlb controller). Note that we do not charge when the folio is > allocated to the hugetlb pool, because at this point it is not owned by > any memcg. > > Some caveats to consider: > * This feature is only available on cgroup v2. > * There is no hugetlb pool management involved in the memory > controller. As stated above, hugetlb folios are only charged towards > the memory controller when it is used. Host overcommit management > has to consider it when configuring hard limits. > * Failure to charge towards the memcg results in SIGBUS. This could > happen even if the hugetlb pool still has pages (but the cgroup > limit is hit and reclaim attempt fails). > * When this feature is enabled, hugetlb pages contribute to memory > reclaim protection. low, min limits tuning must take into account > hugetlb memory. > * Hugetlb pages utilized while this option is not selected will not > be tracked by the memory controller (even if cgroup v2 is remounted > later on). > > Signed-off-by: Nhat Pham Acked-by: Johannes Weiner