From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70995E7F143 for ; Tue, 26 Sep 2023 23:31:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE0676B0088; Tue, 26 Sep 2023 19:31:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A653A6B009A; Tue, 26 Sep 2023 19:31:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DE986B0099; Tue, 26 Sep 2023 19:31:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 79E1C6B0085 for ; Tue, 26 Sep 2023 19:31:20 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 35CBCB3A65 for ; Tue, 26 Sep 2023 23:31:20 +0000 (UTC) X-FDA: 81280347120.25.BB72B29 Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) by imf02.hostedemail.com (Postfix) with ESMTP id 636CD8000D for ; Tue, 26 Sep 2023 23:31:18 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RwKQnC4V; spf=pass (imf02.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695771078; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pAVAV1GBJCwxDDAyuqk5FEpmqUfp3B71TeVssBtQ3XY=; b=3B/LWNZzpr1S9jAuW8OJ+m4iTRGCo9ToDfs8s0e8sA2OTTl9wNE2BjiTOS4zgUm1tiqXd3 wqy+ySY/jne84nMqou20gTnqXAmURVOimUbCZ+TS/+GpJIa8JwlS5NWdND3tFldMViXbB2 TnYD66KcHEo2dJy8JkzVZWeS3hvlseQ= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RwKQnC4V; spf=pass (imf02.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.210.44 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695771078; a=rsa-sha256; cv=none; b=dEmdkWvoi9YwY89wb9Xnr4Xq6kB1sxYNniAlBWfBFcYmm8NdmDyzFto5zGY8rafcmCMD9n FkY9GBxV+M3WdAk96TKJv7PwduMSr7jfd9cdQNrI24fEHESt4Lvke4ABAnelmvhhDC9Gzt mufuhVgIxmqv0/lS8iyUGJGb3bgFtWE= Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-6c0b3cea424so5756049a34.2 for ; Tue, 26 Sep 2023 16:31:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695771077; x=1696375877; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pAVAV1GBJCwxDDAyuqk5FEpmqUfp3B71TeVssBtQ3XY=; b=RwKQnC4VFZw6twg1i2rMcsesR/cDPgOWzSC4r6BnxFihjifMuBsj9zOl/9fOGA9edN mJx2aXjCQVWZladgLheivvqKnT3VAGTcQuraSRAIPiUF5Vb9jLF9bW+UA+S2RZtwYnEH KoNRKKUNDXjHQ5qVmgAt77E7pLsMv3gV7HU4w3vVF241K2sQQnLaAXE3YGryKqL6ZviP M8r9R3zA2UwDtjwV+AHxqU3TLkZbBfA4ka3m/pakm19mWWFHmU4T63ByDG66L6P0JOvP hKeU+te5RLXrevKzRo91hNvdZtPj2a1cVo4geoqYXLPQDgXEceJlGVRxeRuCTYrJOwui sB+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695771077; x=1696375877; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pAVAV1GBJCwxDDAyuqk5FEpmqUfp3B71TeVssBtQ3XY=; b=gQdPSm/c364hbxoRbYdKGumGh127DumhZuaLt9c2DNENuKDYShN1z/DNm39FO4muYL l+j2vyrvPK0yN6Gr3GCvPQg6CphnOJrrdz25M1bApDYiGXr093REzWfmrsPPEi6clgqk veBykU1yS3hZrb4bSEzTLNziRIXcAZv4RpWIVCiPxLDncPUW3HLTbbG2d9EdLUGjK09w lMDa7v/sZB3DaIfM8tl5l5sJiTd1hqptZmwA6jr14jliwDDeRL5Mtd5473RtPK+BOTlX K+AmadO+HCIo707ePnEglAI/8cUeQ8sWU7ObWW5nwgaId5ZDCRiNCftsuH/1waOpi4Vm 1FWg== X-Gm-Message-State: AOJu0YxAcbanVuTzej8t5GMEpkXZ1Ew1Rt6ZLqZoSPH+azQjW1zaRbUJ GlPQUuriQdQIxGQ+oTCKerg7hEtL05RSB8zqP30= X-Google-Smtp-Source: AGHT+IH1yQriWlZtGSXP4SNOcv9TqmZsi0jotr3syJyfbtTczeTQe9fBhqh5dFav+DCAFYTaiG+0pms8xgPC3IbLQZk= X-Received: by 2002:a05:6870:5baa:b0:1d5:a3b5:d89c with SMTP id em42-20020a0568705baa00b001d5a3b5d89cmr453138oab.3.1695771077390; Tue, 26 Sep 2023 16:31:17 -0700 (PDT) MIME-Version: 1.0 References: <20230926194949.2637078-1-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Tue, 26 Sep 2023 16:31:06 -0700 Message-ID: Subject: Re: [PATCH 0/2] hugetlb memcg accounting To: Frank van der Linden Cc: akpm@linux-foundation.org, riel@surriel.com, hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 636CD8000D X-Rspam-User: X-Stat-Signature: 551r6urpk1o1fwx3dyysya54r5jd4tac X-Rspamd-Server: rspam01 X-HE-Tag: 1695771078-83896 X-HE-Meta: U2FsdGVkX1/GLk4PtEHIV4gYwm6s/BbYEBJdo1jXb2mgiJ2iW2FdRNFYcCqDXS24bnPrj7fS771ZnslucNfJaVfEjFRuMBbN04Vy1uXonZIrwv4EQitOop/3EZOEow5RlUNH3QVutTEcoBQiWtqH/HXFdaK4Pc8peGRVWPwruSBFGdvS8G30LGKO3MCSKfJJqk9Sxs6MpvO3ZNxNdxKnr4jIbCmMLLZSSobfjfBivdklSWZC54GYy2+PXLwkTG4p93Re54ch0KExcRP1ve33GJtzUlRzlSMd9od2vkzXGhGlg78yvgDXSn5W49XjlXHzaiaN33ErTKXJp60IrS0nHEBNohwMLnPWCpQWPwv2zbMTp8N51ZlDMUGOi1Sw147p14HRlH1tp0INUg+hMMEnRvAbTPy62OOOHB5nmccDXWa6HVbEPRdSTOvXpG0YRBxn20++QF+RMCBg1B8BEaCfl+cGxSCBVseO4IFTKHqqRsvM30h5wVk8yDFPokqRwUI84vcmZOxSFGfJfHY+mUU9agMIZRnn1jhNOsGSnklWZntmBfRGnmLn0wyWyOnElmMKWriBmimgxbqoV1ECbXHTsh+4SGkC/8PU0nTMPOo6ISfkwJEx6m1Cwh49v+c21tuOXwanum8dFxkeHqGI4yO2fyxshSogDufhilOUrErrsZdGrSU6bxKfra4/YFao3Q/5OsJp8Qg7hQhd794LAAEh5mKBteRM0ux4MLEohBnjoupuylDFrVQ2j8+Eq9+jmzfpR8W+Tp/8/tMZNG1MyKJKmj7NvVLAkrGexKLW1S5061Ys6VptddpQelpJA7PaW9WaijDgNCblgTsy3bbL0lA6Rp1w3KnDm4pet3kJ6ywYYj4o0LOjuelJ3F9gatH/f1JGfhPpKugrj1kBh1DspvvYlSGgpIoeyc1kzUiYmLVOexBnrfgZf025LC9g8ERe3UlYoXQUhlAaLnB3XQZBE2K lyBPiMaw 4WD9SQPeYwdVoVh3BtjJPXnk5uhzYgvreH46fBBCP3ZMOOUX1gXzs9ItnMYCQgy5YPfrbMvOdOwFgerOvUNu6i+uU5t/cdtQV+Nk0oHT70uJYcEYBC3RpuIYlHR0QH6KkSHGOrKKf3cwuOoab8aXpo89dm4XVJDVX8BddW22mbvS6T4woTjdxhGBstxj7XkT+p7Dvvv14IL1DsjKQDwCn9vA2OQol7mWhdHunFL+9EtY4e0IDzOKnOTx+ob7ABGwOvdVRiYaRbgKPB13BAaiz0Eyl+u52Y6TJ7ySTIKttQSWgRSoUQf06jUKRpktHfip2r03Ggd/25FrmpM2gahG2V32CiaI4a7th6z3eokYyCvjyMGuKdhzDIpWI/wxzt4P5BV1In5nXfZt2QIp632YY+3G+gQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 26, 2023 at 1:50=E2=80=AFPM Frank van der Linden wrote: > > On Tue, Sep 26, 2023 at 12:49=E2=80=AFPM Nhat Pham wr= ote: > > > > Currently, hugetlb memory usage is not acounted for in the memory > > controller, which could lead to memory overprotection for cgroups with > > hugetlb-backed memory. This has been observed in our production system. > > > > This patch series rectifies this issue by charging the memcg when the > > hugetlb folio is allocated, and uncharging when the folio is freed. In > > addition, a new selftest is added to demonstrate and verify this new > > behavior. > > > > Nhat Pham (2): > > hugetlb: memcg: account hugetlb-backed memory in memory controller > > selftests: add a selftest to verify hugetlb usage in memcg > > > > MAINTAINERS | 2 + > > fs/hugetlbfs/inode.c | 2 +- > > include/linux/hugetlb.h | 6 +- > > include/linux/memcontrol.h | 8 + > > mm/hugetlb.c | 23 +- > > mm/memcontrol.c | 40 ++++ > > tools/testing/selftests/cgroup/.gitignore | 1 + > > tools/testing/selftests/cgroup/Makefile | 2 + > > .../selftests/cgroup/test_hugetlb_memcg.c | 222 ++++++++++++++++++ > > 9 files changed, 297 insertions(+), 9 deletions(-) > > create mode 100644 tools/testing/selftests/cgroup/test_hugetlb_memcg.c > > > > -- > > 2.34.1 > > > > We've had this behavior at Google for a long time, and we're actually > getting rid of it. hugetlb pages are a precious resource that should > be accounted for separately. They are not just any memory, they are > physically contiguous memory, charging them the same as any other > region of the same size ended up not making sense, especially not for > larger hugetlb page sizes. I agree hugetlb is a special kind of resource. But as Johannes pointed out, it is still a form of memory. Semantically, its usage should be modulated by the memory controller. We do have the HugeTLB controller for hugetlb-specific restriction, and where appropriate we definitely should take advantage of it. But it does not fix the hole we have in memory usage reporting, as well as (over)protection and reclaim dynamics. Hence the need for the userspace hack (as Johannes described): manually adding/subtracting HugeTLB usage where applicable. This is not only inelegant, but also cumbersome and buggy. > > Additionally, if this behavior is changed just like that, there will > be quite a few workloads that will break badly because they'll hit > their limits immediately - imagine a container that uses 1G hugetlb > pages to back something large (a database, a VM), and 'plain' memory > for control processes. > > What do your workloads do? Is it not possible for you to account for > hugetlb pages separately? Sure, it can be annoying to have to deal > with 2 separate totals that you need to take into account, but again, > hugetlb pages are a resource that is best dealt with separately. > Johannes beat me to it - he described our use case, and what we have hacked together to temporarily get around the issue. A knob/flag to turn on/off this behavior sounds good to me. > - Frank Thanks for the comments, Frank!