From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71376E80ABC for ; Wed, 27 Sep 2023 23:34:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B49FC8D00A2; Wed, 27 Sep 2023 19:34:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AD3018D0002; Wed, 27 Sep 2023 19:34:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 94D3F8D00A2; Wed, 27 Sep 2023 19:34:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 819C08D0002 for ; Wed, 27 Sep 2023 19:34:15 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 4BCE714020D for ; Wed, 27 Sep 2023 23:34:15 +0000 (UTC) X-FDA: 81283983270.22.27F0471 Received: from mail-il1-f180.google.com (mail-il1-f180.google.com [209.85.166.180]) by imf27.hostedemail.com (Postfix) with ESMTP id 8654340009 for ; Wed, 27 Sep 2023 23:34:13 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fhcyrM9+; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.180 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695857653; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KuiLzV6CGoqkNcJvWLJWVbKjSzC7FqUZFKmUnpswKQ8=; b=T2Sp0KpNZYP3WxCcnaeWBPakbTb+ebghHyQZF9oa5Z47RXOXC/A427z84UpP2ZRBbEzzu+ DHEuRAZXk7hpd0vbWlUp+WjbLWtBJuvFPXbqWzjbQvV5KggyNMSGjjq3p3To1n/F9DJRKG QL1oJtIpzX2T4tse9pTtHV6u9j2pduA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fhcyrM9+; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf27.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.180 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695857653; a=rsa-sha256; cv=none; b=2e1vn+j0kyAXUEu+pngsDKTQ1Tl+kh5RZPnWFdK6QYDF2OIZoqlbL9NuavjzBGayKH1aVK 2Ha3GfzIq+FPvfIbmmaAA+xej/vhBinKXDyiyhEOl5hWmlzxRihIsH6nN+OCBlYCdiu0oE OVgq3gnvSTR/nZbaTcOv+cePcCSffUI= Received: by mail-il1-f180.google.com with SMTP id e9e14a558f8ab-35135b79b6aso24384775ab.0 for ; Wed, 27 Sep 2023 16:34:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695857652; x=1696462452; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KuiLzV6CGoqkNcJvWLJWVbKjSzC7FqUZFKmUnpswKQ8=; b=fhcyrM9+ek0Lw1E7wwkEyQJhV7x9wfwsXo/0+jRwR6N7i1onl6SqbbN9TnTz6GenwH Wxam7TwRe3f4uf+rV8ntt5uoUTGkn97xEtfUUy7cocdEp75secjK/qNbCoMgFvM9JXUM Wltl33iCyw2jUGzZJL+teRjrLLm+IfuVs6HzQqr8pXQn9XUcGq9Zt0t9qLVPGfGS3w1I NVwfg1iqdXn5fkbU1aM266rH9f8FxCwbvJrlAYaQnkqUDxqGslKirKyTuOCOPNEB0gb4 vdyI+eNAMvj90NvICC3p/UWhRbzd2ndxpuG/HcqTdJm4lttP02hVGI0ZUaAS2aIc0L/4 vepw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695857652; x=1696462452; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KuiLzV6CGoqkNcJvWLJWVbKjSzC7FqUZFKmUnpswKQ8=; b=b/zdtsaSszGDjmzjzfFhQW0cayDH0Sqd2TICbIfHo+TF/NQ2T5yuGxHQ2AqPTla2sn M9bZqhVAJKACsM3G+1EJChfS40sro9KaJDdlaS81cJYXVBUHb6E7xazA2QxsygrJ1njw N3fyn6dy0XGpr7WQq7a4QlUfXAEaRzVp15tvAmOtgSKLMRmkpMMfS4tCe6Ebn5RVXObI 6Zp6/aUB38585ZAAEvQ2B1d1juTS0UVAtrFNmQ/0Iq5BaiaOuAvNQBpmrITwKwx1mjad yagSZNq6D8sr9WGxHgAc5+probiu6Y2WZy1W63OM4xxaHAgPN3I683BhmOjgTuNqsHR2 CbOg== X-Gm-Message-State: AOJu0Yx2ZxEwRd2p9TZ6j1FKjvuZreMTb85nEfAwq7mpBUuTg9y0qhzh /wUO2tj0Of1KiqB1FgWFuAZVJc5PAjpPihG3J6s= X-Google-Smtp-Source: AGHT+IH240jOGmryB4DKmJdrSZ0sw76v69veXHjvxbIKdhdqV6ViGFilCGJvjsPnRuNvxUyXjajBAwH8Z5Q4GyT1Ep0= X-Received: by 2002:a05:6e02:1a29:b0:351:a18:51be with SMTP id g9-20020a056e021a2900b003510a1851bemr4644664ile.15.1695857652609; Wed, 27 Sep 2023 16:34:12 -0700 (PDT) MIME-Version: 1.0 References: <20230926194949.2637078-1-nphamcs@gmail.com> In-Reply-To: From: Nhat Pham Date: Wed, 27 Sep 2023 16:33:59 -0700 Message-ID: Subject: Re: [PATCH 0/2] hugetlb memcg accounting To: Michal Hocko Cc: akpm@linux-foundation.org, riel@surriel.com, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8654340009 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 7oopnme5pfa7iibnhzrrgcq96ckfsp3w X-HE-Tag: 1695857653-266498 X-HE-Meta: U2FsdGVkX18+SsEMIbpnZUcMinwXO+3ANbIr9EFjrCCQ0aVFp3zIECjWPGreBkTZ9XrtWYyRl3toJITIjpRObsFC6v1ajDIHF0aH3yRl9qZE3mJpVFyPlE1iuVKiV7GCZ9PY8hyCfJzLHv2C3jOkVRfnkbIlmrZ72F6/Q57hRsvHFODeCncLpdO9JPXMY3+zQjWCuOhIOD089GV3RE/F3zfR3d4Igl5SKxgJxHbcpSU0kKO1/3ZPERNHi71zlrkGWJV+q+yV7Oy42RowAnbp2hfoQlUujOmaHzNdBSionKsbyNCcD9Y35hQPjujykGgj6/e5UuTeJG5BQ3dW1nZATFUjods9juf1myd9TpPKKL8i9CFMN+ZyblR+GyW0333hyRXb0pk8IB0a3qoR9IxIrRjQt2jaYw1GHKlLUWqBh/3URsoK8rukrpufopopAcSKXMwgUCaFXlOI7zmpwm/X+Lh2LGE54OSazSNFTXl+5WgSEYCc4JJfUV/l3ZtuFiT7CiatZe/plhwergIe9jDOklmy7TX5+M6S3zrk/T4SWX4XjX/iLGSq2T3Y1UeD0B2+IluqDQufb9LDFEJaC2E1VyK0VHY/SU/G0woISGEomYh3hDKvZdjwJsf6o4wEJeC4DvXaPHa+TVFfj7vQWl+6Hf425R/CYsW4d/JieKPkUZ41FODjA4dWGzXKYlmVHOLOrqouhMnb5COqDUr7jMPhyZKhTFkH2bgpjp8J+BSXxqqMUjPZd0cEpBpp0QYD770qJkzTXyXenxD6g+jWgSctQFyOBoKO8S0Q3XCwlVc0VlLjNVRU8SlerGyc8e0v0DrMX64YC5J1lANDd6fVe+KJAcH6GoIyScHEgHkPmLgGx2mRcLiAcATBdsGKAadjkN6t2Ki6uZSsWgQTlowPbe2GbkiSoA6YsFr5lI1I2ABnIywiGuI0FQurFMjjRje3OfV7Tz+nw3B33RFWfA9Q18G BPiSgB5k kbxNkGBPGPNFFTFeCNxU+Hixi814U6iSdZ72sqnQrt/IaLlto6FbWQA8kjGw7gJNFD+zvtWgXdCyWZgWdp2pJ+IbA0Vg+S9JCu8mccfB9ivzBX5qRSd6jAKXBBT5mGsUtRYvHvCg0x1i1s0y0oPVbnMmE360bqScOwfEtIQtM9NuJcqBOlLyV8YNmicA1aNE/2sJhtjEWckrqmSYtyz7rRc1j3VRJQDA1l3lTWP6UV0qSMSnaqdjKSMNA/m5IFHGLMJUdzZ0oY7zXoY3zqa07WDHdqdu/jU1mDB+sWCwBUhqIEurBDmpKY6RpVk3URT8O7LDSF0K2Uu8kYXrf+BpJRBCu241+GjXN6YIM4MWyhlz03rGImTuS7oLYp2HOK9H7lF8ekxlm6ExzaC+Vot/rcMNB2Gsgn+1KPyT9 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 27, 2023 at 4:21=E2=80=AFAM Michal Hocko wrot= e: > > On Tue 26-09-23 12:49:47, Nhat Pham wrote: > > Currently, hugetlb memory usage is not acounted for in the memory > > controller, which could lead to memory overprotection for cgroups with > > hugetlb-backed memory. This has been observed in our production system. > > > > This patch series rectifies this issue by charging the memcg when the > > hugetlb folio is allocated, and uncharging when the folio is freed. In > > addition, a new selftest is added to demonstrate and verify this new > > behavior. > > The primary reason why hugetlb is living outside of memcg (and the core > MM as well) is that it doesn't really fit the whole scheme. In several > aspects. First and the foremost it is an independently managed resource > with its own pool management, use and lifetime. > > There is no notion of memory reclaim and this makes a huge difference > for the pool that might consume considerable amount of memory. While > this is the case for many kernel allocations as well they usually do not > consume considerable portions of the accounted memory. This makes it > really tricky to handle limit enforcement gracefully. > > Another important aspect comes from the lifetime semantics when a proper > reservations accounting and managing needs to handle mmap time rather > than than usual allocation path. While pages are allocated they do not > belong to anybody and only later at the #PF time (or read for the fs > backed mapping) the ownership is established. That makes it really hard > to manage memory as whole under the memcg anyway as a large part of > that pool sits without an ownership yet it cannot be used for any other > purpose. > > These and more reasons where behind the earlier decision o have a > dedicated hugetlb controller. While I believe all of these are true, I think they are not reasons not to have memcg accounting. As everyone has pointed out, memcg accounting by itself cannot handle all situations - it is not a fix-all. Other mechanisms, such as the HugeTLB controller, could be the better solution in these cases, and hugetlb memcg accounting is definitely not an attempt to infringe upon these control domains. However, memcg accounting is still necessary for certain memory limits enforcement to work cleanly and properly - such as the use cases we have (as Johannes has beautifully described). It will certainly help administrators simplify their control workflow a lot (assuming we do not surprise them with this change - a new mount option to opt-in should help with the transition). > > Also I will also Nack involving hugetlb pages being accounted by > default. This would break any setups which mix normal and hugetlb memory > with memcg limits applied. Got it! I'll introduce some opt-in mechanisms in the next version. This is my oversight. > -- > Michal Hocko > SUSE Labs