From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B35A1E784B7 for ; Mon, 2 Oct 2023 14:50:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 329596B0198; Mon, 2 Oct 2023 10:50:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2DA2E6B019A; Mon, 2 Oct 2023 10:50:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A2636B019C; Mon, 2 Oct 2023 10:50:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0C6786B0198 for ; Mon, 2 Oct 2023 10:50:32 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id D2B7DA02C2 for ; Mon, 2 Oct 2023 14:50:31 +0000 (UTC) X-FDA: 81300807462.01.C384197 Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) by imf17.hostedemail.com (Postfix) with ESMTP id B2B7540021 for ; Mon, 2 Oct 2023 14:50:29 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=fTKfcWyG; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf17.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.174 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696258230; a=rsa-sha256; cv=none; b=103KmM1iYQ8yl7qeFAzeLwqnOjcXlrLS00/YafqTgP9wDHYcNkC1PyDkR3wGQcSmOwoh4s cjng/IRR3C0BOlCykKdO41MKnzbf0V6pEIH8fDBFAFCZHxKUTZZ5q70h2PASBK1hM8fDhS n8uQG0E6z21TIrjOm8fOH0Yu0UWu98A= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=fTKfcWyG; dmarc=pass (policy=none) header.from=cmpxchg.org; spf=pass (imf17.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.174 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696258230; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3flX8pvYQpulFjLzINdT1Z6OZiL8rcWR2w9QCAJvARw=; b=pUCS+D3e/W97DslNHSXD3fFm3QHMguaecZ276jiouAeaVCih6uv1v3R2E94noSXdrKB3L6 jzyOpYjEBjCmxGjcnF/ZJJe/ZZLLGOCcVaTukQvJlPwmhfc1W59T4RUJyP9qYFA7c3zJ5N aECXHrOGWbB9eUfroBw79lydf0i6zbE= Received: by mail-qk1-f174.google.com with SMTP id af79cd13be357-77412b91c41so952061785a.1 for ; Mon, 02 Oct 2023 07:50:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1696258228; x=1696863028; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=3flX8pvYQpulFjLzINdT1Z6OZiL8rcWR2w9QCAJvARw=; b=fTKfcWyGZoTh90eplmtaHNhgBXuo/P+zD9Hc8u1yIwwXh29qdoTnbHWShmxIKaxHlE VEkswEkDpCb0KMYfRuoN+n8042SP3y+MfevKAWIkufCBT6yX/LtC6Xc0n1JEXJEgX2bj Kf6zlTAkhjW4sj6n/kV2X4sRJcEINvwDuoa8hKA9BA3bR8h4YYwvZWW72lpMY37FkC/U Wse6BUjGSAZkfXWXNyMf8Ns50lcc9hafl0lnL+t5KV8w5bNPapeoSaMnJLGedwBng+tP cXcieNEUJNcZfle7ypjUdhszeeh2uQKvG9GW/ULuBTJNTsA7bsJ9F436D9AbunjGI0n+ G64Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696258228; x=1696863028; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3flX8pvYQpulFjLzINdT1Z6OZiL8rcWR2w9QCAJvARw=; b=Q59K1folix4H7DHT8/9P+rf4xNrWzMUzj/6PreB5UNxxAXIIzSrFga2XCEnInCzx0S dBKYFW0csNrOmsAKdvJglktqqCXj2dP455yyq7UZ3WBrbe8qJCY3zBEUVyRT7cTaRILq 5lbNGvz6jzvs/F1zkiKsM50hddZnFJS2Slt0jSo+GMkomT43dgJselVdMn81qKk0XAEi 9ftbmPGZsN99xJMliD9j6uXgpFU8NY/hhHtlN9vZSG90x12SyVCGay0vAWAIPLZSmqhR FK9nd9NzNMr2p+yGgyf9Lgz8HsgD+QLFNgJ9K2B8BtzdmWWMK9c/CR/fUjWBbrG5FEmu SypA== X-Gm-Message-State: AOJu0Yyvj+eShZMM2jeKoB2KHy7A4IJRA0rDiBSA5lbguZNnEDaFyGA9 pnn4+F+koiv9VNe8hvTjZV+YaQ== X-Google-Smtp-Source: AGHT+IGrhGOcsDvoDEISgACZY/G+y1Ase64mfWV0FDNriKPr/kVTRdJZtPPkZ36hyFu9t+NaOPtfyQ== X-Received: by 2002:a05:620a:1920:b0:773:ae67:4b96 with SMTP id bj32-20020a05620a192000b00773ae674b96mr13046574qkb.53.1696258228654; Mon, 02 Oct 2023 07:50:28 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-3012-16a2-6bc2-2937.res6.spectrum.com. [2603:7000:c01:2716:3012:16a2:6bc2:2937]) by smtp.gmail.com with ESMTPSA id ou2-20020a05620a620200b0076768dfe53esm8987922qkn.105.2023.10.02.07.50.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 07:50:28 -0700 (PDT) Date: Mon, 2 Oct 2023 10:50:26 -0400 From: Johannes Weiner To: Michal Hocko Cc: Nhat Pham , akpm@linux-foundation.org, riel@surriel.com, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller Message-ID: <20231002145026.GB4414@cmpxchg.org> References: <20230928005723.1709119-1-nphamcs@gmail.com> <20230928005723.1709119-2-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B2B7540021 X-Stat-Signature: 6kig818hx5g33mwffj9ywxurdj4cd7mr X-HE-Tag: 1696258229-408601 X-HE-Meta: U2FsdGVkX18W7IoREhTI3Kn8WDvr3+g3FDeh43uLGc6ITkH8ILBYP9blWba/pOo4vkGPl5k0c9CcRqBWuOd41sfCcuHBOsZrQAXWd371H6Ms1Ei9FCCpxEYrAAtc2LKdGJG3TJIFx6OOvOX04eKpSSWel5tQ6tBLLIh168TUSFqkugUuEccSh4wUNv3tBNLNKYzJcchB0UerYcxsff/HYPnbqPFrCkvFJ5cLacQ8K/HZtQ/jXyr3ke4UG+98iuQ/cUvDquFKjmX6rAN+b9R+Fa78JmtB3f+kp5/W7YFBbsG5nlnz4r7yh+CPg0wGRbiA+o+GnJjLz18170zczEshrPgAjfVZVV+bIbHMwexLduA5riMkjxd9FRqBccUThXyrEGILvf4QdsT3qq4aXRdt2kFwhEyzmSdqJQdKHMzie677t1vX4FgpXojovyY28YgIPpaHC4Qo4/DgaH6OGhWbLL08AXD+evM6yIxt4W01fk5PS078lRCSNIGXZiq0YuUzSdvpm3VQYy9Z5v3M4IokZI83XD7LhOtY6MNRy55mdIeIoEdVILHJY0PSv3gkXULlmszKIdbfLUW11rau3KTagKvYZGItOayVi0gq57bhNthEN3VttKHlSlG9bw8mqqkKR5Y4hCbFAOQqB2VGoUfZ9Uq2jschEevkBnS2Njv9ASuQMGgfcZY0XESHL3XylY1Mc/bj6guM2jvgH9lj7SYQwcmbZcs8Ke3l16/nHF3JUoOmZ0gIUE90xgWyL8o6WuLwGr+MUePVD9ucHl3MNb3ONLBc78LQfXabyXCv1OIXmLFTcthEYKTB7Hl4PkntwLETVZ0CvCDocHtUT3xNZz6hZosnZX5RPMC+L4/rrCx5k+tV9Ys02q9SG5me9e6NXsUld4kbqvYUoLQOLIxP/Kyjbzn5YVXmjsalI1HDuUewT066i0y90Xn7rjfeGeGDy68vbGRi4niRGXsZWrHBhY0 eWPq5Abu EidiwDQBSYxEy+9vYhz1GjJHs1kZ/4feBnrT2Ua04b2G5t/zqKKRkSSjc2oPfUvAOWRQHpeEbg/OdG8CeHmlJztpMsfkXlQibK+OHQenCugzxlkKqojBlGkrVTS+NP3KrCkL/vKuuCPFfdMaFIU3IC9+2bWzRbGuj3HL+nMk2FJBx1Sko7DsbL/ybvKi553nFHV54FiPpMyXcLpuLVjSOMLh6CwqXVbTWwX1oOSqRA1NRdSAbDTZsX8+ZPehpEV5WaoJQLhs4EEQ2tveAMgyFBzHO9cRUNJLA9dty8LYeJ+7EBQA5FyT0w6B6evlY2zQiKsPyEvKJWdcJW+R4SbYAcJfmi3IAc/AFeOZR9l94Xp6aPdpxfoLl3w/RkZI+bjGO4+KvguXjWyFUSKM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 02, 2023 at 03:43:19PM +0200, Michal Hocko wrote: > On Wed 27-09-23 17:57:22, Nhat Pham wrote: > > Currently, hugetlb memory usage is not acounted for in the memory > > controller, which could lead to memory overprotection for cgroups with > > hugetlb-backed memory. This has been observed in our production system. > > > > This patch rectifies this issue by charging the memcg when the hugetlb > > folio is allocated, and uncharging when the folio is freed (analogous to > > the hugetlb controller). > > This changelog is missing a lot of information. Both about the usecase > (we do not want to fish that out from archives in the future) and the > actual implementation and the reasoning behind that. > > AFAICS you have decided to charge on the hugetlb use rather than hugetlb > allocation to the pool. I suspect the underlying reasoning is that pool > pages do not belong to anybody. This is a deliberate decision and it > should be documented as such. > > It is also very important do describe subtle behavior properties that > might be rather unintuitive to users. Most notably > - there is no hugetlb pool management involved in the memcg > controller. One has to use hugetlb controller for that purpose. > Also the pre allocated pool as such doesn't belong to anybody so the > memcg host overcommit management has to consider it when configuring > hard limits. +1 > - memcg limit reclaim doesn't assist hugetlb pages allocation when > hugetlb overcommit is configured (i.e. pages are not consumed from the > pool) which means that the page allocation might disrupt workloads > from other memcgs. > - failure to charge a hugetlb page results in SIGBUS rather > than memcg oom killer. That could be the case even if the > hugetlb pool still has pages available and there is > reclaimable memory in the memcg. Are these actually true? AFAICS, regardless of whether the page comes from the pool or the buddy allocator, the memcg code will go through the regular charge path, attempt reclaim, and OOM if that fails.