From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9912FE784BD for ; Mon, 2 Oct 2023 14:58:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BE3076B019C; Mon, 2 Oct 2023 10:58:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B921E6B019D; Mon, 2 Oct 2023 10:58:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A5ADF6B019F; Mon, 2 Oct 2023 10:58:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 97E456B019C for ; Mon, 2 Oct 2023 10:58:26 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 69012160293 for ; Mon, 2 Oct 2023 14:58:26 +0000 (UTC) X-FDA: 81300827412.02.B64E27C Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf27.hostedemail.com (Postfix) with ESMTP id 67DF94000E for ; Mon, 2 Oct 2023 14:58:24 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=DByYXLBN; spf=pass (imf27.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696258704; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pbbS97LKBoLnUphLCc+wCqBwCujHRrwtZZFMWJk2Q3s=; b=TcYeVbUgMoJjnV7Vgl4jJobuU1jdojQVHqrqhGryqhexzs/UXQ7mLssxB3Hg1bFUWJ671J 5zHZFGwdlrPhoC4K6Ss9Fj5kVt0FVOHuLSiJ7P5gtjij0kuEl+s7+6A1l3bxopMRq+vbB9 eGBptDeY9whQEt3RxLBlJES+9g7e7tQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=DByYXLBN; spf=pass (imf27.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696258704; a=rsa-sha256; cv=none; b=vezltIInSeh0G7SNnMckbkqZ3TSTGf+8hkEbymmaJmKjzp33uEd2EIoNvYiYyBDmOYrEpf ozL8sG4gM8oikvlH9kBqxYlkVyOQ+SacqT4U/xsNwxPbBx5S/uGhp4B/uu+cuogRAgI4DO n0lZGP7PZ2IXVq/K+CZDXu0UjRRDSPQ= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 29E241F37C; Mon, 2 Oct 2023 14:58:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1696258702; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=pbbS97LKBoLnUphLCc+wCqBwCujHRrwtZZFMWJk2Q3s=; b=DByYXLBNO625TR7THAZ3A1ENKm5AGNc/51X+rknPe7tUQbICiLFq2Gx+rkuXHA0uBlX70U p843Tbr0ThKfR1kaIddlxX5+BEWuhI66XM3QdvqPFHJK/axMxE2WdB/lMYs44GDCZObPHj Lj0jHnh/xEb4UWv6qlmyXZdSch+iDDo= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0B09D13434; Mon, 2 Oct 2023 14:58:21 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id IXxEO43aGmWpdAAAMHmgww (envelope-from ); Mon, 02 Oct 2023 14:58:21 +0000 Date: Mon, 2 Oct 2023 16:58:21 +0200 From: Michal Hocko To: Johannes Weiner Cc: Mike Kravetz , Nhat Pham , akpm@linux-foundation.org, riel@surriel.com, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH 0/2] hugetlb memcg accounting Message-ID: References: <20230926194949.2637078-1-nphamcs@gmail.com> <20230927184738.GC365513@cmpxchg.org> <20231001232730.GA11194@monkey> <20231002144250.GA4414@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231002144250.GA4414@cmpxchg.org> X-Rspamd-Queue-Id: 67DF94000E X-Rspam-User: X-Stat-Signature: obed74574nbsmt78cejemq76iwuikmmf X-Rspamd-Server: rspam01 X-HE-Tag: 1696258704-742895 X-HE-Meta: U2FsdGVkX19WEZwqejXCoYLK1dDd09YPGffNdzKkMX6U4gb+sAim+YPog5a1AwHE/W4AHjRneP6ZEWF2KwE1vp9cOVW65uAWcoaPbNWS/UsD1n/lBIMG2b6SWdmz5hSx6yHqnYDlszEfKmq9bcI8vZgGIKg3GNeFwzD3CRZS8hXbvYrNY8HGn4K+uw8FH5UeezPjNE7KRguo3UvtSUO4vbfwVgYYaaNJiPnIbQsDaKNSV5v1nWG+YTDQkleEFNAgPFj1cp2/PWwK3a4CRds7BbJ1sYdaYHBO8HXj35jKmVwj4ylAPOTTpaPbHQNggknxU6Df5fIlft0eWM5wOmF0t4FMaKhrCh4U12wJR43tBNiQS2j3eAbR17WJpg9eGUr1AvA5stHskr5j1SN+RADjxsn5+rkS8WerjQP134m2r5IDeCDz5BQ28Bs3vjC0CA0JLpM59UKVSEKnvqzCfbSsC2kaPjzsNwskvNbijClbbqpIClhIW3e7KPUQNwyz/CkHpdUhNrtOL1oHxcVGWd7GClhgRLzTdlznEJvbLN/JutMJaAkYreqY3qzDvdtjRe1ns22kJ+yrSPJrbXl7PzsD6+j+jFpSiDiJONQH+0fevl6WrrPpTPMWXay/k4J0Ki+as2SZQEj50ukxQJvaNJB5XkGGriK5YXSrTp4GgTaJz7k9n9MHj5sCnuYWwMNWc94u+bE0ug9xPI1y4dKSnTWo3Qu7CYyeaNCMrUjaWJq/8jCBgyUxAHukgl28ZvKIaIHpua+/3T/bafApdA2lOmkIdnVnNCV0/+mgWFVVb4T/Ux3qmEaK8BfmrWibxEhEGEmbhAPBnATO9RBrTztv9B5ssL/wgoQBAeFMWy87Ws91Y9CDRfmv2EsYU5/mLX0UiSTXGPK0iK+gTbqUCkqporxC+5LYWOc4DAozXwHoEdkpCCOyyAelG2/RMPhv36IN8QMG4UQzn0kUrSbI7NmgfvC BjITY6UE y37R8DNkZwD4AfrxOqpNKr7QtNl+oK3dvaC92WKgNi+OQyJhIyoqh7QYtMoJsKY1ZlEja1OP3AwBQP45vbTHGmTBrT4edHHPDF6NbQZn5oS5epauJpGwkzVcUZKJDwshZN1BmTPNuoKctxCNDAxlNSdrrPSbvxZbDslEt8E7Ui4OR4s4/EYkuiW0FAJZ+XJ2Fcrjl6rCYK2uOqYV7iOVDly3nC+vZzQ91hIkcSxHKYtpn4rM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon 02-10-23 10:42:50, Johannes Weiner wrote: > On Sun, Oct 01, 2023 at 04:27:30PM -0700, Mike Kravetz wrote: > > On 09/27/23 14:47, Johannes Weiner wrote: > > > On Wed, Sep 27, 2023 at 01:21:20PM +0200, Michal Hocko wrote: > > > > On Tue 26-09-23 12:49:47, Nhat Pham wrote: > > > > > > So that if you use 80% hugetlb, the other memory is forced to stay in > > > the remaining 20%, or it OOMs; and that if you don't use hugetlb, the > > > group is still allowed to use the full 100% of its host memory > > > allowance, without requiring some outside agent continuously > > > monitoring and adjusting the container limits. > > > > Jumping in late here as I was traveling last week. In addition, I want > > to state my limited cgroup knowledge up front. > > > > I was thinking of your scenario above a little differently. Suppose a > > group is up and running at almost 100% memory usage. However, the majority > > of that memory is reclaimable. Now, someone wants to allocate a 2M hugetlb > > page. There is not 2MB free, but we could easily reclaim 2MB to make room > > for the hugetlb page. I may be missing something, but I do not see how that > > is going to happen. It seems like we would really want that behavior. > > But that is actually what it does, no? > > alloc_hugetlb_folio > mem_cgroup_hugetlb_charge_folio > charge_memcg > try_charge > !page_counter_try_charge ? > !try_to_free_mem_cgroup_pages ? > mem_cgroup_oom > > So it does reclaim when the hugetlb hits the cgroup limit. And if that > fails to make room, it OOMs the cgroup. > > Or maybe I'm missing something? I beleve that Mike alludes to what I have pointed in other email: http://lkml.kernel.org/r/ZRrI90KcRBwVZn/r@dhcp22.suse.cz and a situation when the hugetlb requests results in an acutal hugetlb allocation rather than consumption from the pre-allocated pool. In that case memcg is not involved because the charge happens only after the allocation happens. That btw. means that this request could disrupt a different memcg even if the current one is at the limit or it could be reclaimed instead. Also there is not OOM as hugetlb pages are costly requests and we do not invoke the oom killer. -- Michal Hocko SUSE Labs