From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D68F4E75438 for ; Tue, 3 Oct 2023 09:17:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 72FA26B013F; Tue, 3 Oct 2023 05:17:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B8406B0141; Tue, 3 Oct 2023 05:17:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5339B6B0142; Tue, 3 Oct 2023 05:17:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 44D486B013F for ; Tue, 3 Oct 2023 05:17:51 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 100808020A for ; Tue, 3 Oct 2023 09:17:51 +0000 (UTC) X-FDA: 81303597942.12.9EC75A2 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf01.hostedemail.com (Postfix) with ESMTP id 2A60F4000E for ; Tue, 3 Oct 2023 09:17:47 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=OZcPJ8KL; spf=pass (imf01.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696324668; a=rsa-sha256; cv=none; b=aS/8V5zMetKEgKWIOdH6gMIb5KL8XvQqG4/hd2iLR/IuHvVNCV7wG3LxyXfo84cXEOJTWc HnnCP7Sa6FEt5o39jQr/bVwso2adKnKkGW32HhxtonCwgTiHijXQ0+8QTCj9CqIK0DhSds DvXKbOMjL7hSNToKqYaT5+t5zP6i3BA= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=OZcPJ8KL; spf=pass (imf01.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696324668; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tHMY/QK7iViujm+75eY0KL1YTgmg3x9ujdg7FPt3TI8=; b=Tz79Y/7YcfcQ5mPcCW7GElPEfBKpr0E66s1EOUABXAvVfHtCgbXTwdeEvHXt+5ZmeoiEyn 8Np1sbtPPpTk5ZqYhuYVuCi5Opex/Y5p/C+FskgGrXnHmrD9FyCVkYChbb2lBvvi5ZauvE phiI9+ujPaPStV0+E5f6JsJM3BGIpjo= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 050512189A; Tue, 3 Oct 2023 09:17:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1696324666; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=tHMY/QK7iViujm+75eY0KL1YTgmg3x9ujdg7FPt3TI8=; b=OZcPJ8KLJgaqqBIZ9hziftqLWT/5Y6IhTybSpPiZOUyssGqjfwWFz54YyMW7FaL+dKBlJv VBm4OntUrg4/OGf1j101r6hnTUdboh4qHxao0EolR+hTJQM37P+atlbiurMHIfmTYOi+DN AN8l1z9bhEjLJ0sceYNh7Mow9zBzoVA= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E8FCF139F9; Tue, 3 Oct 2023 09:17:45 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id ZdZFODncG2U5SAAAMHmgww (envelope-from ); Tue, 03 Oct 2023 09:17:45 +0000 Date: Tue, 3 Oct 2023 11:17:45 +0200 From: Michal Hocko To: Johannes Weiner Cc: Nhat Pham , akpm@linux-foundation.org, riel@surriel.com, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, tj@kernel.org, lizefan.x@bytedance.com, shuah@kernel.org, mike.kravetz@oracle.com, yosryahmed@google.com, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org Subject: Re: [PATCH v2 1/2] hugetlb: memcg: account hugetlb-backed memory in memory controller Message-ID: References: <20230928005723.1709119-1-nphamcs@gmail.com> <20230928005723.1709119-2-nphamcs@gmail.com> <20231002145026.GB4414@cmpxchg.org> <20231002152555.GA5054@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231002152555.GA5054@cmpxchg.org> X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2A60F4000E X-Stat-Signature: j9r5sgpr45347gqa7ibgayxoqerez8dk X-Rspam-User: X-HE-Tag: 1696324667-471911 X-HE-Meta: U2FsdGVkX18Ubcy9N1eW+eRo1Mt6tgIsO60Jg7LE+IRB/EF4fGPwZbTcgdD6whu7+CyMmkUiPBtrQEZxSEXBy/CWBksNmw9ul/UY5UZAwZC6qJqgTsSFkDE2NTg7LeAUWuE/64NJ5Ocg6zGIyGbzT7b+FFVC+LXgq9kW8Hsb75G0p0o06tVHJ/xTZtcyTbyDsvVBB3rLkk+gZh/q8B2j5JNOfTRmrbu2yluv0xjZbP+UlcqF/1AeVKisGgxitQDqjjgJNJqhFMfaxpWNZaFEGsxavaQfiRN0NdFyAVXQy1WgHT77ESSCxXRhhilt1t+sMgzwqSb5x1SXscfMgxZLnF2ninXKSUVQNEVpTqhN8vDt7BCWeE2aNMAnfaT3FvJ57TsJ88fWBY+zPiToXCclkmcczohJOyRNhYjJToXAegmgpfQessL3Y+GsarvB+rX3z3+cb9EiB+m1nJQMVn7G4tBmowVbBeAIOBwunMYgqCn2lCaEoZ46/8/XKlLck62md26wB6ZWegkwHPOMYZv5OL6QAaRwRKdHaGlM0T/cCuU9dd2yMrS+c9cw2XX4xKXGa4LGnY3FfkxNZtSbg1GH0h4ICZ5fYfccaEOtLQ/75bjjq7td2bVmaCyDW9l4mIiq9WEJnleJFW1MrP5SAiFb/+eVuxIhKf0Xz0ryIEl8JsMwCWygTC7eCVBpEDuk3NNWryNi2gVnm9EQrsBS+ehgojXnx/ozDqrbyM/6Gr/QLS2clM/OLIWIWL/XM0VS3Ecg6l4q8yOhaXNsgAf3n6yzqOUj0sZTQd8iX0S/jFeplwZJH3KcWFdjyp5IuHJZ8qh6YG5ctRqqwdcTbGWH0pDCtgqTjZF/CABwTX9jaS5mJfOySRddPEFkRE/M+FnABXE3eaF3GVw9655D4520+RSQsjVk8zjNrUXIKg75IK6Veja1yD71lCD8ZNGOJIQ2X5WjKRCdZ5TXyWLVeKEL9lq X8sf06qJ T8QA7Za3mV1oqlUkN6mk50kjeZdcBUtnkeRecQqPiLi6OQIz7WNeVeAM+hewvmddH9DtqaUq/j2/3vbSc3K7EAFcO/CUlg5h7iGTSZhFohMWcLZuL6RTGkVsLv5lVmWb3mYwMABBGThDp2RRc9V8v9qoucg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon 02-10-23 11:25:55, Johannes Weiner wrote: > On Mon, Oct 02, 2023 at 05:08:34PM +0200, Michal Hocko wrote: > > On Mon 02-10-23 10:50:26, Johannes Weiner wrote: > > > On Mon, Oct 02, 2023 at 03:43:19PM +0200, Michal Hocko wrote: > > > > On Wed 27-09-23 17:57:22, Nhat Pham wrote: > > [...] > > > > - memcg limit reclaim doesn't assist hugetlb pages allocation when > > > > hugetlb overcommit is configured (i.e. pages are not consumed from the > > > > pool) which means that the page allocation might disrupt workloads > > > > from other memcgs. > > > > - failure to charge a hugetlb page results in SIGBUS rather > > > > than memcg oom killer. That could be the case even if the > > > > hugetlb pool still has pages available and there is > > > > reclaimable memory in the memcg. > > > > > > Are these actually true? AFAICS, regardless of whether the page comes > > > from the pool or the buddy allocator, the memcg code will go through > > > the regular charge path, attempt reclaim, and OOM if that fails. > > > > OK, I should have been more explicit. Let me expand. Charges are > > accounted only _after_ the actual allocation is done. So the actual > > allocation is not constrained by the memcg context. It might reclaim > > from the memcg at that time but the disruption could have already > > happened. Not really any different from regular memory allocation > > attempt but much more visible with GB pages and one could reasonably > > expect that memcg should stop such a GB allocation if the local reclaim > > would be hopeless to free up enough from its own consumption. > > > > Makes more sense? > > Yes, that makes sense. > > This should be fairly easy to address by having hugetlb do the split > transaction that charge_memcg() does in one go, similar to what we do > for the hugetlb controller as well. IOW, > > alloc_hugetlb_folio() > { > if (mem_cgroup_hugetlb_try_charge()) > return ERR_PTR(-ENOMEM); > > folio = dequeue(); > if (!folio) { > folio = alloc_buddy(); > if (!folio) > goto uncharge; > } > > mem_cgroup_hugetlb_commit_charge(); > } yes, this makes sense. I still suspect we will need a better charge reclaim tuning for GB pages as those are just too huge and a simple MAX_RECLAIM_RETRIES * GB worth of reclaim targets might be just overly aggressive. -- Michal Hocko SUSE Labs