From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06BCEC6FD18 for ; Wed, 19 Apr 2023 10:56:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 736518E0002; Wed, 19 Apr 2023 06:56:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E5E48E0001; Wed, 19 Apr 2023 06:56:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 586B28E0002; Wed, 19 Apr 2023 06:56:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4979C8E0001 for ; Wed, 19 Apr 2023 06:56:09 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 15DA1801F9 for ; Wed, 19 Apr 2023 10:56:09 +0000 (UTC) X-FDA: 80697836058.05.8F7980D Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf07.hostedemail.com (Postfix) with ESMTP id CE5C840014 for ; Wed, 19 Apr 2023 10:56:06 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=sUouYFsz; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=ELBnT8OH; spf=pass (imf07.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1681901767; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6B/lPJ1FBZk3kfElh3DpXSFKFglBHclgptoEbB06na0=; b=z22bhfugpgdH9tHf7VeyKElE+rYXq0Df0WZU1BUtXFA0SjQE7URalS02Scl2ICvjdbk/Pc uGZACpQ1F7ekMBVcIfiRNMiUlF9wFjthMFeXLkhbqkxsLMjIn84XKgsJFaC1KbUcYQ7QSC A+yB5yzqLm7RzFIOLyO7UIwpOj1SYNg= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=sUouYFsz; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=ELBnT8OH; spf=pass (imf07.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1681901767; a=rsa-sha256; cv=none; b=msip9h+Uf/Q6ABxqUjgBih7BUByRmJtezFG06aaLBdAqn8EfoTPv4t9RFWD6GcpTMQTjlT 1dgYcen4fBPutWofDceBF0c+9vt0fep6JmBcraSAVIlwvcDji5/b/Vld6BlePfzGOGvGMn qw0Yw/CkwDGRD19MqsK6iz0DA9N2x/c= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 773BA21999; Wed, 19 Apr 2023 10:56:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1681901765; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6B/lPJ1FBZk3kfElh3DpXSFKFglBHclgptoEbB06na0=; b=sUouYFszcFkz2N+1Y9AQWVxB/q0vo5nu5KlLAKrBsqHYjup5mbNrtllgpcPKGZD7XHrj8D W5W8qB0pWXASuKnuaFqcFjWw12dBwYqvYGkxdbwlaBlNjANlWC+Xf1DCjS5IsnJozqVw/X faBHxf4gUKKZrtT3WUIFMqs0V2ZT/uE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1681901765; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6B/lPJ1FBZk3kfElh3DpXSFKFglBHclgptoEbB06na0=; b=ELBnT8OHYgr2gAdFMKJbswBA6YxJjfruq1jYUvdpLZwyhktmpAp4XdQs+pRNDYPtdeJsgC unMBDt7iT7esjrBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 5162A1390E; Wed, 19 Apr 2023 10:56:05 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id V7GiEsXIP2QgPQAAMHmgww (envelope-from ); Wed, 19 Apr 2023 10:56:05 +0000 Message-ID: <7a52f404-175b-5f64-10ec-4b757b1cc3ed@suse.cz> Date: Wed, 19 Apr 2023 12:56:04 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [RFC PATCH 00/26] mm: reliable huge page allocator Content-Language: en-US To: Johannes Weiner , "Kirill A. Shutemov" Cc: linux-mm@kvack.org, Kaiyang Zhao , Mel Gorman , David Rientjes , linux-kernel@vger.kernel.org, kernel-team@fb.com References: <20230418191313.268131-1-hannes@cmpxchg.org> <20230418235402.lq7mxrrre2kl6vsf@box.shutemov.name> <20230419020814.GA272256@cmpxchg.org> From: Vlastimil Babka In-Reply-To: <20230419020814.GA272256@cmpxchg.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Stat-Signature: 8r9o7ohheb5t6cd5xzynz6b7bnj3eded X-Rspamd-Queue-Id: CE5C840014 X-HE-Tag: 1681901766-410722 X-HE-Meta: U2FsdGVkX18obFDrQzzz5P4xIUCG656dmkHeyRpEHqCkhQgN8Tc/Y6D9FVnvaUCpGtwYf1hEnS7Zq8i76nQYa4xsgrH0bk5pOMxjRMstFZNgoJHMoE5IeueYiCMwhyrSYzHVk1H6Xe/HT/OLyGar47xLQefVOeT1FZ8X3Lh/aBeiAQ27g/6pvgah4frKjf0HIXacdi9a85tsUIoPfYBv0rDuYrHS2AVPp3M9v6WIrtONqLLIu4gYOz88i/Fd/9sodoYJAi6Gc5bvl1f90tbNc5Y5MhjDp/Um+WqIBnoI4p2EZum7ykDmygil7RwkTHQN/pw0pmkvXpQ7A3DBgmS49HVDHSsX0UwBjM6Nq5EuP7+SNGEbPtF9+/p15sWzv98HtMMwlvXKd+k/PGNiDHUHuRPpA1faQxaRm3aMDLVoc2HEaPbElpuy5K5IKTQL9iW35QZeLa6wQ5Gr8KiLDTO/em/IT0tv3BaN1xfWAd2f18A6ljl36ZxinCXvxNcYYbmqW316HTRQ2kX7QOWQGd5whn7Jc5pDvZ1a2w2Q4hXMHA1O3ostdScE2vvMt0h1GpU9cme8lcVMf90P7oXJ6yjYUagjRFlXWbhjLbptSiZoz0L+dNYS3dfevjM14vOQrMgWSZY8DzH8kvNcJd17R/ktCTRXtBkxIuZ+LN3cwlWAz9lY7be6Y8Ph06CWeR3GKiq5on8AFgityTkNeg6p9Dmc/1shvu2L4+ZoJytX26EQhcq0P5eVJr8me+KRiizJqCQmSOekpI42ght45NCK/L3EeR/tE/l0RMSoSAEty/pZud5C1yFmSzcBpzsCJ4Nk/zGgqk3zWIrntY+aqztXvE47oq7jQSBeCMaP6Y9YGPb3GiUFHGhofTmksfSRfwH0TMzCW0G5kSoq6ymDFI+rm1p5nkce0Dhe0vEE+tq1GWQo5SVNLNSSOPYp++jKmQ8o5a/aFl4tF+uT+tEeMeLYRJV cevwzxPE twGtZ9iD8hcgFDnaDxR62IDzGY7fp1Ms7kCDJ4Vtp+napmRODtN6i5J2F8XVgT8NM2F9XxJNrECbG9E03JCASHTtrNdRb9leQG1g8fsHbDkMRUQsVn62ynrxQQpizBMvu0qJ5UrYUaxS1ZF7IVFCPcfMZQ1RUKKNTAegMiua+TKtAWEqJHPozTB7zhAL1cNf0pq/jRuY2gGnT9JAZ/NRZpTFaYg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 4/19/23 04:08, Johannes Weiner wrote: > Hi Kirill, thanks for taking a look so quickly. > > On Wed, Apr 19, 2023 at 02:54:02AM +0300, Kirill A. Shutemov wrote: >> On Tue, Apr 18, 2023 at 03:12:47PM -0400, Johannes Weiner wrote: >> > This series proposes to make THP allocations reliable by enforcing >> > pageblock hygiene, and aligning the allocator, reclaim and compaction >> > on the pageblock as the base unit for managing free memory. All orders >> > up to and including the pageblock are made first-class requests that >> > (outside of OOM situations) are expected to succeed without >> > exceptional investment by the allocating thread. >> > >> > A neutral pageblock type is introduced, MIGRATE_FREE. The first >> > allocation to be placed into such a block claims it exclusively for >> > the allocation's migratetype. Fallbacks from a different type are no >> > longer allowed, and the block is "kept open" for more allocations of >> > the same type to ensure tight grouping. A pageblock becomes neutral >> > again only once all its pages have been freed. >> >> Sounds like this will cause earlier OOM, no? >> >> I guess with 2M pageblock on 64G server it shouldn't matter much. But how >> about smaller machines? > > Yes, it's a tradeoff. > > It's not really possible to reduce external fragmentation and increase > contiguity, without also increasing the risk of internal fragmentation > to some extent. The tradeoff is slighly less but overall faster memory. > > A 2M block size *seems* reasonable for most current setups. It's > actually still somewhat on the lower side, if you consider that we had > 4k blocks when memory was a few megabytes. (4k pages for 4M RAM is the > same ratio as 2M pages for 2G RAM. My phone has 8G and my desktop 32G. > 64G is unusually small for a datacenter server.) > > I wouldn't be opposed to sticking this behind a separate config option > if there are setups that WOULD want to keep the current best-effort > compaction without the block hygiene. But obviously, from a > maintenance POV life would be much easier if we didn't have to. As much as tunables are frowned upon in general, this could make sense to me even as a runtime tunable (maybe with defaults based on how large the system is), because a datacenter server and a phone is after all not the same thing. But of course it would be preferrable to find out it works reasonably well even for the smaller systems. For example we already do completely disable mobility grouping if there's too little RAM for it to make sense, which is somewhat similar (but not completely identical) decision. > FWIF, I have been doing tests in an environment constrained to 2G and > haven't had any issues with premature OOMs. But I'm happy to test > other situations and workloads that might be of interest to people. > >> > Reclaim and compaction are changed from partial block reclaim to >> > producing whole neutral page blocks. >> >> How does it affect allocation latencies? I see direct compact stall grew >> substantially. Hm? > > Good question. > > There are 260 more compact stalls but also 1,734 more successful THP > allocations. And 1,433 fewer allocation stalls. There seems to be much > less direct work performed per successful allocation. Yeah if there's a workload that uses THP madvise to indicate it prefers the compaction stalls to base page fallbacks, and compaction is more sucessful, it won't defer further attempts so as a result there will be more stalls. What we should watch out for are rather latencies of allocations that don't prefer the stalls, but might now be forced to clean up new MIGRATE_FREE pageblocks for their order-0 allocation that would previously just fallback, etc. > But of course, that's not the whole story. Let me trace the actual > latencies. > > Thanks for your thoughts! > Johannes