From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5863C021B8 for ; Wed, 26 Feb 2025 13:06:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ECF1E280019; Wed, 26 Feb 2025 08:06:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E808F280015; Wed, 26 Feb 2025 08:06:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF7B9280019; Wed, 26 Feb 2025 08:06:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 97EFD280015 for ; Wed, 26 Feb 2025 08:06:30 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6BF4281529 for ; Wed, 26 Feb 2025 13:05:50 +0000 (UTC) X-FDA: 83162118060.09.3FC2504 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by imf09.hostedemail.com (Postfix) with ESMTP id 18FB3140039 for ; Wed, 26 Feb 2025 13:05:44 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="IK/nEm52"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="+/B24NVF"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="IK/nEm52"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="+/B24NVF"; spf=pass (imf09.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740575145; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PQM5xjD9lH5bq8rvklVoeF1pebuhqAZWZdZyRFTQoCE=; b=GHHOetFU3VOIMArJ0gmuL+7JM+2f2mMvIIBYUodG16tTn7869L4S2DN3FraMnUVkrwsVw1 G1kZjskZMwNCjMGSb6kd8waMg6eAszIu4GGPVJPSmrCGVoSbFyCh6LoSBbrDc1py9P/ezt wIoAe0eBnLwJUNsd9GmMQZCMsHmsVnY= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="IK/nEm52"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="+/B24NVF"; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b="IK/nEm52"; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="+/B24NVF"; spf=pass (imf09.hostedemail.com: domain of vbabka@suse.cz designates 195.135.223.130 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740575145; a=rsa-sha256; cv=none; b=WBpBiiP2UGmaX2EB9vMjqE5nXVtKCfcI+sdMaUw5i5ucpAAG1DUOjfhbdDWy2GE99RLrnN iOcd8CE/wj8FmFpmYrUSoAhqujcn3RC9jRE6o0oXkPJ+Uelytn725eA6TNpf7+5RAJcDOG Bi/WPNuj/r7duWcSnfCKSoAdZU3OrKk= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 7BE872118E; Wed, 26 Feb 2025 13:05:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1740575143; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PQM5xjD9lH5bq8rvklVoeF1pebuhqAZWZdZyRFTQoCE=; b=IK/nEm52m3OqouU/52iJJw6UBdfzuTTBar5CcoF4rtW/vKhL5bhvOTRFkbPmlwT7loafev wlJWM9G4Nn9Sv14OCIDQdOnHQPar6d67YaLuXJQzE6rOxUJQpMI4XjdKG4B2Fjbq7UaYnb o5u2rQxGlB2OvSvRaBDC82JeJGhpAZA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1740575143; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PQM5xjD9lH5bq8rvklVoeF1pebuhqAZWZdZyRFTQoCE=; b=+/B24NVFI/vfmZ4HmgQ56zehiRrzx0kxuJQ45wJyeGhlvHx4GW1x/efjtYKEDlEyA2CHCU b9dIXy7/XgQT6gAw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1740575143; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PQM5xjD9lH5bq8rvklVoeF1pebuhqAZWZdZyRFTQoCE=; b=IK/nEm52m3OqouU/52iJJw6UBdfzuTTBar5CcoF4rtW/vKhL5bhvOTRFkbPmlwT7loafev wlJWM9G4Nn9Sv14OCIDQdOnHQPar6d67YaLuXJQzE6rOxUJQpMI4XjdKG4B2Fjbq7UaYnb o5u2rQxGlB2OvSvRaBDC82JeJGhpAZA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1740575143; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PQM5xjD9lH5bq8rvklVoeF1pebuhqAZWZdZyRFTQoCE=; b=+/B24NVFI/vfmZ4HmgQ56zehiRrzx0kxuJQ45wJyeGhlvHx4GW1x/efjtYKEDlEyA2CHCU b9dIXy7/XgQT6gAw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 6F6211377F; Wed, 26 Feb 2025 13:05:43 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id OqXoGqcRv2dtJwAAD6G6ig (envelope-from ); Wed, 26 Feb 2025 13:05:43 +0000 Message-ID: <4831e8e7-ebe6-436d-8c7f-a5940083a508@suse.cz> Date: Wed, 26 Feb 2025 14:07:27 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] Revert "mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone" To: Michal Hocko , Gabriel Krisman Bertazi Cc: akpm@linux-foundation.org, linux-mm@kvack.org, Mel Gorman , Baoquan He References: <20250226032258.234099-1-krisman@suse.de> From: Vlastimil Babka Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 18FB3140039 X-Stat-Signature: b9f4r6sbbxy7n61yzqijxa9ycw5tmdp1 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740575144-484702 X-HE-Meta: U2FsdGVkX1/sJQDJJ9dwJwqliQb1GBXFrPU6ZyYuqiwvRIFd0kbLTeY+VJE94uzf7jNbke8uLipnAiTn4Y//K6jFls1XdadEq7WJkVt0C3JEqqT2HM22R7Eyoe2g9i3N9umel4mXM+FOoSSCjrbSeNQxDIUM1aaxbEXbsV3Bixp9WqvD24rp5fyjg2MYoTXB7u3z8d2kCw/t6ePU5lr5wwmoPe2dQBqL8qQPiDxdERLIW69p6k0MsFhFJ8lUVTCKvRunauD1HPfNGvD36k0yjJnGDbUIjc8aRgPAPJtoEVnmSjGji1CehyXGf2Ln8wa++Fup5XS295bCUCGEpYHvo8lT4kXLYhurlPmiVCTCIgYbrC5SZ3JgTAQ1BgriXmRcIP6zb9gTTFZR8xq20RFw/1LBZCccwVvXoaS5Hfjjbqi9Bbc2nnM8nZ1leiHtTZ3m01eLdwrsXxkxCs49hzNuwhNfDE3Ys7g6Gse8M+/h2RpbEVnfwR4G2NLVTJ5gAtv99qTxL2N0mNcxGVGZLVOPxdRPYlZn66uJLkae51KEYVxQ09OEbrI4FB3CroJ4axkZGOqcUH22udSrPmSdFNm12b4z1GeRoEAfws212jYEjR4mX8PpN1sUR3bOXXl8PJeu+/DThPZCmK1BgCjZeQUj8xkM3apzrnQKly7cacUdJGoATaVXXsjKJx4nuksDG+eS3bUkedMVjBb5CGiq9c19SGW0x7c4u2YYW3/S7vuit53Y4avd5YkJXzc7zEOIu0IlkC3zGavwqXbdCiOxaByffXLNzCJuHGoXHKFrUyy7eJh3n2m1av1FutjBNyBqA0Rq9KvU9V7WIwtVnA6qv8Z5RakAMhdlqw97H8kIkmR8x1GUEKxL4AfzwV0Moynij9q4BYAqTVe+1BaOt+amUWq3jDeLMEfuu8OEMmG9/gI50+7AeDlVuOn/20jBENSdL2jG7fte8+XX1cZ89TQXqgz shk/Ygw2 Zc42uqvvuMvw4rCgmrRWfFyVcbb5vDqHg8Y/eONoFLniKCOUGREGaAogjRmQOZ81mVvc36wc0/z7WC1DcCTjtxKPE2+6AWD9B962WLQTQu+YBl6Go7VnG8FMKYqVMP1CBgaHyncq0dA+hP+F0GErCw9sm0vFRQi75TpNPb2hFZiiZwbnWhDOiShrLwdQDZ5Y8zZ4PFHeZ02sm4MOuKc/rhKHjDvEvdxVWP39IuBFrjuv1LaL345zRZnUNCUhntqvzFL2UutDodqqP3lVjOr+xBTIMLgIz5mVHy7dDX51kkrbwQRR2+6jPSXmNW6yGRR7mgJQPqIKBoZoHAqWMpc1ne0KD4TDFRgp35Ov0I02YXW0pd0nX9KNXiJTt482JfyiCUx67pa2KZNKXPQQHpZnq/2f5TBJJYpVKihKRVBAfY3cem79/AgJl6jX2fw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2/26/25 7:54 AM, Michal Hocko wrote: > On Tue 25-02-25 22:22:58, Gabriel Krisman Bertazi wrote: >> Commit 96a5c186efff ("mm/page_alloc.c: don't show protection in zone's >> ->lowmem_reserve[] for empty zone") removes the protection of lower >> zones from allocations targeting memory-less high zones. This had an >> unintended impact on the pattern of reclaims because it makes the >> high-zone-targeted allocation more likely to succeed in lower zones, >> which adds pressure to said zones. I.e, the following corresponding >> checks in zone_watermark_ok/zone_watermark_fast are less likely to >> trigger: >> >> if (free_pages <= min + z->lowmem_reserve[highest_zoneidx]) >> return false; >> >> As a result, we are observing an increase in reclaim and kswapd scans, >> due to the increased pressure. This was initially observed as increased >> latency in filesystem operations when benchmarking with fio on a machine >> with some memory-less zones, but it has since been associated with >> increased contention in locks related to memory reclaim. By reverting >> this patch, the original performance was recovered on that machine. > > I think it would be nice to show the memory layout on that machine (is > there any movable or device zone)? > > Exact reclaim patterns are really hard to predict and it is little bit > surprising the said patch has caused an increased kswapd activity > because I would expect that there will be more reclaim with the lowmem > reserves in place. But it is quite possible that the higher zone memory > pressure is just tipping over and increase the lowmem pressure enough > that it shows up. My theory is that the commit caused a difference between kernel and userspace allocations, with bad consequences. Kernel allocation will have highest_zoneidx = NORMAL and thus observe the lowmem_reserve for for ZONE_DMA32 unchanged. Userspace allocation will have highest_zoneidx = MOVABLE and thus will see zero lowmem_reserve and will allocate from ZONE_DMA32 (or even ZONE_DMA) when previously it wouldn't. Then a kernel allocation might happen to wake up kswapd, which will see the DMA/DMA32 below watermark (with NORMAL highest_zoneidx) and try to reclaim them back above the watermarks. Since the LRU lists are per-node and nor per-zone anymore, it will spend a lot of effort to find pages from DMA/DMA32 to reclaim. > In any case 96a5c186efff seems incorrect because it assumes that the > protection has anything to do with how higher zone is populated while > the protection fundamentaly protects lower zone from higher zones > allocation. Those allocations are independent on the actual memory in > that zone. > >> The original commit was introduced as a clarification of the >> /proc/zoneinfo output, so it doesn't seem there are usecases depending >> on it, making the revert a simple solution. >> >> Cc: Michal Hocko >> Cc: Mel Gorman >> Cc: Vlastimil Babka >> Cc: Baoquan He >> Fixes: 96a5c186efff ("mm/page_alloc.c: don't show protection in zone's ->lowmem_reserve[] for empty zone") >> Signed-off-by: Gabriel Krisman Bertazi > > Acked-by: Michal Hocko > Thanks! > >> --- >> mm/page_alloc.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 579789600a3c..fe986e6de7a0 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -5849,11 +5849,10 @@ static void setup_per_zone_lowmem_reserve(void) >> >> for (j = i + 1; j < MAX_NR_ZONES; j++) { >> struct zone *upper_zone = &pgdat->node_zones[j]; >> - bool empty = !zone_managed_pages(upper_zone); >> >> managed_pages += zone_managed_pages(upper_zone); >> >> - if (clear || empty) >> + if (clear) >> zone->lowmem_reserve[j] = 0; >> else >> zone->lowmem_reserve[j] = managed_pages / ratio; >> -- >> 2.47.0 >