From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C75FEE73158 for ; Mon, 2 Feb 2026 13:11:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12EE86B00AA; Mon, 2 Feb 2026 08:11:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B2356B00AD; Mon, 2 Feb 2026 08:11:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED5D96B00B2; Mon, 2 Feb 2026 08:11:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DC4C36B00AA for ; Mon, 2 Feb 2026 08:11:21 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6FD0A16015A for ; Mon, 2 Feb 2026 13:11:21 +0000 (UTC) X-FDA: 84399552762.01.B3445A7 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) by imf13.hostedemail.com (Postfix) with ESMTP id 3E8A22000E for ; Mon, 2 Feb 2026 13:11:19 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=JFJN+vTs; spf=pass (imf13.hostedemail.com: domain of mhocko@suse.com designates 209.85.221.52 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1770037879; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8JsWhYD8Ccn9O07ciy26zVsalLPwSPT50YwVsz4kOkY=; b=sAE3/jg3ytD5nacxTuPou1svgUzS+an6pilbERu0GoDNRNfgjuyyINDMUcVCY9jUrRvtzD q3UkTOnehWTLNFz2YZGIWHQjQYeqCqqAiO6IcXg5ZEZBLzNbMjrg/qOlDC4tccGLzZEHod ZgQogMKruuEZbCpAmyRY+FEaY8T8kqE= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=JFJN+vTs; spf=pass (imf13.hostedemail.com: domain of mhocko@suse.com designates 209.85.221.52 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1770037879; a=rsa-sha256; cv=none; b=S2t1eqgVcSM9oUDKPtynfekr4WOvVuA0bbb0tSj8uDwphtWz1v6lq5RtawmYuz1RUGVuM9 xUu3Rzc8bODRdHu1H7A1noFEBgNwfBZHCuydaBynmjyuGeDJy6nAFLMaR5zPyFUX2yCd1x ZIozSqN3RVmsrwYbu48a60//3BtKNT4= Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-42fbbc3df8fso3407212f8f.2 for ; Mon, 02 Feb 2026 05:11:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1770037878; x=1770642678; darn=kvack.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=8JsWhYD8Ccn9O07ciy26zVsalLPwSPT50YwVsz4kOkY=; b=JFJN+vTsBcBPXLEk/7jY3HU7tWYSufQgszHGkogIK/HKmvimGyJhkhv2Tdl+FLvUlG do9ng/bt9qrM+b/VM5GCQS9gE7H0CBBzHGnnXYGJLsbmdqN/64JLZegVuGlbRVKs9ltn Eb0iesUS8LPdImTsrgOSjR0WiaHwVg6idRPggFTvRQQx2712PrwmtaTtw0L4lzyemJWQ xhTpEezsqydy910tezFCWdZ9dLYvNr7f9LgSVynhHKHDFyiOEYWbO4vG3eJ9F6LO+Z7d Jvn3RzhCZ1R1zxtFXjtQyc0xqrdCYg6D3Zl2pq2G6IBytDbB3qQspT8KVhF/l3FWlh2N edHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770037878; x=1770642678; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8JsWhYD8Ccn9O07ciy26zVsalLPwSPT50YwVsz4kOkY=; b=D6ppvwo3qs7Jxt7XCyTLR3H8Di5A6tHv4DvOGpsc4QdAoIS1rWMDrI/abzINp+iez+ BlzpH2E/s5p+z++EX5+aodfqPZ1P3Prb+m8LqjHyamy3fUj6kjMUbfr2WHTFIlJmlDyb B0WOrfkYFDIgVuoxLWi2mTiSCSG+Ye4CQkKazwF3qvmf0mVkfanA5sQ1VErR7wNT9u67 BjPGQ7q6uaZ2PuAXWwuqNwx+NmptkFz2dt83ELUg83QHf81mnve6xrdirLjTuLgmCIzG NthU/GR+/wVWTP36CC/HZ1vVER8hDNr2TUBcWYywNfn8/ep1HzcD17hS/bvoQgDTaYFV 6ZUA== X-Forwarded-Encrypted: i=1; AJvYcCXBuOgSqe8moJmLno9kJ0zXLbIR01w7WiDw0Q3xSeyC3bsSkoLBDH6GyAhKkrZhioq0+Iu6M0Hl7w==@kvack.org X-Gm-Message-State: AOJu0Yx0mbvdyNej0UmqLarAE3aO0mfh6bewXoQJrQipJje+vqxmCdob pE2UXYWS0JW0WLHkrQlQWna1Y5fOj50YDNBU3paYjkFCCyn0hXHDboPK5GW/g/prFI8= X-Gm-Gg: AZuq6aKFsuD4ehx7llu3mMQkQ+beoaNMnrH76kJMMKQsztIzGCWVQjFiptbGzjTr4VZ 5KK5UXrtdYY0VOL2cOythoGu3NHfGf9oprGDTBwOgFBPWqSu3lMW0Xd77aErn4kF3qy1oK2WbrL XiYMTXRBUAcew6u1ozCJCcO1ksWqtD0LNZVBY9ovRaHgudMpkPuS/jW+sX0LHD9IxsG1yC0OZNy ptJ6DSDkUNYVuBdcU1pOsX58ruu/WoGwdXqE8iuFOzem0SLsQZbiJPa+avn56he+eqvo+QDwKbR EMaZv3z984ya47WZgNGFp+msqMTMNcsYE5qjdG6g4kkeAQ+MwLEK8+1jK8+FUKxM6JqvqIKWJZM PPnrbuXDqtgc0hS6NbUiwyTjE1xA43+SdVNo+qLqrM8z8jJVjVs8fUcY+d72ZNmSLckbWbXAZAd 5M+TVBMnAeXJU/6/n1Oex+XO9vQieJNkzTnZE= X-Received: by 2002:a5d:64c7:0:b0:432:5c34:fb22 with SMTP id ffacd0b85a97d-435f3a7bee5mr16538747f8f.22.1770037877469; Mon, 02 Feb 2026 05:11:17 -0800 (PST) Received: from localhost (109-81-26-156.rct.o2.cz. [109.81.26.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-435e10e4762sm41985148f8f.6.2026.02.02.05.11.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Feb 2026 05:11:16 -0800 (PST) Date: Mon, 2 Feb 2026 14:11:10 +0100 From: Michal Hocko To: Akinobu Mita Cc: Joshua Hahn , linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, hannes@cmpxchg.org, david@kernel.org, zhengqi.arch@bytedance.com, shakeel.butt@linux.dev, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, ziy@nvidia.com, matthew.brost@intel.com, rakie.kim@sk.com, byungchul@sk.com, gourry@gourry.net, ying.huang@linux.alibaba.com, apopple@nvidia.com, bingjiao@google.com, jonathan.cameron@huawei.com, pratyush.brahma@oss.qualcomm.com Subject: Re: [PATCH v4 3/3] mm/vmscan: don't demote if there is not enough free memory in the lower memory tier Message-ID: References: <20260127220003.3993576-1-joshua.hahnjy@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 3E8A22000E X-Stat-Signature: dpxrpedwjnsi1usonjn1rkwsqxi6ix5n X-Rspam-User: X-HE-Tag: 1770037879-440172 X-HE-Meta: U2FsdGVkX192I+3ubb1Vdd3QNdyfV+EfYQViqIZsfXjOO/G+j7y8lnldLzLbPmUauHZTzj9qwE0pOXxcv3/2JYUXXLaK27oI7YL6vKOiAfHKJOEXw6vqE/xehdCDsBJ8GZdFb+GutxcWQoNR1ytRYmeM1MGns9ub65D2N/Xn9nafDrVfUK+zXftpXd/6BcHjnTHGPUC0eB0eWOoqDDzj+X+uzVeHyspYJt+ZAXVlWaAtfSXn+AqcGUi0lJKUfyj7zuvHCXsCXbeL206DOjEAf1489hTApOYg/KHa/eFtLA3X/MW3M+Xuiae3SsDjfnpa6bYjQbZSADfjVn91SjTmNkuEOVFyUpFo2nxkxWNITItLQroeWFF6ra7et0qS5eDvz1yx4EvjqBKB2q3eursoaxbDeb3JoKKpsdfk4qOearnvrSoKwGT91ymUlgkok2MP35+WJTKU9bnRknlMDsGyiQ04yADFJw125TIYFao4RAJYkRwdUCD8lcPetETIvgIDX7dOoVcnHuBYMR8R+t5UkdiGBEpNrSGXz3n432tjn0rgaY7+T1rXEyjxsPmV5WfVCwMFRoH9rgG54AL+aRWZwe0R/2dsTOHyrl6FKvkS0GEHiBs6W1bC/U65cG1ULDgL1ELTcZuq5zC3/9qkT3PEuuzbSO0JSRkdcKlEEdkU/pGlSDpaxFndkFLEKCG9kpA7iCwRXJ/JnZyLmOGxFqvi6NHbHq5XAgOKVTj5tp0ouK/HBPjTgq/Srd57T7gJhXF7pUPdWfpyUbA9FEb/oRc8c1DbFFz+ho3lHYNW5yhJZRTw+4+wbIA/QMZSqJ8VRFz4tmrfuVj5Giop+WK4xqBScwIlIdCGIzbJOdTEkpTJvsOE0p/p+Xba4ZbLopwbCwrlLjuZHqv6S9RkRgMUoVzadR2lKTsGj6ycRbVhzZqa0JI4yOp/i6lxC5j1RoeomgIVkFy33wzjFdUIRFTkc1c BWGCOVgV 8QEsTVYtkT7HKlAcwlm+v4jz0+rhSOaR/I1VJ4A9gJ73rzKOZdjss6OmKDa7HJRJR42nHWkHtc5wm0bzUxpmggrLk2YvByy9ldegspzgwTfQk1s7SWSHHuBdokgph70o1i1G9VpSUV5mFYkdNdt1TlFrZo3FiTskOebqcBFXKAXYCeiVPMK2fS2rkLxjK3g4u6OTQjLTo6JeaSBPXWQeQCxW67NsDVO0flcfKkUzv+0ncDCnfk5htadZ+aC6RHmFid7y4jOfcFGpPlvzi4kWLKPGeHvr9qypCA66bwWu3uDnAZnFxEgS1v4DvQEnWfJg3C9U2/+2kg1EFQKRYRgdWlLqhY/7Gv960qQIOxoalGgSgVEhhWqtSM49pup+1xDozIUGBnkLcYUaDnkr5WDtdjSFgFrLV8ackB2FgA0UY3GzXyw/nXHTv5uGQlK1pL8GmNv+/54dSdyUgCqImIec0GwUOsW7P9KTasXjjE6odCPF7VSiPx5xegNlGjhcumN0WXDdk X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu 29-01-26 09:40:17, Akinobu Mita wrote: > 2026年1月28日(水) 7:00 Joshua Hahn : > > > > > > > Therefore, it appears that the behavior of get_swappiness() is important > > > > > in this issue. > > > > > > > > This is quite mysterious. > > > > > > > > Especially because get_swappiness() is an MGLRU exclusive function, I find > > > > it quite strange that the issue you mention above occurs regardless of whether > > > > MGLRU is enabled or disabled. With MGLRU disabled, did you see the same hangs > > > > as before? Were these hangs similarly fixed by modifying the callsite in > > > > get_swappiness? > > > > > > Good point. > > > When MGLRU is disabled, changing only the behavior of can_demote() > > > called by get_swappiness() did not solve the problem. > > > > > > Instead, the problem was avoided by changing only the behavior of > > > can_demote() called by can_reclaim_anon_page(), without changing the > > > behavior of can_demote() called from other places. > > > > > > > On a separate note, I feel a bit uncomfortable for making this the default > > > > setting, regardless of whether there is swap space or not. Just as it is > > > > easy to create a degenerate scenario where all memory is unreclaimable > > > > and the system starts going into (wasteful) reclaim on the lower tiers, > > > > it is equally easy to create a scenario where all memory is very easily > > > > reclaimable (say, clean pagecache) and we OOM without making any attempt to > > > > free up memory on the lower tiers. > > > > > > > > Reality is likely somewhere in between. And from my perspective, as long as > > > > we have some amount of easily reclaimable memory, I don't think immediately > > > > OOMing will be helpful for the system (and even if none of the memory is > > > > easily reclaimable, we should still try doing something before killing). > > > > > > > > > > > The reason for this issue is that memory allocations do not directly > > > > > > > trigger the oom-killer, assuming that if the target node has an underlying > > > > > > > memory tier, it can always be reclaimed by demotion. > > > > > > > > This patch enforces that the opposite of this assumption is true; that even > > > > if a target node has an underlying memory tier, it can never be reclaimed by > > > > demotion. > > > > > > > > Certainly for systems with swap and some compression methods (z{ram, swap}), > > > > this new enforcement could be harmful to the system. What do you think? > > > > > > Thank you for the detailed explanation. > > > > > > I understand the concern regarding the current patch, which only > > > checks the free memory of the demotion target node. > > > I will explore a solution. > > > > Hello Akinobu, I hope you had a great weekend! > > > > I noticed something that I thought was worth flagging. It seems like the > > primary addition of this patch, which is to check for zone_watermark_ok > > across the zones, is already a part of should_reclaim_retry(): > > > > /* > > * Keep reclaiming pages while there is a chance this will lead > > * somewhere. If none of the target zones can satisfy our allocation > > * request even if all reclaimable pages are considered then we are > > * screwed and have to go OOM. > > */ > > for_each_zone_zonelist_nodemask(zone, z, ac->zonelist, > > ac->highest_zoneidx, ac->nodemask) { > > > > [...snip...] > > > > /* > > * Would the allocation succeed if we reclaimed all > > * reclaimable pages? > > */ > > wmark = __zone_watermark_ok(zone, order, min_wmark, > > ac->highest_zoneidx, alloc_flags, available); > > > > if (wmark) { > > ret = true; > > break; > > } > > } > > > > ... which is called in __alloc_pages_slowpath. I wonder why we don't already > > hit this. It seems to do the same thing your patch is doing? > > I checked the number of calls and the time spent for several functions > called by __alloc_pages_slowpath(), and found that time is spent in > __alloc_pages_direct_reclaim() before reaching the first should_reclaim_retry(). > > After a few minutes have passed and the debug code that automatically > resets numa_demotion_enabled to false is executed, it appears that > __alloc_pages_direct_reclaim() immediately exits. First of all is this MGLRU or traditional reclaim? Or both? Then another thing I've noticed only now. There seems to be a layering discrepancy (for traditional LRU reclaim) when get_scan_count which controls the to-be-reclaimed lrus always relies on can_reclaim_anon_pages while down the reclaim path shrink_folio_list tries to be more clever and avoid demotion if it turns out to be inefficient. I wouldn't be surprised if get_scan_count predominantly (or even exclusively) scanned anon LRUs only while increasing the reclaim priority (so essentially just checked all anon pages on the LRU list) before concluding that it makes no sense. This can take quite some time and in the worst case you could be recycling couple of page cache pages remaining on the list to make small but sufficient progress to loop around. So I think the first step is to make the demotion behavior consistent. If demotion fails then it would probably makes sense to set sc->no_demotion so that get_scan_count can learn from the reclaim feedback that anonymous pages are not a good reclaim target in this situation. But the whole reclaim path needs a careful review I am afraid. -- Michal Hocko SUSE Labs