From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 693CBC3DA4B for ; Wed, 17 Jul 2024 12:06:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C30C36B0098; Wed, 17 Jul 2024 08:06:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE1326B0099; Wed, 17 Jul 2024 08:06:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A82266B009A; Wed, 17 Jul 2024 08:06:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 88C606B0098 for ; Wed, 17 Jul 2024 08:06:52 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id E72A9C0A97 for ; Wed, 17 Jul 2024 12:06:51 +0000 (UTC) X-FDA: 82349118222.24.FF29D2B Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf15.hostedemail.com (Postfix) with ESMTP id BE759A0031 for ; Wed, 17 Jul 2024 12:06:49 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=YnxeUrr+; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf15.hostedemail.com: domain of mhocko@suse.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721217980; a=rsa-sha256; cv=none; b=usBWHlaVLDE2OH8JJZCV25fORFRcNWog0ZXVfizQj7/QhqEizq3MobYU0IhYlWYR25MArW te4KYSHwJSrHtZJEkLVJaQhSEnqfyAA/31HlC9SU9eKlzL8ORgbeAs1d1O9AJaOtixsrHL 6irBYN74+8PLsjfUtVNWIuciuI10fvw= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=YnxeUrr+; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf15.hostedemail.com: domain of mhocko@suse.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721217980; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1A+uhvZFWN0/aI6TtzrTAyQXjVBtroZMv7k34eBegwY=; b=0WxY3iiCdDeI39537VOU4oEbGlvbjuCibnzf+HSI1VV36vsW1a6KAMcKm7vSXSgpmMHtOZ QeXYiM5nlJBpB7G5BtMd33rVqMF4WcmXdCQLJm8/4lrokgOTxcKmCzg+oTzF5vv4di7n2G 7TqqdsIHvpMsDVx27x/RE/qlPqIXg9Q= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-59f9f59b827so2096342a12.1 for ; Wed, 17 Jul 2024 05:06:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1721218008; x=1721822808; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=1A+uhvZFWN0/aI6TtzrTAyQXjVBtroZMv7k34eBegwY=; b=YnxeUrr+JphW8TFElEKCRBWbcguG0RI8k/EX0r/s62BTmIAYcbbhiKLWw5eqbfzfi2 2DSqtLt6bwFnUYSKrPB/Nj+oZ3kHvMMKsTiJu2V2bAhO3jnKmmYpNFvQ/BJvWeodmr7o huGeWYUEL2wYXhiDJOFhglzCb6VmM7m7MP3A014e5VIlUI0rTbIUQzn1Dz1+txaK3S0P Fhj0eK8lYUxw+w62HBHy7vGX+hQajlss7sY9zYLmMEX4xeCI4pt2y88OJoaTsSpOmylt SKdizRXSBF4FAgfuqBbUSTeaeddOCA5B7MkV3nYeAOvqqP9Eg5vaOuCW/+Hzy2F4Tn/6 3ekA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721218008; x=1721822808; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=1A+uhvZFWN0/aI6TtzrTAyQXjVBtroZMv7k34eBegwY=; b=oAn8dImYLWI5YKG99cyvLydipZtton4+p7JqW1fjo804/AsRaEgqcOpUix3J3F5rTW D5P1OlvMP3pw90cFjz7glSdmVRD02nZvA5vOYRMwOoccKX5512MDRHjxtTYGi94aqWN8 P7Xab0QRG1dXaP2jCPiR0sedgO7Xshj6+sFTk+Plec1SaeJFgQG3ehTexeElCUQ3uWR/ vjXeJ49WcgPZ0OTX/z4DVgGXYRo6oeEsxkKaFOVKfb5SJXdYqszQczrfVLseiZvH6J8P ec2vjUELYanKmw/yFOI6z2by4F/Zky9JyQ5RXZJ73jmXPUnOlT04PNBhKA6/zYQqQoWh 5suQ== X-Forwarded-Encrypted: i=1; AJvYcCXxIkK/9jh7B46OU5lMgPl5Ty9Y2RoDHmAP4BfN/Eylb/qk2luwTrQOjXYvHekksDPeqpm2nV+Sh5/0fsD7Kbh+K6g= X-Gm-Message-State: AOJu0YzloieTI2M8dRYr0/eqH1NRBSumgjfJoePdW25WWih81CVTXCNH ZM6MVGVgPfuNWL0JIZKrjn4959Rl1eo4UEFILXXqIyjVrC0UwTffi1TIOI2JLbc= X-Google-Smtp-Source: AGHT+IFKEQnsVc87r2NlBgc3y7tiA0/99uSXp6CQ8sd9Ors0/3XF6WibUv9rRg2aDIdfojAjEABnDQ== X-Received: by 2002:a50:c30a:0:b0:59e:b95d:e748 with SMTP id 4fb4d7f45d1cf-5a05b416a9fmr1228473a12.8.1721218008012; Wed, 17 Jul 2024 05:06:48 -0700 (PDT) Received: from localhost (109-81-86-75.rct.o2.cz. [109.81.86.75]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5a0c2a750c0sm551431a12.60.2024.07.17.05.06.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jul 2024 05:06:47 -0700 (PDT) Date: Wed, 17 Jul 2024 14:06:46 +0200 From: Michal Hocko To: "Kirill A. Shutemov" Cc: Andrew Morton , "Borislav Petkov (AMD)" , Mel Gorman , Vlastimil Babka , Tom Lendacky , Mike Rapoport , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jianxiong Gao , stable@vger.kernel.org Subject: Re: [PATCH] mm: Fix endless reclaim on machines with unaccepted memory. Message-ID: References: <20240716130013.1997325-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: BE759A0031 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ngsqsd4u1rjfb7abrryumtjkmgn4gir9 X-HE-Tag: 1721218009-425237 X-HE-Meta: U2FsdGVkX1+63bDsk+J2Kzvv0WMJHLborFq7PjQ8mqY6Fjk4vT1w+VnkTKTNzp7IefRX2QZPtiscg0rmgGzvS6FzJ5GFTyyXpsTFcrwJo2hGnkKMXJqr4ob5mtiC18rGh1CO2sGbvajr4Yq/CQaDpLghxtou9E0MQmQqa3G7Di2kxGdEUbtSGP4dRCOes+xJii0ZMzmU1TtX4AJp97EVnFJVsb3mj96NsbV5dQz/M/L5bnxeGgy9puyOUBQ7SuV3QMgSsMWTd/UBeZ3ipkiu5f0N+w7jYyL9i9k2OAj0DcaOBiNqKeR5qKnz9BNRIxnSUHqvZij5q3OYferdh8ac9r3MmchLZzeE+EfVHGuTO/WmtvxWqqH93twDvrw9SeTaYvj4xVFmBBK/iR6M8oFK5KkC0o/IVu3WA5jImb6ruNVs8mhLChwfBTR6Ldu5xoEAr46GvA5dvgvAzxWkdtm8+9KTuDiAtsQ+Fo9omPqVFDsSjAz5rEUbKX92EUyJKtEoTgQSn4n2FsQyV6CEVJ31kPovmF87I49a+gwmOqCNytneVNGFrEjA5A0Btn/St/0Ni4YzSJNTGB96a5Tl2ltV42IW17yyhqHsaoa8iyyDm8pChcyMe4vFvGzSTFbSH6jN73Z51FlRI6pnJouNQySjrx7qQnJxMvakfMAyuZyLIa1zoSLkbu0tXPAHCw22psSnhw/wA4r8+Bn7mM6Iuvu94TPVdmnL+zdgKAr5d4qWJzrvzTsNXsvjRfDrQDEAizkIHmcoQAAdNPOkobTY93A+H60x9AcxtFlBoQHcYqZyDY4BPSFbJSGTe95srbCxo9UN8nY1gv9d2Cz0bxIRlALE1XnxbHO3Dxygzxg+tqyCr0hGOubJugXFnou88E2tXnUVttm2gZYymY0RCHUKPo7stQme0AFO3BXq/xu5ify5mphE2xvhOa4PYYEe9afZzneP8g4GoBW0gKXhUAO/QlG 4kNBhImv 6stsK2hJGcGLE3CqpqgZ8y1NetMhOwlcOnZGRm/AdmpBkLoXv79Tbq4T2zd3YEY47PXoa1Fpk+z8H8K8k2DbVwhuXDlJt9qeoTAkQtzM1ZUqWanhvdozxxfugXTuydTlxv7dFd7QiswXC86SZvvHNle2qPKwgh6k9FdWZ+d2HiR6GZcv/ygSPzhemqhXvU6Gae5LbKU/ePiKV/x5IUK1gdniWAZR5jQDaqU5riDDlHAbGP8QV2ZAGK28nT176se8HoW3UmOXtnReMVdsCDUqv+Cp1PFXM78MWexLho3iF7G1pf7+/4xd4/SlM/I5vAz29gD/87BO6/jkCp16Z2Pw9t2aD7IBuAJZTO4UCfB58EY5b4nvYDoJWybhDzR3RURNI6kVC6L1ufUF//oo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed 17-07-24 14:55:08, Kirill A. Shutemov wrote: > On Wed, Jul 17, 2024 at 09:19:12AM +0200, Michal Hocko wrote: > > On Tue 16-07-24 16:00:13, Kirill A. Shutemov wrote: > > > Unaccepted memory is considered unusable free memory, which is not > > > counted as free on the zone watermark check. This causes > > > get_page_from_freelist() to accept more memory to hit the high > > > watermark, but it creates problems in the reclaim path. > > > > > > The reclaim path encounters a failed zone watermark check and attempts > > > to reclaim memory. This is usually successful, but if there is little or > > > no reclaimable memory, it can result in endless reclaim with little to > > > no progress. This can occur early in the boot process, just after start > > > of the init process when the only reclaimable memory is the page cache > > > of the init executable and its libraries. > > > > How does this happen when try_to_accept_memory is the first thing to do > > when wmark check fails in the allocation path? > > Good question. > > I've lost access to the test setup and cannot check it directly right now. > > Reading the code Looks like __alloc_pages_bulk() bypasses > get_page_from_freelist() where we usually accept more pages and goes > directly to __rmqueue_pcplist() -> rmqueue_bulk() -> __rmqueue(). > > Will look more into it when I have access to the test setup. > > > Could you describe what was the initial configuration of the system? How > > much of the unaccepted memory was there to trigger this? > > This is large TDX guest VM: 176 vCPUs and ~800GiB of memory. > > One thing that I noticed that the problem is only triggered when LRU_GEN > enabled. But I failed to identify why. > > The system hang (or have very little progress) shortly after systemd > starts. Please try to investigate this further. The patch as is looks rather questionable to me TBH. Spilling unaccepted memory into the reclaim seems like something we should avoid if possible as this is something page allocator should care about IMHO. > > > To address this issue, teach shrink_node() and shrink_zones() to accept > > > memory before attempting to reclaim. > > > > > > Signed-off-by: Kirill A. Shutemov > > > Reported-by: Jianxiong Gao > > > Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") > > > Cc: stable@vger.kernel.org # v6.5+ > > [...] > > > static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) > > > { > > > unsigned long nr_reclaimed, nr_scanned, nr_node_reclaimed; > > > struct lruvec *target_lruvec; > > > bool reclaimable = false; > > > > > > + /* Try to accept memory before going for reclaim */ > > > + if (node_try_to_accept_memory(pgdat, sc)) { > > > + if (!should_continue_reclaim(pgdat, 0, sc)) > > > + return; > > > + } > > > + > > > > This would need an exemption from the memcg reclaim. > > Hm. Could you elaborate why? Because memcg reclaim doesn't look for memory but rather frees charges to reclaim for the new use so unaccepted memory is not really relevant as it couldn't have been charged. -- Michal Hocko SUSE Labs