From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03B08C3DA5D for ; Wed, 17 Jul 2024 11:55:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56F886B0095; Wed, 17 Jul 2024 07:55:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 51FDE6B0096; Wed, 17 Jul 2024 07:55:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E7676B0098; Wed, 17 Jul 2024 07:55:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 205F26B0095 for ; Wed, 17 Jul 2024 07:55:17 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C330980A1F for ; Wed, 17 Jul 2024 11:55:16 +0000 (UTC) X-FDA: 82349089032.19.2C3999A Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) by imf28.hostedemail.com (Postfix) with ESMTP id 3401FC0006 for ; Wed, 17 Jul 2024 11:55:13 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=DTxtOk7V; spf=none (imf28.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.198.163.13) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721217295; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=StMctqhuUBibd4nCOYpPbLfgsn0P3KpqutiJn9c9Rjc=; b=OY+FS+r5k7SM9C41h4mHwwK5vzT4KuTvTj+wa/xSn6U+0/pUnrn2QiAPaA/ZD+t9p0c37o QEG/WLlwsg1VAkWQI/AYE17ursApZEMdH7AR1/WHsl1E54XL/hCmNtLjGMrR28MywDdUIF xUfexk77037EKQ4x3BmorDvgLMU7KWM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=DTxtOk7V; spf=none (imf28.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.198.163.13) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721217295; a=rsa-sha256; cv=none; b=63Q/WfcPcj0oJ77t5u6ilFPS3V+eEAxJqNR6KOJroKrbdI3Dfzl44iQlEDZQkt2NAyB+Ia Cw/J2tmw3PouwZ8SUrD1mXTpszHs6USAO2IYUF7RHr1R1idqYMJPT1uwaLHSJMTKK6b0dv N95Og7v4dpkgyoX78rBlZIwNmSAltbM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721217314; x=1752753314; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=ryFplDwaF9LrfXi9LMq+5WiAnqCBtaCywkMwnwyBkjs=; b=DTxtOk7V2cqVo1sA8Dw/jJXPFvwR3j0qMbVpbbOoe2qVWgQOgC5NyEzQ ygUSktkkcO60voucHcdtHZrZDpSghvO3K59/yB9YQip89tr4uhr5ZYo9q MEtySJxYignqnaBS7RuD2tFPK7b+gsfbLrPwaACZKBelKTRQOi56rZIlw 3yns56084HDjd3nTvh066leTAaOA6QIlultWIEiYfNo56YwA1Miuwo5jy 0tzW2rruXXN3F90DgcDg0xm4FYaYbny8Rd+OemA8H6dJGI/EpdxbRsXC2 IkkH34gPJ2WiKY/vySkQ8rIvZahY92Q6YJQntrhTByakxoUYEt/GpnwVh Q==; X-CSE-ConnectionGUID: xuQ+9WgzQiSWIgPEBE9Mig== X-CSE-MsgGUID: DB1ejf0yQw27aH6GgUKB1Q== X-IronPort-AV: E=McAfee;i="6700,10204,11135"; a="21618508" X-IronPort-AV: E=Sophos;i="6.09,214,1716274800"; d="scan'208";a="21618508" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jul 2024 04:55:12 -0700 X-CSE-ConnectionGUID: WW7YfiZtSPuV/FAYAVfibQ== X-CSE-MsgGUID: 50N4wCG+QvCBSVg7wTTQVQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,214,1716274800"; d="scan'208";a="81407900" Received: from black.fi.intel.com ([10.237.72.28]) by fmviesa001.fm.intel.com with ESMTP; 17 Jul 2024 04:55:10 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id EB424161; Wed, 17 Jul 2024 14:55:08 +0300 (EEST) Date: Wed, 17 Jul 2024 14:55:08 +0300 From: "Kirill A. Shutemov" To: Michal Hocko Cc: Andrew Morton , "Borislav Petkov (AMD)" , Mel Gorman , Vlastimil Babka , Tom Lendacky , Mike Rapoport , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jianxiong Gao , stable@vger.kernel.org Subject: Re: [PATCH] mm: Fix endless reclaim on machines with unaccepted memory. Message-ID: References: <20240716130013.1997325-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 3401FC0006 X-Stat-Signature: 1adyc1znj8893mqprxrfpzswjmasnqai X-HE-Tag: 1721217313-575923 X-HE-Meta: U2FsdGVkX18AvwzeXs0DjDC2WBMDwEzJwRCxtLlRDBVpk1ywje4gsuqvBG4OWE1tudiXqCFme3ePYvUZ936rZC9Ll+TnFL7zK6AP2GgeJ/P/nvSQDBeKwfIGT+H4OUyWyEoE/nA9px6h0C7O62sCpCjyfRgem8HYge5YQ683phCXhwj48HHaLEKVNtM34gE3+ySyCe3aHTbjWQFj/RE+Y9cq25T3FBQOSYLPk5HeAqBSO47eA9rfyLen9eYoDJTwOXxul+pgOjDJpXawAChl//u+7K1TDF2xanVMSv5ikww3MM/e+b2JYkkPSa9/OB07s60vg8YNpGxo7XsA5UNMmrNY+uWxRcdEYgtb356y/vGonTOgeQAqQ6a/M2AOruWwb0Nxe+lHN2Yv9j5WTHr+B+LTFwwgCwcHOBF5gaPCT2g+0oZ37/hsUtKWsBnC3MSxKTgeqtP78oat9K7OzObnOUcR/wAYo5rA8dlQzHfi/bQCe0WluHC+375vaDwjRdc8FllaMFD8OA4gwnAd9Cvbu0pT937Bj6khh1/1v/6fXSUUgKtYEyt9lYcZOqvhC2NiBYc0EK6wMOFutqQ4V3ewyYhOHyBdkhtyb7xDBflPuWzh0+eGDM7fusyBJrGTIalmNEdOiL0KVhPqsKtM6KdVsqMTUfQ3PeFXilrtC++voU0gzJ/mSnYe3kAwPhzm5emiCpg4iDRxxBHYlyixq2gRJvlFv68BNtea5srsJ9LGMM0OSPLtqvQnilrqx90fhR8MQoW6TB8g4XStRkmBn8M6LyS2f6nFfMBxUbYvjy81ebrt////PJ62cdnwDfEfFsde0uIgvzNOU3xLzS0pbF6fgDgzPnK7CW/7lM2qMnSIuxfgblX9cSz1cT32l3weZxlyCFYVjxCWl47RJNGiA8a41jZmQIjeyevAyqwS1FzDzcPZCMTRRE08jI3E1hc+QcWSqCczfZ2JTLreANd3Cfs 0b1U51f3 vuCuCoMXt0gexg6UTjuwTSXPK7a7j6IHuHZ14g+boUNxcGYrAyJfPdsxzHhIEzDsnJ+/qI7inl4qhwFjziMqqZoWSNsYjl7UESWIH1wS+ESooj+gimee3tc+wA0Or955HI7QN87SjstKwt71uu6AVteu7tdIfFcke91/AwbYzv1HUHGWsY2JO7olFiHArdS7Y20/+MXGgYj6PmZYqgBdpgnIMfkXLZaJLz30qCpwgJ5BSCuphD85YPu6ZCUPmewuEur8xT3tmQA4L6ds= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jul 17, 2024 at 09:19:12AM +0200, Michal Hocko wrote: > On Tue 16-07-24 16:00:13, Kirill A. Shutemov wrote: > > Unaccepted memory is considered unusable free memory, which is not > > counted as free on the zone watermark check. This causes > > get_page_from_freelist() to accept more memory to hit the high > > watermark, but it creates problems in the reclaim path. > > > > The reclaim path encounters a failed zone watermark check and attempts > > to reclaim memory. This is usually successful, but if there is little or > > no reclaimable memory, it can result in endless reclaim with little to > > no progress. This can occur early in the boot process, just after start > > of the init process when the only reclaimable memory is the page cache > > of the init executable and its libraries. > > How does this happen when try_to_accept_memory is the first thing to do > when wmark check fails in the allocation path? Good question. I've lost access to the test setup and cannot check it directly right now. Reading the code Looks like __alloc_pages_bulk() bypasses get_page_from_freelist() where we usually accept more pages and goes directly to __rmqueue_pcplist() -> rmqueue_bulk() -> __rmqueue(). Will look more into it when I have access to the test setup. > Could you describe what was the initial configuration of the system? How > much of the unaccepted memory was there to trigger this? This is large TDX guest VM: 176 vCPUs and ~800GiB of memory. One thing that I noticed that the problem is only triggered when LRU_GEN enabled. But I failed to identify why. The system hang (or have very little progress) shortly after systemd starts. > > To address this issue, teach shrink_node() and shrink_zones() to accept > > memory before attempting to reclaim. > > > > Signed-off-by: Kirill A. Shutemov > > Reported-by: Jianxiong Gao > > Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") > > Cc: stable@vger.kernel.org # v6.5+ > [...] > > static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) > > { > > unsigned long nr_reclaimed, nr_scanned, nr_node_reclaimed; > > struct lruvec *target_lruvec; > > bool reclaimable = false; > > > > + /* Try to accept memory before going for reclaim */ > > + if (node_try_to_accept_memory(pgdat, sc)) { > > + if (!should_continue_reclaim(pgdat, 0, sc)) > > + return; > > + } > > + > > This would need an exemption from the memcg reclaim. Hm. Could you elaborate why? -- Kiryl Shutsemau / Kirill A. Shutemov