From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E76DCC3DA63 for ; Tue, 23 Jul 2024 13:53:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37E6E6B008C; Tue, 23 Jul 2024 09:53:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 32DED6B0095; Tue, 23 Jul 2024 09:53:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1CF146B0098; Tue, 23 Jul 2024 09:53:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F2FFE6B008C for ; Tue, 23 Jul 2024 09:53:14 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 71B19121F7B for ; Tue, 23 Jul 2024 13:53:14 +0000 (UTC) X-FDA: 82371159108.20.2650DD7 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) by imf12.hostedemail.com (Postfix) with ESMTP id 5D29D40002 for ; Tue, 23 Jul 2024 13:53:11 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fI4vEacK; spf=none (imf12.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.198.163.7) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721742768; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MhxghTfsr7LuAo3YIPOmLcFKWAZAfujtFw0YiqgkKO4=; b=J4w6Waq7or7Bi0/fU1qlZsYUOKpMP7z9vxR09o8mpeV5kvMmauVWYXz/hOhItuVeP3jhgw 1yyYcFWZZJEA5CzEyMxvg5xv0h0Hm/WgcGRxnfzyftw1aYE1JnOGhdTL5IHJUcwddK6Xen CQcDGCfW1haUtg/aphhS1UFitfKjV8A= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fI4vEacK; spf=none (imf12.hostedemail.com: domain of kirill.shutemov@linux.intel.com has no SPF policy when checking 192.198.163.7) smtp.mailfrom=kirill.shutemov@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721742768; a=rsa-sha256; cv=none; b=C/JiOhF/6AYLayFzHSwuJb28RYs3zyM34s8GytxKAMKHX2lVIYWnbGfHaVawAeGW0UtZqm P99cuQqSFGlcR8Iw2ZFf/2OX36TRMtmY2BqYFnbs7M3SZxZduYWB8NMFqGdAJS45ciijHP cKON59LGlGsib14I57RpM6GMelAGjMo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721742792; x=1753278792; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=9eIWzhU6PHUeZcWjbzL9hcWDlD4iJH3dlG5ZDJi3+yI=; b=fI4vEacKR4BBLRR4vb7CfePJCSAdoQ9C//h6pnkrGPukKDCYp4BMqQVI 7ezOWnoiqcYNIH9kEo85uWFj8xRFq+gHbmNxh/zaUpjoMNvmy9JXCUtjk 79fYN20709qb/CuqIAR6ULwOQZlGFIvwH+qnNBy9KLassmZwNGg0KeRgo RscbXr0yrVu6FlF6na3E2q+PRZ2eN5Ykg8B2ewDn9uGuPFhhsGEn3Uo5q eDw5UnWvbVNFFmMX9xRkr4IJdEM8oUIu5WMTUm66vGtyrCuGBMvgzVbbp jIXCyPZDdAwuKBL3tBaitTcCRCrBwNCkRUZsZvRk+kqU1+CP2aHytsTh0 g==; X-CSE-ConnectionGUID: Qed3ZgIsTBmPw5v+pLLvFg== X-CSE-MsgGUID: fwj+wrDfRsu+U4ZOjlsT/A== X-IronPort-AV: E=McAfee;i="6700,10204,11142"; a="44791311" X-IronPort-AV: E=Sophos;i="6.09,230,1716274800"; d="scan'208";a="44791311" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jul 2024 06:53:10 -0700 X-CSE-ConnectionGUID: LNctsrrjTcmE1nFGbnMeWA== X-CSE-MsgGUID: MZMeXLF1SF6STUXCCbCX9Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,230,1716274800"; d="scan'208";a="57085378" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa005.jf.intel.com with ESMTP; 23 Jul 2024 06:53:07 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 8C119178; Tue, 23 Jul 2024 16:53:05 +0300 (EEST) Date: Tue, 23 Jul 2024 16:53:05 +0300 From: "Kirill A. Shutemov" To: Michal Hocko Cc: Vlastimil Babka , Andrew Morton , "Borislav Petkov (AMD)" , Mel Gorman , Tom Lendacky , Mike Rapoport , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jianxiong Gao , stable@vger.kernel.org Subject: Re: [PATCH] mm: Fix endless reclaim on machines with unaccepted memory. Message-ID: References: <20240716130013.1997325-1-kirill.shutemov@linux.intel.com> <564ff8e4-42c9-4a00-8799-eaa1bef9c338@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 5D29D40002 X-Stat-Signature: z49ne3wqf9qq7ghjnu73ibyktmuqxq84 X-Rspam-User: X-HE-Tag: 1721742791-858841 X-HE-Meta: U2FsdGVkX18QG90YYf3jk24hHASn8aAHp+h/867KwELKdxKWI77gt3FfNF7M8WQAzX1iXuIeE+K+kH7nHixp/7/877lhOeMKfVgwk9IHnCNSZ3vQlN2W39/3Kr1kmspTakTjEd9FDdoDewjp/7s0PubudzG4RFdpgt679Fz57IeQpI8L4ite85HjkCsOz24J3s4jkGVUq1i+jkePsc3+h7wrsmkXBQrajT4ioVO+rNlAcBSHtQZ6IQPTS5PWB/mAQw1da0T3J2YvaADk8HhLUTkAT8DIEu13QJl7UiJdyQly2zrWoILh89Yrk6pagsZlOluZ34QhPIMLQ2A40iBndn6Dn0kQEjodh2IdY+cjZh8M6+mq+Qr87f6nJfw512lVJQrkjeIKr0rmiAENw6GEfWS9fwOUV3jfptt+4HQ/n7nNinpU3qJtwxGF4um+AizxjZ0+aOjG9Oz55hLyeoU+RCq1Ucrdnq8l+2VkBO3x2p/X+Y88HvMGqpJnkoXyPugwuPgEj+p74QN0+RY3H5mJ/wn2szLUGXOyLC4427Rf7/hPPdbR4hzhsETWVK6S5bDv/lIxinzmHLqsnU+OyyfYW/UPS3f5m1FBGxDs86ZNhzyfuy7c6tCoejAOQCEp7xSKDzl7b0CyOF0slQJgJ0A/DjSIodZaM/fZ+42gTC6j9Q4Ihn5N2MyxJOi+a12VAm/QkPypzuCNAwvfUcHod9yRC+oH7wi2hXO/gVh4PVS3zSXmPAqsCuGXia26WWgzky1PThXxSAtGRCgjNBYZ6zeWUJH7NNVK8JHx6mz22j4oKKzjqi4y0gu/1yVVN5tY4IG1QOpC97LS13DbmCL5sCiYcOxfNbjippgqy0EtdemS93Nupt+ysZk6uvQauOj2DePeV0z8CGgAmxaP/ZRPuZPJGeMBF/413//ll49kFwwaMj8rl7tNqduT2zfT6ow2oCcx1uMCqFZzf/eKQPmb0/F t0XJFE00 KcrOvx1ndYr5juLJkY6CaDdserRiF8JaMEwgpgZiXou8I6pS7zj4cd/DL3Y+4jvwzl1NwcJ3gvlFOT6ysw7M5paUb26yBnxMRPCPZYWYSmfbwAShZeeO7VSN8r6c79duN9SCTpkNC/Lz9Y4813Vkp6uWcE7gk8ondBunj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 23, 2024 at 01:55:13PM +0200, Michal Hocko wrote: > On Tue 23-07-24 12:49:41, Kirill A. Shutemov wrote: > > On Tue, Jul 23, 2024 at 09:30:27AM +0200, Vlastimil Babka wrote: > [...] > > > Although just removing the lazy accept mode would be much more appealing > > > solution than this :) > > > > :P > > > > Not really an option for big VMs. It might add many minutes to boot time. > > Well a huge part of that can be done in the background so the boot > doesn't really have to wait for all of it. If we really have to start > playing whack-a-mole to plug all the potential ways to trigger reclaim > imbalance I think it is fair to re-evaluate how much lazy should the > initialization really be. One other option I see is to treat unaccepted memory as free, so watermarks would not fail if we have unaccepted memory. No spinning in kswapd in this case. Only get_page_from_freelist() and __alloc_pages_bulk() is aware about unaccepted memory. The quick patch below shows the idea. I am not sure how it would affect __isolate_free_page() callers. IIUC, they expect to see pages on free lists, but might not find them there in this scenario because they are not accepted yet. I need to look closer at this. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c11b7cde81ef..5e0bdfbe2f1f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -667,6 +667,7 @@ enum zone_watermarks { #define min_wmark_pages(z) (z->_watermark[WMARK_MIN] + z->watermark_boost) #define low_wmark_pages(z) (z->_watermark[WMARK_LOW] + z->watermark_boost) #define high_wmark_pages(z) (z->_watermark[WMARK_HIGH] + z->watermark_boost) +#define promo_wmark_pages(z) (z->_watermark[WMARK_PROMO] + z->watermark_boost) #define wmark_pages(z, i) (z->_watermark[i] + z->watermark_boost) /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 14d39f34d336..254bfe29eaf1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -304,7 +304,7 @@ EXPORT_SYMBOL(nr_online_nodes); static bool page_contains_unaccepted(struct page *page, unsigned int order); static void accept_page(struct page *page, unsigned int order); -static bool try_to_accept_memory(struct zone *zone, unsigned int order); +static bool cond_accept_memory(struct zone *zone, unsigned int order); static inline bool has_unaccepted_memory(void); static bool __free_unaccepted(struct page *page); @@ -2947,9 +2947,6 @@ static inline long __zone_watermark_unusable_free(struct zone *z, if (!(alloc_flags & ALLOC_CMA)) unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES); #endif -#ifdef CONFIG_UNACCEPTED_MEMORY - unusable_free += zone_page_state(z, NR_UNACCEPTED); -#endif return unusable_free; } @@ -3243,6 +3240,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, } } + cond_accept_memory(zone, order); + /* * Detect whether the number of free pages is below high * watermark. If so, we will decrease pcp->high and free @@ -3268,10 +3267,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, gfp_mask)) { int ret; - if (has_unaccepted_memory()) { - if (try_to_accept_memory(zone, order)) - goto try_this_zone; - } + if (cond_accept_memory(zone, order)) + goto try_this_zone; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* @@ -3325,10 +3322,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags, return page; } else { - if (has_unaccepted_memory()) { - if (try_to_accept_memory(zone, order)) - goto try_this_zone; - } + if (cond_accept_memory(zone, order)) + goto try_this_zone; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* Try again if zone has deferred pages */ @@ -4456,12 +4451,25 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid, goto failed; } + cond_accept_memory(zone, 0); +retry_this_zone: mark = wmark_pages(zone, alloc_flags & ALLOC_WMARK_MASK) + nr_pages; if (zone_watermark_fast(zone, 0, mark, zonelist_zone_idx(ac.preferred_zoneref), alloc_flags, gfp)) { break; } + + if (cond_accept_memory(zone, 0)) + goto retry_this_zone; + +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + /* Try again if zone has deferred pages */ + if (deferred_pages_enabled()) { + if (_deferred_grow_zone(zone, 0)) + goto retry_this_zone; + } +#endif } /* @@ -6833,9 +6841,6 @@ static bool try_to_accept_memory_one(struct zone *zone) struct page *page; bool last; - if (list_empty(&zone->unaccepted_pages)) - return false; - spin_lock_irqsave(&zone->lock, flags); page = list_first_entry_or_null(&zone->unaccepted_pages, struct page, lru); @@ -6861,23 +6866,29 @@ static bool try_to_accept_memory_one(struct zone *zone) return true; } -static bool try_to_accept_memory(struct zone *zone, unsigned int order) +static bool cond_accept_memory(struct zone *zone, unsigned int order) { long to_accept; int ret = false; - /* How much to accept to get to high watermark? */ - to_accept = high_wmark_pages(zone) - - (zone_page_state(zone, NR_FREE_PAGES) - - __zone_watermark_unusable_free(zone, order, 0)); + if (!has_unaccepted_memory()) + return false; - /* Accept at least one page */ - do { + if (list_empty(&zone->unaccepted_pages)) + return false; + + /* How much to accept to get to high watermark? */ + to_accept = promo_wmark_pages(zone) - + (zone_page_state(zone, NR_FREE_PAGES) - + __zone_watermark_unusable_free(zone, order, 0) - + zone_page_state(zone, NR_UNACCEPTED)); + + while (to_accept > 0) { if (!try_to_accept_memory_one(zone)) break; ret = true; to_accept -= MAX_ORDER_NR_PAGES; - } while (to_accept > 0); + } return ret; } @@ -6920,7 +6931,7 @@ static void accept_page(struct page *page, unsigned int order) { } -static bool try_to_accept_memory(struct zone *zone, unsigned int order) +static bool cond_ccept_memory(struct zone *zone, unsigned int order) { return false; } -- Kiryl Shutsemau / Kirill A. Shutemov