From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4747BC77B7E for ; Thu, 27 Apr 2023 05:43:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B651F6B0071; Thu, 27 Apr 2023 01:43:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B14A56B0072; Thu, 27 Apr 2023 01:43:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DC976B0074; Thu, 27 Apr 2023 01:43:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8C9696B0071 for ; Thu, 27 Apr 2023 01:43:07 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 94E7B12019C for ; Thu, 27 Apr 2023 05:43:06 +0000 (UTC) X-FDA: 80726077572.27.04A8A27 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf16.hostedemail.com (Postfix) with ESMTP id 5EC5F180007 for ; Thu, 27 Apr 2023 05:43:04 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Ni9uou/0"; spf=pass (imf16.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682574184; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yYGiZvUx36uEczzipQc64FV4iCAPC3MLaCD9z/b/BOA=; b=vWaiSITQg/dSr++euDddrkN1HBWhI+UAqinyZw46W5/dhptsnh9UjmK7gfLwWivPlynSic 7Sb+4j+RsY1h6nFHYDQG+vciBJdTJL2bbwlDvK1EWTE9rBuoKYd1gepkwfCF1dEwbJS70E PTe5aIXhaJPtKEGD+zXCWK7h/nkq3Vs= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="Ni9uou/0"; spf=pass (imf16.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682574184; a=rsa-sha256; cv=none; b=pTehqPqbDeLhP3LLkq4qoPkStiTT3QVTABrV3wz3VQ9pKYnTUJ9npc7UvoQOR60jG0EFQy 2YC9nYp0aHXGJ9r0VE5P5ei2t69Yn0wxKn7XLVoKqTSR6Z4t28TBctZ3lehT8Panxo813F vFZE5aiPz3yKqQUabBnz3TVGayzC+gA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1682574184; x=1714110184; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=CpfjGZP5sNi8QN42pWns4vxK9C3j3dLDfG7m/DJ9cYg=; b=Ni9uou/07yDXERTvEcpILHPS2/R6lU4tA0vzrfM8jinFWWhkWn/SrodT Cn8gKge6LJh3Xq/v6g4HXSKNHc5mHVD2Kf01eNGwAoPKzyg1/v3gK4DLu 5hgJ+0GD89oe3JNN4FBGt4oBCD79Sr3/D1jVbPVF2CVzmGbMDKs7ZcF7A DZmkWs8saN6iGNfTifLqyAmV7zhKTE552DfYcgjOUzhjKRjJ+03chHojP MNWoYYQhTkHied312qHfUAhgY18cJaYWmzWDWoq8QY7JcBqPXjcWIFPRb RX02DVOqCzSsTfTwjIW9x9ynjwaK6uy6PoxCV6SYJHHWDBP9QLiBInVX6 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10692"; a="412665845" X-IronPort-AV: E=Sophos;i="5.99,230,1677571200"; d="scan'208";a="412665845" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Apr 2023 22:42:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10692"; a="724744907" X-IronPort-AV: E=Sophos;i="5.99,230,1677571200"; d="scan'208";a="724744907" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Apr 2023 22:42:47 -0700 From: "Huang, Ying" To: Johannes Weiner Cc: linux-mm@kvack.org, Kaiyang Zhao , Mel Gorman , Vlastimil Babka , David Rientjes , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [RFC PATCH 20/26] mm: vmscan: use compaction_suitable() check in kswapd References: <20230418191313.268131-1-hannes@cmpxchg.org> <20230418191313.268131-21-hannes@cmpxchg.org> <87a5ywfyeb.fsf@yhuang6-desk2.ccr.corp.intel.com> <20230425142641.GA17132@cmpxchg.org> <87o7nbe8gg.fsf@yhuang6-desk2.ccr.corp.intel.com> <20230426152206.GA30064@cmpxchg.org> Date: Thu, 27 Apr 2023 13:41:38 +0800 In-Reply-To: <20230426152206.GA30064@cmpxchg.org> (Johannes Weiner's message of "Wed, 26 Apr 2023 11:22:06 -0400") Message-ID: <87cz3peval.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5EC5F180007 X-Stat-Signature: kjgxs8smfq4ez744zor4xxigonumcw98 X-HE-Tag: 1682574184-909469 X-HE-Meta: U2FsdGVkX18XCb8icfdL2dY7kxcJHhoacRgH/pQVtughqMmKlNlMYEHDJPJzxF8Mjreu3HdYHmqUwZNJLDNVgEvOGDLKwNXwQhoi6EzW0Kk8KQo9pMk49pc3NGJKA6f2vSPatICVHgxyYGG72fU59vj6UUSqLomJZrvlZdfOelmiRbwu7g5+gvm05em0JmAnEPvfMLF2SwcMpApHPvplqZdLQkRYCtieyGTzVl5zF/6uIU+80+viA9V29TwC9XGsUeOJa+l+DmuIVj6lPR6o9ie0QrhvEAHc+uUdT/nifBJ56F7U8FtW7ekb46YyTl8ub4VhBlnwD69YloC2HIZdfty0YwDoHOxkZfTj3RsRAMszu6uMbzvecmNvtC/Q6xk3VmuPozJGUt+AWRbGLhQ9scK5qIQClqm87xPvJKFN1CbYiKVrjGrbf0iIAqBb5K9jfBNWIKVeo87S9MgBet6WDr90roCDMEwd0TfYXB67Wy1vof030tdFn0VMe+6ZbyXle7cjpxVuPxZpwSm0ZoIZxFeLjVvGbfYLZQyhrulGZ+S7OzXTTtGLOw21C7xeBabrxNtqv4k4qnw95mKCT9p4J797X1g9hGOMOOnzWUzPlZ/G0H9yOwbWMEl4LJZoY+77lWaajbBQ0zEuBcdDPTbEyr+lax9isCOLJ05r7rJM75QU1SswAx4jQKzP2UM/aEvKodMRvKueWHNNPHxZpi2hMCmZOyEh7hiCgehE3FPRYgxmkjt1qgiAmdYHlwAxFeIHRITcFug0qmuo54bubdnw8644w4ak17C8Bikh4pdYakBYB1iWZxLuaug2tHKPI7AiY39xQ3NGv6AK/QzSOQ1NTv1Zd4ds1T9kUc4jiPDKScvuTQQqkGYFeyTefSuK9h4Km/E2UjzevoeP7c+cx7y3ZeQnkss9TjYV7tjj69aj0ryrqseT7VjLu/UUVxPd13mAyOuADMqJ+vndBE959GE WOok581E R+z1sdEOF4+MP8WePVQ4lfy2MnLtX+WaTQzGs1QDz8UNMXAuA/VU6/x6dGwDq8dZuT/NeT5M7HLzNiTYQ4mQVUcT0ChuXNVSStTjgqmXz2V4MFJs3Ic1QGxMQhRe3m/QUyiGZ9C50RLLSBKPUymHQVvvb72HkkT0QPWgeya2/JAB5nPhg3KG7vnBIpOkqB3E4ITjQ9enuVbKwhf/iK9SpdPuEct+kqnLYrGBNWynoNAPqgNEGieagDTJ5HPiNaVyUon9f X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Johannes Weiner writes: > On Wed, Apr 26, 2023 at 09:30:23AM +0800, Huang, Ying wrote: >> Johannes Weiner writes: >> >> > On Tue, Apr 25, 2023 at 11:12:28AM +0800, Huang, Ying wrote: >> >> Johannes Weiner writes: >> >> >> >> > Kswapd currently bails on higher-order allocations with an open-coded >> >> > check for whether it's reclaimed the compaction gap. >> >> > >> >> > compaction_suitable() is the customary interface to coordinate reclaim >> >> > with compaction. >> >> > >> >> > Signed-off-by: Johannes Weiner >> >> > --- >> >> > mm/vmscan.c | 67 ++++++++++++++++++----------------------------------- >> >> > 1 file changed, 23 insertions(+), 44 deletions(-) >> >> > >> >> > diff --git a/mm/vmscan.c b/mm/vmscan.c >> >> > index ee8c8ca2e7b5..723705b9e4d9 100644 >> >> > --- a/mm/vmscan.c >> >> > +++ b/mm/vmscan.c >> >> > @@ -6872,12 +6872,18 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx) >> >> > if (!managed_zone(zone)) >> >> > continue; >> >> > >> >> > + /* Allocation can succeed in any zone, done */ >> >> > if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) >> >> > mark = wmark_pages(zone, WMARK_PROMO); >> >> > else >> >> > mark = high_wmark_pages(zone); >> >> > if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx)) >> >> > return true; >> >> > + >> >> > + /* Allocation can't succeed, but enough order-0 to compact */ >> >> > + if (compaction_suitable(zone, order, >> >> > + highest_zoneidx) == COMPACT_CONTINUE) >> >> > + return true; >> >> >> >> Should we check the following first? >> >> >> >> order > 0 && zone_watermark_ok_safe(zone, 0, mark, highest_zoneidx) >> > >> > That's what compaction_suitable() does. It checks whether there are >> > enough migration targets for compaction (COMPACT_CONTINUE) or whether >> > reclaim needs to do some more work (COMPACT_SKIPPED). >> >> Yes. And I found that the watermark used in compaction_suitable() is >> low_wmark_pages() or min_wmark_pages(), which doesn't match the >> watermark here. Or did I miss something? > > Ahh, you're right, kswapd will bail prematurely. Compaction cannot > reliably meet the high watermark with a low watermark scratch space. > > I'll add the order check before the suitable test, for clarity, and so > that order-0 requests don't check the same thing twice. > > For the watermark, I'd make it an arg to compaction_suitable() and use > whatever the reclaimer targets (high for kswapd, min for direct). > > However, there is a minor snag. compaction_suitable() currently has > its own smarts regarding the watermark: > > /* > * Watermarks for order-0 must be met for compaction to be able to > * isolate free pages for migration targets. This means that the > * watermark and alloc_flags have to match, or be more pessimistic than > * the check in __isolate_free_page(). We don't use the direct > * compactor's alloc_flags, as they are not relevant for freepage > * isolation. We however do use the direct compactor's highest_zoneidx > * to skip over zones where lowmem reserves would prevent allocation > * even if compaction succeeds. > * For costly orders, we require low watermark instead of min for > * compaction to proceed to increase its chances. > * ALLOC_CMA is used, as pages in CMA pageblocks are considered > * suitable migration targets > */ > watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ? > low_wmark_pages(zone) : min_wmark_pages(zone); > > Historically it has always checked low instead of min. Then Vlastimil > changed it to min for non-costly orders here: > > commit 8348faf91f56371d4bada6fc5915e19580a15ffe > Author: Vlastimil Babka > Date: Fri Oct 7 16:58:00 2016 -0700 > > mm, compaction: require only min watermarks for non-costly orders > > The __compaction_suitable() function checks the low watermark plus a > compact_gap() gap to decide if there's enough free memory to perform > compaction. Then __isolate_free_page uses low watermark check to decide > if particular free page can be isolated. In the latter case, using low > watermark is needlessly pessimistic, as the free page isolations are > only temporary. For __compaction_suitable() the higher watermark makes > sense for high-order allocations where more freepages increase the > chance of success, and we can typically fail with some order-0 fallback > when the system is struggling to reach that watermark. But for > low-order allocation, forming the page should not be that hard. So > using low watermark here might just prevent compaction from even trying, > and eventually lead to OOM killer even if we are above min watermarks. > > So after this patch, we use min watermark for non-costly orders in > __compaction_suitable(), and for all orders in __isolate_free_page(). > > Lowering to min wasn't an issue for non-costly, but AFAICS there was > no explicit testing for whether min would work for costly orders too. > > I'd propose trying it with min even for costly and see what happens. > > If it does regress, a better place to boost scratch space for costly > orders might be compact_gap(), so I'd move it there. > > Does that sound reasonable? Sounds good to me, Thanks! Best Regards, Huang, Ying