From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86BABC77B78 for ; Wed, 26 Apr 2023 15:22:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C32E6B00ED; Wed, 26 Apr 2023 11:22:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 04C0F6B00EE; Wed, 26 Apr 2023 11:22:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E07436B00EF; Wed, 26 Apr 2023 11:22:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id CBF796B00ED for ; Wed, 26 Apr 2023 11:22:12 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 84E4AC01A4 for ; Wed, 26 Apr 2023 15:22:12 +0000 (UTC) X-FDA: 80723908104.18.0E1E884 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf14.hostedemail.com (Postfix) with ESMTP id 0C5E910001A for ; Wed, 26 Apr 2023 15:22:08 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=qkdCix2P; spf=pass (imf14.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.171 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682522529; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ef15Ov3pvrQ2pgycNFRBHaU07mMI1UW5IHnYE+IozeY=; b=fJDsgmh1W+dIipxUJBdBOS+iXVsM3/G7lYV9VpYm34QjHCahUDbUJZzxVUNuOl0IHhxTUw 2IRK5D9EJe63NVqBhhjsBE4EFGriE18b9DgEM/NeAoJqSAggVYgojcRUHye5hynGtLICLA hrjhls9t9k2uwWJ4uFqa4TpLFRDziIQ= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=cmpxchg-org.20221208.gappssmtp.com header.s=20221208 header.b=qkdCix2P; spf=pass (imf14.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.160.171 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682522529; a=rsa-sha256; cv=none; b=1htX6hKFynn0+9R/Opio4exJaNn2LJwS79K6+tI9p21N1+t15r6D5f6QcaE04qtyssFxK0 +VqzRDbO8aoka4A1CHsfVpKLacnirF/v/NU5jmigAXcTyfdeMKkgNOIMrJXKOruSVvR744 rRIin3NbfPPg12KltbQCSUZMxEgXTJQ= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-3eab1f2ba18so35807621cf.0 for ; Wed, 26 Apr 2023 08:22:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20221208.gappssmtp.com; s=20221208; t=1682522528; x=1685114528; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Ef15Ov3pvrQ2pgycNFRBHaU07mMI1UW5IHnYE+IozeY=; b=qkdCix2P7HMB3uQ2e81uHtXLHflo/4cd/mtAubKmNJWUKFgRnZZU9IaEBLpwZSYSZh 35cc6iSNybkg6evafA36dTp28QJxh4/UuaJxm5i95wiEv3QtRIdjTnVrSMJVau489SGq /A5Zuia1amqKux27dw/2/r136rv1L8BRW0FFWtqOfGCHv/aD/lE2NTnO/QGGkDh9ldNh i+bZkjmSAoFwPeGCKaHRevd+e8cWBdmKad9WHdjD35si6qk0EC1bP6bkvad4zYx+d2ur zq6AiB6ei5ox31oW9DK8ebBEM4/j2RQovJ0Wsxif3pSCrOiJzrJsVN6eFPSsqFxP0BqJ 1ziA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682522528; x=1685114528; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Ef15Ov3pvrQ2pgycNFRBHaU07mMI1UW5IHnYE+IozeY=; b=WNgqtCW5DoZQspARjQ6QlGR+buCCy3p0CTt9wdGvjF5Acez8QppyNjzfbWbmKE90c1 ssLanosUiTTKxmnK+/29ygPl4X4its1+sMMSioohTJWptOCMQHE2ayq6TJVZSVE2VQAn xHnqX9gm7dwQXbXCE0Q7vhlrEyM8QhkpbOVr2SnEcZqLTSSfQrVpVctfrsAsGxWtP8rV awXoJ1keJt6UwtR6Air8XFRQdvpvBaeKOqaFsxjl3DuxNb/Vzf05clMU6Fyjt7cg9UIG IYg5LDyB1A+zHLa8Xveo03CYyrxh48ifp+KTpMtb/cSCJGRguX4JZLVXg/HRQy7cZNaZ M2ow== X-Gm-Message-State: AAQBX9fGnA1ernDUayo3TK192M5Ce+JP0YWkc6IEOVJCU0tqyQNbZJWf bqFQbeyT5ZnynpVjG4oCK1eqvg== X-Google-Smtp-Source: AKy350Z6dg4HGzWyvdPitzabnkl2zzr6soo/rgAVk0yZLzuDRNaH7yQZJ2Eh+uK9ldebFf4ACtyoEQ== X-Received: by 2002:a05:622a:19a9:b0:3e6:71d6:5d5d with SMTP id u41-20020a05622a19a900b003e671d65d5dmr33707209qtc.1.1682522528006; Wed, 26 Apr 2023 08:22:08 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-8f57-5681-ccd3-4a2e.res6.spectrum.com. [2603:7000:c01:2716:8f57:5681:ccd3:4a2e]) by smtp.gmail.com with ESMTPSA id bp10-20020a05620a458a00b0074e0951c7e7sm5186055qkb.28.2023.04.26.08.22.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Apr 2023 08:22:07 -0700 (PDT) Date: Wed, 26 Apr 2023 11:22:06 -0400 From: Johannes Weiner To: "Huang, Ying" Cc: linux-mm@kvack.org, Kaiyang Zhao , Mel Gorman , Vlastimil Babka , David Rientjes , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [RFC PATCH 20/26] mm: vmscan: use compaction_suitable() check in kswapd Message-ID: <20230426152206.GA30064@cmpxchg.org> References: <20230418191313.268131-1-hannes@cmpxchg.org> <20230418191313.268131-21-hannes@cmpxchg.org> <87a5ywfyeb.fsf@yhuang6-desk2.ccr.corp.intel.com> <20230425142641.GA17132@cmpxchg.org> <87o7nbe8gg.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87o7nbe8gg.fsf@yhuang6-desk2.ccr.corp.intel.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0C5E910001A X-Rspam-User: X-Stat-Signature: d7u5xw7o53rteosdomwsnfi56z763h9y X-HE-Tag: 1682522528-961982 X-HE-Meta: U2FsdGVkX1+3yeLUgyqJrABmNFfhtYv3OWHkG+iGKF26JtomvsKrSDzUdntMt5f7HFfFayOZTMmeuOmSYLzYLlVb5gNNmThLjcraQO02R7f0uIxjgqccFqzXf5hFl88pgt4XVJ6qgbdu9OqBGcbeUygh35o/hnw+hL89XoUGGlZATdEboUI8qrWH4cLe+BnYSrtmO6KJAdXCuogX/4hJnCCMdBDWFy5wT//NPeBukQfJ0HC2GufVHphjcLThvt7xFUYY7SevvhRKOE9PiAO8MENj4vU3lsiJrCeBHms0/hDcJrWB7XC9O1ha1ceLbKu5ZAylox1TknpBJcDOhRae6k5tKFq7RQM7UH4n5vVhnfUVU5pghLFn2hIsCE2NSkteb+rm7ym0ntqHj/ziZCv7B/sxxO9z4ajV/9Yq11wO0J9o2b4mhvPnArE28axXZHua1j/BRpO1R0pl/+l93CbqENVZ4RzwlI/ggKlsSYW1fsWzveCBEAoBAxVkhuDU9dzEKQUHpEWspQIv6dQmVJWRuL7IoyuqYIhawWCqj83JeVPZBdfdHbmNXpGuA3oFCrgZTUm+B0Se3rF6zRoxiRj0qFRzt/SoPZUbaj1Uk+hJSeehgb/QHfsyWBb7djIJfAahmHgHx1TWo0uXekfCR5CAllEnZwX3OZuxe/i4T+x/iSUzfVqPNV5a9AzQ/0rs8V7KW6o4wS7uhlmQKh4dN+4S6qivFtzBbaP+u8Q1KHS99FOvXkL2Rx3G7CFlA4u2DsWT7HjKx62EXQptjYnB+Rf/Su81VnkDh4ZCc9uwOE+XE3ATtMqm+LxLtHF9Iy62hlNR9ue0P9lCE2eMGpRExCO9Ojl+a0ZZio2uyQB6kSu8W8fBeDtnH+N1ZyjAIMy4MR+qXt0dJ0gAcli/NlOJrbsjPvy/YI+doBwmgtz7hdKzZSIx2BiTzhcpMY+gGVLE2Sbs1oRdPsEMskdLD+xuSJN xpOMQ5L7 gWDEY2fF8jzEDCZUf/ylnUpF5yk9/sxU6ZmPFlvk1GF4qcSNc8YZOIb21QrKuQzdTUxPF4j0ZdtGDE8APMaG/A3vFpMWVM5T0C65wyOSMcWGur53v3UyRuMch8A3xa0TU/ttidzuKvboJw2eDcfn2TXAaDN26zWFXhRYTk5nR70zRnKgWj5Hc5AOBN+uwb4w1BhwgGjZzHwm5VJensBKG7SCGs+JgYi3x0yb4Nr3QDGihCghf9cKZs0c/ZudVXXdB9fU6/Gu/ay/nMYyk0BIeXZP+GO+GV1A4wxREb5YaP7qiHGucrboDji6ogcTeQUdSwxcdZELwWlfTc/Gg7Au+CBUQZHJ17W5tF4KRfr3+t5WNpsruKV3Y6fbu5luxhmQUAes9nwB4MBMbGuNdb0KwGqxVr8LOLWV95um1qIc9P3/uG7j8sX/mhr2gBW0j+YxeRtfi/jCmmVjKsjH2VUz44jyqBec/X21Ic5UXnR0lWjyQN0+Lr4Ec/vCsW3AwwOK8JfpN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 26, 2023 at 09:30:23AM +0800, Huang, Ying wrote: > Johannes Weiner writes: > > > On Tue, Apr 25, 2023 at 11:12:28AM +0800, Huang, Ying wrote: > >> Johannes Weiner writes: > >> > >> > Kswapd currently bails on higher-order allocations with an open-coded > >> > check for whether it's reclaimed the compaction gap. > >> > > >> > compaction_suitable() is the customary interface to coordinate reclaim > >> > with compaction. > >> > > >> > Signed-off-by: Johannes Weiner > >> > --- > >> > mm/vmscan.c | 67 ++++++++++++++++++----------------------------------- > >> > 1 file changed, 23 insertions(+), 44 deletions(-) > >> > > >> > diff --git a/mm/vmscan.c b/mm/vmscan.c > >> > index ee8c8ca2e7b5..723705b9e4d9 100644 > >> > --- a/mm/vmscan.c > >> > +++ b/mm/vmscan.c > >> > @@ -6872,12 +6872,18 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx) > >> > if (!managed_zone(zone)) > >> > continue; > >> > > >> > + /* Allocation can succeed in any zone, done */ > >> > if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) > >> > mark = wmark_pages(zone, WMARK_PROMO); > >> > else > >> > mark = high_wmark_pages(zone); > >> > if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx)) > >> > return true; > >> > + > >> > + /* Allocation can't succeed, but enough order-0 to compact */ > >> > + if (compaction_suitable(zone, order, > >> > + highest_zoneidx) == COMPACT_CONTINUE) > >> > + return true; > >> > >> Should we check the following first? > >> > >> order > 0 && zone_watermark_ok_safe(zone, 0, mark, highest_zoneidx) > > > > That's what compaction_suitable() does. It checks whether there are > > enough migration targets for compaction (COMPACT_CONTINUE) or whether > > reclaim needs to do some more work (COMPACT_SKIPPED). > > Yes. And I found that the watermark used in compaction_suitable() is > low_wmark_pages() or min_wmark_pages(), which doesn't match the > watermark here. Or did I miss something? Ahh, you're right, kswapd will bail prematurely. Compaction cannot reliably meet the high watermark with a low watermark scratch space. I'll add the order check before the suitable test, for clarity, and so that order-0 requests don't check the same thing twice. For the watermark, I'd make it an arg to compaction_suitable() and use whatever the reclaimer targets (high for kswapd, min for direct). However, there is a minor snag. compaction_suitable() currently has its own smarts regarding the watermark: /* * Watermarks for order-0 must be met for compaction to be able to * isolate free pages for migration targets. This means that the * watermark and alloc_flags have to match, or be more pessimistic than * the check in __isolate_free_page(). We don't use the direct * compactor's alloc_flags, as they are not relevant for freepage * isolation. We however do use the direct compactor's highest_zoneidx * to skip over zones where lowmem reserves would prevent allocation * even if compaction succeeds. * For costly orders, we require low watermark instead of min for * compaction to proceed to increase its chances. * ALLOC_CMA is used, as pages in CMA pageblocks are considered * suitable migration targets */ watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ? low_wmark_pages(zone) : min_wmark_pages(zone); Historically it has always checked low instead of min. Then Vlastimil changed it to min for non-costly orders here: commit 8348faf91f56371d4bada6fc5915e19580a15ffe Author: Vlastimil Babka Date: Fri Oct 7 16:58:00 2016 -0700 mm, compaction: require only min watermarks for non-costly orders The __compaction_suitable() function checks the low watermark plus a compact_gap() gap to decide if there's enough free memory to perform compaction. Then __isolate_free_page uses low watermark check to decide if particular free page can be isolated. In the latter case, using low watermark is needlessly pessimistic, as the free page isolations are only temporary. For __compaction_suitable() the higher watermark makes sense for high-order allocations where more freepages increase the chance of success, and we can typically fail with some order-0 fallback when the system is struggling to reach that watermark. But for low-order allocation, forming the page should not be that hard. So using low watermark here might just prevent compaction from even trying, and eventually lead to OOM killer even if we are above min watermarks. So after this patch, we use min watermark for non-costly orders in __compaction_suitable(), and for all orders in __isolate_free_page(). Lowering to min wasn't an issue for non-costly, but AFAICS there was no explicit testing for whether min would work for costly orders too. I'd propose trying it with min even for costly and see what happens. If it does regress, a better place to boost scratch space for costly orders might be compact_gap(), so I'd move it there. Does that sound reasonable?