From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A696FC6FD18 for ; Tue, 25 Apr 2023 03:13:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B40F16B0071; Mon, 24 Apr 2023 23:13:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF0466B0074; Mon, 24 Apr 2023 23:13:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 991616B0075; Mon, 24 Apr 2023 23:13:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 868126B0071 for ; Mon, 24 Apr 2023 23:13:40 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 44B7EC02FB for ; Tue, 25 Apr 2023 03:13:40 +0000 (UTC) X-FDA: 80718443400.15.5B851AA Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf24.hostedemail.com (Postfix) with ESMTP id D74BB18000A for ; Tue, 25 Apr 2023 03:13:36 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MGg9GfSX; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1682392418; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0waulKoPtRCcS+7upFQuV5xGo+4zwMF38BeZe4BnahA=; b=Pk/PoXuNnsOF7DTwYMfbJOWV870CZJ+362f7wqHWzZU87L2W9xUd+9JizvvF4owCcb5EdU qqdvLpVdgcnBcleHKWY4iZltWh574YTcCl9/8SxwlK0R296MBe5iq/MlcQoZaO+F5ZnJmc Icbhc4E+bzUaae1jPV7Z5ELXo3TnxVs= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=MGg9GfSX; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1682392418; a=rsa-sha256; cv=none; b=S9h28jXG0VjvI/UYiSJFS9rUEyyC251IcBycjleQj6JEHbZEp1/QeEGHP4bUJzjxWy3vRK gISSMCXSKFPyhdAD4weRi7hyXTrHLmpBxz8VXa9+yzq0eX8K57jpj4VR4Mlkxc/NnVuTs+ gwXQUtEasbgxSS3DCw8U13KhfrBQbNo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1682392417; x=1713928417; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=+d2IJmNCS85jvw/d/zpzihqKRQR4ygfmq9/yqpHotHQ=; b=MGg9GfSXndLKtwoVX6r53yY3zr9HWYkFf38mnWjYx9myneTqKKF1n6N/ y0+o7Iga1SALdn3cWmWzbOVfJiydboFTWJx6DC71ywYr2Y1FaDB8gQqLv mZ35v/3xrqpQyIaOM2zNWxSojfzkMcX1fijCyf29duvi3dRlZmR4liM4D 3uN6WmurHI8dTjSNF8Ig04RiB43JRqeZ/YkDeyOVabzhd5mh3OUh6AMmE oonGNIFgm6/hKoGzP3PFDcnos+W48IjubDLtDN32ROhcmb1bH5pU7rpty rk5gOf0gk27HNMdpp4G8SWIVkABLPhBMvdGQYWKyJzEQxtWGknTajTIxb g==; X-IronPort-AV: E=McAfee;i="6600,9927,10690"; a="345389940" X-IronPort-AV: E=Sophos;i="5.99,224,1677571200"; d="scan'208";a="345389940" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2023 20:13:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10690"; a="757937745" X-IronPort-AV: E=Sophos;i="5.99,224,1677571200"; d="scan'208";a="757937745" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2023 20:13:32 -0700 From: "Huang, Ying" To: Johannes Weiner Cc: linux-mm@kvack.org, Kaiyang Zhao , Mel Gorman , Vlastimil Babka , David Rientjes , linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [RFC PATCH 20/26] mm: vmscan: use compaction_suitable() check in kswapd References: <20230418191313.268131-1-hannes@cmpxchg.org> <20230418191313.268131-21-hannes@cmpxchg.org> Date: Tue, 25 Apr 2023 11:12:28 +0800 In-Reply-To: <20230418191313.268131-21-hannes@cmpxchg.org> (Johannes Weiner's message of "Tue, 18 Apr 2023 15:13:07 -0400") Message-ID: <87a5ywfyeb.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Stat-Signature: bbqt18ifou3doouwa4krd7jmhweamr5f X-Rspam-User: X-Rspamd-Queue-Id: D74BB18000A X-Rspamd-Server: rspam06 X-HE-Tag: 1682392416-224429 X-HE-Meta: U2FsdGVkX19DS2piKop1pdoyh3YdNDCMitTAtHJBwZUnag8GVC2pZpyj4V2Dv2/md9y0GmkxazoDaAgL6Hc6ceFliv0e27o0KywpSxGcWqsp7p260TB0uWPUrnUMdBP6JavUijbr0CWuLEcb8yIMLBlbtBdfxzMZWl4g1yL2PEu0UleAYFShhLoRgSGLgonq3mHRxqMrmVLCjrzkYzw8DNOAWSE8yLbfORMYPSNHTUAM39q9toKy5xApF2tKs7ROKvoms7CYVymFkz+egnss9LZsZ3sFUFv6BojyqcxNlMmhTwvq2Uu4ncCU6+gcs7y+HI089j2c0SnVo9b9Rks8Xl/Hb+8UTCPbZA2qBn0gGMoj0gZSyVwWciHItMOEchDJidwe2ACK7Lf1qkTFvd+oLWZCiFgx0UUF4TDBmjUV40TOpxQq1B5UtmAGl+q6prwPJ9IqagyTpyULKjSAXVaPVKR0K6F1AJBGQYvtEpbQvLYzSKsEKwUAOaxE0kyKG0xAy2JTcVBHQG7Lu7Jt1ff8RsLumxa10yXp7Da2AXSA60md31/e+sG60xUj4guZ/LJ4QpuUaVFF9zQ21yXnX7RkGL8/HWgPDZqvHjsVP4xB810OijRkX2dAZu3OYiYnoPAZX2Wl0ajhz2YEcrt6/t+1cZe+2/Ug7A/ahJVR2ORK60rAjDIzRkkKjmvv9ouzuy4knj7j1MrI7Aw9p1yq3ACz3HU8oPKMR+IwCJTQqw73LkbaXBu8xN+8Y71o2YhNdzCJG8GZsI417bcTSyTdoFi0//I7bO+IdklvJCNmOST0WwiAlDodKVw3xSCRx6Rx9iP9/iXyJY9Y5A3yPAsflk3hSl8B+ZoZqX4FUQHhtFIUqhmpbyAFh7eDlVi5aE9BoYVVuIKFTBTT/9KBS2UVAYGENswu86oRN+YSDX9GRDMfkz3kI0pmFp1qf14OauOOqA7IDO6F6Y0I97FulP/Bog3 mZD+g4II wQjlqTAkGzUL+5cBnr8X+BNS1KKBUBObwDO4tpkarZtNVPtqKNNgq3QoxMQZc2p67EHN+g19bgX60qQj3wpDHLNoW/HOml0/UM9GRMqRV5lpzKVuZeL87nkhCFAbXPyaXmono2xEXQhiP95di0np17DCUTiDJg5ePFshR3lLRVUNsxoGK/J+pXWcOW/if9A16oQGaeXQx/xo6J+XnS9hzI+w9tZKc3NeZzk5j3x+/gHXOCx+gvdrcppt+N4IxoxwrnpQzboBMOzlmWc1jxOJw73xOcpFEEsbFOp01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Johannes Weiner writes: > Kswapd currently bails on higher-order allocations with an open-coded > check for whether it's reclaimed the compaction gap. > > compaction_suitable() is the customary interface to coordinate reclaim > with compaction. > > Signed-off-by: Johannes Weiner > --- > mm/vmscan.c | 67 ++++++++++++++++++----------------------------------- > 1 file changed, 23 insertions(+), 44 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index ee8c8ca2e7b5..723705b9e4d9 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -6872,12 +6872,18 @@ static bool pgdat_balanced(pg_data_t *pgdat, int order, int highest_zoneidx) > if (!managed_zone(zone)) > continue; > > + /* Allocation can succeed in any zone, done */ > if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) > mark = wmark_pages(zone, WMARK_PROMO); > else > mark = high_wmark_pages(zone); > if (zone_watermark_ok_safe(zone, order, mark, highest_zoneidx)) > return true; > + > + /* Allocation can't succeed, but enough order-0 to compact */ > + if (compaction_suitable(zone, order, > + highest_zoneidx) == COMPACT_CONTINUE) > + return true; Should we check the following first? order > 0 && zone_watermark_ok_safe(zone, 0, mark, highest_zoneidx) Best Regards, Huang, Ying > } > > /* > @@ -6968,16 +6974,6 @@ static bool kswapd_shrink_node(pg_data_t *pgdat, > */ > shrink_node(pgdat, sc); > > - /* > - * Fragmentation may mean that the system cannot be rebalanced for > - * high-order allocations. If twice the allocation size has been > - * reclaimed then recheck watermarks only at order-0 to prevent > - * excessive reclaim. Assume that a process requested a high-order > - * can direct reclaim/compact. > - */ > - if (sc->order && sc->nr_reclaimed >= compact_gap(sc->order)) > - sc->order = 0; > - > return sc->nr_scanned >= sc->nr_to_reclaim; > } > > @@ -7018,15 +7014,13 @@ clear_reclaim_active(pg_data_t *pgdat, int highest_zoneidx) > * that are eligible for use by the caller until at least one zone is > * balanced. > * > - * Returns the order kswapd finished reclaiming at. > - * > * kswapd scans the zones in the highmem->normal->dma direction. It skips > * zones which have free_pages > high_wmark_pages(zone), but once a zone is > * found to have free_pages <= high_wmark_pages(zone), any page in that zone > * or lower is eligible for reclaim until at least one usable zone is > * balanced. > */ > -static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) > +static void balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) > { > int i; > unsigned long nr_soft_reclaimed; > @@ -7226,14 +7220,6 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int highest_zoneidx) > __fs_reclaim_release(_THIS_IP_); > psi_memstall_leave(&pflags); > set_task_reclaim_state(current, NULL); > - > - /* > - * Return the order kswapd stopped reclaiming at as > - * prepare_kswapd_sleep() takes it into account. If another caller > - * entered the allocator slow path while kswapd was awake, order will > - * remain at the higher level. > - */ > - return sc.order; > } > > /* > @@ -7251,7 +7237,7 @@ static enum zone_type kswapd_highest_zoneidx(pg_data_t *pgdat, > return curr_idx == MAX_NR_ZONES ? prev_highest_zoneidx : curr_idx; > } > > -static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_order, > +static void kswapd_try_to_sleep(pg_data_t *pgdat, int order, > unsigned int highest_zoneidx) > { > long remaining = 0; > @@ -7269,7 +7255,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o > * eligible zone balanced that it's also unlikely that compaction will > * succeed. > */ > - if (prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) { > + if (prepare_kswapd_sleep(pgdat, order, highest_zoneidx)) { > /* > * Compaction records what page blocks it recently failed to > * isolate pages from and skips them in the future scanning. > @@ -7282,7 +7268,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o > * We have freed the memory, now we should compact it to make > * allocation of the requested order possible. > */ > - wakeup_kcompactd(pgdat, alloc_order, highest_zoneidx); > + wakeup_kcompactd(pgdat, order, highest_zoneidx); > > remaining = schedule_timeout(HZ/10); > > @@ -7296,8 +7282,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o > kswapd_highest_zoneidx(pgdat, > highest_zoneidx)); > > - if (READ_ONCE(pgdat->kswapd_order) < reclaim_order) > - WRITE_ONCE(pgdat->kswapd_order, reclaim_order); > + if (READ_ONCE(pgdat->kswapd_order) < order) > + WRITE_ONCE(pgdat->kswapd_order, order); > } > > finish_wait(&pgdat->kswapd_wait, &wait); > @@ -7308,8 +7294,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o > * After a short sleep, check if it was a premature sleep. If not, then > * go fully to sleep until explicitly woken up. > */ > - if (!remaining && > - prepare_kswapd_sleep(pgdat, reclaim_order, highest_zoneidx)) { > + if (!remaining && prepare_kswapd_sleep(pgdat, order, highest_zoneidx)) { > trace_mm_vmscan_kswapd_sleep(pgdat->node_id); > > /* > @@ -7350,8 +7335,7 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int alloc_order, int reclaim_o > */ > static int kswapd(void *p) > { > - unsigned int alloc_order, reclaim_order; > - unsigned int highest_zoneidx = MAX_NR_ZONES - 1; > + unsigned int order, highest_zoneidx; > pg_data_t *pgdat = (pg_data_t *)p; > struct task_struct *tsk = current; > const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); > @@ -7374,22 +7358,20 @@ static int kswapd(void *p) > tsk->flags |= PF_MEMALLOC | PF_KSWAPD; > set_freezable(); > > - WRITE_ONCE(pgdat->kswapd_order, 0); > + order = 0; > + highest_zoneidx = MAX_NR_ZONES - 1; > + WRITE_ONCE(pgdat->kswapd_order, order); > WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES); > + > atomic_set(&pgdat->nr_writeback_throttled, 0); > + > for ( ; ; ) { > bool ret; > > - alloc_order = reclaim_order = READ_ONCE(pgdat->kswapd_order); > - highest_zoneidx = kswapd_highest_zoneidx(pgdat, > - highest_zoneidx); > - > -kswapd_try_sleep: > - kswapd_try_to_sleep(pgdat, alloc_order, reclaim_order, > - highest_zoneidx); > + kswapd_try_to_sleep(pgdat, order, highest_zoneidx); > > /* Read the new order and highest_zoneidx */ > - alloc_order = READ_ONCE(pgdat->kswapd_order); > + order = READ_ONCE(pgdat->kswapd_order); > highest_zoneidx = kswapd_highest_zoneidx(pgdat, > highest_zoneidx); > WRITE_ONCE(pgdat->kswapd_order, 0); > @@ -7415,11 +7397,8 @@ static int kswapd(void *p) > * request (alloc_order). > */ > trace_mm_vmscan_kswapd_wake(pgdat->node_id, highest_zoneidx, > - alloc_order); > - reclaim_order = balance_pgdat(pgdat, alloc_order, > - highest_zoneidx); > - if (reclaim_order < alloc_order) > - goto kswapd_try_sleep; > + order); > + balance_pgdat(pgdat, order, highest_zoneidx); > } > > tsk->flags &= ~(PF_MEMALLOC | PF_KSWAPD);