From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6166C7EE26 for ; Tue, 23 May 2023 01:24:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04217280001; Mon, 22 May 2023 21:24:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F34C7900003; Mon, 22 May 2023 21:24:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DD689280001; Mon, 22 May 2023 21:24:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CC551900003 for ; Mon, 22 May 2023 21:24:35 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9F45B1A05C5 for ; Tue, 23 May 2023 01:24:35 +0000 (UTC) X-FDA: 80819774910.15.E4767A0 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf23.hostedemail.com (Postfix) with ESMTP id DB5EA140009 for ; Tue, 23 May 2023 01:24:32 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="aN/iPv1t"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684805073; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Mi4TQt65Mkm+biNJAAi2Z9lrGodHZjJHG5PSQByjuOI=; b=asILJOYZUYQD5cwZ6wLB1MCtdvESSlGuzRRD3kjgZKjhfsxgypUEQzoJ1n4BbqMrjWaEO4 a1mq2g7hpdodnq30827DzYKAVJuQx0UEOFmVOtKz3U3ixf8jhZN+J2EBVp5lwvTQKn6dNw U0AJ0yPNgJEOaUUvSUCHgxRpt4H2MAc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="aN/iPv1t"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf23.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684805073; a=rsa-sha256; cv=none; b=ymuss3z1O+KiqVxhyG+G0diMdxm+r3nzuf8fe5gQbSed8zm72Wqve7JQkD72EE4f2ruKfd xUlounkgOrJ0mlchc/+8O7arKXkeYUK7BOqFn80FRwJc1yzLdXZ+lC6UxhbmWF5Re6cEP0 zOWepZ3d0lUrxXRaWMvPTroeakkbUNQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684805073; x=1716341073; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=R2bwulSitpiMNefzkQURMvBQZqOHvZc4MsH0ZIFOZuQ=; b=aN/iPv1teH4CYaANuYidWpw0mAAuqENl47aucT53HAXTY5KZ56BVf3c/ hC/9Ec7G5Diclc/lj+7loSJH5nHegiyiLaU+20p91n1eoda8XRPPfMAWp Vpw4hmYQ0YDLW4Y3TX8aOz7AlOrXz0BMaH5rtsHkzC8tJnJvMM4RhZaq+ kLnxIx4GTfRRU05m4yvDTDOW/ooH5L1pkzLhn/Rhud52GvBk+3Q4a/lSC 1HjWIZPv6F7aAMtglWKg4fMJO4y/9P72lfQ6ZaqT+I0t0211xEYpJwa1P RzunxgBu8b6nZhZ0kHy3xgF8NeX4b6TZAu/8neVK69x58u57axmHCFQtf Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10718"; a="356321725" X-IronPort-AV: E=Sophos;i="6.00,184,1681196400"; d="scan'208";a="356321725" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2023 18:24:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10718"; a="815913033" X-IronPort-AV: E=Sophos;i="6.00,184,1681196400"; d="scan'208";a="815913033" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 May 2023 18:24:28 -0700 From: "Huang, Ying" To: Khalid Aziz Cc: David Hildenbrand , akpm@linux-foundation.org, willy@infradead.org, steven.sistare@oracle.com, mgorman@techsingularity.net, khalid@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] mm, compaction: Skip all non-migratable pages during scan References: <20230517161555.84776-1-khalid.aziz@oracle.com> <87sfbubg3j.fsf@yhuang6-desk2.ccr.corp.intel.com> <5f43a17c-94a0-4bff-b451-e5014de36ed9@redhat.com> <875y8k3o60.fsf@yhuang6-desk2.ccr.corp.intel.com> <5ba66714-b4a9-4fd3-1149-4d67b26c370a@oracle.com> Date: Tue, 23 May 2023 09:23:25 +0800 In-Reply-To: <5ba66714-b4a9-4fd3-1149-4d67b26c370a@oracle.com> (Khalid Aziz's message of "Mon, 22 May 2023 09:12:38 -0600") Message-ID: <87bkibyh6a.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: DB5EA140009 X-Stat-Signature: scpjs3g65gtxcc3mqid9hph41bo8tgh3 X-Rspam-User: X-HE-Tag: 1684805072-947661 X-HE-Meta: U2FsdGVkX19yghB+Xtw7yKwxi/Sus+TNxTF7TOPEx2ru2SSDBD9X9HPg2CuUrrZiPMCiyLc63x4NMerp1fBk1dM5N6wo3k6Bf+zwy8gJ+QGiFnvOrvGFyOKhic6hfwimxQn6dsL4z9fxhIAxiXoZS6XwFpz2/FuBekq31Bl7B/+IsKkc1lzn/sHujqJpMVeU6D05TJhPWGCFA1pMbn9XuRFbDk51ykudKhyrzNZLjcD1tWYmwUYYQCouGaHxhjsqll5BoieEvMLZgc2sENl/l1IojIt7RTQ67TxVMi80pUhWlSAJgAIcqcP/WLHsVVu+EK6rqDjfQCrX89qiYqlIUgzymo/B73tq+mwZ4A7nNmBY+4zyh6bZvUvQuz2Ye4vLSMPN5/p03m1wbXLrqtneeZ2oskIRfNopMNW8EwIB4WwGpPcUKOmiuXXYmHXCAw2aFYi7hiccGj5/DHZNlmO6755i3102VvMDcZXxc7ymd58yY7jL+MsaRQOnpXXR2pwRr8H4jHCMYBZueh+i4BhwjpxFNt61bI2HBWAuCT/QkmMlycAfGmTvbM8uZ3Ioea2AsgM5kewd4Kq1jNV2X05kOtiX3d6T02Np/0RZT40A0gUaFB01epn+Cfr5JC5K4P6ImpqNCXJ47tRSmPggL26r6W0NYqZaw4QZGNEZsCzHyMlFsXiO3BbFGKUCc+1YVeVELKMo11+0lajbgwLALl5g4cyoEKLsz1KFZ6stZ4AqsCb3RX9Q1Zgp+CrI9anD0HO+5GosivUzozHdUn8dK2IbIjo4tkgB6/biB9FoCz+fNQdxyQN1/yrUE5yIQVVIeLEPZDBGQHTcQNCCnCnz3iccrj21ofnXoP8s6RJnh8IEgtweF5Vwk0ObgbqEZzUG5XPC75OMiT/pMOXIRi9WVGOL2KOtqYz6aCEB28X8DeRXPRonbKc1vN0/8gBi6r4HGth2C9CmOdCErsRoxGjLymu kYrA+9Ff TjVdnHuEWSkemX1ir1wrnng8EPds62wbwAb93rxFo/YIbISKqW/VvzkVWYGY47LD7KBUFdopGDnkos3qsJFHh38f4P/UkwF41t0BAknE2X2Z8ppsf9Q8wObTX/Ci83Y06lAoQHQo4eualZR0fzsD3PwK0o1CnEyx4IxGoUSHsxusAt4qSqjSr5rtVoWsdbKAiqfrS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Khalid Aziz writes: > On 5/21/23 23:55, Huang, Ying wrote: >> David Hildenbrand writes: >> >>> On 18.05.23 03:09, Huang, Ying wrote: >>>> David Hildenbrand writes: >>>> >>>>> On 17.05.23 18:15, Khalid Aziz wrote: >>>>>> Pages pinned in memory through extra refcounts can not be migrated. >>>>>> Currently as isolate_migratepages_block() scans pages for >>>>>> compaction, it skips any pinned anonymous pages. All non-migratable >>>>>> pages should be skipped and not just the anonymous pinned pages. >>>>>> This patch adds a check for extra refcounts on a page to determine >>>>>> if the page can be migrated. This was seen as a real issue on a >>>>>> customer workload where a large number of pages were pinned by vfio >>>>>> on the host and any attempts to allocate hugepages resulted in >>>>>> significant amount of cpu time spent in either direct compaction or >>>>>> in kcompactd scanning vfio pinned pages over and over again that can >>>>>> not be migrated. >>>>> >>>>> How will this change affect alloc_contig_range(), such as used for CMA >>>>> allocations or virtio-mem? alloc_contig_range() ends up calling >>>>> isolate_migratepages_range() -> isolate_migratepages_block(). >>>> IIUC, cc->alloc_contig can be used to distinguish contiguous >>>> allocation >>>> and compaction. And, from the original commit which introduced >>>> anonymous pages skipping (commit 119d6d59dcc0 ("mm, compaction: avoid >>>> isolating pinned pages ")) and this patch, large number of migration >>>> failure during compaction causes real issue too. So, I suggest to use >>>> cc->alloc_contig here. >>> >>> Agreed. I further wonder if we want to special-case the !alloc_contig >>> case also for MIGRATE_CMA and ZONE_MOVABLE, where we cannot have >>> longterm page pinnings (e.g., vfio pinned pages). >> This makes sense. The skipping is more accurate in this way. >> > > > Something like this? > > diff --git a/mm/compaction.c b/mm/compaction.c > index f04c00981172..014e21d3d7e9 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1025,7 +1025,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > * lru_lock and isolating it unnecessarily > */ > mapping = page_mapping(page); > - if (page_has_extra_refs(page)) > + if (!cc->alloc_contig && page_has_extra_refs(page)) > goto isolate_fail_put; > > /* As suggested by David above, you can check the current zone type (for ZONE_MOVABLE) and page block migrate type (MIGRATE_CMA) too. Because pages there will not be pinned in long term, and should be tried to be migrated. Best Regards, Huang, Ying