From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28DB6C77B75 for ; Mon, 15 May 2023 08:48:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8900E900003; Mon, 15 May 2023 04:48:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 83F73900002; Mon, 15 May 2023 04:48:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E064900003; Mon, 15 May 2023 04:48:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5C95A900002 for ; Mon, 15 May 2023 04:48:37 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 35EDB16137B for ; Mon, 15 May 2023 08:48:37 +0000 (UTC) X-FDA: 80791863474.01.C73F4C3 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by imf11.hostedemail.com (Postfix) with ESMTP id 59F3740017 for ; Mon, 15 May 2023 08:48:34 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YlwdXXub; spf=pass (imf11.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684140515; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lv6Xd/Y7XESC1jt2h68SrQefttY0i4tRmGoEw7VoNDE=; b=ItY6unKEaIZh1tJw/HGfTzOA0wN5s9rjyskLUUBFsw+sG/ZhEKIumFKP64hiqflwJ/M0Mt 3LEGe20ygaOXyXLfy43mmY3kBYVnjA13ENUNLfhkT1H8dt0Pzt5ZP7+jJWwp3tm9i/yUYq a5aBU5WreaESmALWUD4iqcgN+5N98hY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684140515; a=rsa-sha256; cv=none; b=1iVuEu74Q0cyCDHrWm2gLO+pSCM3X0K7/8cMDkLlZHSKklRzvqgXJ72NtoBPpn1jVvXDl4 d3oUeNeIitGcSY/7U28CC/JGPSNMO6NbFYSIUFsMZUSV3ajd2Z4P918/OzOuVB6CS2LaKY RXB60w+8THeBFmVmCZ16YChyX/k7eiU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=YlwdXXub; spf=pass (imf11.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.88 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1684140514; x=1715676514; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=NP9grGktDYmSW/y0bEvt2e7TpvQ9W2Et0cmTgeeaOa8=; b=YlwdXXublNfAAtYWFof1dY/M89znuE9Htn5kSdgMfGe38Zdhe9OmrqJT Ee/zS+a7IYJSDbTUZgsyHe2hHKfgzrngqKrUMkZVsPrTqeex/r7AdzVDh LBAF1/QrCuM0x2dR/7Z/YYPYXFfl18gCR7DGFJO7VKdQ/mUlfNKgEINnQ K4V9SiuGndN7wSXNCuY4kU5n6Q+8xg0YfL0zV1Fax6nZYOBGyp3VheUi6 l4VgH0DCsSn2LeOU4hwszuCcVDD7yUD4N7Vk6tezfrUEDFR6rBPB9V2wz sELml7IWhmjgh9aKXc3Sj/zGIxn72f5RJoYeZSsiX5FDAE+pwwmiK4EwP g==; X-IronPort-AV: E=McAfee;i="6600,9927,10710"; a="379307408" X-IronPort-AV: E=Sophos;i="5.99,276,1677571200"; d="scan'208";a="379307408" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2023 01:48:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10710"; a="845163661" X-IronPort-AV: E=Sophos;i="5.99,276,1677571200"; d="scan'208";a="845163661" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 May 2023 01:48:27 -0700 From: "Huang, Ying" To: Khalid Aziz Cc: akpm@linux-foundation.org, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Steve Sistare , Khalid Aziz , Mel Gorman Subject: Re: [PATCH v2] mm, compaction: Skip all non-migratable pages during scan References: <20230512190000.103737-1-khalid.aziz@oracle.com> Date: Mon, 15 May 2023 16:47:17 +0800 In-Reply-To: <20230512190000.103737-1-khalid.aziz@oracle.com> (Khalid Aziz's message of "Fri, 12 May 2023 13:00:00 -0600") Message-ID: <871qji7zhm.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Stat-Signature: pi9nhnaj4s4b3k8nfyh6rdfbps1xu6mb X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 59F3740017 X-HE-Tag: 1684140514-786829 X-HE-Meta: U2FsdGVkX1+tblAQFTs1QtNaD8CWNW3iqlwhz26pgNY5nDJCbA7Dw7z85+HLJJsseSSQL6lL05wcZ777r7SXLerfTiRIik0pA4NTD2Ik43e+0FHd1bk8FE3TWMRdhSRCKNynfgJNFd/9M80VAUzzM6b9VyeLJm0K9vR3c6rK57D+MSRNIqnjgy9TNRkXnDxSHXV+Oibb91qMlUWI+buduCdLGprFhzMBPSBa+Z4+ew4ypGscM6SswK1KPpDVvhIYPP4xYrtvufStymbGUc0v/WfBx34eEUJlP/L3jns8r3DL6pdJn3cAMwf8y80mCQ39IukBGXWZSB8FhJ2LUbXZAfAe3hIbrOw0mMDXGlh+Yuhj3FQ9EXlFsCbx61412nQdA6YAIfHV7CQVfe1e1z8/PO8/+4h73MLDjau8XXgXD1gxaAwQFdn7hV0N9pmszF7z0z2uUGdrAfsVGLLDgmQxSegv4VDtKOCQTNfnc3UxG7qJNFlXO6Lv9yGif6LMmITkCa1WLuOeDF6pXbzm2mIPJp53iIAoPXmiP9jdJ/tOxYM6++SFcELf+Kc6XTHz8nhJSG1wTut1h87FWJiCezkRi0Khfl5ZmSLEurSRiTsbxfIOMXL5z7gm4r4xpKCqqQPPBW3xTUMZKy7SRzDCyC2F50RRg97LzwIpMGgfq6vrAcOFKvouqn6I+3ZAzF0SAhAuYm+NKKk91t3OqG6Qfl/bJwxOHqWSAxiolf91dY3od4MlqIdNrPaNlHiIZWrr+S2fPpFp/K3KcuXKnP2kdqUGZ8OJ00dY9qydKnnZOaTWEz70eLeETfYdK3KrQI/ephyP6muZCGz9MK2an3Rep5UX8IUORs2xBXdKpjkoKbOIaanNNmFvK8rUzb1NvpMZOJzR1QUWtn7J0FntFUL8zawW1JwrmM815RihoGucHr/eP75GCYRS6BTzYUCvbSXfX1e1e2DhN9x7LrybbUADsQ1 NN5tu0uO 1tJGT3A09XXj1FG2OT6PUXQSZ0qacODzME67rhhzXbVS7Yqdh0J7koM7HF8aBjsD10jn68k/0VTfk+eB0iIvwK/zspGazggI2gImZte5VpTUcRVPjnjsd2mzBHcMgxRnnHNcRhgc5xbr/mMFv5Q+QdYpHapZViKHdF6Bd6BKNh1sAgbbwXmXgN43kZVw7YxmG/5xy6Re60ybJU3d6RCkRpQwF+h2oZM1iZcsN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Khalid, Cced Mel. Khalid Aziz writes: > Pages pinned in memory through extra refcounts can not be migrated. > Currently as isolate_migratepages_block() scans pages for > compaction, it skips any pinned anonymous pages. All non-migratable > pages should be skipped and not just the anonymous pinned pages. > This patch adds a check for extra refcounts on a page to determine > if the page can be migrated. This was seen as a real issue on a > customer workload where a large number of pages were pinned by vfio > on the host and any attempts to allocate hugepages resulted in > significant amount of cpu time spent in either direct compaction or > in kcompatd scanning vfio pinned pages over and over again that can s/kcompatd/kcompactd/ > not be migrated. With the patch below, the cycles for kcompactd disappeared? > Signed-off-by: Khalid Aziz > Suggested-by: Steve Sistare > Cc: Khalid Aziz > --- > v2: > - Update comments in the code (Suggested by Andrew) > - Use PagePrivate() instead of page_has_private() (Suggested > by Matthew) > - Pass mapping to page_has_extrarefs() (Suggested by Matthew) > - Use page_ref_count() (Suggested by Matthew) > - Rename is_pinned_page() to reflect its function more > accurately (Suggested by Matthew) > > mm/compaction.c | 36 ++++++++++++++++++++++++++++++++---- > 1 file changed, 32 insertions(+), 4 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 5a9501e0ae01..837f20df2bbb 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -764,6 +764,34 @@ static bool too_many_isolated(pg_data_t *pgdat) > return too_many; > } > > +/* > + * Check if this base page should be skipped from isolation because > + * it has extra refcounts that will prevent it from being migrated. > + * This function is called for regular pages only, and not > + * for THP or hugetlbfs pages. This code is inspired by similar code > + * in migrate_vma_check_page(), can_split_folio() and > + * folio_migrate_mapping() > + */ > +static inline bool page_has_extrarefs(struct page *page, Better to be named as page_has_extra_refs()? > + struct address_space *mapping) > +{ > + unsigned long extra_refs; > + > + /* anonymous page can have extra ref from swap cache */ > + if (mapping) > + extra_refs = 1 + PagePrivate(page); > + else > + extra_refs = PageSwapCache(page) ? 1 : 0; IIUC, mapping != NULL if PageSwapCache(page) is true. Please check the implementation of page_mapping(). And even if mapping == NULL, the extra_refs should be 1, because we have elevated the page refcount in isolate_migratepages_block() before checking whether the page is pinned. IIUC, this is the original behavior. Or, we can add "- 1" in the following checking. > + > + /* > + * This is an admittedly racy check but good enough to determine > + * if a page is pinned and can not be migrated > + */ > + if ((page_ref_count(page) - extra_refs) > page_mapcount(page)) > + return true; > + return false; > +} > + > /** > * isolate_migratepages_block() - isolate all migrate-able pages within > * a single pageblock > @@ -992,12 +1020,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > goto isolate_fail; > > /* > - * Migration will fail if an anonymous page is pinned in memory, > - * so avoid taking lru_lock and isolating it unnecessarily in an > - * admittedly racy check. > + * Migration will fail if a page has extra refcounts > + * preventing it from migrating, so avoid taking > + * lru_lock and isolating it unnecessarily > */ > mapping = page_mapping(page); > - if (!mapping && (page_count(page) - 1) > total_mapcount(page)) > + if (page_has_extrarefs(page, mapping)) > goto isolate_fail_put; > > /* Best Regards, Huang, Ying