From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4D93C4708E for ; Wed, 4 Jan 2023 01:42:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 145C18E0002; Tue, 3 Jan 2023 20:42:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F5E48E0001; Tue, 3 Jan 2023 20:42:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB1038E0002; Tue, 3 Jan 2023 20:42:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id D943F8E0001 for ; Tue, 3 Jan 2023 20:42:17 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id AB08040A53 for ; Wed, 4 Jan 2023 01:42:16 +0000 (UTC) X-FDA: 80315416272.01.224235C Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf25.hostedemail.com (Postfix) with ESMTP id AD351A000C for ; Wed, 4 Jan 2023 01:42:12 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ru1kxkW0; spf=pass (imf25.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672796535; a=rsa-sha256; cv=none; b=NTI813WrsBaaDn45yVHTiirTVP3Sj5TPAlwJ+J8AXmLi8pbbrq1IvHW5sGbBkLYN2iaDXq IKs142dnaTXayhIIjka6nEV0Wy6XLyKdq/9G93cXctca/xP3Bn7WM/KJumxPmfJt3TXgHd Si/3JJaAbbzdGqGFARnMo58/rid5L0k= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=Ru1kxkW0; spf=pass (imf25.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672796535; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LR2QxRKylXJus6BbCypjPXCVLdoaPixujJ4SiE+0SHA=; b=FQ1n3Io4wXqSwS67wi9jeZVpa1QJUXRhfwcokPvAFdUHCO0OwAPqKLIivltwNVALw+Qcdw Jtj/S0bEuvPhVj/Z8x9NaiaubolB2CCtqyA6PWYE8OEYGI9cy8xw3HX2Ol0P06Eyg9UTqe uNBHUI9JSNfLtUtu+8pr7uWbNM1e1QM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1672796533; x=1704332533; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=3FWdbGNJ00seTicTD9h7GyRV6nTIv2Ka0HqJ4nF8V0I=; b=Ru1kxkW0e09nupBnyX6vaHiz9+7GVPRfWBmazz3znTjKt7qhiQ7i1Eya 9rxkximeprIk5EbPaXy603jdOpU69B93/XlUAQtBcl/4ybb0wTzt8SECH QRl6G7nmRFtmVtIEQ0Emv1w07K1oASlQuk7IeR2Kok5bRKDA9ZSJNoV3E TBKFIs419FASyhds4NYfRE8hDn2iWQEZ3BeGxVR3x37SBBicFSwPsM2k8 /QGvCMgplHR8QoBHZ+UqzVWIemWodREm2jXURmdTxbL85ebB+93DB7+JQ M62jgcO/K4ic//yklQDJt3K62s04kdOPCah5UP1knwXvXXRs8gyq8qV+B w==; X-IronPort-AV: E=McAfee;i="6500,9779,10579"; a="302179911" X-IronPort-AV: E=Sophos;i="5.96,297,1665471600"; d="scan'208";a="302179911" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2023 17:42:10 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10579"; a="656956601" X-IronPort-AV: E=Sophos;i="5.96,297,1665471600"; d="scan'208";a="656956601" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2023 17:42:07 -0800 From: "Huang, Ying" To: Zi Yan Cc: Andrew Morton , , , Yang Shi , Baolin Wang , Oscar Salvador , "Matthew Wilcox" , Bharata B Rao , "Alistair Popple" , haoxin Subject: Re: [PATCH 8/8] migrate_pages: batch flushing TLB References: <20221227002859.27740-1-ying.huang@intel.com> <20221227002859.27740-9-ying.huang@intel.com> Date: Wed, 04 Jan 2023 09:41:14 +0800 In-Reply-To: (Zi Yan's message of "Tue, 03 Jan 2023 14:19:23 -0500") Message-ID: <87o7rfm5z9.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: AD351A000C X-Stat-Signature: 8gcfum9wt98x9ffhtwytodhhydrxhef3 X-HE-Tag: 1672796532-365458 X-HE-Meta: U2FsdGVkX1+78cOpWJjiJ6SI9SsSQYlLjvtSLoF6XXG1WUKpxWdYTVzNROaZx3K/nf3oxo8uwyg/xeoOf7HY2ShHYurM4wKOrccJ12UWpQmxteMNQUN5ac4q+d7fIsG2g5kqLeUOvR5OyScbuwzIKXhu7RM7EIKXTkAhWw2MqCIbcs+Tlu1HIWbp0q9oVAZmIxxzWzwcNHwCeQwutWCnG3BsoTVId7i9R+A2LQzYXGan/S4OjoHwaOiLJWAysKQq3HBUpHNAQKLU4LNVcZ2ElyGIPjk4pU4mIvB5RiiQNvSvojS83lQgry8q/PDqLUZD4d/lLjphQMtKqVoqtR5xMpkvCDIkcJsByHaR3FzcyGiqMKLdX/cWzym3BuLGsgwFPa8oIDgiHmCYIDJcUJ8h/+uyI3FO9WK3XNI/b2Jjk1qU2P/tBO8ZgaxjCJgu8cbl+gWaPMSkQ8+ht5FoqbqPTPyCsC4DZnWRMQ0wYI0bVDlcTHf5WMtdn2hdLEtS4WTwS7BnqIbLZ78tUWbc+oLNcmtSFCVt8CDiV+lOAWz03u+ywGeNfhfANdRv1NXKdHATfVURSUf23Gd7gZYHo350fypVRXsuIK6i3yOgzgF0D6ZYmS5hl0zq+OUCZskI56oryd0fWzCT2c7XIAXPMzmAoXUxfMP3clATwR0mRvkqO9chWhkkCb5gOJi+Jkzq9MDty/z9TE9kLTU/HLBbdd28IwfFnH+Qo1RqxEJp0lkWdl8HgxSgXbsrpmQwbMKNP54TheyqEz2iHncclDJucSffTh3W19dh5gJD6rhlUNYMRi0XaDG/9IQXG8z6wdjI8ENGWaVB+6nkKtSR0aNdajaSx6nP1Cg2MGLFesxkVatx2rH8Xep4Dvb6i313UsdqUbr9eGDIxctxmzGKebGb1L3gaFfXXaV6dbiU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Zi Yan writes: > On 26 Dec 2022, at 19:28, Huang Ying wrote: > >> The TLB flushing will cost quite some CPU cycles during the folio >> migration in some situations. For example, when migrate a folio of a >> process with multiple active threads that run on multiple CPUs. After >> batching the _unmap and _move in migrate_pages(), the TLB flushing can >> be batched easily with the existing TLB flush batching mechanism. >> This patch implements that. >> >> We use the following test case to test the patch. >> >> On a 2-socket Intel server, >> >> - Run pmbench memory accessing benchmark >> >> - Run `migratepages` to migrate pages of pmbench between node 0 and >> node 1 back and forth. >> >> With the patch, the TLB flushing IPI reduces 99.1% during the test and >> the number of pages migrated successfully per second increases 291.7%. >> >> Signed-off-by: "Huang, Ying" >> Cc: Zi Yan >> Cc: Yang Shi >> Cc: Baolin Wang >> Cc: Oscar Salvador >> Cc: Matthew Wilcox >> Cc: Bharata B Rao >> Cc: Alistair Popple >> Cc: haoxin >> --- >> mm/migrate.c | 4 +++- >> mm/rmap.c | 20 +++++++++++++++++--- >> 2 files changed, 20 insertions(+), 4 deletions(-) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 70a40b8fee1f..d7413164e748 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1215,7 +1215,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page >> /* Establish migration ptes */ >> VM_BUG_ON_FOLIO(folio_test_anon(src) && >> !folio_test_ksm(src) && !anon_vma, src); >> - try_to_migrate(src, 0); >> + try_to_migrate(src, TTU_BATCH_FLUSH); >> page_was_mapped = 1; >> } >> >> @@ -1732,6 +1732,8 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page, >> stats->nr_thp_failed += thp_retry; >> stats->nr_failed_pages += nr_retry_pages; >> move: >> + try_to_unmap_flush(); >> + >> retry = 1; >> for (pass = 0; pass < 10 && (retry || large_retry); pass++) { >> retry = 0; >> diff --git a/mm/rmap.c b/mm/rmap.c >> index b616870a09be..2e125f3e462e 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1976,7 +1976,21 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, >> } else { >> flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); >> /* Nuke the page table entry. */ >> - pteval = ptep_clear_flush(vma, address, pvmw.pte); >> + if (should_defer_flush(mm, flags)) { >> + /* >> + * We clear the PTE but do not flush so potentially >> + * a remote CPU could still be writing to the folio. >> + * If the entry was previously clean then the >> + * architecture must guarantee that a clear->dirty >> + * transition on a cached TLB entry is written through >> + * and traps if the PTE is unmapped. >> + */ >> + pteval = ptep_get_and_clear(mm, address, pvmw.pte); >> + >> + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval)); >> + } else { >> + pteval = ptep_clear_flush(vma, address, pvmw.pte); >> + } >> } >> > > This is only for PTE mapped pages, right? We also need something similar > in set_pmd_migration_entry() in mm/huge_memory.c for PMD-mapped THPs. > Oh, since you limit NR_MAX_BATCHED_MIGRATION to HPAGE_PMD_NR and count > nr_pages with folio_nr_pages(), THPs will only be migrated one by one. > This is not obvious from the cover letter. > > Are you planning to support batched THP migration? If not, it might be > better to update cover letter to be explicit about it and add comments > in migrate_pages(). It would be nice to also note that we need to > increase NR_MAX_BATCHED_MIGRATION beyond HPAGE_PMD_NR and make similar > changes in set_pmd_migration_entry() to get batched THP migration support. For now, I have no plan to support batching THP migration. Because the overhead of THP TLB shootdown is only 1/512 of that of the 4KB normal page. I will add some words in patch description for that. Best Regards, Huang, Ying >> /* Set the dirty flag on the folio now the pte is gone. */ >> @@ -2148,10 +2162,10 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) >> >> /* >> * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and >> - * TTU_SPLIT_HUGE_PMD and TTU_SYNC flags. >> + * TTU_SPLIT_HUGE_PMD, TTU_SYNC, and TTU_BATCH_FLUSH flags. >> */ >> if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD | >> - TTU_SYNC))) >> + TTU_SYNC | TTU_BATCH_FLUSH))) >> return; >> >> if (folio_is_zone_device(folio) && >> -- >> 2.35.1