From: "Huang, Ying" <ying.huang@intel.com>
To: Zi Yan <ziy@nvidia.com>
Cc: Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>,
<linux-kernel@vger.kernel.org>, Yang Shi <shy828301@gmail.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Oscar Salvador <osalvador@suse.de>,
"Matthew Wilcox" <willy@infradead.org>,
Bharata B Rao <bharata@amd.com>,
"Alistair Popple" <apopple@nvidia.com>,
haoxin <xhao@linux.alibaba.com>
Subject: Re: [PATCH 8/8] migrate_pages: batch flushing TLB
Date: Wed, 04 Jan 2023 09:41:14 +0800 [thread overview]
Message-ID: <87o7rfm5z9.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <D8F02562-AA7E-4868-BE58-F1144728A352@nvidia.com> (Zi Yan's message of "Tue, 03 Jan 2023 14:19:23 -0500")
Zi Yan <ziy@nvidia.com> writes:
> On 26 Dec 2022, at 19:28, Huang Ying wrote:
>
>> The TLB flushing will cost quite some CPU cycles during the folio
>> migration in some situations. For example, when migrate a folio of a
>> process with multiple active threads that run on multiple CPUs. After
>> batching the _unmap and _move in migrate_pages(), the TLB flushing can
>> be batched easily with the existing TLB flush batching mechanism.
>> This patch implements that.
>>
>> We use the following test case to test the patch.
>>
>> On a 2-socket Intel server,
>>
>> - Run pmbench memory accessing benchmark
>>
>> - Run `migratepages` to migrate pages of pmbench between node 0 and
>> node 1 back and forth.
>>
>> With the patch, the TLB flushing IPI reduces 99.1% during the test and
>> the number of pages migrated successfully per second increases 291.7%.
>>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Zi Yan <ziy@nvidia.com>
>> Cc: Yang Shi <shy828301@gmail.com>
>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>> Cc: Oscar Salvador <osalvador@suse.de>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Bharata B Rao <bharata@amd.com>
>> Cc: Alistair Popple <apopple@nvidia.com>
>> Cc: haoxin <xhao@linux.alibaba.com>
>> ---
>> mm/migrate.c | 4 +++-
>> mm/rmap.c | 20 +++++++++++++++++---
>> 2 files changed, 20 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 70a40b8fee1f..d7413164e748 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1215,7 +1215,7 @@ static int migrate_folio_unmap(new_page_t get_new_page, free_page_t put_new_page
>> /* Establish migration ptes */
>> VM_BUG_ON_FOLIO(folio_test_anon(src) &&
>> !folio_test_ksm(src) && !anon_vma, src);
>> - try_to_migrate(src, 0);
>> + try_to_migrate(src, TTU_BATCH_FLUSH);
>> page_was_mapped = 1;
>> }
>>
>> @@ -1732,6 +1732,8 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>> stats->nr_thp_failed += thp_retry;
>> stats->nr_failed_pages += nr_retry_pages;
>> move:
>> + try_to_unmap_flush();
>> +
>> retry = 1;
>> for (pass = 0; pass < 10 && (retry || large_retry); pass++) {
>> retry = 0;
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index b616870a09be..2e125f3e462e 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1976,7 +1976,21 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma,
>> } else {
>> flush_cache_page(vma, address, pte_pfn(*pvmw.pte));
>> /* Nuke the page table entry. */
>> - pteval = ptep_clear_flush(vma, address, pvmw.pte);
>> + if (should_defer_flush(mm, flags)) {
>> + /*
>> + * We clear the PTE but do not flush so potentially
>> + * a remote CPU could still be writing to the folio.
>> + * If the entry was previously clean then the
>> + * architecture must guarantee that a clear->dirty
>> + * transition on a cached TLB entry is written through
>> + * and traps if the PTE is unmapped.
>> + */
>> + pteval = ptep_get_and_clear(mm, address, pvmw.pte);
>> +
>> + set_tlb_ubc_flush_pending(mm, pte_dirty(pteval));
>> + } else {
>> + pteval = ptep_clear_flush(vma, address, pvmw.pte);
>> + }
>> }
>>
>
> This is only for PTE mapped pages, right? We also need something similar
> in set_pmd_migration_entry() in mm/huge_memory.c for PMD-mapped THPs.
> Oh, since you limit NR_MAX_BATCHED_MIGRATION to HPAGE_PMD_NR and count
> nr_pages with folio_nr_pages(), THPs will only be migrated one by one.
> This is not obvious from the cover letter.
>
> Are you planning to support batched THP migration? If not, it might be
> better to update cover letter to be explicit about it and add comments
> in migrate_pages(). It would be nice to also note that we need to
> increase NR_MAX_BATCHED_MIGRATION beyond HPAGE_PMD_NR and make similar
> changes in set_pmd_migration_entry() to get batched THP migration support.
For now, I have no plan to support batching THP migration. Because the
overhead of THP TLB shootdown is only 1/512 of that of the 4KB normal
page. I will add some words in patch description for that.
Best Regards,
Huang, Ying
>> /* Set the dirty flag on the folio now the pte is gone. */
>> @@ -2148,10 +2162,10 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags)
>>
>> /*
>> * Migration always ignores mlock and only supports TTU_RMAP_LOCKED and
>> - * TTU_SPLIT_HUGE_PMD and TTU_SYNC flags.
>> + * TTU_SPLIT_HUGE_PMD, TTU_SYNC, and TTU_BATCH_FLUSH flags.
>> */
>> if (WARN_ON_ONCE(flags & ~(TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD |
>> - TTU_SYNC)))
>> + TTU_SYNC | TTU_BATCH_FLUSH)))
>> return;
>>
>> if (folio_is_zone_device(folio) &&
>> --
>> 2.35.1
prev parent reply other threads:[~2023-01-04 1:42 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-27 0:28 [PATCH 0/8] migrate_pages(): batch TLB flushing Huang Ying
2022-12-27 0:28 ` [PATCH 1/8] migrate_pages: organize stats with struct migrate_pages_stats Huang Ying
2023-01-03 18:06 ` Zi Yan
2023-01-05 3:02 ` Alistair Popple
2023-01-05 5:53 ` Huang, Ying
2023-01-05 6:50 ` Alistair Popple
2023-01-05 7:06 ` Huang, Ying
2022-12-27 0:28 ` [PATCH 2/8] migrate_pages: separate hugetlb folios migration Huang Ying
2022-12-28 23:17 ` Andrew Morton
2023-01-02 23:53 ` Huang, Ying
2023-01-05 4:13 ` Alistair Popple
2023-01-05 5:51 ` Huang, Ying
2023-01-05 6:43 ` Alistair Popple
2023-01-05 7:31 ` Huang, Ying
2023-01-05 7:39 ` Alistair Popple
2023-01-09 7:23 ` Huang, Ying
2023-01-10 1:37 ` Alistair Popple
2022-12-27 0:28 ` [PATCH 3/8] migrate_pages: restrict number of pages to migrate in batch Huang Ying
2023-01-03 18:40 ` Zi Yan
2023-01-04 0:24 ` Huang, Ying
2022-12-27 0:28 ` [PATCH 4/8] migrate_pages: split unmap_and_move() to _unmap() and _move() Huang Ying
2023-01-03 18:55 ` Zi Yan
2023-01-05 18:26 ` Nathan Chancellor
2023-01-05 18:57 ` Kees Cook
2023-01-08 23:33 ` Huang, Ying
2022-12-27 0:28 ` [PATCH 5/8] migrate_pages: batch _unmap and _move Huang Ying
2022-12-28 23:22 ` Andrew Morton
2023-01-02 23:29 ` Huang, Ying
2023-01-03 19:01 ` Zi Yan
2023-01-04 0:34 ` Huang, Ying
2022-12-27 0:28 ` [PATCH 6/8] migrate_pages: move migrate_folio_done() and migrate_folio_unmap() Huang Ying
2023-01-03 19:02 ` Zi Yan
2023-01-04 1:26 ` Huang, Ying
2022-12-27 0:28 ` [PATCH 7/8] migrate_pages: share more code between _unmap and _move Huang Ying
2023-01-04 7:12 ` Alistair Popple
2023-01-06 4:15 ` Huang, Ying
2022-12-27 0:28 ` [PATCH 8/8] migrate_pages: batch flushing TLB Huang Ying
2023-01-03 19:19 ` Zi Yan
2023-01-04 1:41 ` Huang, Ying [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87o7rfm5z9.fsf@yhuang6-desk2.ccr.corp.intel.com \
--to=ying.huang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=bharata@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=osalvador@suse.de \
--cc=shy828301@gmail.com \
--cc=willy@infradead.org \
--cc=xhao@linux.alibaba.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox