From: Zi Yan <zi.yan@cs.rutgers.edu>
To: Mel Gorman <mgorman@suse.de>
Cc: David Nellans <dnellans@nvidia.com>,
Anshuman Khandual <khandual@linux.vnet.ibm.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
mhocko@suse.com, vbabka@suse.cz, minchan@kernel.org,
aneesh.kumar@linux.vnet.ibm.com, bsingharora@gmail.com,
srikar@linux.vnet.ibm.com, haren@linux.vnet.ibm.com,
jglisse@redhat.com, dave.hansen@intel.com,
dan.j.williams@intel.com,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH 0/6] Enable parallel page migration
Date: Thu, 9 Mar 2017 17:46:16 -0600 [thread overview]
Message-ID: <58C1E948.9020306@cs.rutgers.edu> (raw)
In-Reply-To: <20170309221522.hwk4wyaqx2jonru6@suse.de>
[-- Attachment #1: Type: text/plain, Size: 5035 bytes --]
Hi Mel,
Thanks for pointing out the problems in this patchset.
It was my intern project done in NVIDIA last summer. I only used
micro-benchmarks to demonstrate the big memory bandwidth utilization gap
between base page migration and THP migration along with serialized page
migration vs parallel page migration.
Here are cross-socket serialized page migration results from calling
move_pages() syscall:
In x86_64, a Intel two-socket E5-2640v3 box,
single 4KB base page migration takes 62.47 us, using 0.06 GB/s BW,
single 2MB THP migration takes 658.54 us, using 2.97 GB/s BW,
512 4KB base page migration takes 1987.38 us, using 0.98 GB/s BW.
In ppc64, a two-socket Power8 box,
single 64KB base page migration takes 49.3 us, using 1.24 GB/s BW,
single 16MB THP migration takes 2202.17 us, using 7.10 GB/s BW,
256 64KB base page migration takes 2543.65 us, using 6.14 GB/s BW.
THP migration is not slow at all when compared to a group of equivalent
base page migrations.
For 1-thread vs 8-thread THP migration:
In x86_64,
1-thread 2MB THP migration takes 658.54 us, using 2.97 GB/s BW,
8-thread 2MB THP migration takes 227.76 us, using 8.58 GB/s BW.
In ppc64,
1-thread 16MB THP migration takes 2202.17 us, using 7.10 GB/s BW,
8-thread 16MB THP migration takes 1223.87 us, using 12.77 GB/s BW.
This big increase on BW utilization is the motivation of pushing this
patchset.
>
> So the key potential issue here in my mind is that THP migration is too slow
> in some cases. What I object to is improving that using a high priority
> workqueue that potentially starves other CPUs and pollutes their cache
> which is generally very expensive.
I might not completely agree with this. Using a high priority workqueue
can guarantee page migration work is done ASAP. Otherwise, we completely
lose the speedup brought by parallel page migration, if data copy
threads have to wait.
I understand your concern on CPU utilization impact. I think checking
CPU utilization and only using idle CPUs could potentially avoid this
problem.
>
> Lets look at the core of what copy_huge_page does in mm/migrate.c which
> is the function that gets parallelised by the series in question. For
> a !HIGHMEM system, it's woefully inefficient. Historically, it was an
> implementation that would work generically which was fine but maybe not
> for future systems. It was also fine back when hugetlbfs was the only huge
> page implementation and COW operations were incredibly rare on the grounds
> due to the risk that they could terminate the process with prejudice.
>
> The function takes a huge page, splits it into PAGE_SIZE chunks, kmap_atomics
> the source and destination for each PAGE_SIZE chunk and copies it. The
> parallelised version does one kmap and copies it in chunks assuming the
> THP is fully mapped and accessible. Fundamentally, this is broken in the
> generic sense as the kmap is not guaranteed to make the whole page necessary
> but it happens to work on !highmem systems. What is more important to
> note is that it's multiple preempt and pagefault enables and disables
> on a per-page basis that happens 512 times (for THP on x86-64 at least),
> all of which are expensive operations depending on the kernel config and
> I suspect that the parallisation is actually masking that stupid overhead.
You are right on kmap, I think making this patchset depend on !HIGHMEM
can avoid the problem. It might not make sense to kmap potentially 512
base pages to migrate a THP in a system with highmem.
>
> At the very least, I would have expected an initial attempt of one patch that
> optimised for !highmem systems to ignore kmap, simply disable preempt (if
> that is even necessary, I didn't check) and copy a pinned physical->physical
> page as a single copy without looping on a PAGE_SIZE basis and see how
> much that gained. Do it initially for THP only and worry about gigantic
> pages when or if that is a problem.
I can try this out to show how much improvement we can obtain from
existing THP migration, which is shown in the data above.
>
> That would be patch 1 of a series. Maybe that'll be enough, maybe not but
> I feel it's important to optimise the serialised case as much as possible
> before considering parallelisation to highlight and justify why it's
> necessary[1]. If nothing else, what if two CPUs both parallelise a migration
> at the same time and end up preempting each other? Between that and the
> workqueue setup, it's potentially much slower than an optimised serial copy.
>
> It would be tempting to experiment but the test case was not even included
> with the series (maybe it's somewhere else)[2]. While it's obvious how
> such a test case could be constructed, it feels unnecessary to construct
> it when it should be in the changelog.
Do you mean performing multiple parallel page migrations at the same
time and show all the page migration time?
--
Best Regards,
Yan Zi
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 537 bytes --]
next prev parent reply other threads:[~2017-03-09 23:46 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-17 11:24 Anshuman Khandual
2017-02-17 11:24 ` [PATCH 1/6] mm/migrate: Add new mode parameter to migrate_page_copy() function Anshuman Khandual
2017-03-09 6:24 ` Anshuman Khandual
2017-02-17 11:24 ` [PATCH 2/6] mm/migrate: Make migrate_mode types non-exclusive Anshuman Khandual
2017-02-17 11:24 ` [PATCH 3/6] mm/migrate: Add copy_pages_mthread function Anshuman Khandual
2017-02-17 12:27 ` kbuild test robot
2017-03-08 15:40 ` Anshuman Khandual
2017-03-09 6:25 ` Anshuman Khandual
2017-02-17 11:24 ` [PATCH 4/6] mm/migrate: Add new migrate mode MIGRATE_MT Anshuman Khandual
2017-02-17 11:24 ` [PATCH 5/6] mm/migrate: Add new migration flag MPOL_MF_MOVE_MT for syscalls Anshuman Khandual
2017-03-09 6:26 ` Anshuman Khandual
2017-02-17 11:24 ` [PATCH 6/6] sysctl: Add global tunable mt_page_copy Anshuman Khandual
2017-02-17 15:30 ` kbuild test robot
2017-03-08 15:37 ` Anshuman Khandual
2017-03-10 1:12 ` [kbuild-all] " Ye Xiaolong
2017-03-10 12:11 ` Anshuman Khandual
2017-02-22 5:04 ` [PATCH 0/6] Enable parallel page migration Balbir Singh
2017-02-22 5:55 ` Anshuman Khandual
2017-02-22 10:52 ` Balbir Singh
2017-03-08 16:04 ` Anshuman Khandual
2017-03-09 15:09 ` Mel Gorman
2017-03-09 17:38 ` David Nellans
2017-03-09 22:15 ` Mel Gorman
2017-03-09 23:46 ` Zi Yan [this message]
2017-03-10 14:07 ` Mel Gorman
2017-03-10 14:45 ` Michal Hocko
2017-03-10 13:05 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=58C1E948.9020306@cs.rutgers.edu \
--to=zi.yan@cs.rutgers.edu \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=bsingharora@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dnellans@nvidia.com \
--cc=haren@linux.vnet.ibm.com \
--cc=jglisse@redhat.com \
--cc=khandual@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=srikar@linux.vnet.ibm.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox