Re: [PATCH 0/6] Enable parallel page migration

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zi Yan <zi.yan@cs.rutgers.edu>
To: Mel Gorman <mgorman@suse.de>
Cc: David Nellans <dnellans@nvidia.com>,
	Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mhocko@suse.com, vbabka@suse.cz, minchan@kernel.org,
	aneesh.kumar@linux.vnet.ibm.com, bsingharora@gmail.com,
	srikar@linux.vnet.ibm.com, haren@linux.vnet.ibm.com,
	jglisse@redhat.com, dave.hansen@intel.com,
	dan.j.williams@intel.com,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Subject: Re: [PATCH 0/6] Enable parallel page migration
Date: Thu, 9 Mar 2017 17:46:16 -0600	[thread overview]
Message-ID: <58C1E948.9020306@cs.rutgers.edu> (raw)
In-Reply-To: <20170309221522.hwk4wyaqx2jonru6@suse.de>

[-- Attachment #1: Type: text/plain, Size: 5035 bytes --]

Hi Mel,

Thanks for pointing out the problems in this patchset.

It was my intern project done in NVIDIA last summer. I only used
micro-benchmarks to demonstrate the big memory bandwidth utilization gap
between base page migration and THP migration along with serialized page
migration vs parallel page migration.

Here are cross-socket serialized page migration results from calling
move_pages() syscall:

In x86_64, a Intel two-socket E5-2640v3 box,
single 4KB base page migration takes 62.47 us, using 0.06 GB/s BW,
single 2MB THP migration takes 658.54 us, using 2.97 GB/s BW,
512 4KB base page migration takes 1987.38 us, using 0.98 GB/s BW.

In ppc64, a two-socket Power8 box,
single 64KB base page migration takes 49.3 us, using 1.24 GB/s BW,
single 16MB THP migration takes 2202.17 us, using 7.10 GB/s BW,
256 64KB base page migration takes 2543.65 us, using 6.14 GB/s BW.

THP migration is not slow at all when compared to a group of equivalent
base page migrations.

For 1-thread vs 8-thread THP migration:
In x86_64,
1-thread 2MB THP migration takes 658.54 us, using 2.97 GB/s BW,
8-thread 2MB THP migration takes 227.76 us, using 8.58 GB/s BW.

In ppc64,
1-thread 16MB THP migration takes 2202.17 us, using 7.10 GB/s BW,
8-thread 16MB THP migration takes 1223.87 us, using 12.77 GB/s BW.

This big increase on BW utilization is the motivation of pushing this
patchset.

> 
> So the key potential issue here in my mind is that THP migration is too slow
> in some cases. What I object to is improving that using a high priority
> workqueue that potentially starves other CPUs and pollutes their cache
> which is generally very expensive.

I might not completely agree with this. Using a high priority workqueue
can guarantee page migration work is done ASAP. Otherwise, we completely
lose the speedup brought by parallel page migration, if data copy
threads have to wait.

I understand your concern on CPU utilization impact. I think checking
CPU utilization and only using idle CPUs could potentially avoid this
problem.

> 
> Lets look at the core of what copy_huge_page does in mm/migrate.c which
> is the function that gets parallelised by the series in question. For
> a !HIGHMEM system, it's woefully inefficient. Historically, it was an
> implementation that would work generically which was fine but maybe not
> for future systems. It was also fine back when hugetlbfs was the only huge
> page implementation and COW operations were incredibly rare on the grounds
> due to the risk that they could terminate the process with prejudice.
> 
> The function takes a huge page, splits it into PAGE_SIZE chunks, kmap_atomics
> the source and destination for each PAGE_SIZE chunk and copies it. The
> parallelised version does one kmap and copies it in chunks assuming the
> THP is fully mapped and accessible. Fundamentally, this is broken in the
> generic sense as the kmap is not guaranteed to make the whole page necessary
> but it happens to work on !highmem systems.  What is more important to
> note is that it's multiple preempt and pagefault enables and disables
> on a per-page basis that happens 512 times (for THP on x86-64 at least),
> all of which are expensive operations depending on the kernel config and
> I suspect that the parallisation is actually masking that stupid overhead.

You are right on kmap, I think making this patchset depend on !HIGHMEM
can avoid the problem. It might not make sense to kmap potentially 512
base pages to migrate a THP in a system with highmem.

> 
> At the very least, I would have expected an initial attempt of one patch that
> optimised for !highmem systems to ignore kmap, simply disable preempt (if
> that is even necessary, I didn't check) and copy a pinned physical->physical
> page as a single copy without looping on a PAGE_SIZE basis and see how
> much that gained. Do it initially for THP only and worry about gigantic
> pages when or if that is a problem.

I can try this out to show how much improvement we can obtain from
existing THP migration, which is shown in the data above.

> 
> That would be patch 1 of a series.  Maybe that'll be enough, maybe not but
> I feel it's important to optimise the serialised case as much as possible
> before considering parallelisation to highlight and justify why it's
> necessary[1]. If nothing else, what if two CPUs both parallelise a migration
> at the same time and end up preempting each other? Between that and the
> workqueue setup, it's potentially much slower than an optimised serial copy.
> 
> It would be tempting to experiment but the test case was not even included
> with the series (maybe it's somewhere else)[2]. While it's obvious how
> such a test case could be constructed, it feels unnecessary to construct
> it when it should be in the changelog.

Do you mean performing multiple parallel page migrations at the same
time and show all the page migration time?

-- 
Best Regards,
Yan Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 537 bytes --]

next prev parent reply	other threads:[~2017-03-09 23:46 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-17 11:24 Anshuman Khandual
2017-02-17 11:24 ` [PATCH 1/6] mm/migrate: Add new mode parameter to migrate_page_copy() function Anshuman Khandual
2017-03-09  6:24   ` Anshuman Khandual
2017-02-17 11:24 ` [PATCH 2/6] mm/migrate: Make migrate_mode types non-exclusive Anshuman Khandual
2017-02-17 11:24 ` [PATCH 3/6] mm/migrate: Add copy_pages_mthread function Anshuman Khandual
2017-02-17 12:27   ` kbuild test robot
2017-03-08 15:40     ` Anshuman Khandual
2017-03-09  6:25   ` Anshuman Khandual
2017-02-17 11:24 ` [PATCH 4/6] mm/migrate: Add new migrate mode MIGRATE_MT Anshuman Khandual
2017-02-17 11:24 ` [PATCH 5/6] mm/migrate: Add new migration flag MPOL_MF_MOVE_MT for syscalls Anshuman Khandual
2017-03-09  6:26   ` Anshuman Khandual
2017-02-17 11:24 ` [PATCH 6/6] sysctl: Add global tunable mt_page_copy Anshuman Khandual
2017-02-17 15:30   ` kbuild test robot
2017-03-08 15:37     ` Anshuman Khandual
2017-03-10  1:12       ` [kbuild-all] " Ye Xiaolong
2017-03-10 12:11         ` Anshuman Khandual
2017-02-22  5:04 ` [PATCH 0/6] Enable parallel page migration Balbir Singh
2017-02-22  5:55   ` Anshuman Khandual
2017-02-22 10:52     ` Balbir Singh
2017-03-08 16:04 ` Anshuman Khandual
2017-03-09 15:09   ` Mel Gorman
2017-03-09 17:38     ` David Nellans
2017-03-09 22:15       ` Mel Gorman
2017-03-09 23:46         ` Zi Yan [this message]
2017-03-10 14:07           ` Mel Gorman
2017-03-10 14:45             ` Michal Hocko
2017-03-10 13:05     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58C1E948.9020306@cs.rutgers.edu \
    --to=zi.yan@cs.rutgers.edu \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=bsingharora@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dnellans@nvidia.com \
    --cc=haren@linux.vnet.ibm.com \
    --cc=jglisse@redhat.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox