From: Pete Wyckoff <pw@osc.edu>
To: Christoph Lameter <clameter@sgi.com>
Cc: Christian Bell <christian.bell@qlogic.com>,
Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <andrea@qumranet.com>,
a.p.zijlstra@chello.nl, izike@qumranet.com,
Roland Dreier <rdreier@cisco.com>,
steiner@sgi.com, linux-kernel@vger.kernel.org, avi@qumranet.com,
linux-mm@kvack.org, daniel.blueman@quadrics.com,
Robin Holt <holt@sgi.com>,
general@lists.openfabrics.org,
Andrew Morton <akpm@linux-foundation.org>,
kvm-devel@lists.sourceforge.net
Subject: Re: [ofa-general] Re: Demand paging for memory regions
Date: Wed, 13 Feb 2008 18:23:08 -0500 [thread overview]
Message-ID: <20080213232308.GB7597@osc.edu> (raw)
In-Reply-To: <20080213040905.GQ29340@mv.qlogic.com>
christian.bell@qlogic.com wrote on Tue, 12 Feb 2008 20:09 -0800:
> One other area that has not been brought up yet (I think) is the
> applicability of notifiers in letting users know when pinned memory
> is reclaimed by the kernel. This is useful when a lower-level
> library employs lazy deregistration strategies on memory regions that
> are subsequently released to the kernel via the application's use of
> munmap or sbrk. Ohio Supercomputing Center has work in this area but
> a generalized approach in the kernel would certainly be welcome.
The whole need for memory registration is a giant pain. There is no
motivating application need for it---it is simply a hack around
virtual memory and the lack of full VM support in current hardware.
There are real hardware issues that interact poorly with virtual
memory, as discussed previously in this thread.
The way a messaging cycle goes in IB is:
register buf
post send from buf
wait for completion
deregister buf
This tends to get hidden via userspace software libraries into
a single call:
MPI_send(buf)
Now if you actually do the reg/dereg every time, things are very
slow. So userspace library writers came up with the idea of caching
registrations:
if buf is not registered:
register buf
post send from buf
wait for completion
The second time that the app happens to do a send from the same
buffer, it proceeds much faster. Spatial locality applies here, and
this caching is generally worth it. Some libraries have schemes to
limit the size of the registration cache too.
But there are plenty of ways to hurt yourself with such a scheme.
The first being a huge pool of unused but registered memory, as the
library doesn't know the app patterns, and it doesn't know the VM
pressure level in the kernel.
There are plenty of subtle ways that this breaks too. If the
registered buf is removed from the address space via munmap() or
sbrk() or other ways, the mapping and registration are gone, but the
library has no way of knowing that the app just did this. Sure the
physical page is still there and pinned, but the app cannot get at
it. Later if new address space arrives at the same virtual address
but a different physical page, the library will mistakenly think it
already has it registered properly, and data is transferred from
this old now-unmapped physical page.
The whole situation is rather ridiculuous, but we are quite stuck
with it for current generation IB and iWarp hardware. If we can't
have the kernel interact with the device directly, we could at least
manage state in these multiple userspace registration caches. The
VM could ask for certain (or any) pages to be released, and the
library would respond if they are indeed not in use by the device.
The app itself does not know about pinned regions, and the library
is aware of exactly which regions are potentially in use.
Since the great majority of userspace messaging over IB goes through
middleware like MPI or PGAS languages, and they all have the same
approach to registration caching, this approach could fix the
problem for a big segment of use cases.
More text on the registration caching problem is here:
http://www.osc.edu/~pw/papers/wyckoff-memreg-ccgrid05.pdf
with an approach using vm_ops open and close operations in a kernel
module here:
http://www.osc.edu/~pw/dreg/
There is a place for VM notifiers in RDMA messaging, but not in
talking to devices, at least not the current set. If you can define
a reasonable userspace interface for VM notifiers, libraries can
manage registration caches more efficiently, letting the kernel
unmap pinned pages as it likes.
-- Pete
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-02-13 23:23 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-08 22:06 [patch 0/6] MMU Notifiers V6 Christoph Lameter
2008-02-08 22:06 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
2008-02-08 22:06 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-02-08 22:06 ` [patch 3/6] mmu_notifier: invalidate_page callbacks Christoph Lameter
2008-02-08 22:06 ` [patch 4/6] mmu_notifier: Skeleton driver for a simple mmu_notifier Christoph Lameter
2008-02-08 22:06 ` [patch 5/6] mmu_notifier: Support for drivers with revers maps (f.e. for XPmem) Christoph Lameter
2008-02-08 22:06 ` [patch 6/6] mmu_rmap_notifier: Skeleton for complex driver that uses its own rmaps Christoph Lameter
2008-02-08 22:23 ` [patch 0/6] MMU Notifiers V6 Andrew Morton
2008-02-08 23:32 ` Christoph Lameter
2008-02-08 23:36 ` Robin Holt
2008-02-08 23:41 ` Christoph Lameter
2008-02-08 23:43 ` Robin Holt
2008-02-08 23:56 ` Andrew Morton
2008-02-09 0:05 ` Christoph Lameter
2008-02-09 0:12 ` [ofa-general] " Roland Dreier
2008-02-09 0:16 ` Christoph Lameter
2008-02-09 0:21 ` [ofa-general] trying to get of all lists R S
2008-02-09 0:22 ` [ofa-general] Re: [patch 0/6] MMU Notifiers V6 Roland Dreier
2008-02-09 0:36 ` Christoph Lameter
2008-02-09 1:24 ` Andrea Arcangeli
2008-02-09 1:27 ` Christoph Lameter
2008-02-09 1:56 ` Andrea Arcangeli
2008-02-09 2:16 ` Christoph Lameter
2008-02-09 12:55 ` Rik van Riel
2008-02-09 21:46 ` Christoph Lameter
2008-02-11 22:40 ` Demand paging for memory regions (was Re: MMU Notifiers V6) Roland Dreier
2008-02-12 22:01 ` Steve Wise
2008-02-12 22:10 ` Christoph Lameter
2008-02-12 22:41 ` [ofa-general] Re: Demand paging for memory regions Roland Dreier
2008-02-12 23:14 ` Felix Marti
2008-02-13 0:57 ` Christoph Lameter
2008-02-14 15:09 ` Steve Wise
2008-02-14 15:53 ` Robin Holt
2008-02-14 16:23 ` Steve Wise
2008-02-14 17:48 ` Caitlin Bestler
2008-02-14 20:47 ` David Singleton
2008-02-15 9:55 ` Robin Holt
2008-02-14 19:39 ` Christoph Lameter
2008-02-14 20:17 ` Caitlin Bestler
2008-02-14 20:20 ` Christoph Lameter
2008-02-14 22:43 ` Caitlin Bestler
2008-02-14 22:48 ` Christoph Lameter
2008-02-15 1:26 ` Caitlin Bestler
2008-02-15 2:37 ` Christoph Lameter
2008-02-15 18:09 ` Caitlin Bestler
2008-02-15 18:45 ` Christoph Lameter
2008-02-15 18:53 ` Caitlin Bestler
2008-02-15 20:02 ` Christoph Lameter
2008-02-15 20:14 ` Caitlin Bestler
2008-02-15 22:50 ` Christoph Lameter
2008-02-15 23:50 ` Caitlin Bestler
2008-02-12 23:23 ` Jason Gunthorpe
2008-02-13 1:01 ` Christoph Lameter
2008-02-13 1:26 ` Jason Gunthorpe
2008-02-13 1:45 ` Steve Wise
2008-02-13 2:35 ` Christoph Lameter
2008-02-13 3:25 ` Jason Gunthorpe
2008-02-13 3:56 ` Patrick Geoffray
2008-02-13 4:26 ` Jason Gunthorpe
2008-02-13 4:47 ` Patrick Geoffray
2008-02-13 18:51 ` Christoph Lameter
2008-02-13 19:51 ` Jason Gunthorpe
2008-02-13 20:36 ` Christoph Lameter
2008-02-13 4:09 ` Christian Bell
2008-02-13 19:00 ` Christoph Lameter
2008-02-13 19:46 ` Christian Bell
2008-02-13 20:32 ` Christoph Lameter
2008-02-13 22:44 ` Kanoj Sarcar
2008-02-13 23:02 ` Christoph Lameter
2008-02-13 23:43 ` Kanoj Sarcar
2008-02-13 23:48 ` Jesse Barnes
2008-02-14 0:56 ` [ofa-general] " Andrea Arcangeli
2008-02-14 19:35 ` Christoph Lameter
2008-02-13 23:23 ` Pete Wyckoff [this message]
2008-02-14 0:01 ` Jason Gunthorpe
2008-02-27 22:11 ` Christoph Lameter
2008-02-13 1:55 ` Christian Bell
2008-02-13 2:19 ` Christoph Lameter
2008-02-13 0:56 ` Christoph Lameter
2008-02-13 12:11 ` Christoph Raisch
2008-02-13 19:02 ` Christoph Lameter
2008-02-09 0:12 ` [patch 0/6] MMU Notifiers V6 Andrew Morton
2008-02-09 0:18 ` Christoph Lameter
2008-02-13 14:31 ` Jack Steiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080213232308.GB7597@osc.edu \
--to=pw@osc.edu \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andrea@qumranet.com \
--cc=avi@qumranet.com \
--cc=christian.bell@qlogic.com \
--cc=clameter@sgi.com \
--cc=daniel.blueman@quadrics.com \
--cc=general@lists.openfabrics.org \
--cc=holt@sgi.com \
--cc=izike@qumranet.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rdreier@cisco.com \
--cc=riel@redhat.com \
--cc=steiner@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox