From: Kanoj Sarcar <kanojsarcar@yahoo.com>
To: Christoph Lameter <clameter@sgi.com>
Cc: Christian Bell <christian.bell@qlogic.com>,
Jason Gunthorpe <jgunthorpe@obsidianresearch.com>,
Rik van Riel <riel@redhat.com>,
Andrea Arcangeli <andrea@qumranet.com>,
a.p.zijlstra@chello.nl, izike@qumranet.com,
Roland Dreier <rdreier@cisco.com>,
steiner@sgi.com, linux-kernel@vger.kernel.org, avi@qumranet.com,
linux-mm@kvack.org, daniel.blueman@quadrics.com,
Robin Holt <holt@sgi.com>,
general@lists.openfabrics.org,
Andrew Morton <akpm@linux-foundation.org>,
kvm-devel@lists.sourceforge.net
Subject: Re: [ofa-general] Re: Demand paging for memory regions
Date: Wed, 13 Feb 2008 15:43:17 -0800 (PST) [thread overview]
Message-ID: <866658.37093.qm@web32510.mail.mud.yahoo.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0802131452410.22542@schroedinger.engr.sgi.com>
--- Christoph Lameter <clameter@sgi.com> wrote:
> On Wed, 13 Feb 2008, Kanoj Sarcar wrote:
>
> > It seems that the need is to solve potential
> memory
> > shortage and overcommit issues by being able to
> > reclaim pages pinned by rdma driver/hardware. Is
> my
> > understanding correct?
>
> Correct.
>
> > If I do understand correctly, then why is rdma
> page
> > pinning any different than eg mlock pinning? I
> imagine
> > Oracle pins lots of memory (using mlock), how come
> > they do not run into vm overcommit issues?
>
> Mlocked pages are not pinned. They are movable by
> f.e. page migration and
> will be potentially be moved by future memory defrag
> approaches. Currently
> we have the same issues with mlocked pages as with
> pinned pages. There is
> work in progress to put mlocked pages onto a
> different lru so that reclaim
> exempts these pages and more work on limiting the
> percentage of memory
> that can be mlocked.
>
> > Are we up against some kind of breaking c-o-w
> issue
> > here that is different between mlock and rdma
> pinning?
>
> Not that I know.
>
> > Asked another way, why should effort be spent on a
> > notifier scheme, and rather not on fixing any
> memory
> > accounting problems and unifying how pin pages are
> > accounted for that get pinned via mlock() or rdma
> > drivers?
>
> There are efforts underway to account for and limit
> mlocked pages as
> described above. Page pinning the way it is done by
> Infiniband through
> increasing the page refcount is treated by the VM as
> a temporary
> condition not as a permanent pin. The VM will
> continually try to reclaim
> these pages thinking that the temporary usage of the
> page must cease
> soon. This is why the use of large amounts of pinned
> pages can lead to
> livelock situations.
Oh ok, yes, I did see the discussion on this; sorry I
missed it. I do see what notifiers bring to the table
now (without endorsing it :-)).
An orthogonal question is this: is IB/rdma the only
"culprit" that elevates page refcounts? Are there no
other subsystems which do a similar thing?
The example I am thinking about is rawio (Oracle's
mlock'ed SHM regions are handed to rawio, isn't it?).
My understanding of how rawio works in Linux is quite
dated though ...
Kanoj
>
> If we want to have pinning behavior then we could
> mark pinned pages
> specially so that the VM will not continually try to
> evict these pages. We
> could manage them similar to mlocked pages but just
> not allow page
> migration, memory unplug and defrag to occur on
> pinned memory. All of
> theses would have to fail. With the notifier scheme
> the device driver
> could be told to get rid of the pinned memory. This
> would make these 3
> techniques work despite having an RDMA memory
> section.
>
> > Startup benefits are well understood with the
> notifier
> > scheme (ie, not all pages need to be faulted in at
> > memory region creation time), specially when most
> of
> > the memory region is not accessed at all. I would
> > imagine most of HPC does not work this way though.
>
> No for optimal performance you would want to
> prefault all pages like
> it is now. The notifier scheme would only become
> relevant in memory
> shortage situations.
>
> > Then again, as rdma hardware is applied
> (increasingly?) towards apps
> > with short lived connections, the notifier scheme
> will help with startup
> > times.
>
> The main use of the notifier scheme is for stability
> and reliability. The
> "pinned" pages become unpinnable on request by the
> VM. So the VM can work
> itself out of memory shortage situations in
> cooperation with the
> RDMA logic instead of simply failing.
>
> --
> To unsubscribe, send a message with 'unsubscribe
> linux-mm' in
> the body to majordomo@kvack.org. For more info on
> Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org">
> email@kvack.org </a>
>
____________________________________________________________________________________
Looking for last minute shopping deals?
Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-02-13 23:43 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-08 22:06 [patch 0/6] MMU Notifiers V6 Christoph Lameter
2008-02-08 22:06 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
2008-02-08 22:06 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-02-08 22:06 ` [patch 3/6] mmu_notifier: invalidate_page callbacks Christoph Lameter
2008-02-08 22:06 ` [patch 4/6] mmu_notifier: Skeleton driver for a simple mmu_notifier Christoph Lameter
2008-02-08 22:06 ` [patch 5/6] mmu_notifier: Support for drivers with revers maps (f.e. for XPmem) Christoph Lameter
2008-02-08 22:06 ` [patch 6/6] mmu_rmap_notifier: Skeleton for complex driver that uses its own rmaps Christoph Lameter
2008-02-08 22:23 ` [patch 0/6] MMU Notifiers V6 Andrew Morton
2008-02-08 23:32 ` Christoph Lameter
2008-02-08 23:36 ` Robin Holt
2008-02-08 23:41 ` Christoph Lameter
2008-02-08 23:43 ` Robin Holt
2008-02-08 23:56 ` Andrew Morton
2008-02-09 0:05 ` Christoph Lameter
2008-02-09 0:12 ` [ofa-general] " Roland Dreier
2008-02-09 0:16 ` Christoph Lameter
2008-02-09 0:21 ` [ofa-general] trying to get of all lists R S
2008-02-09 0:22 ` [ofa-general] Re: [patch 0/6] MMU Notifiers V6 Roland Dreier
2008-02-09 0:36 ` Christoph Lameter
2008-02-09 1:24 ` Andrea Arcangeli
2008-02-09 1:27 ` Christoph Lameter
2008-02-09 1:56 ` Andrea Arcangeli
2008-02-09 2:16 ` Christoph Lameter
2008-02-09 12:55 ` Rik van Riel
2008-02-09 21:46 ` Christoph Lameter
2008-02-11 22:40 ` Demand paging for memory regions (was Re: MMU Notifiers V6) Roland Dreier
2008-02-12 22:01 ` Steve Wise
2008-02-12 22:10 ` Christoph Lameter
2008-02-12 22:41 ` [ofa-general] Re: Demand paging for memory regions Roland Dreier
2008-02-12 23:14 ` Felix Marti
2008-02-13 0:57 ` Christoph Lameter
2008-02-14 15:09 ` Steve Wise
2008-02-14 15:53 ` Robin Holt
2008-02-14 16:23 ` Steve Wise
2008-02-14 17:48 ` Caitlin Bestler
2008-02-14 20:47 ` David Singleton
2008-02-15 9:55 ` Robin Holt
2008-02-14 19:39 ` Christoph Lameter
2008-02-14 20:17 ` Caitlin Bestler
2008-02-14 20:20 ` Christoph Lameter
2008-02-14 22:43 ` Caitlin Bestler
2008-02-14 22:48 ` Christoph Lameter
2008-02-15 1:26 ` Caitlin Bestler
2008-02-15 2:37 ` Christoph Lameter
2008-02-15 18:09 ` Caitlin Bestler
2008-02-15 18:45 ` Christoph Lameter
2008-02-15 18:53 ` Caitlin Bestler
2008-02-15 20:02 ` Christoph Lameter
2008-02-15 20:14 ` Caitlin Bestler
2008-02-15 22:50 ` Christoph Lameter
2008-02-15 23:50 ` Caitlin Bestler
2008-02-12 23:23 ` Jason Gunthorpe
2008-02-13 1:01 ` Christoph Lameter
2008-02-13 1:26 ` Jason Gunthorpe
2008-02-13 1:45 ` Steve Wise
2008-02-13 2:35 ` Christoph Lameter
2008-02-13 3:25 ` Jason Gunthorpe
2008-02-13 3:56 ` Patrick Geoffray
2008-02-13 4:26 ` Jason Gunthorpe
2008-02-13 4:47 ` Patrick Geoffray
2008-02-13 18:51 ` Christoph Lameter
2008-02-13 19:51 ` Jason Gunthorpe
2008-02-13 20:36 ` Christoph Lameter
2008-02-13 4:09 ` Christian Bell
2008-02-13 19:00 ` Christoph Lameter
2008-02-13 19:46 ` Christian Bell
2008-02-13 20:32 ` Christoph Lameter
2008-02-13 22:44 ` Kanoj Sarcar
2008-02-13 23:02 ` Christoph Lameter
2008-02-13 23:43 ` Kanoj Sarcar [this message]
2008-02-13 23:48 ` Jesse Barnes
2008-02-14 0:56 ` [ofa-general] " Andrea Arcangeli
2008-02-14 19:35 ` Christoph Lameter
2008-02-13 23:23 ` Pete Wyckoff
2008-02-14 0:01 ` Jason Gunthorpe
2008-02-27 22:11 ` Christoph Lameter
2008-02-13 1:55 ` Christian Bell
2008-02-13 2:19 ` Christoph Lameter
2008-02-13 0:56 ` Christoph Lameter
2008-02-13 12:11 ` Christoph Raisch
2008-02-13 19:02 ` Christoph Lameter
2008-02-09 0:12 ` [patch 0/6] MMU Notifiers V6 Andrew Morton
2008-02-09 0:18 ` Christoph Lameter
2008-02-13 14:31 ` Jack Steiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=866658.37093.qm@web32510.mail.mud.yahoo.com \
--to=kanojsarcar@yahoo.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andrea@qumranet.com \
--cc=avi@qumranet.com \
--cc=christian.bell@qlogic.com \
--cc=clameter@sgi.com \
--cc=daniel.blueman@quadrics.com \
--cc=general@lists.openfabrics.org \
--cc=holt@sgi.com \
--cc=izike@qumranet.com \
--cc=jgunthorpe@obsidianresearch.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rdreier@cisco.com \
--cc=riel@redhat.com \
--cc=steiner@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox