linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <andrea@qumranet.com>
To: Kanoj Sarcar <kanojsarcar@yahoo.com>
Cc: Christoph Lameter <clameter@sgi.com>,
	Christian Bell <christian.bell@qlogic.com>,
	Jason Gunthorpe <jgunthorpe@obsidianresearch.com>,
	Rik van Riel <riel@redhat.com>,
	a.p.zijlstra@chello.nl, izike@qumranet.com,
	Roland Dreier <rdreier@cisco.com>,
	steiner@sgi.com, linux-kernel@vger.kernel.org, avi@qumranet.com,
	linux-mm@kvack.org, daniel.blueman@quadrics.com,
	Robin Holt <holt@sgi.com>,
	general@lists.openfabrics.org,
	Andrew Morton <akpm@linux-foundation.org>,
	kvm-devel@lists.sourceforge.net
Subject: Re: [ofa-general] Re: Demand paging for memory regions
Date: Thu, 14 Feb 2008 01:56:54 +0100	[thread overview]
Message-ID: <20080214005653.GE14146@v2.random> (raw)
In-Reply-To: <866658.37093.qm@web32510.mail.mud.yahoo.com>

Hi Kanoj,

On Wed, Feb 13, 2008 at 03:43:17PM -0800, Kanoj Sarcar wrote:
> Oh ok, yes, I did see the discussion on this; sorry I
> missed it. I do see what notifiers bring to the table
> now (without endorsing it :-)).

I'm not really livelocks are really the big issue here.

I'm running N 1G VM on a 1G ram system, with N-1G swapped
out. Combining this with auto-ballooning, rss limiting, and ksm ram
sharing, provides really advanced and lowlevel virtualization VM
capabilities to the linux kernel while at the same time guaranteeing
no oom failures as long as the guest pages are lower than ram+swap
(just slower runtime if too many pages are unshared or if the balloons
are deflated etc..).

Swapping the virtual machine in the host may be more efficient than
having the guest swapping over a virtual swap paravirt storage for
example. As more management features are added admins will gain more
experience in handling those new features and they'll find what's best
for them. mmu notifiers and real reliable swapping are the enabler for
those more advanced VM features.

oom livelocks wouldn't happen anyway with KVM as long as the maximimal
number of guest physical is lower than RAM.

> An orthogonal question is this: is IB/rdma the only
> "culprit" that elevates page refcounts? Are there no
> other subsystems which do a similar thing?
> 
> The example I am thinking about is rawio (Oracle's
> mlock'ed SHM regions are handed to rawio, isn't it?).
> My understanding of how rawio works in Linux is quite
> dated though ...

rawio in flight I/O shall be limited. As long as each task can't pin
more than X ram, and the ram is released when the task is oom killed,
and the first get_user_pages/alloc_pages/slab_alloc that returns
-ENOMEM takes an oom fail path that returns failure to userland,
everything is ok.

Even with IB deadlock could only happen if IB would allow unlimited
memory to be pinned down by unprivileged users.

If IB is insecure and DoSable without mmu notifiers, then I'm not sure
how enabling swapping of the IB memory could be enough to fix the
DoS. Keep in mind that even tmpfs can't be safe allowing all ram+swap
to be allocated in a tmpfs file (despite the tmpfs file storage
includes swap and not only ram). Pinning the whole ram+swap with tmpfs
livelocks the same way of pinning the whole ram with ramfs. So if you
add mmu notifier support to IB, you only need to RDMA an area as large
as ram+swap to livelock again as before... no difference at all.

I don't think livelocks have anything to do with mmu notifiers (other
than to deferring the livelock to the "swap+ram" point of no return
instead of the current "ram" point of no return). Livelocks have to be
solved the usual way: handling alloc_pages/get_user_pages/slab
allocation failures with a fail path that returns to userland and
allows the ram to be released if the task was selected for
oom-killage.

The real benefit of the mmu notifiers for IB would be to allow the
rdma region to be larger than RAM without triggering the oom
killer (or without triggering a livelock if it's DoSable but then the
livelock would need fixing to be converted in a regular oom-killing by
some other mean not related to the mmu-notifier, it's really an
orthogonal problem).

So suppose you've a MPI simulation that requires a 10G array and
you've only 1G of ram, then you can rdma over 10G like if you had 10G
of ram. Things will preform ok only if there's some huge locality of
the computations. For virtualization it's orders of magnitude more
useful than for computer clusters but certain simulations really swaps
so I don't exclude certain RDMA apps will also need this (dunno about
IB).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2008-02-14  0:56 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-08 22:06 [patch 0/6] MMU Notifiers V6 Christoph Lameter
2008-02-08 22:06 ` [patch 1/6] mmu_notifier: Core code Christoph Lameter
2008-02-08 22:06 ` [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges Christoph Lameter
2008-02-08 22:06 ` [patch 3/6] mmu_notifier: invalidate_page callbacks Christoph Lameter
2008-02-08 22:06 ` [patch 4/6] mmu_notifier: Skeleton driver for a simple mmu_notifier Christoph Lameter
2008-02-08 22:06 ` [patch 5/6] mmu_notifier: Support for drivers with revers maps (f.e. for XPmem) Christoph Lameter
2008-02-08 22:06 ` [patch 6/6] mmu_rmap_notifier: Skeleton for complex driver that uses its own rmaps Christoph Lameter
2008-02-08 22:23 ` [patch 0/6] MMU Notifiers V6 Andrew Morton
2008-02-08 23:32   ` Christoph Lameter
2008-02-08 23:36     ` Robin Holt
2008-02-08 23:41       ` Christoph Lameter
2008-02-08 23:43         ` Robin Holt
2008-02-08 23:56           ` Andrew Morton
2008-02-09  0:05             ` Christoph Lameter
2008-02-09  0:12               ` [ofa-general] " Roland Dreier
2008-02-09  0:16                 ` Christoph Lameter
2008-02-09  0:21                   ` [ofa-general] trying to get of all lists R S
2008-02-09  0:22                   ` [ofa-general] Re: [patch 0/6] MMU Notifiers V6 Roland Dreier
2008-02-09  0:36                     ` Christoph Lameter
2008-02-09  1:24                       ` Andrea Arcangeli
2008-02-09  1:27                         ` Christoph Lameter
2008-02-09  1:56                           ` Andrea Arcangeli
2008-02-09  2:16                             ` Christoph Lameter
2008-02-09 12:55                               ` Rik van Riel
2008-02-09 21:46                                 ` Christoph Lameter
2008-02-11 22:40                                   ` Demand paging for memory regions (was Re: MMU Notifiers V6) Roland Dreier
2008-02-12 22:01                                     ` Steve Wise
2008-02-12 22:10                                       ` Christoph Lameter
2008-02-12 22:41                                         ` [ofa-general] Re: Demand paging for memory regions Roland Dreier
2008-02-12 23:14                                           ` Felix Marti
2008-02-13  0:57                                             ` Christoph Lameter
2008-02-14 15:09                                             ` Steve Wise
2008-02-14 15:53                                               ` Robin Holt
2008-02-14 16:23                                                 ` Steve Wise
2008-02-14 17:48                                                   ` Caitlin Bestler
2008-02-14 20:47                                                     ` David Singleton
2008-02-15  9:55                                                       ` Robin Holt
2008-02-14 19:39                                               ` Christoph Lameter
2008-02-14 20:17                                                 ` Caitlin Bestler
2008-02-14 20:20                                                   ` Christoph Lameter
2008-02-14 22:43                                                     ` Caitlin Bestler
2008-02-14 22:48                                                       ` Christoph Lameter
2008-02-15  1:26                                                         ` Caitlin Bestler
2008-02-15  2:37                                                           ` Christoph Lameter
2008-02-15 18:09                                                             ` Caitlin Bestler
2008-02-15 18:45                                                               ` Christoph Lameter
2008-02-15 18:53                                                                 ` Caitlin Bestler
2008-02-15 20:02                                                                   ` Christoph Lameter
2008-02-15 20:14                                                                     ` Caitlin Bestler
2008-02-15 22:50                                                                       ` Christoph Lameter
2008-02-15 23:50                                                                         ` Caitlin Bestler
2008-02-12 23:23                                           ` Jason Gunthorpe
2008-02-13  1:01                                             ` Christoph Lameter
2008-02-13  1:26                                               ` Jason Gunthorpe
2008-02-13  1:45                                                 ` Steve Wise
2008-02-13  2:35                                                 ` Christoph Lameter
2008-02-13  3:25                                                   ` Jason Gunthorpe
2008-02-13  3:56                                                     ` Patrick Geoffray
2008-02-13  4:26                                                       ` Jason Gunthorpe
2008-02-13  4:47                                                         ` Patrick Geoffray
2008-02-13 18:51                                                     ` Christoph Lameter
2008-02-13 19:51                                                       ` Jason Gunthorpe
2008-02-13 20:36                                                         ` Christoph Lameter
2008-02-13  4:09                                                   ` Christian Bell
2008-02-13 19:00                                                     ` Christoph Lameter
2008-02-13 19:46                                                       ` Christian Bell
2008-02-13 20:32                                                         ` Christoph Lameter
2008-02-13 22:44                                                           ` Kanoj Sarcar
2008-02-13 23:02                                                             ` Christoph Lameter
2008-02-13 23:43                                                               ` Kanoj Sarcar
2008-02-13 23:48                                                                 ` Jesse Barnes
2008-02-14  0:56                                                                 ` Andrea Arcangeli [this message]
2008-02-14 19:35                                                                 ` [ofa-general] " Christoph Lameter
2008-02-13 23:23                                                     ` Pete Wyckoff
2008-02-14  0:01                                                       ` Jason Gunthorpe
2008-02-27 22:11                                                         ` Christoph Lameter
2008-02-13  1:55                                               ` Christian Bell
2008-02-13  2:19                                                 ` Christoph Lameter
2008-02-13  0:56                                           ` Christoph Lameter
2008-02-13 12:11                                           ` Christoph Raisch
2008-02-13 19:02                                             ` Christoph Lameter
2008-02-09  0:12               ` [patch 0/6] MMU Notifiers V6 Andrew Morton
2008-02-09  0:18                 ` Christoph Lameter
2008-02-13 14:31 ` Jack Steiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080214005653.GE14146@v2.random \
    --to=andrea@qumranet.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=avi@qumranet.com \
    --cc=christian.bell@qlogic.com \
    --cc=clameter@sgi.com \
    --cc=daniel.blueman@quadrics.com \
    --cc=general@lists.openfabrics.org \
    --cc=holt@sgi.com \
    --cc=izike@qumranet.com \
    --cc=jgunthorpe@obsidianresearch.com \
    --cc=kanojsarcar@yahoo.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rdreier@cisco.com \
    --cc=riel@redhat.com \
    --cc=steiner@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox