Re: [RFC PATCH 0/4] (Take 2): transcendent memory ("tmem") for Linux

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Anthony Liguori <anthony@codemonkey.ws>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Rik van Riel <riel@redhat.com>,
	linux-kernel@vger.kernel.org, npiggin@suse.de, akpm@osdl.org,
	jeremy@goop.org, xen-devel@lists.xensource.com,
	tmem-devel@oss.oracle.com, alan@lxorguk.ukuu.org.uk,
	linux-mm@kvack.org, kurt.hackel@oracle.com,
	Rusty Russell <rusty@rustcorp.com.au>,
	dave.mccracken@oracle.com, Marcelo Tosatti <mtosatti@redhat.com>,
	sunil.mushran@oracle.com, Avi Kivity <avi@redhat.com>,
	Schwidefsky <schwidefsky@de.ibm.com>,
	chris.mason@oracle.com, Balbir Singh <balbir@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH 0/4] (Take 2): transcendent memory ("tmem") for Linux
Date: Thu, 09 Jul 2009 18:33:15 -0500	[thread overview]
Message-ID: <4A567E3B.90609@codemonkey.ws> (raw)
In-Reply-To: <7cb22078-f200-45e3-a265-10cce2ae8224@default>

Dan Magenheimer wrote:
> But this means that either the content of that page must have been
> preserved somewhere or the discard fault handler has sufficient
> information to go back and get the content from the source (e.g.
> the filesystem).  Or am I misunderstanding?
>   

As Rik said, it's the later.

> With tmem, the equivalent of the "failure to access a discarded page"
> is inline and synchronous, so if the tmem access "fails", the
> normal code immediately executes.
>   

Yup.  This is the main difference AFAICT.  It's really just API 
semantics within Linux.

You could clearly use the volatile state of CMM2 to implement tmem as an 
API in Linux.  The get/put functions would set a flag such that if the 
discard handler was invoked as long as that operation happened, the 
operation could safely fail.  That's why I claimed tmem is a subset of CMM2.

> I suppose changing Linux to utilize the two tmem services
> as described above is a semantic change.  But to me it
> seems no more of a semantic change than requiring a new
> special page fault handler because a page of memory might
> disappear behind the OS's back.
>
> But IMHO this is a corollary of the fundamental difference.  CMM2's
> is more the "VMware" approach which is that OS's should never have
> to be modified to run in a virtual environment.  (Oh, but maybe
> modified just slightly to make the hypervisor a little less
> clueless about the OS's resource utilization.)

While I always enjoy a good holy war, I'd like to avoid one here because 
I want to stay on the topic at hand.

If there was one change to tmem that would make it more palatable, for 
me it would be changing the way pools are "allocated".  Instead of 
getting an opaque handle from the hypervisor, I would force the guest to 
allocate it's own memory and to tell the hypervisor that it's a tmem 
pool.  You could then introduce semantics about whether the guest was 
allowed to directly manipulate the memory as long as it was in the 
pool.  It would be required to access the memory via get/put functions 
that under Xen, would end up being a hypercall and a copy.  Presumably 
you would do some tricks with ballooning to allocate empty memory in Xen 
and then use those addresses as tmem pools.  On KVM, we could do 
something more clever.

The big advantage of keeping the tmem pool part of the normal set of 
guest memory is that you don't introduce new challenges with respect to 
memory accounting.  Whether or not tmem is directly accessible from the 
guest, it is another memory resource.  I'm certain that you'll want to 
do accounting of how much tmem is being consumed by each guest, and I 
strongly suspect that you'll want to do tmem accounting on a per-process 
basis.  I also suspect that doing tmem limiting for things like cgroups 
would be desirable.

That all points to making tmem normal memory so that all that 
infrastructure can be reused.  I'm not sure how well this maps to Xen 
guests, but it works out fine when the VMM is capable of presenting 
memory to the guest without actually allocating it (via overcommit).

Regards,

Anthony Liguori

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-07-09 23:13 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-07 16:17 Dan Magenheimer
2009-07-07 17:28 ` Rik van Riel
2009-07-07 19:53   ` Dan Magenheimer
2009-07-08 22:56 ` Anthony Liguori
2009-07-08 23:31   ` [Xen-devel] " Dan Magenheimer
2009-07-08 23:57     ` Anthony Liguori
2009-07-09  0:17       ` Jeremy Fitzhardinge
2009-07-09  0:27         ` Anthony Liguori
2009-07-09  1:20   ` Rik van Riel
2009-07-09 21:09     ` Dan Magenheimer
2009-07-09 21:27       ` Rik van Riel
2009-07-09 21:48         ` Dan Magenheimer
2009-07-09 21:41       ` Anthony Liguori
2009-07-09 22:34         ` Dan Magenheimer
2009-07-09 22:45           ` Rik van Riel
2009-07-09 23:33           ` Anthony Liguori [this message]
2009-07-10 15:23             ` Dan Magenheimer
2009-07-12  9:20               ` Avi Kivity
2009-07-12 16:28                 ` Dan Magenheimer
2009-07-12 17:27                   ` Avi Kivity
2009-07-12 20:59                     ` Dan Magenheimer
2009-07-12 13:28               ` Anthony Liguori
2009-07-12 16:20                 ` Dan Magenheimer
2009-07-12 17:16                   ` Avi Kivity
2009-07-12 19:34                     ` Anthony Liguori
2009-07-13 20:17                       ` Chris Mason
2009-07-13 20:38                         ` Anthony Liguori
2009-07-13 21:01                           ` Chris Mason
2009-07-13 21:17                             ` Anthony Liguori
2009-07-26 15:00                               ` Avi Kivity
2009-07-13 20:38                         ` Anthony Liguori
2009-07-12 20:39                     ` [Xen-devel] " Dan Magenheimer
2009-07-12 20:43                       ` Avi Kivity
2009-07-12 21:08                         ` Dan Magenheimer
2009-07-13 11:33                           ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A567E3B.90609@codemonkey.ws \
    --to=anthony@codemonkey.ws \
    --cc=akpm@osdl.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=avi@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=chris.mason@oracle.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=dave.mccracken@oracle.com \
    --cc=jeremy@goop.org \
    --cc=kurt.hackel@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=schwidefsky@de.ibm.com \
    --cc=sunil.mushran@oracle.com \
    --cc=tmem-devel@oss.oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox