linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Magenheimer <dan.magenheimer@oracle.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-kernel@vger.kernel.org, xen-devel@lists.xensource.com,
	npiggin@suse.de, chris.mason@oracle.com, kurt.hackel@oracle.com,
	dave.mccracken@oracle.com, Avi Kivity <avi@redhat.com>,
	jeremy@goop.org, Rik van Riel <riel@redhat.com>,
	alan@lxorguk.ukuu.org.uk, Rusty Russell <rusty@rustcorp.com.au>,
	akpm@osdl.org, Marcelo Tosatti <mtosatti@redhat.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	tmem-devel@oss.oracle.com, sunil.mushran@oracle.com,
	linux-mm@kvack.org, Himanshu Raj <rhim@microsoft.com>
Subject: RE: [RFC] transcendent memory for Linux
Date: Mon, 22 Jun 2009 13:41:19 -0700 (PDT)	[thread overview]
Message-ID: <636843ec-b290-4ea9-b629-1d364f3b1112@default> (raw)
In-Reply-To: <20090622132702.6638d841@skybase>

> > Tmem has some similarity to IBM's Collaborative Memory Management,
> > but creates more of a partnership between the kernel and the
> > "privileged entity" and is not very invasive.  Tmem may be
> > applicable for KVM and containers; there is some disagreement on
> > the extent of its value. Tmem is highly complementary to ballooning
> > (aka page granularity hot plug) and memory deduplication (aka
> > transparent content-based page sharing) but still has value
> > when neither are present.

Hi Martin --

Thanks much for taking the time to reply!

> The basic idea seems to be that you reduce the amount of memory
> available to the guest and as a compensation give the guest some
> tmem, no?

That's mostly right.  Tmem's primary role is to help
with guests that have had their available memory reduced
(via ballooning or hotplug or some future mechanism).
However tmem additionally provides a way of providing otherwise
unused-by-the-hypervisor ("fallow") memory to a guest,
essentially expanding a guest kernel's page cache if
no other guest is using the RAM anyway.

And "as a compensation GIVE the guest some tmem" is misleading,
because tmem (at least ephemeral tmem) is never "given"
to a guest.  A better word might be "loaned" or "rented".
The guest gets to use some tmem for awhile but if it
doesn't use it effectively, the memory is "repossessed"
(or the guest is "evicted" from using that memory)
transparently so that it can be used more effectively
elsewhere.

> If that is the case then the effect of tmem is somewhat
> comparable to the volatile page cache pages.

There is definitely some similarity in that both are providing
useful information to the hypervisor.  In CMM's case, the
guest is passively providing info; in tmem's case it is
actively providing info and making use of the info within
the kernel, not just in the hypervsior, which is why I described it
as "more of a partnership".

> The big advantage of this approach is its simplicity, but there
> are down sides as well:
> 1) You need to copy the data between the tmem pool and the page
> cache. At least temporarily there are two copies of the same
> page around. That increases the total amount of used memory.

Certainly this is theoretically true, but I think the increase
is small and transient.  The kernel only puts the page into
precache when it has decided to use that page for another
purpose (due to memory pressure).  Until it actually
"reprovisions" the page, the data is briefly duplicated.

On the other hand, copying eliminates the need for fancy
games with virtual mappings and TLB entries.  Copying appears
to be getting much faster on recent CPUs; I'm not sure
if this is also true of TLB operations.

> 2) The guest has a smaller memory size. Either the memory is
> large enough for the working set size in which case tmem is
> ineffective...

Yes, if the kernel has memory to "waste" (e.g. never refaults and
never swaps), tmem is ineffective.  The goal of tmem is to optimize
memory usage across an environment where there is contention
among multiple users (guests) for a limited resource (RAM).
If your environment always has enough RAM for every guest
and there's never any contention, you don't want tmem... but
I'd assert you've wasted money in your data center by buying
too much RAM!

> or the working set does not fit which increases
> the memory pressure and the cpu cycles spent in the mm code.

True, this is where preswap is useful.  Without tmem/preswap,
"does not fit" means swap-to-disk or refaulting is required.
Preswap alleviates the memory pressure by using tmem to
essentially swap to "magic memory" and precache reduces the
need for refaulting.

> 3) There is an additional turning knob, the size of the tmem pool
> for the guest. I see the need for a clever algorithm to determine
> the size for the different tmem pools.

Yes, some policy in the hypervisor is still required, essentially
a "memory scheduler".  The working implementation (in Xen)
uses FIFO, but modified by admin-configurable "weight" values
to allow QoS and avoid DoS. 

> Overall I would say its worthwhile to investigate the performance
> impacts of the approach.

Thanks.  I'd appreciate any thoughts or experience you have
in this area (onlist or offlist) as I don't think there are
any adequate benchmarks that aren't either myopic for a complex
environment or contrived (and thus misleading) to prove an
isolated point.

I would also guess that tmem is more beneficial on recent
multi-core processors, and more costly on older chips.

Thanks again,
Dan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2009-06-22 20:40 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-19 23:53 Dan Magenheimer
2009-06-20  1:35 ` [RFC PATCH 0/4] transcendent memory ("tmem") " Dan Magenheimer
2009-06-20  1:35 ` [RFC PATCH 1/4] tmem: infrastructure for tmem layer Dan Magenheimer
2009-06-20  1:50   ` Rik van Riel
2009-06-20  1:35 ` [RFC PATCH 2/4] tmem: precache implementation (layered on tmem) Dan Magenheimer
2009-06-20  2:28   ` Rik van Riel
2009-06-20  1:36 ` [RFC PATCH 3/4] tmem: preswap " Dan Magenheimer
2009-06-20  1:36 ` [RFC PATCH 4/4] tmem: interface code for tmem on top of xen Dan Magenheimer
2009-06-22 11:27 ` [RFC] transcendent memory for Linux Martin Schwidefsky
2009-06-22 20:41   ` Dan Magenheimer [this message]
2009-06-22 14:31 ` Chris Friesen
2009-06-22 20:50   ` Dan Magenheimer
2009-06-24 15:04 ` Pavel Machek
2009-06-29 14:34   ` Dan Magenheimer
2009-06-29 20:36     ` Pavel Machek
2009-06-29 21:13       ` Dan Magenheimer
2009-06-29 21:23         ` Jeremy Fitzhardinge
2009-06-29 21:57           ` Dan Magenheimer
2009-06-29 22:15             ` Jeremy Fitzhardinge
2009-06-30 21:21               ` Dan Magenheimer
2009-06-30 22:46                 ` Jeremy Fitzhardinge
2009-07-01 23:02                   ` Dan Magenheimer
2009-07-01 23:31                     ` Jeremy Fitzhardinge
2009-07-02  6:38                     ` Pavel Machek
2009-07-02 14:03                       ` Dan Magenheimer
2009-06-27 13:18 ` Linus Walleij
2009-06-28  7:42   ` Avi Kivity
2009-06-29 14:44   ` Dan Magenheimer
2009-07-01  3:41     ` Roland Dreier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=636843ec-b290-4ea9-b629-1d364f3b1112@default \
    --to=dan.magenheimer@oracle.com \
    --cc=akpm@osdl.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=avi@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=chris.mason@oracle.com \
    --cc=dave.mccracken@oracle.com \
    --cc=jeremy@goop.org \
    --cc=kurt.hackel@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@suse.de \
    --cc=rhim@microsoft.com \
    --cc=riel@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=schwidefsky@de.ibm.com \
    --cc=sunil.mushran@oracle.com \
    --cc=tmem-devel@oss.oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox