From: Nitin Gupta <ngupta@vflare.org>
To: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Nick Piggin <npiggin@suse.de>,
Andrew Morton <akpm@linux-foundation.org>,
jeremy@goop.org, xen-devel@lists.xensource.com,
tmem-devel@oss.oracle.com, Rusty Russell <rusty@rustcorp.com.au>,
Rik van Riel <riel@redhat.com>,
dave.mccracken@oracle.com, Rusty@rcsinet15.oracle.com,
sunil.mushran@oracle.com, Avi Kivity <avi@redhat.com>,
Schwidefsky <schwidefsky@de.ibm.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
Alan Cox <alan@lxorguk.ukuu.org.uk>,
chris.mason@oracle.com, Pavel Machek <pavel@ucw.cz>,
linux-mm <linux-mm@kvack.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: Tmem [PATCH 0/5] (Take 3): Transcendent memory
Date: Mon, 21 Dec 2009 19:16:41 +0530 [thread overview]
Message-ID: <4B2F7C41.9020106@vflare.org> (raw)
Hi Dan,
(I'm not sure if gmane.org interface sends mail to everyone in CC list, so
sending again. Sorry if you are getting duplicate mail).
Dan Magenheimer <dan.magenheimer <at> oracle.com> writes:
>
> Tmem [PATCH 0/5] (Take 3): Transcendent memory
> Transcendent memory
<snip>
>
> Normal memory is directly addressable by the kernel, of a known
> normally-fixed size, synchronously accessible, and persistent (though
> not across a reboot).
>
> What if there was a class of memory that is of unknown and dynamically
> variable size, is addressable only indirectly by the kernel, can be
> configured either as persistent or as "ephemeral" (meaning it will be
> around for awhile, but might disappear without warning), and is still
> fast enough to be synchronously accessible?
>
I really like the idea of allocating cache memory from hypervisor directly. This
is much more flexible than assigning fixed size memory to guests.
>
> "Frontswap" is so named because it can be thought of as the opposite of
> a "backing store". Frontswap IS persistent, but for various reasons may not
> always be available for use, again due to factors that may not be visible to
> the kernel. (But, briefly, if the kernel is being "good" and has shared its
> resources nicely, then it will be able to use frontswap, else it will not.)
> Once a page is put, a get on the page will always succeed. So when the
> kernel finds itself in a situation where it needs to swap out a page, it
> first attempts to use frontswap. If the put works, a disk write and
> (usually) a disk read are avoided. If it doesn't, the page is written
> to swap as usual. Unlike cleancache, whether a page is stored in frontswap
> vs swap is recorded in kernel data structures, so when a page needs to
> be fetched, the kernel does a get if it is in frontswap and reads from
> swap if it is not in frontswap.
>
I think 'frontswap' part seriously overlaps the functionality provided by
'ramzswap' which is a virtual block device driver recently added to
drivers/staging/ramzswap/. This device acts as a swap disk which compresses and
stores pages in memory itself.
To provide frontswap functionality, ramzswap needs few changes only:
instead of:
compress --> alloc and store within guest.
do:
compress --> send out to hypervisor (tmem_put_page).
Also, ramzswap driver supports multiple /dev/ramzswap{0,1,2...} devices. Each of
these devices can have separate backing partition/file which is used to flush
out incompressible pages or when (per-device) memory limit is exceeded.
When used on native systems, it uses custom xvmalloc allocator which is
specially designed to handle these compressed pages.
We can use all this by just a minor change in ramzswap as mentioned above.
> "Cleancache" can be thought of as a page-granularity victim cache for clean
> pages that the kernel's pageframe replacement algorithm (PFRA) would like
> to keep around, but can't since there isn't enough memory. So when the
> PFRA "evicts" a page, it first puts it into the cleancache via a call to
> tmem. And any time a filesystem reads a page from disk, it first attempts
> to get the page from cleancache. If it's there, a disk access is eliminated.
> If not, the filesystem just goes to the disk like normal. Cleancache is
> "ephemeral" so whether a page is kept in cleancache (between the "put" and
> the "get") is dependent on a number of factors that are invisible to
> the kernel.
Just an idea: as an alternate approach, we can create an 'in-memory compressed
storage' backend for FS-Cache. This way, all filesystems modified to use
fs-cache can benefit from this backend. To make it virtualization friendly like
tmem, we can again provide (per-cache?) option to allocate from hypervisor i.e.
tmem_{put,get}_page() or use [compress]+alloc natively.
For guest<-->hypervisor interface, maybe we can use virtio so that all
hypervisors can benefit? Not quite sure about this one.
Thanks,
Nitin
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2009-12-21 13:48 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-21 13:46 Nitin Gupta [this message]
2009-12-21 23:46 ` Dan Magenheimer
2009-12-23 6:28 ` Nitin Gupta
2009-12-23 17:15 ` Dan Magenheimer
2009-12-24 3:27 ` Nitin Gupta
2009-12-24 20:51 ` Dan Magenheimer
2009-12-25 19:18 ` Pavel Machek
2009-12-28 15:57 ` Dan Magenheimer
2009-12-28 20:51 ` Pavel Machek
2009-12-28 21:41 ` Dan Magenheimer
2009-12-29 2:07 ` Nitin Gupta
-- strict thread matches above, loose matches on Subject: below --
2009-12-18 0:36 Dan Magenheimer
2009-12-18 8:06 ` Pavel Machek
2009-12-21 10:54 ` Nitin Gupta
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B2F7C41.9020106@vflare.org \
--to=ngupta@vflare.org \
--cc=Rusty@rcsinet15.oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=avi@redhat.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=chris.mason@oracle.com \
--cc=dan.magenheimer@oracle.com \
--cc=dave.mccracken@oracle.com \
--cc=jeremy@goop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mtosatti@redhat.com \
--cc=npiggin@suse.de \
--cc=pavel@ucw.cz \
--cc=riel@redhat.com \
--cc=rusty@rustcorp.com.au \
--cc=schwidefsky@de.ibm.com \
--cc=sunil.mushran@oracle.com \
--cc=tmem-devel@oss.oracle.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox