linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] Memory merging driver for Linux
@ 2008-01-21 16:05 Izik Eidus
  2008-01-23 17:05 ` [kvm-devel] " Rik van Riel
  2008-01-23 23:10 ` Chris Wright
  0 siblings, 2 replies; 8+ messages in thread
From: Izik Eidus @ 2008-01-21 16:05 UTC (permalink / raw)
  To: kvm-devel, andrea, avi, dor.laor, linux-mm, yaniv

when kvm is used in production servers, many times it run the same 
guests operation systems more than once
the idea of this module is to find the identical pages in diffrent 
guests and to share them so we can save memory,
due to the fact that many guests run identical operation systems, alot 
of data in the ram is equal between the guests

this module find this identical data (pages) and merge them into one 
single page
this new page is write protected so in any case the guest will try to 
write to it do_wp_page will duplicate the page

this module simply go over a list of pages that were registered, and 
find the identical pages (using hash table)
the pages that it scan are anonymous, each time that it find an 
identical pages it create a file mapped
(right now it is just kernel allocated) page that will be the shared page,

as for now i am missing swapping support (will add soon using non-linear 
vmas)
 
this module can be used for every other purpuse and work without kvm
(i used it for qemu)
to make it work for kvm, the mmu notifers sent by andrea should be used

i added 2 new functions to the kernel
one:
page_wrprotect() make the page as read only by setting the ptes point to
it as read only.
second:
replace_page() - replace the pte mapping related to vm area between two 
pages

few numbers:
for started windows i can share almost the whole memory (as it zero all 
the pages),
so i can start much much more windows guests than i have memory (as long 
as no one touch it)

for linux guests i was able to share 800mb+ for 4 centos guests that 
each had 512mb memory allocated to
(again it was without work load, and they ran X)

-- 
woof.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kvm-devel] [RFC][PATCH 0/5] Memory merging driver for Linux
  2008-01-21 16:05 [RFC][PATCH 0/5] Memory merging driver for Linux Izik Eidus
@ 2008-01-23 17:05 ` Rik van Riel
  2008-01-23 17:54   ` Andrea Arcangeli
  2008-01-24  5:38   ` Avi Kivity
  2008-01-23 23:10 ` Chris Wright
  1 sibling, 2 replies; 8+ messages in thread
From: Rik van Riel @ 2008-01-23 17:05 UTC (permalink / raw)
  To: Izik Eidus; +Cc: kvm-devel, andrea, avi, dor.laor, linux-mm, yaniv

On Mon, 21 Jan 2008 18:05:53 +0200
Izik Eidus <izike@qumranet.com> wrote:

> i added 2 new functions to the kernel
> one:
> page_wrprotect() make the page as read only by setting the ptes point to
> it as read only.
> second:
> replace_page() - replace the pte mapping related to vm area between two 
> pages

How will this work on CPUs with nested paging support, where the
CPU does the guest -> physical address translation?  (opposed to
having shadow page tables)

Is it sufficient to mark the page read-only in the guest->physical
translation page table?

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kvm-devel] [RFC][PATCH 0/5] Memory merging driver for Linux
  2008-01-23 17:05 ` [kvm-devel] " Rik van Riel
@ 2008-01-23 17:54   ` Andrea Arcangeli
  2008-01-23 18:11     ` Izik Eidus
  2008-01-24  5:38   ` Avi Kivity
  1 sibling, 1 reply; 8+ messages in thread
From: Andrea Arcangeli @ 2008-01-23 17:54 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Izik Eidus, kvm-devel, avi, dor.laor, linux-mm, yaniv

On Wed, Jan 23, 2008 at 12:05:10PM -0500, Rik van Riel wrote:
> On Mon, 21 Jan 2008 18:05:53 +0200
> Izik Eidus <izike@qumranet.com> wrote:
> 
> > i added 2 new functions to the kernel
> > one:
> > page_wrprotect() make the page as read only by setting the ptes point to
> > it as read only.
> > second:
> > replace_page() - replace the pte mapping related to vm area between two 
> > pages
> 
> How will this work on CPUs with nested paging support, where the
> CPU does the guest -> physical address translation?  (opposed to
> having shadow page tables)

sptes resolve guest addresses to host physical addresses (what is
different is only which kind of guest address is being translated).

sptes are faster than nptes for non pte-mangling non-context-switching
memory intensive number crunching workloads infact. (DBMS will
appreciate ntpes instead ;)

> Is it sufficient to mark the page read-only in the guest->physical
> translation page table?

Yes, just like with sptes too. I guess ntpes will also be managed as a
tlb even if they won't require many changes, but the mmu notifier
already firing in those two calls is what will keep both sptes and
nptes in sync with the main linux VM. The serialization against
get_user_pages that refills the spte/npte layer with
nonpresent-nofault case of course happens through the PT lock, just
like for the regular linux page fault against the pte that is pte_none
for a little while but with the lock held (and set to write protect or
new value before releasing it). This infact shows how the mmu
notifiers that connects the linux pte to the spte/npte works for more
than swapping.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kvm-devel] [RFC][PATCH 0/5] Memory merging driver for Linux
  2008-01-23 17:54   ` Andrea Arcangeli
@ 2008-01-23 18:11     ` Izik Eidus
  0 siblings, 0 replies; 8+ messages in thread
From: Izik Eidus @ 2008-01-23 18:11 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, kvm-devel, avi, dor.laor, linux-mm, yaniv

Andrea Arcangeli wrote:
> On Wed, Jan 23, 2008 at 12:05:10PM -0500, Rik van Riel wrote:
>   
>> On Mon, 21 Jan 2008 18:05:53 +0200
>> Izik Eidus <izike@qumranet.com> wrote:
>>
>>     
>>> i added 2 new functions to the kernel
>>> one:
>>> page_wrprotect() make the page as read only by setting the ptes point to
>>> it as read only.
>>> second:
>>> replace_page() - replace the pte mapping related to vm area between two 
>>> pages
>>>       
>> How will this work on CPUs with nested paging support, where the
>> CPU does the guest -> physical address translation?  (opposed to
>> having shadow page tables)
>>     

thanks for reviewing.

nested page tables are some what diffrent from shadow page tables
instead of keeping another page table like we are doing with the shadow code
we are keeping another layer that translate the physical memory of the 
guest into the
physical memory of the host,
to this new layer we are allowed to add access permission, so we can 
mark the pages that
are shared as readonly and to vmexit on that, so it should work with that.
>
> sptes resolve guest addresses to host physical addresses (what is
> different is only which kind of guest address is being translated).
>
> sptes are faster than nptes for non pte-mangling non-context-switching
> memory intensive number crunching workloads infact. (DBMS will
> appreciate ntpes instead ;)
>
>   
>> Is it sufficient to mark the page read-only in the guest->physical
>> translation page table?
>>     
>
> Yes, just like with sptes too. I guess ntpes will also be managed as a
> tlb even if they won't require many changes, but the mmu notifier
> already firing in those two calls is what will keep both sptes and
> nptes in sync with the main linux VM. The serialization against
> get_user_pages that refills the spte/npte layer with
> nonpresent-nofault case of course happens through the PT lock, just
> like for the regular linux page fault against the pte that is pte_none
> for a little while but with the lock held (and set to write protect or
> new value before releasing it). This infact shows how the mmu
> notifiers that connects the linux pte to the spte/npte works for more
> than swapping.
>   
yea, without mmu notifiers this driver cant work safely and effective 
for kvm
it can only work for normal applications such as qemu without the mmu 
notifers.

-- 
woof.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kvm-devel] [RFC][PATCH 0/5] Memory merging driver for Linux
  2008-01-21 16:05 [RFC][PATCH 0/5] Memory merging driver for Linux Izik Eidus
  2008-01-23 17:05 ` [kvm-devel] " Rik van Riel
@ 2008-01-23 23:10 ` Chris Wright
  2008-01-24  5:40   ` Avi Kivity
  1 sibling, 1 reply; 8+ messages in thread
From: Chris Wright @ 2008-01-23 23:10 UTC (permalink / raw)
  To: Izik Eidus; +Cc: kvm-devel, andrea, avi, dor.laor, linux-mm, yaniv

* Izik Eidus (izike@qumranet.com) wrote:
> this module find this identical data (pages) and merge them into one 
> single page
> this new page is write protected so in any case the guest will try to 
> write to it do_wp_page will duplicate the page

What happens if you've merged more pages than you can recover on write
faults?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kvm-devel] [RFC][PATCH 0/5] Memory merging driver for Linux
  2008-01-23 17:05 ` [kvm-devel] " Rik van Riel
  2008-01-23 17:54   ` Andrea Arcangeli
@ 2008-01-24  5:38   ` Avi Kivity
  1 sibling, 0 replies; 8+ messages in thread
From: Avi Kivity @ 2008-01-24  5:38 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Izik Eidus, kvm-devel, andrea, dor.laor, linux-mm, yaniv

Rik van Riel wrote:
> On Mon, 21 Jan 2008 18:05:53 +0200
> Izik Eidus <izike@qumranet.com> wrote:
>
>   
>> i added 2 new functions to the kernel
>> one:
>> page_wrprotect() make the page as read only by setting the ptes point to
>> it as read only.
>> second:
>> replace_page() - replace the pte mapping related to vm area between two 
>> pages
>>     
>
> How will this work on CPUs with nested paging support, where the
> CPU does the guest -> physical address translation?  (opposed to
> having shadow page tables)
>
>   

Nested page tables are very similar to real-mode shadow paging: both 
translate guest physical addresses to host physical addreses.

In any case, the merge driver is oblivious to the paging method used, it 
works at the Linux pte level and relies on mmu notifiers to keep 
everything in sync.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kvm-devel] [RFC][PATCH 0/5] Memory merging driver for Linux
  2008-01-23 23:10 ` Chris Wright
@ 2008-01-24  5:40   ` Avi Kivity
  2008-01-24  9:26     ` Izik Eidus
  0 siblings, 1 reply; 8+ messages in thread
From: Avi Kivity @ 2008-01-24  5:40 UTC (permalink / raw)
  To: Chris Wright; +Cc: Izik Eidus, kvm-devel, andrea, dor.laor, linux-mm, yaniv

Chris Wright wrote:
> * Izik Eidus (izike@qumranet.com) wrote:
>   
>> this module find this identical data (pages) and merge them into one 
>> single page
>> this new page is write protected so in any case the guest will try to 
>> write to it do_wp_page will duplicate the page
>>     
>
> What happens if you've merged more pages than you can recover on write
> faults?
>   

You start to swap.  Just like Linux when you start to write on fork()ed 
memory.

A management application may start taking measures, like inflating 
balloons and migrating to other hosts, but swapping is needed as a last 
resort measure.

-- 
Any sufficiently difficult bug is indistinguishable from a feature.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [kvm-devel] [RFC][PATCH 0/5] Memory merging driver for Linux
  2008-01-24  5:40   ` Avi Kivity
@ 2008-01-24  9:26     ` Izik Eidus
  0 siblings, 0 replies; 8+ messages in thread
From: Izik Eidus @ 2008-01-24  9:26 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Wright, kvm-devel, andrea, dor.laor, linux-mm, yaniv

Avi Kivity wrote:
> Chris Wright wrote:
>> * Izik Eidus (izike@qumranet.com) wrote:
>>  
>>> this module find this identical data (pages) and merge them into one 
>>> single page
>>> this new page is write protected so in any case the guest will try 
>>> to write to it do_wp_page will duplicate the page
>>>     
>>
>> What happens if you've merged more pages than you can recover on write
>> faults?
>>   
>
> You start to swap.  Just like Linux when you start to write on 
> fork()ed memory.
>
> A management application may start taking measures, like inflating 
> balloons and migrating to other hosts, but swapping is needed as a 
> last resort measure.
>
yes, write faults are getting into do_wp_page() that in turn create a 
new anonymous/swappable page
so it is safe.

-- 
woof.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-01-24  9:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-21 16:05 [RFC][PATCH 0/5] Memory merging driver for Linux Izik Eidus
2008-01-23 17:05 ` [kvm-devel] " Rik van Riel
2008-01-23 17:54   ` Andrea Arcangeli
2008-01-23 18:11     ` Izik Eidus
2008-01-24  5:38   ` Avi Kivity
2008-01-23 23:10 ` Chris Wright
2008-01-24  5:40   ` Avi Kivity
2008-01-24  9:26     ` Izik Eidus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox