Re: KSM: current madvise rollup

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Izik Eidus <ieidus@redhat.com>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Rik van Riel <riel@redhat.com>, Chris Wright <chrisw@redhat.com>,
	Nick Piggin <nickpiggin@yahoo.com.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: KSM: current madvise rollup
Date: Mon, 13 Jul 2009 21:28:21 +0300	[thread overview]
Message-ID: <4A5B7CC5.4030908@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0907121459150.7417@sister.anvils>

Hugh Dickins wrote:
> On Sun, 12 Jul 2009, Izik Eidus wrote:
>   
>> On Sat, 11 Jul 2009 20:22:11 +0100 (BST)
>> Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
>>     
>>> We may want to do that anyway.  It concerned me a lot when I was
>>> first testing (and often saw kernel_pages_allocated greater than
>>> pages_shared - probably because of the original KSM's eagerness to
>>> merge forked pages, though I think there may have been more to it
>>> than that).  But seems much less of an issue now (that ratio is much
>>> healthier), and even less of an issue once KSM pages can be swapped.
>>> So I'm not bothering about it at the moment, but it may make sense.
>>>       
>
> I realized since writing that with the current statistics you really
> cannot tell how big an issue the orphaned (count 1) KSM pages are -
> good sharing of a few will completely hide non-sharing of many.
>
> But I've hacked in more stats (not something I'd care to share yet!),
> and those confirm that for my loads at least, the orphaned KSM pages
> are few compared with the shared ones.
>
>   
>> We could add patch like the below, but I think we should leave it as it
>> is now,
>>     
>
> I agree we should leave it as is for now.  My guess is that we'll
> prefer to leave them around, until approaching max_kernel_pages_alloc,
> pruning them only at that stage (rather as we free swap more aggressively
> when it's 50% full).  There may be benefit in not removing them too soon,
> there may be benefit in holding on to stable pages for longer (holding a
> reference in the stable tree for a while).  Or maybe not, just an idea.
>   

Well, I personaly dont see any real soultion to this problem without 
real swapping support,
So for now I dont mind not to deal with it at all (as long as the admin 
can decide how many unswappable pages he allow)
But, If you have a stronger opnion than me for that case, I really dont 
care to change it.

>   
>> and solve it all (like you have said) with the ksm pages
>> swapping support in next kernel release.
>> (Right now ksm can limit itself with max_kernel_pages_alloc)
>>
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index a0fbdb2..ee80861 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -1261,8 +1261,13 @@ static void ksm_do_scan(unsigned int scan_npages)
>>  		rmap_item = scan_get_next_rmap_item(&page);
>>  		if (!rmap_item)
>>  			return;
>> -		if (!PageKsm(page) || !in_stable_tree(rmap_item))
>> +		if (!PageKsm(page) || !in_stable_tree(rmap_item)) {
>>  			cmp_and_merge_page(page, rmap_item);
>> +		} else if (page_mapcount(page) == 0) {
>>     
>
> If we did that (but we agree not for now), shouldn't it be
> 			   page_mapcount(page) == 1
> ?  The mapcount 0 ones already got freed by the zap/unmap code.
>   

Yea, I dont know what i thought when i compare it to zero..., 1 is the 
right value here indeed.

>   
>> +			break_cow(rmap_item->mm,
>> +				  rmap_item->address & PAGE_MASK);
>>     
>
> Just a note on that " & PAGE_MASK": it's unnecessary there and
> almost everywhere else.  One of the pleasures of putting flags into
> the bottom bits of the address, in code concerned with faulting, is
> that the faulting address can be anywhere within the page, so we
> don't have to bother to mask off the flags.
>
>   
>> +			remove_rmap_item_from_tree(rmap_item);
>> +		}
>>  		put_page(page);
>>  	}
>>  }
>>
>>     
>>> Oh, something that might be making it higher, that I didn't highlight
>>> (and can revert if you like, it was just more straightforward this
>>> way): with scan_get_next_rmap skipping the non-present ptes,
>>> pages_to_scan is currently a limit on the _present_ pages scanned in
>>> one batch.
>>>       
>> You mean that now when you say: pages_to_scan = 512, it wont count the
>> none present ptes as part of the counter, so if we have 500 not present
>> ptes in the begining and then 512 ptes later, before it used to call
>> cmp_and_merge_page() only for 12 pages while now it will get called on
>> 512 pages?
>>     
>
> If I understand you right, yes, before it would do those 500 absent then
> 512 present in two batches, first 512 (of which only 12 present) then 500;
> whereas now it'll skip the 500 absent without counting them, and handle
> the 512 present in that same one batch.
>
>   
>> If yes, then I liked this change, it is more logical from cpu
>> consumption point of view,
>>     
>
> Yes, although it does spend a little time on the absent ones, it should
> be much less time than it spends comparing or checksumming on present ones.
>   

Yea, compare to the other stuff it is minor...

>   
>> and in addition we have that cond_reched()
>> so I dont see a problem with this.
>>     
>
> Right, that cond_resched() is vital in this case.
>
> By the way, something else I didn't highlight, a significant benefit
> from avoiding get_user_pages(): that was doing a mark_page_accessed()
> on every present pte that it found, interfering with pageout decisions.
>   

That is a great value, i didn't even thought about that...

> Hugh
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-07-13 18:00 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-06-29 14:14 Hugh Dickins
2009-06-30  9:52 ` Izik Eidus
2009-06-30 15:57   ` Hugh Dickins
2009-06-30 18:41     ` Izik Eidus
2009-07-01  2:03       ` Hugh Dickins
2009-07-01  9:50         ` Izik Eidus
2009-07-08 21:07           ` Hugh Dickins
2009-07-09 15:37             ` Izik Eidus
2009-07-10 13:43               ` Hugh Dickins
2009-07-10 22:42             ` Izik Eidus
2009-07-10 22:49               ` Izik Eidus
2009-07-11 19:22               ` Hugh Dickins
2009-07-11 21:22                 ` Izik Eidus
2009-07-12 14:44                   ` Hugh Dickins
2009-07-13 18:28                     ` Izik Eidus [this message]
2009-07-01 10:33         ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A5B7CC5.4030908@redhat.com \
    --to=ieidus@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=chrisw@redhat.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nickpiggin@yahoo.com.au \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox