From: Izik Eidus <ieidus@redhat.com>
To: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Rik van Riel <riel@redhat.com>, Chris Wright <chrisw@redhat.com>,
Nick Piggin <nickpiggin@yahoo.com.au>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: KSM: current madvise rollup
Date: Mon, 13 Jul 2009 21:28:21 +0300 [thread overview]
Message-ID: <4A5B7CC5.4030908@redhat.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0907121459150.7417@sister.anvils>
Hugh Dickins wrote:
> On Sun, 12 Jul 2009, Izik Eidus wrote:
>
>> On Sat, 11 Jul 2009 20:22:11 +0100 (BST)
>> Hugh Dickins <hugh.dickins@tiscali.co.uk> wrote:
>>
>>> We may want to do that anyway. It concerned me a lot when I was
>>> first testing (and often saw kernel_pages_allocated greater than
>>> pages_shared - probably because of the original KSM's eagerness to
>>> merge forked pages, though I think there may have been more to it
>>> than that). But seems much less of an issue now (that ratio is much
>>> healthier), and even less of an issue once KSM pages can be swapped.
>>> So I'm not bothering about it at the moment, but it may make sense.
>>>
>
> I realized since writing that with the current statistics you really
> cannot tell how big an issue the orphaned (count 1) KSM pages are -
> good sharing of a few will completely hide non-sharing of many.
>
> But I've hacked in more stats (not something I'd care to share yet!),
> and those confirm that for my loads at least, the orphaned KSM pages
> are few compared with the shared ones.
>
>
>> We could add patch like the below, but I think we should leave it as it
>> is now,
>>
>
> I agree we should leave it as is for now. My guess is that we'll
> prefer to leave them around, until approaching max_kernel_pages_alloc,
> pruning them only at that stage (rather as we free swap more aggressively
> when it's 50% full). There may be benefit in not removing them too soon,
> there may be benefit in holding on to stable pages for longer (holding a
> reference in the stable tree for a while). Or maybe not, just an idea.
>
Well, I personaly dont see any real soultion to this problem without
real swapping support,
So for now I dont mind not to deal with it at all (as long as the admin
can decide how many unswappable pages he allow)
But, If you have a stronger opnion than me for that case, I really dont
care to change it.
>
>> and solve it all (like you have said) with the ksm pages
>> swapping support in next kernel release.
>> (Right now ksm can limit itself with max_kernel_pages_alloc)
>>
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index a0fbdb2..ee80861 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -1261,8 +1261,13 @@ static void ksm_do_scan(unsigned int scan_npages)
>> rmap_item = scan_get_next_rmap_item(&page);
>> if (!rmap_item)
>> return;
>> - if (!PageKsm(page) || !in_stable_tree(rmap_item))
>> + if (!PageKsm(page) || !in_stable_tree(rmap_item)) {
>> cmp_and_merge_page(page, rmap_item);
>> + } else if (page_mapcount(page) == 0) {
>>
>
> If we did that (but we agree not for now), shouldn't it be
> page_mapcount(page) == 1
> ? The mapcount 0 ones already got freed by the zap/unmap code.
>
Yea, I dont know what i thought when i compare it to zero..., 1 is the
right value here indeed.
>
>> + break_cow(rmap_item->mm,
>> + rmap_item->address & PAGE_MASK);
>>
>
> Just a note on that " & PAGE_MASK": it's unnecessary there and
> almost everywhere else. One of the pleasures of putting flags into
> the bottom bits of the address, in code concerned with faulting, is
> that the faulting address can be anywhere within the page, so we
> don't have to bother to mask off the flags.
>
>
>> + remove_rmap_item_from_tree(rmap_item);
>> + }
>> put_page(page);
>> }
>> }
>>
>>
>>> Oh, something that might be making it higher, that I didn't highlight
>>> (and can revert if you like, it was just more straightforward this
>>> way): with scan_get_next_rmap skipping the non-present ptes,
>>> pages_to_scan is currently a limit on the _present_ pages scanned in
>>> one batch.
>>>
>> You mean that now when you say: pages_to_scan = 512, it wont count the
>> none present ptes as part of the counter, so if we have 500 not present
>> ptes in the begining and then 512 ptes later, before it used to call
>> cmp_and_merge_page() only for 12 pages while now it will get called on
>> 512 pages?
>>
>
> If I understand you right, yes, before it would do those 500 absent then
> 512 present in two batches, first 512 (of which only 12 present) then 500;
> whereas now it'll skip the 500 absent without counting them, and handle
> the 512 present in that same one batch.
>
>
>> If yes, then I liked this change, it is more logical from cpu
>> consumption point of view,
>>
>
> Yes, although it does spend a little time on the absent ones, it should
> be much less time than it spends comparing or checksumming on present ones.
>
Yea, compare to the other stuff it is minor...
>
>> and in addition we have that cond_reched()
>> so I dont see a problem with this.
>>
>
> Right, that cond_resched() is vital in this case.
>
> By the way, something else I didn't highlight, a significant benefit
> from avoiding get_user_pages(): that was doing a mark_page_accessed()
> on every present pte that it found, interfering with pageout decisions.
>
That is a great value, i didn't even thought about that...
> Hugh
>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-07-13 18:00 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-29 14:14 Hugh Dickins
2009-06-30 9:52 ` Izik Eidus
2009-06-30 15:57 ` Hugh Dickins
2009-06-30 18:41 ` Izik Eidus
2009-07-01 2:03 ` Hugh Dickins
2009-07-01 9:50 ` Izik Eidus
2009-07-08 21:07 ` Hugh Dickins
2009-07-09 15:37 ` Izik Eidus
2009-07-10 13:43 ` Hugh Dickins
2009-07-10 22:42 ` Izik Eidus
2009-07-10 22:49 ` Izik Eidus
2009-07-11 19:22 ` Hugh Dickins
2009-07-11 21:22 ` Izik Eidus
2009-07-12 14:44 ` Hugh Dickins
2009-07-13 18:28 ` Izik Eidus [this message]
2009-07-01 10:33 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A5B7CC5.4030908@redhat.com \
--to=ieidus@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=chrisw@redhat.com \
--cc=hugh.dickins@tiscali.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nickpiggin@yahoo.com.au \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox