From: Mike Waychison <mikew@google.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Ying Han <yinghan@google.com>, Ingo Molnar <mingo@elte.hu>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
akpm <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Rohit Seth <rohitseth@google.com>,
Hugh Dickins <hugh@veritas.com>, "H. Peter Anvin" <hpa@zytor.com>,
edwintorok@gmail.com
Subject: Re: [RFC v1][PATCH]page_fault retry with NOPAGE_RETRY
Date: Thu, 27 Nov 2008 11:22:57 -0800 [thread overview]
Message-ID: <492EF391.1040408@google.com> (raw)
In-Reply-To: <20081127101436.GI28285@wotan.suse.de>
Nick Piggin wrote:
> On Thu, Nov 27, 2008 at 11:00:07AM +0100, Peter Zijlstra wrote:
>> On Thu, 2008-11-27 at 01:28 -0800, Mike Waychison wrote:
>>
>>> Correct. I don't recall the numbers from the pathelogical cases we were
>>> seeing, but iirc, it was on the order of 10s of seconds, likely
>>> exascerbated by slower than usual disks. I've been digging through my
>>> inbox to find numbers without much success -- we've been using a variant
>>> of this patch since 2.6.11.
>>> We generally try to avoid such things, but sometimes it a) can't be
>>> easily avoided (third party libraries for instance) and b) when it hits
>>> us, it affects the overall health of the machine/cluster (the monitoring
>>> daemons get blocked, which isn't very healthy).
>> If its only monitoring, there might be another solution. If you can keep
>> the required data in a separate (approximate) copy so that you don't
>> need mmap_sem at all to show them.
>>
>> If your mmap_sem is so contended your latencies are unacceptable, adding
>> more users to it - even statistics gathering, just isn't going to cure
>> the situation.
>>
>> Furthermore, /proc code usually isn't written with performance in mind,
>> so its usually simple and robust code. Adding it to a 'hot'-path like
>> you're doing doesn't seem advisable.
>>
>> Also, releasing and re-acquiring mmap_sem can significantly add to the
>> cacheline bouncing that thing already has.
>
> Yes, it would be nice to reduce mmap_sem load regardless of any other
> fixes or problems. I guess they're not very worried about cacheline
> bouncing but more about hold time (how many sockets in these systems?
> 4 at most?)
>
> I guess it is the pagemap stuff that they use most heavily?
>
We aren't using pagemap yet. Reading /proc/pid/maps alone hurts.
> pagemap_read looks like it can use get_user_pages_fast. The smaps and
> clear_refs stuff might have been nicer if they could work on ranges
> like pagemap. Then they could avoid mmap_sem as well (although maps
> would need to be sampled and take mmap_sem I guess).
>
> One problem with dropping mmap_sem is that it hurts priority/fairness.
> And it opens a bit of a (maybe theoretical but not something to completely
> ignore) forward progress hole AFAIKS. If mmap_sem is very heavily
> contended, then the refault is going to take a while to get through,
> and then the page might get reclaimed etc).
Right, this can be an issue. The way around it should be to minimize
the length of time any single lock holder can sit on it. Compared to
what we have today with:
- sleep in major fault with read lock held,
- enqueue writer behind it,
- and make all other faults wait on the rwsem
The retry logic seems to be a lot better for forward progress.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-11-27 19:22 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-11-22 6:47 Ying Han
2008-11-22 7:15 ` Andrew Morton
2008-11-23 9:18 ` Ingo Molnar
2008-11-23 18:24 ` Andrew Morton
2008-11-25 18:42 ` Ying Han
2008-11-26 12:32 ` Nick Piggin
2008-11-26 19:57 ` Mike Waychison
2008-11-27 8:55 ` Nick Piggin
2008-11-27 9:28 ` Mike Waychison
2008-11-27 10:00 ` Peter Zijlstra
2008-11-27 10:14 ` Nick Piggin
2008-11-27 19:22 ` Mike Waychison [this message]
2008-11-28 9:41 ` Nick Piggin
2008-11-28 22:46 ` Mike Waychison
2008-11-27 11:08 ` KOSAKI Motohiro
2008-11-27 19:10 ` Mike Waychison
2008-11-27 11:39 ` Török Edwin
2008-11-27 12:03 ` Nick Piggin
2008-11-27 12:21 ` Török Edwin
2008-11-27 12:32 ` Peter Zijlstra
2008-11-27 12:39 ` Nick Piggin
2008-11-27 12:52 ` Török Edwin
2008-11-27 13:05 ` Nick Piggin
2008-11-27 13:10 ` Török Edwin
2008-11-27 13:12 ` Nick Piggin
2008-11-27 13:23 ` Török Edwin
2008-11-28 12:10 ` Nick Piggin
2008-11-30 19:38 ` Török Edwin
2008-12-01 8:52 ` Nick Piggin
2008-12-01 11:13 ` Nick Piggin
2008-12-01 11:37 ` Török Edwin
2008-12-04 22:27 ` Ying Han
2008-12-05 6:50 ` Török Edwin
2008-11-27 13:08 ` Nick Piggin
2008-11-27 19:03 ` Mike Waychison
2008-11-28 9:37 ` Nick Piggin
2008-11-28 23:02 ` Mike Waychison
2008-11-30 19:54 ` Török Edwin
2008-12-01 4:50 ` Mike Waychison
2008-12-01 8:58 ` Nick Piggin
2008-12-01 11:45 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=492EF391.1040408@google.com \
--to=mikew@google.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=edwintorok@gmail.com \
--cc=hpa@zytor.com \
--cc=hugh@veritas.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@elte.hu \
--cc=npiggin@suse.de \
--cc=rientjes@google.com \
--cc=rohitseth@google.com \
--cc=yinghan@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox