[V4][PATCH 0/4]page fault retry with NOPAGE

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [V4][PATCH 0/4]page fault retry with NOPAGE_RETRY
@ 2009-04-13 19:44 Ying Han
  2009-04-13 19:57 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: Ying Han @ 2009-04-13 19:44 UTC (permalink / raw)
  To: linux-mm, linux-kernel, akpm, torvalds, Ingo Molnar,
	Mike Waychison, Rohit Seth, Hugh Dickins, Peter Zijlstra,
	H. Peter Anvin, Török Edwin, Lee Schermerhorn,
	Nick Piggin, Wu Fengguang

page fault retry with NOPAGE_RETRY
Allow major faults to drop the mmap_sem read lock while waiting for
synchronous disk read. This allows another thread which wishes to grab
down_write(mmap_sem) to proceed while the current is waiting the disk IO.

The patch replace the 'write' with flag of handle_mm_fault() to
FAULT_FLAG_RETRY as identify that the caller can tolerate the retry in the
filemap_fault call patch.

Compiled and booted on x86_64. (Andrew: can you help on other arches )

changelog[v4]:
- split the patch into four parts. the first two patches is a cleanup of
  Linus "untested" patch, which replace the boolen "writeaccess" to be a
  flag in handle_mm_fault. and the last two patches add the
  FAULT_FLAG_RETRY support.
- include all arches cleanups.
- add more comments as Andrew suggested.
- cleanups as Fengguang Wu suggested.

changelog[v3]:
- applied fixes and cleanups from Wu Fengguang.
filemap VM_FAULT_RETRY fixes
[PATCH 01/14] mm: fix find_lock_page_retry() return value parsing
[PATCH 02/14] mm: fix major/minor fault accounting on retried fault
[PATCH 04/14] mm: reduce duplicate page fault code
[PATCH 05/14] readahead: account mmap_miss for VM_FAULT_RETRY

- split the patch into two parts. first part includes FAULT_FLAG_RETRY
  support with no current user change. second part includes individual
  per-architecture cleanups that enable FAULT_FLAG_RETRY.
  currently there are mainly two users for handle_mm_fault, we enable
  FAULT_FLAG_RETRY for actual fault handler and leave get_user_pages
  unchanged.

changelog[v2]:
- reduce the runtime overhead by extending the 'write' flag of
  handle_mm_fault() to indicate the retry hint.
- add another two branches in filemap_fault with retry logic.
- replace find_lock_page with find_lock_page_retry to make the code
  cleaner.

Benchmarks:
case 1. one application has a high count of threads each faulting in
different pages of a hugefile. Benchmark indicate that this double data
structure walking in case of major fault results in << 1% performance hit.

case 2. add another thread in the above application which in a tight loop
of mmap()/munmap(). Here we measure loop count in the new thread while other
threads doing the same amount of work as case one. we got << 3% performance
hit on the Complete Time(benchmark value for case one) and 10% performance
improvement on the mmap()/munmap() counter.

This patch helps a lot in cases we have writer which is waitting behind all
readers, so it could execute much faster.

benchmarks from Wufengguang:
Just tested the sparse-random-read-on-sparse-file case, and found the
performance impact to be 0.4% (8.706s vs 8.744s) in the worst case.
Kind of acceptable.

without FAULT_FLAG_RETRY:
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse
3.28s user 5.39s system 99% cpu 8.692 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse
3.17s user 5.54s system 99% cpu 8.742 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse
3.18s user 5.48s system 99% cpu 8.684 total

FAULT_FLAG_RETRY:
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse
3.18s user 5.63s system 99% cpu 8.825 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse
3.22s user 5.47s system 99% cpu 8.718 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse
3.13s user 5.55s system 99% cpu 8.690 total

In the above faked workload, the mmap read page offsets are loaded from
stride-100 and performed on /mnt/btrfs-ram/sparse, which are created by:

	seq 0 100 1000000 > stride-100
	dd if=/dev/zero of=/mnt/btrfs-ram/sparse bs=1M count=1 seek=1024000

Signed-off-by: Mike Waychison <mikew@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ying Han <yinghan@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [V4][PATCH 0/4]page fault retry with NOPAGE_RETRY
  2009-04-13 19:44 [V4][PATCH 0/4]page fault retry with NOPAGE_RETRY Ying Han
@ 2009-04-13 19:57 ` Linus Torvalds
  2009-04-13 20:19   ` Linus Torvalds
  2009-04-13 21:04   ` Ying Han
  0 siblings, 2 replies; 4+ messages in thread
From: Linus Torvalds @ 2009-04-13 19:57 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, linux-kernel, akpm, Ingo Molnar, Mike Waychison,
	Rohit Seth, Hugh Dickins, Peter Zijlstra, H. Peter Anvin,
	Török Edwin, Lee Schermerhorn, Nick Piggin,
	Wu Fengguang



On Mon, 13 Apr 2009, Ying Han wrote:
> 
> Benchmarks:
> case 1. one application has a high count of threads each faulting in
> different pages of a hugefile. Benchmark indicate that this double data
> structure walking in case of major fault results in << 1% performance hit.
> 
> case 2. add another thread in the above application which in a tight loop
> of mmap()/munmap(). Here we measure loop count in the new thread while other
> threads doing the same amount of work as case one. we got << 3% performance
> hit on the Complete Time(benchmark value for case one) and 10% performance
> improvement on the mmap()/munmap() counter.
> 
> This patch helps a lot in cases we have writer which is waitting behind all
> readers, so it could execute much faster.

Hmm. I normally think of "<<" as "much smaller than", but the way you use 
it makes me wonder. In particular, "<< 3%" sounds very odd. If it's much 
smaller than 3%, I'd have expected "<< 1%" again. So it probably isn't.

> benchmarks from Wufengguang:
> Just tested the sparse-random-read-on-sparse-file case, and found the
> performance impact to be 0.4% (8.706s vs 8.744s) in the worst case.
> Kind of acceptable.

Well, have you tried the obvious optimization of _not_ doing the RETRY 
path when atomic_read(&mm->counter) == 1?

After all, if it's not a threaded app, and it doesn't have a possibility 
of concurrent mmap/fault, then why release the lock?

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [V4][PATCH 0/4]page fault retry with NOPAGE_RETRY
  2009-04-13 19:57 ` Linus Torvalds
@ 2009-04-13 20:19   ` Linus Torvalds
  2009-04-13 21:04   ` Ying Han
  1 sibling, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2009-04-13 20:19 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, linux-kernel, akpm, Ingo Molnar, Mike Waychison,
	Rohit Seth, Hugh Dickins, Peter Zijlstra, H. Peter Anvin,
	Török Edwin, Lee Schermerhorn, Nick Piggin,
	Wu Fengguang



On Mon, 13 Apr 2009, Linus Torvalds wrote:
> 
> Well, have you tried the obvious optimization of _not_ doing the RETRY 
> path when atomic_read(&mm->counter) == 1?
> 
> After all, if it's not a threaded app, and it doesn't have a possibility 
> of concurrent mmap/fault, then why release the lock?

Ok, so the counter is called 'mm_users', not 'counter'.

Anyway, I would try that in the arch patch, and just see what happens when 
you change the

	unsigned int fault_flags = FAULT_FLAG_RETRY;

into

	unsigned int fault_flags;

	..

	fault_flags = atomic_read(&mm->mm_users) > 1 ? FAULT_FLAG_RETRY : 0;

where you should probably do that mm dereference only after checking that 
you're not in atomic context or something like that (so move the 
assignment down).

The reason I'd suggest doing it in the caller of handle_mm_fault() rather 
than anywhere deeper in the call chain is that some callers _might_ really 
want to get the retry semantics even if they are the only ones. Imagine, 
for example, some kind of non-blocking "get_user_pages()" thing.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [V4][PATCH 0/4]page fault retry with NOPAGE_RETRY
  2009-04-13 19:57 ` Linus Torvalds
  2009-04-13 20:19   ` Linus Torvalds
@ 2009-04-13 21:04   ` Ying Han
  1 sibling, 0 replies; 4+ messages in thread
From: Ying Han @ 2009-04-13 21:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-mm, linux-kernel, akpm, Ingo Molnar, Mike Waychison,
	Rohit Seth, Hugh Dickins, Peter Zijlstra, H. Peter Anvin,
	Török Edwin, Lee Schermerhorn, Nick Piggin,
	Wu Fengguang

On Mon, Apr 13, 2009 at 12:57 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 13 Apr 2009, Ying Han wrote:
>>
>> Benchmarks:
>> case 1. one application has a high count of threads each faulting in
>> different pages of a hugefile. Benchmark indicate that this double data
>> structure walking in case of major fault results in << 1% performance hit.
>>
>> case 2. add another thread in the above application which in a tight loop
>> of mmap()/munmap(). Here we measure loop count in the new thread while other
>> threads doing the same amount of work as case one. we got << 3% performance
>> hit on the Complete Time(benchmark value for case one) and 10% performance
>> improvement on the mmap()/munmap() counter.
>>
>> This patch helps a lot in cases we have writer which is waitting behind all
>> readers, so it could execute much faster.
>
> Hmm. I normally think of "<<" as "much smaller than", but the way you use
> it makes me wonder. In particular, "<< 3%" sounds very odd. If it's much
> smaller than 3%, I'd have expected "<< 1%" again. So it probably isn't.

Yes, it should be "< 3%", i will make the change.

>> benchmarks from Wufengguang:
>> Just tested the sparse-random-read-on-sparse-file case, and found the
>> performance impact to be 0.4% (8.706s vs 8.744s) in the worst case.
>> Kind of acceptable.
>
> Well, have you tried the obvious optimization of _not_ doing the RETRY
> path when atomic_read(&mm->counter) == 1?
>
> After all, if it's not a threaded app, and it doesn't have a possibility
> of concurrent mmap/fault, then why release the lock?
>
>                Linus
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-04-13 21:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-13 19:44 [V4][PATCH 0/4]page fault retry with NOPAGE_RETRY Ying Han
2009-04-13 19:57 ` Linus Torvalds
2009-04-13 20:19   ` Linus Torvalds
2009-04-13 21:04   ` Ying Han

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox