* [PATCH] updated low-latency zap_page_range
@ 2002-07-25 0:29 Robert Love
2002-07-25 0:45 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Robert Love @ 2002-07-25 0:29 UTC (permalink / raw)
To: akpm, torvalds; +Cc: riel, linux-kernel, linux-mm
Andrew and Linus,
The lock hold time in zap_page_range is horrid. This patch breaks the
work up into chunks and relinquishes the lock after each iteration.
This drastically lowers latency by creating a preemption point, as well
as lowering lock contention.
This patch is updated over the previous: per Linus's suggestion, we now
call a new "cond_resched_lock()" function. Per Andrew's suggestion, it
checks the preempt_count as to not allow an improper preemption.
However, we can now longer allow the conditional reschedule without
CONFIG_PREEMPT here (since there is no way to know if it is safe) so it
becomes a nop.
This lowers the maximum latency in zap_page_range from 10~20ms (on a
dual Athlon - one of the worst latencies recorded) to unmeasurable.
I made a couple other cleanups and optimizations:
- remove unneeded dir variable and call to pgd_offset - nothing
uses this anymore as the code was pushed to unmap_page_range
- removed duplicated start variable - it is the same as address
- BUG -> BUG_ON in unmap_page_range
- remove redundant BUG from zap_page_range - the same check is
done in unmap_page_range
- better comments
Patch is against 2.5.28, please apply.
Robert Love
diff -urN linux-2.5.28/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.5.28/include/linux/sched.h Wed Jul 24 14:03:20 2002
+++ linux/include/linux/sched.h Wed Jul 24 17:21:29 2002
@@ -888,6 +888,24 @@
__cond_resched();
}
+/*
+ * cond_resched_lock() - if a reschedule is pending, drop the given lock,
+ * call schedule, and on return reacquire the lock.
+ *
+ * Note: this assumes the given lock is the _only_ held lock and otherwise
+ * you are not atomic. The kernel preemption counter gives us "free"
+ * checking that this is really the only lock held -- let's use it.
+ */
+static inline void cond_resched_lock(spinlock_t * lock)
+{
+ if (need_resched() && preempt_count() == 1) {
+ _raw_spin_unlock(lock);
+ preempt_enable_no_resched();
+ __cond_resched();
+ spin_lock(lock);
+ }
+}
+
/* Reevaluate whether the task has signals pending delivery.
This is required every time the blocked sigset_t changes.
Athread cathreaders should have t->sigmask_lock. */
diff -urN linux-2.5.28/mm/memory.c linux/mm/memory.c
--- linux-2.5.28/mm/memory.c Wed Jul 24 14:03:27 2002
+++ linux/mm/memory.c Wed Jul 24 17:20:58 2002
@@ -390,8 +390,8 @@
{
pgd_t * dir;
- if (address >= end)
- BUG();
+ BUG_ON(address >= end);
+
dir = pgd_offset(vma->vm_mm, address);
tlb_start_vma(tlb, vma);
do {
@@ -402,33 +402,43 @@
tlb_end_vma(tlb, vma);
}
-/*
- * remove user pages in a given range.
+#define ZAP_BLOCK_SIZE (256 * PAGE_SIZE) /* how big a chunk we loop over */
+
+/**
+ * zap_page_range - remove user pages in a given range
+ * @vma: vm_area_struct holding the applicable pages
+ * @address: starting address of pages to zap
+ * @size: number of bytes to zap
*/
void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size)
{
struct mm_struct *mm = vma->vm_mm;
mmu_gather_t *tlb;
- pgd_t * dir;
- unsigned long start = address, end = address + size;
+ unsigned long end, block;
- dir = pgd_offset(mm, address);
+ spin_lock(&mm->page_table_lock);
/*
- * This is a long-lived spinlock. That's fine.
- * There's no contention, because the page table
- * lock only protects against kswapd anyway, and
- * even if kswapd happened to be looking at this
- * process we _want_ it to get stuck.
+ * This was once a long-held spinlock. Now we break the
+ * work up into ZAP_BLOCK_SIZE units and relinquish the
+ * lock after each interation. This drastically lowers
+ * lock contention and allows for a preemption point.
*/
- if (address >= end)
- BUG();
- spin_lock(&mm->page_table_lock);
- flush_cache_range(vma, address, end);
+ while (size) {
+ block = (size > ZAP_BLOCK_SIZE) ? ZAP_BLOCK_SIZE : size;
+ end = address + block;
+
+ flush_cache_range(vma, address, end);
+ tlb = tlb_gather_mmu(mm, 0);
+ unmap_page_range(tlb, vma, address, end);
+ tlb_finish_mmu(tlb, address, end);
+
+ cond_resched_lock(&mm->page_table_lock);
+
+ address += block;
+ size -= block;
+ }
- tlb = tlb_gather_mmu(mm, 0);
- unmap_page_range(tlb, vma, address, end);
- tlb_finish_mmu(tlb, start, end);
spin_unlock(&mm->page_table_lock);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] updated low-latency zap_page_range
2002-07-25 0:29 [PATCH] updated low-latency zap_page_range Robert Love
@ 2002-07-25 0:45 ` Andrew Morton
2002-07-25 1:16 ` Robert Love
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2002-07-25 0:45 UTC (permalink / raw)
To: Robert Love; +Cc: torvalds, riel, linux-kernel, linux-mm
Robert Love wrote:
>
> ...
> +static inline void cond_resched_lock(spinlock_t * lock)
> +{
> + if (need_resched() && preempt_count() == 1) {
> + _raw_spin_unlock(lock);
> + preempt_enable_no_resched();
> + __cond_resched();
> + spin_lock(lock);
> + }
> +}
Maybe I'm being thick. How come a simple spin_unlock() in here
won't do the right thing?
And this won't _really_ compile to nothing with CONFIG_PREEMPT=n,
will it? It just does nothing because preempt_count() is zero?
-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] updated low-latency zap_page_range
2002-07-25 0:45 ` Andrew Morton
@ 2002-07-25 1:16 ` Robert Love
2002-07-25 1:19 ` Andrew Morton
2002-07-25 1:21 ` Linus Torvalds
0 siblings, 2 replies; 8+ messages in thread
From: Robert Love @ 2002-07-25 1:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: torvalds, riel, linux-kernel, linux-mm
On Wed, 2002-07-24 at 17:45, Andrew Morton wrote:
> Robert Love wrote:
> >
> > +static inline void cond_resched_lock(spinlock_t * lock)
> > +{
> > + if (need_resched() && preempt_count() == 1) {
> > + _raw_spin_unlock(lock);
> > + preempt_enable_no_resched();
> > + __cond_resched();
> > + spin_lock(lock);
> > + }
> > +}
>
> Maybe I'm being thick. How come a simple spin_unlock() in here
> won't do the right thing?
It will, but we will check need_resched twice. And preempt_count
again. My original version just did the "unlock; lock" combo and thus
the checking was automatic... but if we want to check before we unlock,
we might as well be optimal about it.
> And this won't _really_ compile to nothing with CONFIG_PREEMPT=n,
> will it? It just does nothing because preempt_count() is zero?
I hope it compiles to nothing! There is a false in an if... oh, wait,
to preserve possible side-effects gcc will keep the need_resched() call
so I guess we should reorder it as:
if (preempt_count() == 1 && need_resched())
Then we get "if (0 && ..)" which should hopefully be evaluated away.
Then the inline is empty and nothing need be done.
Robert Love
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] updated low-latency zap_page_range
2002-07-25 1:16 ` Robert Love
@ 2002-07-25 1:19 ` Andrew Morton
2002-07-25 1:21 ` Linus Torvalds
1 sibling, 0 replies; 8+ messages in thread
From: Andrew Morton @ 2002-07-25 1:19 UTC (permalink / raw)
To: Robert Love; +Cc: torvalds, riel, linux-kernel, linux-mm
Robert Love wrote:
>
> ...
> I hope it compiles to nothing! There is a false in an if... oh, wait,
> to preserve possible side-effects gcc will keep the need_resched() call
> so I guess we should reorder it as:
>
> if (preempt_count() == 1 && need_resched())
>
> Then we get "if (0 && ..)" which should hopefully be evaluated away.
> Then the inline is empty and nothing need be done.
I think someone changed the definition of preempt_count()
while you weren't looking.
Plus it's used as an lvalue :(
-
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] updated low-latency zap_page_range
2002-07-25 1:16 ` Robert Love
2002-07-25 1:19 ` Andrew Morton
@ 2002-07-25 1:21 ` Linus Torvalds
2002-07-25 1:29 ` Robert Love
2002-07-25 1:39 ` george anzinger
1 sibling, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2002-07-25 1:21 UTC (permalink / raw)
To: Robert Love; +Cc: Andrew Morton, riel, linux-kernel, linux-mm
On 24 Jul 2002, Robert Love wrote:
>
> if (preempt_count() == 1 && need_resched())
>
> Then we get "if (0 && ..)" which should hopefully be evaluated away.
I think preempt_count() is not unconditionally 0 for non-preemptible
kernels, so I don't think this is a compile-time constant.
That may be a bug in preempt_count(), of course.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] updated low-latency zap_page_range
2002-07-25 1:21 ` Linus Torvalds
@ 2002-07-25 1:29 ` Robert Love
2002-07-25 1:39 ` george anzinger
1 sibling, 0 replies; 8+ messages in thread
From: Robert Love @ 2002-07-25 1:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Andrew Morton, riel, linux-kernel, linux-mm
On Wed, 2002-07-24 at 18:21, Linus Torvalds wrote:
> > Then we get "if (0 && ..)" which should hopefully be evaluated away.
>
> I think preempt_count() is not unconditionally 0 for non-preemptible
> kernels, so I don't think this is a compile-time constant.
>
> That may be a bug in preempt_count(), of course.
Oh, it was until Ingo's IRQ rewrite... we do not want it unconditional
anymore. Here is a new version which defines an empty
"cond_resched_lock()" for !CONFIG_PREEMPT [1].
Good for you guys?
Robert Love
[1] You may ask, why not just have it drop the lock and reschedule
unconditionally if !CONFIG_PREEMPT? The answer is because in this
function, as in many others, we do not know the call chain and the locks
held. Most callers of zap_page_range() hold no locks on call -- but one
does. This is one reason I would prefer to just unconditionally drop
the locks and have each kernel do the right thing.
diff -urN linux-2.5.28/include/linux/sched.h linux/include/linux/sched.h
--- linux-2.5.28/include/linux/sched.h Wed Jul 24 14:03:20 2002
+++ linux/include/linux/sched.h Wed Jul 24 18:26:06 2002
@@ -888,6 +888,34 @@
__cond_resched();
}
+#ifdef CONFIG_PREEMPT
+
+/*
+ * cond_resched_lock() - if a reschedule is pending, drop the given lock,
+ * call schedule, and on return reacquire the lock.
+ *
+ * Note: this assumes the given lock is the _only_ held lock and otherwise
+ * you are not atomic. The kernel preemption counter gives us "free"
+ * checking that this is really the only lock held -- let's use it.
+ */
+static inline void cond_resched_lock(spinlock_t * lock)
+{
+ if (need_resched() && preempt_count() == 1) {
+ _raw_spin_unlock(lock);
+ preempt_enable_no_resched();
+ __cond_resched();
+ spin_lock(lock);
+ }
+}
+
+#else
+
+static inline void cond_resched_lock(spinlock_t * lock)
+{
+}
+
+#endif
+
/* Reevaluate whether the task has signals pending delivery.
This is required every time the blocked sigset_t changes.
Athread cathreaders should have t->sigmask_lock. */
diff -urN linux-2.5.28/mm/memory.c linux/mm/memory.c
--- linux-2.5.28/mm/memory.c Wed Jul 24 14:03:27 2002
+++ linux/mm/memory.c Wed Jul 24 18:24:48 2002
@@ -390,8 +390,8 @@
{
pgd_t * dir;
- if (address >= end)
- BUG();
+ BUG_ON(address >= end);
+
dir = pgd_offset(vma->vm_mm, address);
tlb_start_vma(tlb, vma);
do {
@@ -402,33 +402,43 @@
tlb_end_vma(tlb, vma);
}
-/*
- * remove user pages in a given range.
+#define ZAP_BLOCK_SIZE (256 * PAGE_SIZE) /* how big a chunk we loop over */
+
+/**
+ * zap_page_range - remove user pages in a given range
+ * @vma: vm_area_struct holding the applicable pages
+ * @address: starting address of pages to zap
+ * @size: number of bytes to zap
*/
void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size)
{
struct mm_struct *mm = vma->vm_mm;
mmu_gather_t *tlb;
- pgd_t * dir;
- unsigned long start = address, end = address + size;
+ unsigned long end, block;
- dir = pgd_offset(mm, address);
+ spin_lock(&mm->page_table_lock);
/*
- * This is a long-lived spinlock. That's fine.
- * There's no contention, because the page table
- * lock only protects against kswapd anyway, and
- * even if kswapd happened to be looking at this
- * process we _want_ it to get stuck.
+ * This was once a long-held spinlock. Now we break the
+ * work up into ZAP_BLOCK_SIZE units and relinquish the
+ * lock after each interation. This drastically lowers
+ * lock contention and allows for a preemption point.
*/
- if (address >= end)
- BUG();
- spin_lock(&mm->page_table_lock);
- flush_cache_range(vma, address, end);
+ while (size) {
+ block = (size > ZAP_BLOCK_SIZE) ? ZAP_BLOCK_SIZE : size;
+ end = address + block;
+
+ flush_cache_range(vma, address, end);
+ tlb = tlb_gather_mmu(mm, 0);
+ unmap_page_range(tlb, vma, address, end);
+ tlb_finish_mmu(tlb, address, end);
+
+ cond_resched_lock(&mm->page_table_lock);
+
+ address += block;
+ size -= block;
+ }
- tlb = tlb_gather_mmu(mm, 0);
- unmap_page_range(tlb, vma, address, end);
- tlb_finish_mmu(tlb, start, end);
spin_unlock(&mm->page_table_lock);
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] updated low-latency zap_page_range
2002-07-25 1:21 ` Linus Torvalds
2002-07-25 1:29 ` Robert Love
@ 2002-07-25 1:39 ` george anzinger
2002-07-25 5:19 ` Linus Torvalds
1 sibling, 1 reply; 8+ messages in thread
From: george anzinger @ 2002-07-25 1:39 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Robert Love, Andrew Morton, riel, linux-kernel, linux-mm
Linus Torvalds wrote:
>
> On 24 Jul 2002, Robert Love wrote:
> >
> > if (preempt_count() == 1 && need_resched())
> >
> > Then we get "if (0 && ..)" which should hopefully be evaluated away.
>
> I think preempt_count() is not unconditionally 0 for non-preemptible
> kernels, so I don't think this is a compile-time constant.
>
> That may be a bug in preempt_count(), of course.
>
Didn't we just put bh_count and irq_count in the same
word???
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] updated low-latency zap_page_range
2002-07-25 1:39 ` george anzinger
@ 2002-07-25 5:19 ` Linus Torvalds
0 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2002-07-25 5:19 UTC (permalink / raw)
To: george anzinger; +Cc: Robert Love, Andrew Morton, riel, linux-kernel, linux-mm
On Wed, 24 Jul 2002, george anzinger wrote:
> >
> > That may be a bug in preempt_count(), of course.
> >
> Didn't we just put bh_count and irq_count in the same
> word???
Yes. But that doesn't mean that the "preempt_count()" macro necessarily
needs to reflect that.
In particular, we have separate macros for getting the irq bits from that
shared word ("irq_count()" etc). Right now they happen to use the
"preempt_count()" macro, but that's not really fundamental.
No big deal either way, I suspect.
Linus
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2002-07-25 5:19 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-07-25 0:29 [PATCH] updated low-latency zap_page_range Robert Love
2002-07-25 0:45 ` Andrew Morton
2002-07-25 1:16 ` Robert Love
2002-07-25 1:19 ` Andrew Morton
2002-07-25 1:21 ` Linus Torvalds
2002-07-25 1:29 ` Robert Love
2002-07-25 1:39 ` george anzinger
2002-07-25 5:19 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox