Re: [PATCH] mm: disable preemption in apply_to_pte

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
       [not found] ` <4994C052.9060907@goop.org>
@ 2009-02-13  0:55   ` Andrew Morton
  2009-02-13  1:39     ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2009-02-13  0:55 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, Nick Piggin, linux-mm

On Thu, 12 Feb 2009 16:35:30 -0800
Jeremy Fitzhardinge <jeremy@goop.org> wrote:

> Jeremy Fitzhardinge wrote:
> > commit 79d9c90453a7bc9e7613ae889a97ff6b44ab8380
> 
> Scratch that.

Whew.  Version 1 did an obvious GFP_KERNEL allocation inside
preempt_disable().

>  This instead.
>     J
> 
>     mm: disable preemption in apply_to_pte_range
>     
>     Lazy mmu mode needs preemption disabled, so if we're apply to
>     init_mm (which doesn't require any pte locks), then explicitly
>     disable preemption.  (Do it unconditionally after checking we've
>     successfully done the allocation to simplify the error handling.)
>     
>     Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index baa999e..b80cc31 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1718,6 +1718,7 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
>  
>  	BUG_ON(pmd_huge(*pmd));
>  
> +	preempt_disable();
>  	arch_enter_lazy_mmu_mode();
>  
>  	token = pmd_pgtable(*pmd);
> @@ -1729,6 +1730,7 @@ static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd,
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
>  
>  	arch_leave_lazy_mmu_mode();
> +	preempt_enable();
>  
>  	if (mm != &init_mm)
>  		pte_unmap_unlock(pte-1, ptl);
> 

This weakens the apply_to_page_range() utility by newly requiring that
the callback function be callable under preempt_disable() if the target
mm is init_mm.  I guess we can live with that.

It's OK for the two present in-tree callers.  There might of course be
out-of-tree callers which break, but it is unlikely.

The patch should include a comment explaining why there is a random
preempt_disable() in this function.

Why is apply_to_page_range() exported to modules, btw?  I can find no
modules which need it.  Unexporting that function would make the
proposed weakening even less serious.

The patch assumes that
arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() must have
preemption disabled for all architectures.  Is this a sensible
assumption?

If so, should we do the preempt_disable/enable within those functions? 
Probably not worth the cost, I guess..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
  2009-02-13  0:55   ` [PATCH] mm: disable preemption in apply_to_pte_range Andrew Morton
@ 2009-02-13  1:39     ` Jeremy Fitzhardinge
  2009-02-13 11:48       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-13  1:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Nick Piggin, linux-mm

Andrew Morton wrote:
> This weakens the apply_to_page_range() utility by newly requiring that
> the callback function be callable under preempt_disable() if the target
> mm is init_mm.  I guess we can live with that.
>
> It's OK for the two present in-tree callers.  There might of course be
> out-of-tree callers which break, but it is unlikely.
>
> The patch should include a comment explaining why there is a random
> preempt_disable() in this function.
>   

I cuddled them up to their corresponding arch_X_lazy_mmu_mode calls to 
get this across, but I guess some prose would be helpful here.

> Why is apply_to_page_range() exported to modules, btw?  I can find no
> modules which need it.  Unexporting that function would make the
> proposed weakening even less serious.
>   

I have some yet-to-be upstreamed code that can use it from modules.

> The patch assumes that
> arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() must have
> preemption disabled for all architectures.  Is this a sensible
> assumption?
>   

In general the model for lazy updates is that you're batching the 
updates in some queue somewhere, which is almost certainly a piece of 
percpu state being maintained by someone.  Its therefore broken and/or 
meaningless to have the code making the updates wandering between cpus 
for the duration of the lazy updates.

> If so, should we do the preempt_disable/enable within those functions? 
> Probably not worth the cost, I guess.

The specific rules are that 
arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() require you to be 
holding the appropriate pte locks for the ptes you're updating, so 
preemption is naturally disabled in that case.

This all goes a bit strange with init_mm's non-requirement for taking 
pte locks.  The caller has to arrange for some kind of serialization on 
updating the range in question, and that could be a mutex.  Explicitly 
disabling preemption in enter_lazy_mmu_mode would make sense for this 
case, but it would be redundant for the common case of batched updates 
to usermode ptes.

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
  2009-02-13  1:39     ` Jeremy Fitzhardinge
@ 2009-02-13 11:48       ` Peter Zijlstra
  2009-02-13 13:30         ` Nick Piggin
  2009-02-13 17:24         ` Jeremy Fitzhardinge
  0 siblings, 2 replies; 11+ messages in thread
From: Peter Zijlstra @ 2009-02-13 11:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andrew Morton, linux-kernel, Nick Piggin, linux-mm, Ingo Molnar

On Thu, 2009-02-12 at 17:39 -0800, Jeremy Fitzhardinge wrote:

> In general the model for lazy updates is that you're batching the 
> updates in some queue somewhere, which is almost certainly a piece of 
> percpu state being maintained by someone.  Its therefore broken and/or 
> meaningless to have the code making the updates wandering between cpus 
> for the duration of the lazy updates.
> 
> > If so, should we do the preempt_disable/enable within those functions? 
> > Probably not worth the cost, I guess.
> 
> The specific rules are that 
> arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() require you to be 
> holding the appropriate pte locks for the ptes you're updating, so 
> preemption is naturally disabled in that case.

Right, except on -rt where the pte lock is a mutex.

> This all goes a bit strange with init_mm's non-requirement for taking 
> pte locks.  The caller has to arrange for some kind of serialization on 
> updating the range in question, and that could be a mutex.  Explicitly 
> disabling preemption in enter_lazy_mmu_mode would make sense for this 
> case, but it would be redundant for the common case of batched updates 
> to usermode ptes.

I really utterly hate how you just plonk preempt_disable() in there
unconditionally and without very clear comments on how and why.

I'd rather we'd fix up the init_mm to also have a pte lock.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
  2009-02-13 11:48       ` Peter Zijlstra
@ 2009-02-13 13:30         ` Nick Piggin
  2009-02-13 14:16           ` Peter Zijlstra
  2009-02-13 17:24         ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 11+ messages in thread
From: Nick Piggin @ 2009-02-13 13:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jeremy Fitzhardinge, Andrew Morton, linux-kernel, linux-mm, Ingo Molnar

On Friday 13 February 2009 22:48:30 Peter Zijlstra wrote:
> On Thu, 2009-02-12 at 17:39 -0800, Jeremy Fitzhardinge wrote:
> > In general the model for lazy updates is that you're batching the
> > updates in some queue somewhere, which is almost certainly a piece of
> > percpu state being maintained by someone.  Its therefore broken and/or
> > meaningless to have the code making the updates wandering between cpus
> > for the duration of the lazy updates.
> >
> > > If so, should we do the preempt_disable/enable within those functions?
> > > Probably not worth the cost, I guess.
> >
> > The specific rules are that
> > arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() require you to be
> > holding the appropriate pte locks for the ptes you're updating, so
> > preemption is naturally disabled in that case.
>
> Right, except on -rt where the pte lock is a mutex.
>
> > This all goes a bit strange with init_mm's non-requirement for taking
> > pte locks.  The caller has to arrange for some kind of serialization on
> > updating the range in question, and that could be a mutex.  Explicitly
> > disabling preemption in enter_lazy_mmu_mode would make sense for this
> > case, but it would be redundant for the common case of batched updates
> > to usermode ptes.
>
> I really utterly hate how you just plonk preempt_disable() in there
> unconditionally and without very clear comments on how and why.

And even on mainline kernels, builds without the lazy mmu mode stuff
don't need preemption disabled here either, so it is technically a
regression in those cases too.

> I'd rather we'd fix up the init_mm to also have a pte lock.

Well that wouldn't fix -rt; there would need to be a preempt_disable
within arch_enter_lazy_mmu_mode(), which I think is the cleanest
solution.

And, hmm... this makes me wonder what is being applied to what? Are
the callers of apply_to_pte_range on init_mm doing the correct
locking? I'd not be surprised if not. Jeremy did you notice any
particular backtraces?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
  2009-02-13 13:30         ` Nick Piggin
@ 2009-02-13 14:16           ` Peter Zijlstra
  2009-02-13 14:30             ` Nick Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2009-02-13 14:16 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Jeremy Fitzhardinge, Andrew Morton, linux-kernel, linux-mm, Ingo Molnar

On Sat, 2009-02-14 at 00:30 +1100, Nick Piggin wrote:
> On Friday 13 February 2009 22:48:30 Peter Zijlstra wrote:
> > On Thu, 2009-02-12 at 17:39 -0800, Jeremy Fitzhardinge wrote:
> > > In general the model for lazy updates is that you're batching the
> > > updates in some queue somewhere, which is almost certainly a piece of
> > > percpu state being maintained by someone.  Its therefore broken and/or
> > > meaningless to have the code making the updates wandering between cpus
> > > for the duration of the lazy updates.
> > >
> > > > If so, should we do the preempt_disable/enable within those functions?
> > > > Probably not worth the cost, I guess.
> > >
> > > The specific rules are that
> > > arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() require you to be
> > > holding the appropriate pte locks for the ptes you're updating, so
> > > preemption is naturally disabled in that case.
> >
> > Right, except on -rt where the pte lock is a mutex.
> >
> > > This all goes a bit strange with init_mm's non-requirement for taking
> > > pte locks.  The caller has to arrange for some kind of serialization on
> > > updating the range in question, and that could be a mutex.  Explicitly
> > > disabling preemption in enter_lazy_mmu_mode would make sense for this
> > > case, but it would be redundant for the common case of batched updates
> > > to usermode ptes.
> >
> > I really utterly hate how you just plonk preempt_disable() in there
> > unconditionally and without very clear comments on how and why.
> 
> And even on mainline kernels, builds without the lazy mmu mode stuff
> don't need preemption disabled here either, so it is technically a
> regression in those cases too.

Well, normally we'd be holding the pte lock, which on regular kernels
already disable preemption, as Jeremy noted. So in that respect it
doesn't change things too much.

Its just that slapping preempt_disable()s around like there's not
tomorrow is horridly annoying, its like using the BKL -- there's no data
affinity what so ever, so trying to unravel the dependencies a year
later when you notice its a latency concern is a massive pain in the
backside.

> > I'd rather we'd fix up the init_mm to also have a pte lock.
> 
> Well that wouldn't fix -rt; there would need to be a preempt_disable
> within arch_enter_lazy_mmu_mode(), which I think is the cleanest
> solution.

Hmm, so you're saying we need to be cpu-affine for the lazy mmu stuff?
Otherwise a -rt would just convert the init_mm pte lock to a mutex along
with all other pte locks and there'd be no issue.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
  2009-02-13 14:16           ` Peter Zijlstra
@ 2009-02-13 14:30             ` Nick Piggin
  2009-02-13 14:38               ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Nick Piggin @ 2009-02-13 14:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jeremy Fitzhardinge, Andrew Morton, linux-kernel, linux-mm, Ingo Molnar

On Saturday 14 February 2009 01:16:51 Peter Zijlstra wrote:
> On Sat, 2009-02-14 at 00:30 +1100, Nick Piggin wrote:
> > On Friday 13 February 2009 22:48:30 Peter Zijlstra wrote:
> > > On Thu, 2009-02-12 at 17:39 -0800, Jeremy Fitzhardinge wrote:
> > > > In general the model for lazy updates is that you're batching the
> > > > updates in some queue somewhere, which is almost certainly a piece of
> > > > percpu state being maintained by someone.  Its therefore broken
> > > > and/or meaningless to have the code making the updates wandering
> > > > between cpus for the duration of the lazy updates.
> > > >
> > > > > If so, should we do the preempt_disable/enable within those
> > > > > functions? Probably not worth the cost, I guess.
> > > >
> > > > The specific rules are that
> > > > arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() require you to
> > > > be holding the appropriate pte locks for the ptes you're updating, so
> > > > preemption is naturally disabled in that case.
> > >
> > > Right, except on -rt where the pte lock is a mutex.
> > >
> > > > This all goes a bit strange with init_mm's non-requirement for taking
> > > > pte locks.  The caller has to arrange for some kind of serialization
> > > > on updating the range in question, and that could be a mutex. 
> > > > Explicitly disabling preemption in enter_lazy_mmu_mode would make
> > > > sense for this case, but it would be redundant for the common case of
> > > > batched updates to usermode ptes.
> > >
> > > I really utterly hate how you just plonk preempt_disable() in there
> > > unconditionally and without very clear comments on how and why.
> >
> > And even on mainline kernels, builds without the lazy mmu mode stuff
> > don't need preemption disabled here either, so it is technically a
> > regression in those cases too.
>
> Well, normally we'd be holding the pte lock, which on regular kernels
> already disable preemption, as Jeremy noted. So in that respect it
> doesn't change things too much.

But not (necessarily) in the init_mm case.


> Its just that slapping preempt_disable()s around like there's not
> tomorrow is horridly annoying, its like using the BKL -- there's no data
> affinity what so ever, so trying to unravel the dependencies a year
> later when you notice its a latency concern is a massive pain in the
> backside.

Or like using memory barriers. Any of them are OK if they're properly
commented though, I guess.


> > > I'd rather we'd fix up the init_mm to also have a pte lock.
> >
> > Well that wouldn't fix -rt; there would need to be a preempt_disable
> > within arch_enter_lazy_mmu_mode(), which I think is the cleanest
> > solution.
>
> Hmm, so you're saying we need to be cpu-affine for the lazy mmu stuff?
> Otherwise a -rt would just convert the init_mm pte lock to a mutex along
> with all other pte locks and there'd be no issue.

Well I don't see any other reason why it should have to use preempt_disable.
Not necessarily just cpu-affine, but perhaps it is using per-cpu data in
non-trivial way so cannot get switched out either.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
  2009-02-13 14:30             ` Nick Piggin
@ 2009-02-13 14:38               ` Peter Zijlstra
  2009-02-13 17:41                 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2009-02-13 14:38 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Jeremy Fitzhardinge, Andrew Morton, linux-kernel, linux-mm, Ingo Molnar

On Sat, 2009-02-14 at 01:30 +1100, Nick Piggin wrote:
> On Saturday 14 February 2009 01:16:51 Peter Zijlstra wrote:
> > On Sat, 2009-02-14 at 00:30 +1100, Nick Piggin wrote:
> > > On Friday 13 February 2009 22:48:30 Peter Zijlstra wrote:
> > > > On Thu, 2009-02-12 at 17:39 -0800, Jeremy Fitzhardinge wrote:
> > > > > In general the model for lazy updates is that you're batching the
> > > > > updates in some queue somewhere, which is almost certainly a piece of
> > > > > percpu state being maintained by someone.  Its therefore broken
> > > > > and/or meaningless to have the code making the updates wandering
> > > > > between cpus for the duration of the lazy updates.
> > > > >
> > > > > > If so, should we do the preempt_disable/enable within those
> > > > > > functions? Probably not worth the cost, I guess.
> > > > >
> > > > > The specific rules are that
> > > > > arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() require you to
> > > > > be holding the appropriate pte locks for the ptes you're updating, so
> > > > > preemption is naturally disabled in that case.
> > > >
> > > > Right, except on -rt where the pte lock is a mutex.
> > > >
> > > > > This all goes a bit strange with init_mm's non-requirement for taking
> > > > > pte locks.  The caller has to arrange for some kind of serialization
> > > > > on updating the range in question, and that could be a mutex. 
> > > > > Explicitly disabling preemption in enter_lazy_mmu_mode would make
> > > > > sense for this case, but it would be redundant for the common case of
> > > > > batched updates to usermode ptes.
> > > >
> > > > I really utterly hate how you just plonk preempt_disable() in there
> > > > unconditionally and without very clear comments on how and why.
> > >
> > > And even on mainline kernels, builds without the lazy mmu mode stuff
> > > don't need preemption disabled here either, so it is technically a
> > > regression in those cases too.
> >
> > Well, normally we'd be holding the pte lock, which on regular kernels
> > already disable preemption, as Jeremy noted. So in that respect it
> > doesn't change things too much.
> 
> But not (necessarily) in the init_mm case.

Right.

> > Its just that slapping preempt_disable()s around like there's not
> > tomorrow is horridly annoying, its like using the BKL -- there's no data
> > affinity what so ever, so trying to unravel the dependencies a year
> > later when you notice its a latency concern is a massive pain in the
> > backside.
> 
> Or like using memory barriers. Any of them are OK if they're properly
> commented though, I guess.

Yes, given sufficient comments and a good reason most things can be
gotten away with ;-)

> > > > I'd rather we'd fix up the init_mm to also have a pte lock.
> > >
> > > Well that wouldn't fix -rt; there would need to be a preempt_disable
> > > within arch_enter_lazy_mmu_mode(), which I think is the cleanest
> > > solution.
> >
> > Hmm, so you're saying we need to be cpu-affine for the lazy mmu stuff?
> > Otherwise a -rt would just convert the init_mm pte lock to a mutex along
> > with all other pte locks and there'd be no issue.
> 
> Well I don't see any other reason why it should have to use preempt_disable.
> Not necessarily just cpu-affine, but perhaps it is using per-cpu data in
> non-trivial way so cannot get switched out either.

If the lazy mmu code relies on per-cpu data, then it should be the lazy
mmu's responsibility to ensure stuff is properly serialized. Eg. it
should do get_cpu_var() and put_cpu_var().

Those constructs can usually be converted to preemptable variants quite
easily, as it clearly shows what data needs to be protected.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
  2009-02-13 11:48       ` Peter Zijlstra
  2009-02-13 13:30         ` Nick Piggin
@ 2009-02-13 17:24         ` Jeremy Fitzhardinge
  2009-02-14  9:56           ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-13 17:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, linux-kernel, Nick Piggin, linux-mm, Ingo Molnar

Peter Zijlstra wrote:
>> The specific rules are that 
>> arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() require you to be 
>> holding the appropriate pte locks for the ptes you're updating, so 
>> preemption is naturally disabled in that case.
>>     
>
> Right, except on -rt where the pte lock is a mutex.
>   

Hm, that's interesting.  The requirement isn't really "no preemption", 
its "must not migrate to another cpu".  Is there a better way to express 
that?

>> This all goes a bit strange with init_mm's non-requirement for taking 
>> pte locks.  The caller has to arrange for some kind of serialization on 
>> updating the range in question, and that could be a mutex.  Explicitly 
>> disabling preemption in enter_lazy_mmu_mode would make sense for this 
>> case, but it would be redundant for the common case of batched updates 
>> to usermode ptes.
>>     
>
> I really utterly hate how you just plonk preempt_disable() in there
> unconditionally and without very clear comments on how and why.
>   

Well, there's the commit comment.  They're important, right?  That's why 
we spend time writing good commit comments?  So they get read?  ;)

OK, I'll add a comment, particularly if there's a more precise way to 
express "no migration".

> I'd rather we'd fix up the init_mm to also have a pte lock.
>   
Yes, I don't like the init_mm-exceptionalism either.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
  2009-02-13 14:38               ` Peter Zijlstra
@ 2009-02-13 17:41                 ` Jeremy Fitzhardinge
  2009-02-14  9:46                   ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Jeremy Fitzhardinge @ 2009-02-13 17:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Nick Piggin, Andrew Morton, linux-kernel, linux-mm, Ingo Molnar

Peter Zijlstra wrote:
> If the lazy mmu code relies on per-cpu data, then it should be the lazy
> mmu's responsibility to ensure stuff is properly serialized. Eg. it
> should do get_cpu_var() and put_cpu_var().
>
> Those constructs can usually be converted to preemptable variants quite
> easily, as it clearly shows what data needs to be protected.
>   

At the moment the lazy update stuff is inherently cpu-affine.  The basic 
model is that you can amortize the cost of individual update operations 
(via hypercall, for example) by batching them up.  That batch is almost 
certainly a piece of percpu state (in Xen's case its maintained on the 
kernel side as per-cpu data, but in VMI it happens somewhere under their 
ABI), and so we can't allow switching to another cpu while lazy update 
mode is active.

Preemption is also problematic because if we're doing lazy updates and 
we switch to another task, it will likely get very confused if its 
pagetable updates get deferred until some arbitrary point in the future...

So at the moment, we just disable preemption, and take advantage of the 
existing work to make sure pagetable updates are not non-preemptible for 
too long.  This has been fine so far, because almost all the work on 
using lazy mmu updates has focused on usermode mappings.

But I can see how this is problematic from your perspective.  One thing 
we could consider is making the lazy mmu mode a per-task property, so if 
we get preempted we can flush any pending changes and safely switch to 
another task, and then reenable it when we get scheduled in again.  
(This may be already possible with the existing paravirt-ops hooks in 
switch_to.)

In this specific case, if the lazy mmu updates / non-preemptable section 
is really causing heartburn, we can just back it out for now.

    J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
  2009-02-13 17:41                 ` Jeremy Fitzhardinge
@ 2009-02-14  9:46                   ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2009-02-14  9:46 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, Andrew Morton, linux-kernel, linux-mm, Ingo Molnar

On Fri, 2009-02-13 at 09:41 -0800, Jeremy Fitzhardinge wrote:
> Peter Zijlstra wrote:
> > If the lazy mmu code relies on per-cpu data, then it should be the lazy
> > mmu's responsibility to ensure stuff is properly serialized. Eg. it
> > should do get_cpu_var() and put_cpu_var().
> >
> > Those constructs can usually be converted to preemptable variants quite
> > easily, as it clearly shows what data needs to be protected.
> >   
> 
> At the moment the lazy update stuff is inherently cpu-affine.  The basic 
> model is that you can amortize the cost of individual update operations 
> (via hypercall, for example) by batching them up.  That batch is almost 
> certainly a piece of percpu state (in Xen's case its maintained on the 
> kernel side as per-cpu data,

What we often do for -rt is put a lock in such per-cpu data, and take
the data that is cpu-local when we start and aquire the lock, thereafter
we keep operating on that data no matter where we get migrated to.

That way you're optimistically per-cpu but still freely schedulable.

>  but in VMI it happens somewhere under their 
> ABI), and so we can't allow switching to another cpu while lazy update 
> mode is active.

Sounds like VMI is a bit of a problem there, luckily no sane person uses
that since its all closed source cruft ;-)

It might just end up meaning you cannot run a RT kernel on a VMI host --
something that's daft anyway.

> Preemption is also problematic because if we're doing lazy updates and 
> we switch to another task, it will likely get very confused if its 
> pagetable updates get deferred until some arbitrary point in the future...

Who will get confused? The preempted task doesn't care, as its not
running, and once it gets back to running we'll happily continue poking
at the page-tables and complete the operation like normal.

> So at the moment, we just disable preemption, and take advantage of the 
> existing work to make sure pagetable updates are not non-preemptible for 
> too long.  This has been fine so far, because almost all the work on 
> using lazy mmu updates has focused on usermode mappings.
> 
> But I can see how this is problematic from your perspective.  One thing 
> we could consider is making the lazy mmu mode a per-task property, so if 
> we get preempted we can flush any pending changes and safely switch to 
> another task, and then reenable it when we get scheduled in again.  
> (This may be already possible with the existing paravirt-ops hooks in 
> switch_to.)

That sounds like a fine option, we have these preempt notifiers for
exactly such cases I think.

> In this specific case, if the lazy mmu updates / non-preemptable section 
> is really causing heartburn, we can just back it out for now.

I'm not sure, I haven't ran an -rt kernel with your patch applied, so
I'm not sure it will explode. However I do dislike simply requiring
preemption disabled there, when its clearly something that's specific to
the lazy mmu stuff.

So I think that no matter how we're going to do it, it should be done in
the lazy mmu hooks. If that requires changing the
arch_{enter_,leave}lazy_mmu_mode() calls to change signature to take a
mm_struct pointer, then so be it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: disable preemption in apply_to_pte_range
  2009-02-13 17:24         ` Jeremy Fitzhardinge
@ 2009-02-14  9:56           ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2009-02-14  9:56 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Andrew Morton, linux-kernel, Nick Piggin, linux-mm, Ingo Molnar

On Fri, 2009-02-13 at 09:24 -0800, Jeremy Fitzhardinge wrote:
> Peter Zijlstra wrote:
> >> The specific rules are that 
> >> arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() require you to be 
> >> holding the appropriate pte locks for the ptes you're updating, so 
> >> preemption is naturally disabled in that case.
> >>     
> >
> > Right, except on -rt where the pte lock is a mutex.
> >   
> 
> Hm, that's interesting.  The requirement isn't really "no preemption", 
> its "must not migrate to another cpu".  Is there a better way to express 
> that?

Not really, in the past something like migrate_disable() has been
proposed, however that's problematic in that it can generate latencies
that are _very_ hard to track down, so we've always resisted that and
found other ways.

> >> This all goes a bit strange with init_mm's non-requirement for taking 
> >> pte locks.  The caller has to arrange for some kind of serialization on 
> >> updating the range in question, and that could be a mutex.  Explicitly 
> >> disabling preemption in enter_lazy_mmu_mode would make sense for this 
> >> case, but it would be redundant for the common case of batched updates 
> >> to usermode ptes.
> >>     
> >
> > I really utterly hate how you just plonk preempt_disable() in there
> > unconditionally and without very clear comments on how and why.
> >   
> 
> Well, there's the commit comment.  They're important, right?  That's why 
> we spend time writing good commit comments?  So they get read?  ;)

Andrew taught me that indeed, but still when looking at the code its
good to have some text there explaining things too.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-02-14  9:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4994BCF0.30005@goop.org>
     [not found] ` <4994C052.9060907@goop.org>
2009-02-13  0:55   ` [PATCH] mm: disable preemption in apply_to_pte_range Andrew Morton
2009-02-13  1:39     ` Jeremy Fitzhardinge
2009-02-13 11:48       ` Peter Zijlstra
2009-02-13 13:30         ` Nick Piggin
2009-02-13 14:16           ` Peter Zijlstra
2009-02-13 14:30             ` Nick Piggin
2009-02-13 14:38               ` Peter Zijlstra
2009-02-13 17:41                 ` Jeremy Fitzhardinge
2009-02-14  9:46                   ` Peter Zijlstra
2009-02-13 17:24         ` Jeremy Fitzhardinge
2009-02-14  9:56           ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox