linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH mm] fix swapoff breakage; however...
@ 2007-09-17 18:57 Hugh Dickins
  2007-09-17 19:12 ` Balbir Singh
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2007-09-17 18:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Balbir Singh, linux-kernel, linux-mm

rc4-mm1's memory-controller-memory-accounting-v7.patch broke swapoff:
it extended unuse_pte_range's boolean "found" return code to allow an
error return too; but ended up returning found (1) as an error.
Replace that by success (0) before it gets to the upper level.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
---
More fundamentally, it looks like any container brought over its limit in
unuse_pte will abort swapoff: that doesn't doesn't seem "contained" to me.
Maybe unuse_pte should just let containers go over their limits without
error?  Or swap should be counted along with RSS?  Needs reconsideration.

 mm/swapfile.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- 2.6.23-rc4-mm1/mm/swapfile.c	2007-09-07 13:09:42.000000000 +0100
+++ linux/mm/swapfile.c	2007-09-17 15:14:47.000000000 +0100
@@ -642,7 +642,7 @@ static int unuse_mm(struct mm_struct *mm
 			break;
 	}
 	up_read(&mm->mmap_sem);
-	return ret;
+	return (ret < 0)? ret: 0;
 }
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH mm] fix swapoff breakage; however...
  2007-09-17 18:57 [PATCH mm] fix swapoff breakage; however Hugh Dickins
@ 2007-09-17 19:12 ` Balbir Singh
  2007-09-17 19:51   ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Balbir Singh @ 2007-09-17 19:12 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel, linux-mm

Hugh Dickins wrote:
> rc4-mm1's memory-controller-memory-accounting-v7.patch broke swapoff:
> it extended unuse_pte_range's boolean "found" return code to allow an
> error return too; but ended up returning found (1) as an error.
> Replace that by success (0) before it gets to the upper level.
> 
> Signed-off-by: Hugh Dickins <hugh@veritas.com>
> ---
> More fundamentally, it looks like any container brought over its limit in
> unuse_pte will abort swapoff: that doesn't doesn't seem "contained" to me.
> Maybe unuse_pte should just let containers go over their limits without
> error?  Or swap should be counted along with RSS?  Needs reconsideration.
> 
>  mm/swapfile.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- 2.6.23-rc4-mm1/mm/swapfile.c	2007-09-07 13:09:42.000000000 +0100
> +++ linux/mm/swapfile.c	2007-09-17 15:14:47.000000000 +0100
> @@ -642,7 +642,7 @@ static int unuse_mm(struct mm_struct *mm
>  			break;
>  	}
>  	up_read(&mm->mmap_sem);
> -	return ret;
> +	return (ret < 0)? ret: 0;

Thanks, for the catching this. There are three possible solutions

1. Account each RSS page with a probable swap cache page, double
   the RSS accounting to ensure that swapoff will not fail.
2. Account for the RSS page just once, do not account swap cache
   pages
3. Follow your suggestion and let containers go over their limits
   without error

With the current approach, a container over it's limit will not
be able to call swapoff successfully, is that bad?

We plan to implement per container/per cpuset swap in the future.
Given that, isn't this expected functionality. You are over it's
limit cannot really swapoff a swap device. If we allow pages to
be unused, we could end up with a container that could exceed
it's limit by a significant amount by calling swapoff.


>  }
> 
>  /*


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH mm] fix swapoff breakage; however...
  2007-09-17 19:12 ` Balbir Singh
@ 2007-09-17 19:51   ` Hugh Dickins
  2007-09-17 20:48     ` Balbir Singh
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2007-09-17 19:51 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Andrew Morton, linux-kernel, linux-mm

On Tue, 18 Sep 2007, Balbir Singh wrote:
> Hugh Dickins wrote:
> > More fundamentally, it looks like any container brought over its limit in
> > unuse_pte will abort swapoff: that doesn't doesn't seem "contained" to me.
> > Maybe unuse_pte should just let containers go over their limits without
> > error?  Or swap should be counted along with RSS?  Needs reconsideration.
> 
> Thanks, for the catching this. There are three possible solutions
> 
> 1. Account each RSS page with a probable swap cache page, double
>    the RSS accounting to ensure that swapoff will not fail.
> 2. Account for the RSS page just once, do not account swap cache
>    pages

Neither of those makes sense to me, but I may be misunderstanding.

What would make sense is (what I meant when I said swap counted
along with RSS) not to count pages out and back in as they are
go out to swap and back in, just keep count of instantiated pages

I say "make sense" meaning that the numbers could be properly
accounted; but it may well be unpalatable to treat fast RAM as
equal to slow swap.

> 3. Follow your suggestion and let containers go over their limits
>    without error
> 
> With the current approach, a container over it's limit will not
> be able to call swapoff successfully, is that bad?

That's not so bad.  What's bad is that anyone else with the
CAP_SYS_ADMIN to swapoff is liable to be prevented by containers
going over their limits.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH mm] fix swapoff breakage; however...
  2007-09-17 19:51   ` Hugh Dickins
@ 2007-09-17 20:48     ` Balbir Singh
  2007-09-17 22:36       ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Balbir Singh @ 2007-09-17 20:48 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel, linux-mm

Hugh Dickins wrote:
> On Tue, 18 Sep 2007, Balbir Singh wrote:
>> Hugh Dickins wrote:
>>> More fundamentally, it looks like any container brought over its limit in
>>> unuse_pte will abort swapoff: that doesn't doesn't seem "contained" to me.
>>> Maybe unuse_pte should just let containers go over their limits without
>>> error?  Or swap should be counted along with RSS?  Needs reconsideration.
>> Thanks, for the catching this. There are three possible solutions
>>
>> 1. Account each RSS page with a probable swap cache page, double
>>    the RSS accounting to ensure that swapoff will not fail.
>> 2. Account for the RSS page just once, do not account swap cache
>>    pages
> 
> Neither of those makes sense to me, but I may be misunderstanding.
> 
> What would make sense is (what I meant when I said swap counted
> along with RSS) not to count pages out and back in as they are
> go out to swap and back in, just keep count of instantiated pages
> 

I am not sure how you define instantiated pages. I suspect that
you mean RSS + pages swapped out (swap_pte)?

> I say "make sense" meaning that the numbers could be properly
> accounted; but it may well be unpalatable to treat fast RAM as
> equal to slow swap.
> 
>> 3. Follow your suggestion and let containers go over their limits
>>    without error
>>
>> With the current approach, a container over it's limit will not
>> be able to call swapoff successfully, is that bad?
> 
> That's not so bad.  What's bad is that anyone else with the
> CAP_SYS_ADMIN to swapoff is liable to be prevented by containers
> going over their limits.
> 

If a swapoff is going to push a container over it's limit, then
we break the container and the isolation it provides. Upon swapoff
failure, may be we could get the container to print a nice
little warning so that anyone else with CAP_SYS_ADMIN can fix the
container limit and retry swapoff.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH mm] fix swapoff breakage; however...
  2007-09-17 20:48     ` Balbir Singh
@ 2007-09-17 22:36       ` Hugh Dickins
  2007-09-18  4:23         ` Balbir Singh
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2007-09-17 22:36 UTC (permalink / raw)
  To: Balbir Singh; +Cc: Andrew Morton, linux-kernel, linux-mm

On Tue, 18 Sep 2007, Balbir Singh wrote:
> Hugh Dickins wrote:
> > 
> > What would make sense is (what I meant when I said swap counted
> > along with RSS) not to count pages out and back in as they are
> > go out to swap and back in, just keep count of instantiated pages
> > 
> 
> I am not sure how you define instantiated pages. I suspect that
> you mean RSS + pages swapped out (swap_pte)?

That's it.  (Whereas file pages counted out when paged out,
then counted back in when paged back in.)

> If a swapoff is going to push a container over it's limit, then
> we break the container and the isolation it provides.

Is it just my traditional bias, that makes me prefer you break
your container than my swapoff?  I'm not sure.

> Upon swapoff
> failure, may be we could get the container to print a nice
> little warning so that anyone else with CAP_SYS_ADMIN can fix the
> container limit and retry swapoff.

And then they hit the next one... rather like trying to work out
the dependencies of packages for oneself: a very tedious process.

If the swapoff succeeds, that does mean there was actually room
in memory (+ other swap) for everyone, even if some have gone over
their nominal limits.  (But if the swapoff runs out of memory in
the middle, yes, it might well have assigned the memory unfairly.)

The appropriate answer may depend on what you do when a container
tries to fault in one more page than its limit.  Apparently just
fail it (no attempt to page out another page from that container).

So, if the whole system is under memory pressure, kswapd will
be keeping the RSS of all tasks low, and they won't reach their
limits; whereas if the system is not under memory pressure,
tasks will easily approach their limits and so fail.

Please tell me my understanding is wrong!

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH mm] fix swapoff breakage; however...
  2007-09-17 22:36       ` Hugh Dickins
@ 2007-09-18  4:23         ` Balbir Singh
  0 siblings, 0 replies; 6+ messages in thread
From: Balbir Singh @ 2007-09-18  4:23 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel, linux-mm

Hugh Dickins wrote:
> On Tue, 18 Sep 2007, Balbir Singh wrote:
>> Hugh Dickins wrote:
>>> What would make sense is (what I meant when I said swap counted
>>> along with RSS) not to count pages out and back in as they are
>>> go out to swap and back in, just keep count of instantiated pages
>>>
>> I am not sure how you define instantiated pages. I suspect that
>> you mean RSS + pages swapped out (swap_pte)?
> 
> That's it.  (Whereas file pages counted out when paged out,
> then counted back in when paged back in.)
> 
>> If a swapoff is going to push a container over it's limit, then
>> we break the container and the isolation it provides.
> 
> Is it just my traditional bias, that makes me prefer you break
> your container than my swapoff?  I'm not sure.
>


:-) Please see my response below

>> Upon swapoff
>> failure, may be we could get the container to print a nice
>> little warning so that anyone else with CAP_SYS_ADMIN can fix the
>> container limit and retry swapoff.
> 
> And then they hit the next one... rather like trying to work out
> the dependencies of packages for oneself: a very tedious process.
> 

Yes, but here's the overall picture of what is happening

1. The system administrator setup a memory container to contain
   a group of applications.
2. The administrator tried to swapoff one/a group of swap files/
   devices
3. Operation 2, failed due to a container being above it's limit.
   Which implies that at some point a container went over it's
   limit and some of it's pages were swapped out

During swapoff, we try to account for pages coming back into the
container, our charging routine does try to reclaim pages,
which in turn implies -- it will use another swap device or
reclaim page cache, if both fails, we return -ENOMEM.

Given that the system administrator has setup the container and
the swap devices, I feel that he is in better control of what
to do with the system when swapoff fails.

In the future we plan to implement per container swap (a feature
desired by several people), assuming that administrators use
per container swap in the future, failing on limit sounds
like the right way to go forward.

> If the swapoff succeeds, that does mean there was actually room
> in memory (+ other swap) for everyone, even if some have gone over
> their nominal limits.  (But if the swapoff runs out of memory in
> the middle, yes, it might well have assigned the memory unfairly.)
> 

Yes, precisely my point, the administrator is the best person
to decide how to assign memory to containers. Would it help
to add a container tunable that says, it's ok to go overlimit
with this container during a swapoff.

> The appropriate answer may depend on what you do when a container
> tries to fault in one more page than its limit.  Apparently just
> fail it (no attempt to page out another page from that container).
> 

The problem with that approach is that applications will fail
in the middle of their task. They will never get a chance
to run at all, they will always get killed in the middle.
We want to be able to reclaim pages from the container and
let the application continue.

> So, if the whole system is under memory pressure, kswapd will
> be keeping the RSS of all tasks low, and they won't reach their
> limits; whereas if the system is not under memory pressure,
> tasks will easily approach their limits and so fail.
> 

Tasks failing on limit does not sound good unless we are out
of all backup memory (slow storage). We still let the application
run, although slowly.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-09-18  4:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-17 18:57 [PATCH mm] fix swapoff breakage; however Hugh Dickins
2007-09-17 19:12 ` Balbir Singh
2007-09-17 19:51   ` Hugh Dickins
2007-09-17 20:48     ` Balbir Singh
2007-09-17 22:36       ` Hugh Dickins
2007-09-18  4:23         ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox