* [patch] mm, oom: avoid checking set of allowed nodes twice when selecting a victim
@ 2012-04-03 23:34 David Rientjes
2012-04-12 14:01 ` Michal Hocko
2012-04-26 8:45 ` Michal Hocko
0 siblings, 2 replies; 7+ messages in thread
From: David Rientjes @ 2012-04-03 23:34 UTC (permalink / raw)
To: Andrew Morton; +Cc: KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm
For systems with high CONFIG_NODES_SHIFT, checking nodes_intersect() for
each thread's set of allowed nodes is very expensive. It's unnecessary
to do this twice for each thread, once in select_bad_process() and once
in oom_badness(). We've already filtered unkillable threads at the point
where oom_badness() is called.
oom_badness() must still check if a thread is a kthread, however, to
ensure /proc/pid/oom_score doesn't report one as killable.
This significantly speeds up the tasklist iteration when there are a
large number of threads on the system and CONFIG_NODES_SHIFT is high.
Signed-off-by: David Rientjes <rientjes@google.com>
---
mm/oom_kill.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -151,13 +151,16 @@ struct task_struct *find_lock_task_mm(struct task_struct *p)
return NULL;
}
+static bool is_unkillable_kthread(struct task_struct *p)
+{
+ return is_global_init(p) || (p->flags & PF_KTHREAD);
+}
+
/* return true if the task is not adequate as candidate victim task. */
static bool oom_unkillable_task(struct task_struct *p,
const struct mem_cgroup *memcg, const nodemask_t *nodemask)
{
- if (is_global_init(p))
- return true;
- if (p->flags & PF_KTHREAD)
+ if (is_unkillable_kthread(p))
return true;
/* When mem_cgroup_out_of_memory() and p is not member of the group */
@@ -185,7 +188,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
{
long points;
- if (oom_unkillable_task(p, memcg, nodemask))
+ if (is_unkillable_kthread(p))
return 0;
p = find_lock_task_mm(p);
@@ -478,9 +481,8 @@ static void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
if (child->mm == p->mm)
continue;
- /*
- * oom_badness() returns 0 if the thread is unkillable
- */
+ if (oom_unkillable_task(child, memcg, nodemask))
+ continue;
child_points = oom_badness(child, memcg, nodemask,
totalpages);
if (child_points > victim_points) {
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] mm, oom: avoid checking set of allowed nodes twice when selecting a victim
2012-04-03 23:34 [patch] mm, oom: avoid checking set of allowed nodes twice when selecting a victim David Rientjes
@ 2012-04-12 14:01 ` Michal Hocko
2012-04-24 23:09 ` David Rientjes
2012-04-26 8:45 ` Michal Hocko
1 sibling, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2012-04-12 14:01 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm
On Tue 03-04-12 16:34:36, David Rientjes wrote:
> For systems with high CONFIG_NODES_SHIFT, checking nodes_intersect() for
> each thread's set of allowed nodes is very expensive. It's unnecessary
> to do this twice for each thread, once in select_bad_process() and once
> in oom_badness(). We've already filtered unkillable threads at the point
> where oom_badness() is called.
>
> oom_badness() must still check if a thread is a kthread, however, to
> ensure /proc/pid/oom_score doesn't report one as killable.
>
> This significantly speeds up the tasklist iteration when there are a
> large number of threads on the system and CONFIG_NODES_SHIFT is high.
Looks correct but I am not sure I like the subtle dependency between
oom_unkillable_task and oom_badness which is a result of this change.
We do not need it for proc oom_score because we are feeding it with NULL
cgroup and nodemask but we really care in other cases.
I do agree that the test duplication is not nice and it can be expensive
but this subtleness is not nice either.
Wouldn't it make more sense to extract __oom_badness without the checks
and make it explicit that the function can be called only for killable
tasks (namely only select_bad_process would use it)?
Something like (untested):
---
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 46bf2ed5..a9df008 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -171,23 +171,10 @@ static bool oom_unkillable_task(struct task_struct *p,
return false;
}
-/**
- * oom_badness - heuristic function to determine which candidate task to kill
- * @p: task struct of which task we should calculate
- * @totalpages: total present RAM allowed for page allocation
- *
- * The heuristic for determining which task to kill is made to be as simple and
- * predictable as possible. The goal is to return the highest value for the
- * task consuming the most memory to avoid subsequent oom failures.
- */
-unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
+/* can be used only for tasks which are killable as per oom_unkillable_task */
+static unsigned int __oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
const nodemask_t *nodemask, unsigned long totalpages)
{
- long points;
-
- if (oom_unkillable_task(p, memcg, nodemask))
- return 0;
-
p = find_lock_task_mm(p);
if (!p)
return 0;
@@ -239,6 +226,26 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
return (points < 1000) ? points : 1000;
}
+/**
+ * oom_badness - heuristic function to determine which candidate task to kill
+ * @p: task struct of which task we should calculate
+ * @totalpages: total present RAM allowed for page allocation
+ *
+ * The heuristic for determining which task to kill is made to be as simple and
+ * predictable as possible. The goal is to return the highest value for the
+ * task consuming the most memory to avoid subsequent oom failures.
+ */
+unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
+ const nodemask_t *nodemask, unsigned long totalpages)
+{
+ long points;
+
+ if (oom_unkillable_task(p, memcg, nodemask))
+ return 0;
+
+ return __oom_badness(p, memcg, nodemask, totalpages);
+}
+
/*
* Determine the type of allocation constraint.
*/
@@ -366,7 +373,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
}
}
- points = oom_badness(p, memcg, nodemask, totalpages);
+ points = __oom_badness(p, memcg, nodemask, totalpages);
if (points > *ppoints) {
chosen = p;
*ppoints = points;
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] mm, oom: avoid checking set of allowed nodes twice when selecting a victim
2012-04-12 14:01 ` Michal Hocko
@ 2012-04-24 23:09 ` David Rientjes
2012-04-25 8:06 ` Michal Hocko
0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2012-04-24 23:09 UTC (permalink / raw)
To: Michal Hocko; +Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm
On Thu, 12 Apr 2012, Michal Hocko wrote:
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 46bf2ed5..a9df008 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -171,23 +171,10 @@ static bool oom_unkillable_task(struct task_struct *p,
> return false;
> }
>
> -/**
> - * oom_badness - heuristic function to determine which candidate task to kill
> - * @p: task struct of which task we should calculate
> - * @totalpages: total present RAM allowed for page allocation
> - *
> - * The heuristic for determining which task to kill is made to be as simple and
> - * predictable as possible. The goal is to return the highest value for the
> - * task consuming the most memory to avoid subsequent oom failures.
> - */
> -unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> +/* can be used only for tasks which are killable as per oom_unkillable_task */
> +static unsigned int __oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> const nodemask_t *nodemask, unsigned long totalpages)
> {
> - long points;
> -
> - if (oom_unkillable_task(p, memcg, nodemask))
> - return 0;
> -
> p = find_lock_task_mm(p);
> if (!p)
> return 0;
> @@ -239,6 +226,26 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> return (points < 1000) ? points : 1000;
> }
>
> +/**
> + * oom_badness - heuristic function to determine which candidate task to kill
> + * @p: task struct of which task we should calculate
> + * @totalpages: total present RAM allowed for page allocation
> + *
> + * The heuristic for determining which task to kill is made to be as simple and
> + * predictable as possible. The goal is to return the highest value for the
> + * task consuming the most memory to avoid subsequent oom failures.
> + */
> +unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> + const nodemask_t *nodemask, unsigned long totalpages)
> +{
> + long points;
> +
> + if (oom_unkillable_task(p, memcg, nodemask))
> + return 0;
> +
> + return __oom_badness(p, memcg, nodemask, totalpages);
> +}
> +
> /*
> * Determine the type of allocation constraint.
> */
> @@ -366,7 +373,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
> }
> }
>
> - points = oom_badness(p, memcg, nodemask, totalpages);
> + points = __oom_badness(p, memcg, nodemask, totalpages);
> if (points > *ppoints) {
> chosen = p;
> *ppoints = points;
No, the way I had it written is correct: the above unnecessarily checks
for membership in a memcg or intersection with a set of allowable nodes
for child threads in oom_kill_process(). With a lot of children and with
a CONFIG_NODES_SHIFT significantly large (the prerequisite for this patch
to make any difference), that's too costly to do.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] mm, oom: avoid checking set of allowed nodes twice when selecting a victim
2012-04-24 23:09 ` David Rientjes
@ 2012-04-25 8:06 ` Michal Hocko
2012-04-25 20:59 ` David Rientjes
0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2012-04-25 8:06 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm
On Tue 24-04-12 16:09:14, David Rientjes wrote:
> On Thu, 12 Apr 2012, Michal Hocko wrote:
>
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 46bf2ed5..a9df008 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -171,23 +171,10 @@ static bool oom_unkillable_task(struct task_struct *p,
> > return false;
> > }
> >
> > -/**
> > - * oom_badness - heuristic function to determine which candidate task to kill
> > - * @p: task struct of which task we should calculate
> > - * @totalpages: total present RAM allowed for page allocation
> > - *
> > - * The heuristic for determining which task to kill is made to be as simple and
> > - * predictable as possible. The goal is to return the highest value for the
> > - * task consuming the most memory to avoid subsequent oom failures.
> > - */
> > -unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> > +/* can be used only for tasks which are killable as per oom_unkillable_task */
> > +static unsigned int __oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> > const nodemask_t *nodemask, unsigned long totalpages)
> > {
> > - long points;
> > -
> > - if (oom_unkillable_task(p, memcg, nodemask))
> > - return 0;
> > -
> > p = find_lock_task_mm(p);
> > if (!p)
> > return 0;
> > @@ -239,6 +226,26 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> > return (points < 1000) ? points : 1000;
> > }
> >
> > +/**
> > + * oom_badness - heuristic function to determine which candidate task to kill
> > + * @p: task struct of which task we should calculate
> > + * @totalpages: total present RAM allowed for page allocation
> > + *
> > + * The heuristic for determining which task to kill is made to be as simple and
> > + * predictable as possible. The goal is to return the highest value for the
> > + * task consuming the most memory to avoid subsequent oom failures.
> > + */
> > +unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> > + const nodemask_t *nodemask, unsigned long totalpages)
> > +{
> > + long points;
> > +
> > + if (oom_unkillable_task(p, memcg, nodemask))
> > + return 0;
> > +
> > + return __oom_badness(p, memcg, nodemask, totalpages);
> > +}
> > +
> > /*
> > * Determine the type of allocation constraint.
> > */
> > @@ -366,7 +373,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
> > }
> > }
> >
> > - points = oom_badness(p, memcg, nodemask, totalpages);
> > + points = __oom_badness(p, memcg, nodemask, totalpages);
> > if (points > *ppoints) {
> > chosen = p;
> > *ppoints = points;
>
> No, the way I had it written is correct: the above unnecessarily checks
> for membership in a memcg or intersection with a set of allowable nodes
> for child threads in oom_kill_process().
your patch does
if (oom_unkillable_task(child, memcg, nodemask))
continue;
oom_badness((child, memcg, nodemask,
totalpages);
in oom_kill_process so the check is very same. Or am I missing
something?
> With a lot of children and with
> a CONFIG_NODES_SHIFT significantly large (the prerequisite for this patch
> to make any difference), that's too costly to do.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] mm, oom: avoid checking set of allowed nodes twice when selecting a victim
2012-04-25 8:06 ` Michal Hocko
@ 2012-04-25 20:59 ` David Rientjes
2012-04-26 8:44 ` Michal Hocko
0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2012-04-25 20:59 UTC (permalink / raw)
To: Michal Hocko; +Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm
On Wed, 25 Apr 2012, Michal Hocko wrote:
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 46bf2ed5..a9df008 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -171,23 +171,10 @@ static bool oom_unkillable_task(struct task_struct *p,
> > > return false;
> > > }
> > >
> > > -/**
> > > - * oom_badness - heuristic function to determine which candidate task to kill
> > > - * @p: task struct of which task we should calculate
> > > - * @totalpages: total present RAM allowed for page allocation
> > > - *
> > > - * The heuristic for determining which task to kill is made to be as simple and
> > > - * predictable as possible. The goal is to return the highest value for the
> > > - * task consuming the most memory to avoid subsequent oom failures.
> > > - */
> > > -unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> > > +/* can be used only for tasks which are killable as per oom_unkillable_task */
> > > +static unsigned int __oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> > > const nodemask_t *nodemask, unsigned long totalpages)
> > > {
> > > - long points;
> > > -
> > > - if (oom_unkillable_task(p, memcg, nodemask))
> > > - return 0;
> > > -
> > > p = find_lock_task_mm(p);
> > > if (!p)
> > > return 0;
> > > @@ -239,6 +226,26 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> > > return (points < 1000) ? points : 1000;
> > > }
> > >
> > > +/**
> > > + * oom_badness - heuristic function to determine which candidate task to kill
> > > + * @p: task struct of which task we should calculate
> > > + * @totalpages: total present RAM allowed for page allocation
> > > + *
> > > + * The heuristic for determining which task to kill is made to be as simple and
> > > + * predictable as possible. The goal is to return the highest value for the
> > > + * task consuming the most memory to avoid subsequent oom failures.
> > > + */
> > > +unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> > > + const nodemask_t *nodemask, unsigned long totalpages)
> > > +{
> > > + long points;
> > > +
> > > + if (oom_unkillable_task(p, memcg, nodemask))
> > > + return 0;
> > > +
> > > + return __oom_badness(p, memcg, nodemask, totalpages);
> > > +}
> > > +
> > > /*
> > > * Determine the type of allocation constraint.
> > > */
> > > @@ -366,7 +373,7 @@ static struct task_struct *select_bad_process(unsigned int *ppoints,
> > > }
> > > }
> > >
> > > - points = oom_badness(p, memcg, nodemask, totalpages);
> > > + points = __oom_badness(p, memcg, nodemask, totalpages);
> > > if (points > *ppoints) {
> > > chosen = p;
> > > *ppoints = points;
> >
> > No, the way I had it written is correct: the above unnecessarily checks
> > for membership in a memcg or intersection with a set of allowable nodes
> > for child threads in oom_kill_process().
>
> your patch does
> if (oom_unkillable_task(child, memcg, nodemask))
> continue;
> oom_badness((child, memcg, nodemask,
> totalpages);
>
> in oom_kill_process so the check is very same. Or am I missing
> something?
>
Here you go again.
Why would you ever do something like this?
/proc/pid/oom_score certainly doesn't care about cpusets or memcg and
exports only oom scores in a global context, anything else would be
inconsistent. It only cares about whether the thread is init or another
kthread because they are ineligible. So let's leave /proc/pid/oom_score
out of this.
That's the function of oom_badness(): to assign a point value for a
specific process to determine the highest priority for oom kill. It
shouldn't care about the context of the oom kill; and that's why
/proc/pid/oom_score, which is always global, doesn't care.
Now tell me what's clearer in terms of the code: calling
oom_unkillable_task() to determine the context of the oom kill explicitly
where it matters or calling either oom_badness() or __oom_badness() and
remembering what the heck the difference between the two is.
You're patch also wouldn't compile because you've removed the declaration
of "points" from __oom_badness(), which actually uses it, to
oom_badness(), which doesn't use it, for no apparent reason.
If this sounds frustrated, then it certainly is. This patch has stalled
out three weeks from being in -mm because of your broken patch suggestion.
My patch fixes an issue that people with very large systems and a high
CONFIG_NODES_SHIFT will encounter and possibly cause timeouts and we run
with it internally as part of a fix for a faster oom killer because this
problem actually manifests itself in real-world situations.
And here you are, just like when you wanted to rework a patch of mine and
rewrite the changelog so Andrew mistakenly sent it to Linus as a fix for a
patch that wasn't even in his tree, suggesting broken (and admittedly
untested) patches as some kind of cleanup that actually just makes the
code harder to understand if you're reading it.
This patch fixes a real-world issue that has been tested on thousands of
machines. Please keep your little untested follow-up changes that you
think make it look better to yourself so that this patch can get merged
and help people out.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] mm, oom: avoid checking set of allowed nodes twice when selecting a victim
2012-04-25 20:59 ` David Rientjes
@ 2012-04-26 8:44 ` Michal Hocko
0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2012-04-26 8:44 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm
On Wed 25-04-12 13:59:28, David Rientjes wrote:
[...]
> /proc/pid/oom_score certainly doesn't care about cpusets or memcg and
> exports only oom scores in a global context, anything else would be
> inconsistent. It only cares about whether the thread is init or another
> kthread because they are ineligible.
OK, your patch makes more sense in this context because the allowed
nodes check is skipped completely (memcg test is disabled already by
memcg==NULL). You are right that this is inconsistent because we did
considered allowed nodes of the process reading the file but we ignored
memcg it belongs to. So it is neither local view of the reading process
nor the global view.
The changelog doesn't mention this side effect and it wasn't obvious to
me.
> So let's leave /proc/pid/oom_score out of this.
>
> That's the function of oom_badness(): to assign a point value for a
> specific process to determine the highest priority for oom kill. It
> shouldn't care about the context of the oom kill; and that's why
> /proc/pid/oom_score, which is always global, doesn't care.
>
> Now tell me what's clearer in terms of the code: calling
> oom_unkillable_task() to determine the context of the oom kill explicitly
> where it matters or calling either oom_badness() or __oom_badness() and
> remembering what the heck the difference between the two is.
OK, fair enough. I was trapped in the mindset where oom_badness took
care of the task filtering as well. That's why I added another layer
(__oom_badness) which only calculates the score.
But you are right it could end up being more confusing than explicitly
requiring to _always_ check the context before calling oom_badness.
> You're patch also wouldn't compile because you've removed the declaration
> of "points" from __oom_badness(), which actually uses it, to
> oom_badness(), which doesn't use it, for no apparent reason.
The patch was just an illustration and I made it explicit by noting it
was untested.
Thanks and I'm sorry if this was a high priority fix that got stalled
just because of my query.
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [patch] mm, oom: avoid checking set of allowed nodes twice when selecting a victim
2012-04-03 23:34 [patch] mm, oom: avoid checking set of allowed nodes twice when selecting a victim David Rientjes
2012-04-12 14:01 ` Michal Hocko
@ 2012-04-26 8:45 ` Michal Hocko
1 sibling, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2012-04-26 8:45 UTC (permalink / raw)
To: David Rientjes
Cc: Andrew Morton, KAMEZAWA Hiroyuki, KOSAKI Motohiro, linux-mm
On Tue 03-04-12 16:34:36, David Rientjes wrote:
> For systems with high CONFIG_NODES_SHIFT, checking nodes_intersect() for
> each thread's set of allowed nodes is very expensive. It's unnecessary
> to do this twice for each thread, once in select_bad_process() and once
> in oom_badness(). We've already filtered unkillable threads at the point
> where oom_badness() is called.
>
> oom_badness() must still check if a thread is a kthread, however, to
> ensure /proc/pid/oom_score doesn't report one as killable.
>
> This significantly speeds up the tasklist iteration when there are a
> large number of threads on the system and CONFIG_NODES_SHIFT is high.
>
> Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
> ---
> mm/oom_kill.c | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -151,13 +151,16 @@ struct task_struct *find_lock_task_mm(struct task_struct *p)
> return NULL;
> }
>
> +static bool is_unkillable_kthread(struct task_struct *p)
> +{
> + return is_global_init(p) || (p->flags & PF_KTHREAD);
> +}
> +
> /* return true if the task is not adequate as candidate victim task. */
> static bool oom_unkillable_task(struct task_struct *p,
> const struct mem_cgroup *memcg, const nodemask_t *nodemask)
> {
> - if (is_global_init(p))
> - return true;
> - if (p->flags & PF_KTHREAD)
> + if (is_unkillable_kthread(p))
> return true;
>
> /* When mem_cgroup_out_of_memory() and p is not member of the group */
> @@ -185,7 +188,7 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
> {
> long points;
>
> - if (oom_unkillable_task(p, memcg, nodemask))
> + if (is_unkillable_kthread(p))
> return 0;
>
> p = find_lock_task_mm(p);
> @@ -478,9 +481,8 @@ static void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
>
> if (child->mm == p->mm)
> continue;
> - /*
> - * oom_badness() returns 0 if the thread is unkillable
> - */
> + if (oom_unkillable_task(child, memcg, nodemask))
> + continue;
> child_points = oom_badness(child, memcg, nodemask,
> totalpages);
> if (child_points > victim_points) {
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-04-26 8:45 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-03 23:34 [patch] mm, oom: avoid checking set of allowed nodes twice when selecting a victim David Rientjes
2012-04-12 14:01 ` Michal Hocko
2012-04-24 23:09 ` David Rientjes
2012-04-25 8:06 ` Michal Hocko
2012-04-25 20:59 ` David Rientjes
2012-04-26 8:44 ` Michal Hocko
2012-04-26 8:45 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox