From: Miao Xie <miaox@cn.fujitsu.com>
To: Nick Piggin <npiggin@suse.de>
Cc: David Rientjes <rientjes@google.com>,
Lee Schermerhorn <lee.schermerhorn@hp.com>,
Paul Menage <menage@google.com>,
Linux-Kernel <linux-kernel@vger.kernel.org>,
Linux-MM <linux-mm@kvack.org>, Li Zefan <lizf@cn.fujitsu.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily
Date: Wed, 31 Mar 2010 17:54:34 +0800 [thread overview]
Message-ID: <4BB31BDA.8080203@cn.fujitsu.com> (raw)
In-Reply-To: <20100311110317.GL5812@laptop>
on 2010-3-11 19:03, Nick Piggin wrote:
>> Ok, I try to make a new patch by using seqlock.
>
> Well... I do think seqlocks would be a bit simpler because they don't
> require this checking and synchronizing of this patch.
>
> But you are right: on non-x86 architectures seqlocks would probably be
> more costly than your patch in the fastpaths. Unless you can avoid
> using the seqlock in fastpaths and just have callers handle the rare
> case of an empty nodemask.
>
> cpuset_node_allowed_*wall doesn't need anything because it is just
> interested in one bit in the mask.
>
> cpuset_mem_spread_node doesn't matter because it will loop around and
> try again if it doesn't find any nodes online.
>
> cpuset_mems_allowed seems totally broken anyway
>
> etc.
>
> This approach might take a little more work, but I think it might be the
> best way.
Hi, Nick Piggin
The following is the new patch made by your idea. Could you review it?
Thanks!
Miao
---
kernel/cpuset.c | 11 +++++++++--
mm/mempolicy.c | 20 +++++++++++++++++---
mm/mmzone.c | 15 +++++++++++----
3 files changed, 37 insertions(+), 9 deletions(-)
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index d109467..fbb1f1c 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -952,8 +952,6 @@ static void cpuset_migrate_mm(struct mm_struct *mm, const nodemask_t *from,
static void cpuset_change_task_nodemask(struct task_struct *tsk,
nodemask_t *newmems)
{
- nodes_or(tsk->mems_allowed, tsk->mems_allowed, *newmems);
- mpol_rebind_task(tsk, &tsk->mems_allowed);
mpol_rebind_task(tsk, newmems);
tsk->mems_allowed = *newmems;
}
@@ -2442,6 +2440,15 @@ int cpuset_mem_spread_node(void)
node = next_node(current->cpuset_mem_spread_rotor, current->mems_allowed);
if (node == MAX_NUMNODES)
node = first_node(current->mems_allowed);
+
+ /*
+ * if node is still MAX_NUMNODES, that means the kernel allocator saw
+ * an empty nodemask. In that case, the kernel allocator allocate
+ * memory on the current node.
+ */
+ if (unlikely(node == MAX_NUMNODES))
+ node = numa_node_id();
+
current->cpuset_mem_spread_rotor = node;
return node;
}
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 8034abd..75e819e 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1449,8 +1449,16 @@ static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy)
* the first node in the mask instead.
*/
if (unlikely(gfp & __GFP_THISNODE) &&
- unlikely(!node_isset(nd, policy->v.nodes)))
+ unlikely(!node_isset(nd, policy->v.nodes))) {
nd = first_node(policy->v.nodes);
+ /*
+ * When rebinding task->mempolicy, th kernel allocator
+ * may see an empty nodemask, and first_node() returns
+ * MAX_NUMNODES, In that case, we will use current node.
+ */
+ if (unlikely(nd == MAX_NUMNODES))
+ nd = numa_node_id();
+ }
break;
case MPOL_INTERLEAVE: /* should not happen */
break;
@@ -1522,17 +1530,21 @@ unsigned slab_node(struct mempolicy *policy)
static unsigned offset_il_node(struct mempolicy *pol,
struct vm_area_struct *vma, unsigned long off)
{
- unsigned nnodes = nodes_weight(pol->v.nodes);
+ nodemask_t tmp_nodes;
+ unsigned nnodes;
unsigned target;
int c;
int nid = -1;
+ tmp_nodes = pol->v.nodes;
+ nnodes = nodes_weight(tmp_nodes);
if (!nnodes)
return numa_node_id();
+
target = (unsigned int)off % nnodes;
c = 0;
do {
- nid = next_node(nid, pol->v.nodes);
+ nid = next_node(nid, tmp_nodes);
c++;
} while (c <= target);
return nid;
@@ -1631,7 +1643,9 @@ bool init_nodemask_of_mempolicy(nodemask_t *mask)
case MPOL_BIND:
/* Fall through */
case MPOL_INTERLEAVE:
+ task_lock(current);
*mask = mempolicy->v.nodes;
+ task_unlock(current);
break;
default:
diff --git a/mm/mmzone.c b/mm/mmzone.c
index f5b7d17..43ac21b 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -58,6 +58,7 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
nodemask_t *nodes,
struct zone **zone)
{
+ nodemask_t tmp_nodes;
/*
* Find the next suitable zone to use for the allocation.
* Only filter based on nodemask if it's set
@@ -65,10 +66,16 @@ struct zoneref *next_zones_zonelist(struct zoneref *z,
if (likely(nodes == NULL))
while (zonelist_zone_idx(z) > highest_zoneidx)
z++;
- else
- while (zonelist_zone_idx(z) > highest_zoneidx ||
- (z->zone && !zref_in_nodemask(z, nodes)))
- z++;
+ else {
+ tmp_nodes = *nodes;
+ if (nodes_empty(tmp_nodes))
+ while (zonelist_zone_idx(z) > highest_zoneidx)
+ z++;
+ else
+ while (zonelist_zone_idx(z) > highest_zoneidx ||
+ (z->zone && !zref_in_nodemask(z, &tmp_nodes)))
+ z++;
+ }
*zone = zonelist_zone(z);
return z;
--
1.6.5.2
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-31 9:54 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-08 10:10 Miao Xie
2010-03-08 21:46 ` David Rientjes
2010-03-09 7:25 ` Miao Xie
2010-03-11 8:15 ` Nick Piggin
2010-03-11 10:33 ` Miao Xie
2010-03-11 11:03 ` Nick Piggin
2010-03-25 10:23 ` Miao Xie
2010-03-25 12:56 ` Miao Xie
2010-03-25 13:33 ` [PATCH] [PATCH -mmotm] cpuset,mm: use seqlock to protect task->mempolicy and mems_allowed (v2) (was: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily) Miao Xie
2010-03-28 5:30 ` Bob Liu
2010-03-31 19:42 ` Andrew Morton
2010-03-31 9:54 ` Miao Xie [this message]
2010-03-31 10:34 ` [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily David Rientjes
2010-04-01 2:16 ` Miao Xie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BB31BDA.8080203@cn.fujitsu.com \
--to=miaox@cn.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
--cc=npiggin@suse.de \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox