Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Miao Xie <miaox@cn.fujitsu.com>
To: David Rientjes <rientjes@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>,
	Nick Piggin <npiggin@suse.de>, Paul Menage <menage@google.com>,
	Linux-Kernel <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily
Date: Tue, 09 Mar 2010 15:25:54 +0800	[thread overview]
Message-ID: <4B95F802.9020308@cn.fujitsu.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1003081330370.18502@chino.kir.corp.google.com>

on 2010-3-9 5:46, David Rientjes wrote:
[snip]
>> Considering the change of task->mems_allowed is not frequent, so in this patch,
>> I use two variables as a tag to indicate whether task->mems_allowed need be
>> update or not. And before setting the tag, cpuset caches the new mask of every
>> task at its task_struct.
>>
> 
> So what exactly is the benefit of 58568d2 from last June that caused this 
> issue to begin with?  It seems like this entire patchset is a revert of 
> that commit.  So why shouldn't we just revert that one commit and then add 
> the locking and updating necessary for configs where
> MAX_NUMNODES > BITS_PER_LONG on top?

I worried about the consistency of task->mempolicy with task->mems_allowed for
configs where MAX_NUMNODES <= BITS_PER_LONG. 

The problem that I worried is fowllowing:
When the kernel allocator allocates pages for tasks, it will access task->mempolicy
first and get the allowed node, then check whether that node is allowed by
task->mems_allowed.

But, Without this patch, ->mempolicy and ->mems_allowed is not updated at the same
time. the kernel allocator may access the inconsistent information of ->mempolicy
and ->mems_allowed, sush as the allocator gets the allowed node from old mempolicy,
but checks whether that node is allowed by new mems_allowed which does't intersect
old mempolicy.

So I made this patchset.

>> +/**
>> + * cpuset_update_task_mems_allowed - update task memory placement
>> + *
>> + * If the current task's mems_allowed_for_update and mempolicy_for_update are
>> + * changed by cpuset behind our backs, update current->mems_allowed,
>> + * mems_generation and task NUMA mempolicy to the new value.
>> + *
>> + * Call WITHOUT mems_lock held.
>> + * 
>> + * This routine is needed to update the pre-task mems_allowed and mempolicy
>> + * within the tasks context, when it is trying to allocate memory.
>> + */
>> +static __always_inline void cpuset_update_task_mems_allowed(void)
>> +{
>> +	struct task_struct *tsk = current;
>> +	unsigned long flags;
>> +
>> +	if (unlikely(tsk->mems_generation != tsk->mems_generation_for_update)) {
>> +		task_mems_lock_irqsave(tsk, flags);
>> +		tsk->mems_allowed = tsk->mems_allowed_for_update;
>> +		tsk->mems_generation = tsk->mems_generation_for_update;
>> +		task_mems_unlock_irqrestore(tsk, flags);
> 
> By this synchronization, you're guaranteeing that no other kernel code 
> ever reads tsk->mems_allowed when tsk != current?  Otherwise, you're 
> simply protecting the store to tsk->mems_allowed here and not serializing 
> on the loads that can return empty nodemasks.

I guarantee that no other kernel code changes tsk->mems_allowed when tsk != current.
so every task can  be safe to read tsk->mems_allowed without lock.

I will use mems_lock to protect it when other task reads. 

>> +	/* Protection of ->mems_allowed_for_update */
>> +	spinlock_t mems_lock;
>> +	/*
>> +	 * This variable(mems_allowed_for_update) are just used for caching
>> +	 * memory placement information.
>> +	 *
>> +	 * ->mems_allowed are used by the kernel allocator.
>> +	 */
>> +	nodemask_t mems_allowed_for_update;	/* Protected by mems_lock */
> 
> Another nodemask_t in struct task_struct for this?  And for all configs, 
> including those that can do atomic updates to mems_allowed?

Yes, for all configs.

> 
>> +
>> +	/*
>> +	 * Increment this integer everytime ->mems_allowed_for_update is
>> +	 * changed by cpuset. Task can compare this number with mems_generation,
>> +	 * and if they are not the same, mems_allowed_for_update is changed and
>> +	 * ->mems_allowed must be updated. In this way, tasks can avoid having
>> +	 * to lock and reload mems_allowed_for_update unless it is changed.
>> +	 */
>> +	int mems_generation_for_update;
>> +	/*
>> +	 * After updating mems_allowed, set mems_generation to
>> +	 * mems_generation_for_update.
>> +	 */
>> +	int mems_generation;
> 
> I don't see why you need two mems_generation numbers, one should belong in 
> the task's cpuset.  Then you can compare tsk->mems_generation to 
> task_cs(tsk)->mems_generation at cpuset_update_task_memory_state() if you 
> set tsk->mems_generation = task_cs(tsk)->mems_generation on 
> cpuset_attach() or update_nodemask().

In this way, we must use rcu_read_lock() to protect task's cs, and the performance
will slowdown though rcu read lock's spending is very small.

Thanks!
Miao

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2010-03-09  7:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-08 10:10 Miao Xie
2010-03-08 21:46 ` David Rientjes
2010-03-09  7:25   ` Miao Xie [this message]
2010-03-11  8:15     ` Nick Piggin
2010-03-11 10:33       ` Miao Xie
2010-03-11 11:03         ` Nick Piggin
2010-03-25 10:23           ` Miao Xie
2010-03-25 12:56             ` Miao Xie
2010-03-25 13:33           ` [PATCH] [PATCH -mmotm] cpuset,mm: use seqlock to protect task->mempolicy and mems_allowed (v2) (was: Re: [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily) Miao Xie
2010-03-28  5:30             ` Bob Liu
2010-03-31 19:42             ` Andrew Morton
2010-03-31  9:54           ` [PATCH V2 4/4] cpuset,mm: update task's mems_allowed lazily Miao Xie
2010-03-31 10:34             ` David Rientjes
2010-04-01  2:16               ` Miao Xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B95F802.9020308@cn.fujitsu.com \
    --to=miaox@cn.fujitsu.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=menage@google.com \
    --cc=npiggin@suse.de \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox