linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Zhongkun He <hezhongkun.hzk@bytedance.com>
Cc: corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
	linux-doc@vger.kernel.org, wuyun.abel@bytedance.com
Subject: Re: [External] Re: [RFC] mm: add new syscall pidfd_set_mempolicy()
Date: Wed, 12 Oct 2022 11:02:01 +0200	[thread overview]
Message-ID: <Y0aCiYMQ4liL2azT@dhcp22.suse.cz> (raw)
In-Reply-To: <582cf257-bc0d-c96e-e72e-9164cff4fce1@bytedance.com>

On Wed 12-10-22 15:55:44, Zhongkun He wrote:
> Hi  michal, thanks for your reply and suggestiones.
> 
> > Please add some explanation why the cpuset interface is not usable for
> > that usecase.
> OK.
> 
> > > To solve the issue, this patch introduces a new syscall
> > > pidfd_set_mempolicy(2).  it sets the NUMA memory policy of the thread
> > > specified in pidfd.
> > > 
> > > In current process context there is no locking because only the process
> > > accesses its own memory policy, so task_work is used in
> > > pidfd_set_mempolicy() to update the mempolicy of the process specified
> > > in pidfd, avoid using locks and race conditions.
> > 
> > Why cannot you alter kernel_set_mempolicy (and do_set_mempolicy) to
> > accept a task rather than operate on current?
> 
> I have tried it before this patch, but I found a problem.The allocation and
> update of mempolicy are in the current context, so it is not protected by
> any lock.But when the mempolicy is modified by other processes, the race
> condition appears.
> Say something like the following
> 
> 	pidfd_set_mempolicy	     target task stack
> 				       alloc_pages
> 					mpol = get_task_policy;
> 	 task_lock(task);
> 	 old = task->mempolicy;
> 	 task->mempolicy = new;
> 	 task_unlock(task);
> 	 mpol_put(old);			
> 					page = __alloc_pages(mpol);
> There is a situation that when the old mempolicy is released, the target
> task is still using the policy.It would be better if there are suggestions
> on this case.

Yes, this will require some refactoring and one potential way is to make
mpol ref counting unconditional. The conditional ref. counting has
already caused issues in the past and the code is rather hard to follow
anyway. I am not really sure this optimization is worth it.

Another option would be to block the pidfd side of things on completion
which would wake it up from the task_work context but I would rather
explore the ref counting approach first and only if this is proven to be
too expensive to go with hacks like this.
-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2022-10-12  9:02 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-10  9:48 Zhongkun He
2022-10-10 16:22 ` Frank van der Linden
2022-10-11 15:00   ` Michal Hocko
2022-10-11 17:22     ` Frank van der Linden
2022-10-11 19:29       ` Michal Hocko
2022-10-12  3:14         ` Abel Wu
2022-10-12 12:34         ` Vinicius Petrucci
2022-10-12 13:07           ` Michal Hocko
2022-10-12 13:23             ` Michal Hocko
2022-10-12 16:51           ` Frank van der Linden
2022-10-11 14:57 ` Michal Hocko
2022-10-12  7:55   ` [External] " Zhongkun He
2022-10-12  9:02     ` Michal Hocko [this message]
2022-10-12 11:22       ` Zhongkun He
2022-10-12 12:15         ` Michal Hocko
2022-10-13 10:44           ` Zhongkun He
2022-10-13 11:26             ` Michal Hocko
2022-10-13 12:50               ` Zhongkun He
2022-10-13 13:17                 ` Michal Hocko
2022-10-13 13:42                   ` Zhongkun He
2022-10-12  4:16 ` Bagas Sanjaya
2022-10-12  8:18   ` [External] " Zhongkun He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0aCiYMQ4liL2azT@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=hezhongkun.hzk@bytedance.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=wuyun.abel@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox