From: Zhongkun He <hezhongkun.hzk@bytedance.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: corbet@lwn.net, mhocko@suse.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-api@vger.kernel.org,
linux-doc@vger.kernel.org
Subject: Re: [External] Re: [PATCH v2] mm: add new syscall pidfd_set_mempolicy().
Date: Mon, 14 Nov 2022 00:41:21 +0800 [thread overview]
Message-ID: <a44f794e-fe60-e261-3631-9107822d5c36@bytedance.com> (raw)
In-Reply-To: <20221111112732.30e1696bcd0d5b711c188a9a@linux-foundation.org>
Hi Andrew, thanks for your replay.
> This sounds a bit suspicious. Please share much more detail about
> these races. If we proced with this design then mpol_put_async()
> shouild have comments which fully describe the need for the async free.
>
> How do we *know* that these races are fully prevented with this
> approach? How do we know that mpol_put_async() won't free the data
> until the race window has fully passed?
A mempolicy can be either associated with a process or with a VMA.
All vma manipulation is somewhat protected by a down_read on
mmap_lock.In process context there is no locking because only
the process accesses its own state before.
Now we need to change the process context mempolicy specified
in pidfd. the mempolicy may about to be freed by
pidfd_set_mempolicy() while alloc_pages() is using it,
the race condition appears.
process context mempolicy is used in:
alloc_pages()
alloc_pages_bulk_array_mempolicy()
policy_mbind_nodemask()
mempolicy_slab_node()
.....
Say something like the following:
pidfd_set_mempolicy() target task stack:
alloc_pages:
mpol = p->mempolicy;
task_lock(task);
old = task->mempolicy;
task->mempolicy = new;
task_unlock(task);
mpol_put(old);
/*old mpol has been freed.*/
policy_node(...., mpol)
__alloc_pages(mpol);
To reduce the use of locks and atomic operations(mpol_get/put)
in the hot path,task_work is used in mpol_put_async(),
when the target task exit to user mode, the process context
mempolicy is not used anymore, mpol_free_async()
will be called as task_work to free mempolicy in
target context.
> Also, in some situations mpol_put_async() will free the data
> synchronously anyway, so aren't these races still present?
> If the task has run exit_task_work(),task_work_add() will fail.
we can free the mempolicy directly because mempolicy is not used.
>
> Secondly, why was the `flags' argument added? We might use it one day?
> For what purpose? I mean, every syscall could have a does-nothing
> `flags' arg, but we don't do that. What's the plan here?
>
I found that some functions use 'flags' for scalability, such
as process_madvise(), set_mempolicy_home_node(). back to our case, This
operation has per-thread rather than per-process semantic ,we could use
flags to switch for future extension if any. but I'm not sure.
Thanks.
next prev parent reply other threads:[~2022-11-13 16:41 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-11 8:40 Zhongkun He
2022-11-11 19:27 ` Andrew Morton
2022-11-13 16:41 ` Zhongkun He [this message]
2022-11-14 11:44 ` [External] " Michal Hocko
2022-11-14 11:46 ` Michal Hocko
2022-11-14 17:52 ` Michal Hocko
2022-11-14 15:12 ` Zhongkun He
2022-11-14 18:12 ` Michal Hocko
2022-11-15 7:39 ` Zhongkun He
2022-11-16 11:28 ` Zhongkun He
2022-11-16 14:57 ` Michal Hocko
2022-11-17 7:19 ` Zhongkun He
2022-11-21 14:38 ` Michal Hocko
2022-11-22 8:33 ` Zhongkun He
2022-11-22 8:40 ` Michal Hocko
2022-11-14 9:24 ` Zhongkun He
2022-11-12 2:09 ` kernel test robot
2022-11-16 7:04 ` Huang, Ying
2022-11-16 9:38 ` [External] " Zhongkun He
2022-11-16 9:44 ` Michal Hocko
2022-11-17 6:29 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a44f794e-fe60-e261-3631-9107822d5c36@bytedance.com \
--to=hezhongkun.hzk@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox