From: Michal Hocko <mhocko@suse.com>
To: Zhongkun He <hezhongkun.hzk@bytedance.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
corbet@lwn.net, linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-api@vger.kernel.org, linux-doc@vger.kernel.org
Subject: Re: [External] Re: [PATCH v2] mm: add new syscall pidfd_set_mempolicy().
Date: Mon, 14 Nov 2022 19:12:37 +0100 [thread overview]
Message-ID: <Y3KFFfMFE55lVdNZ@dhcp22.suse.cz> (raw)
In-Reply-To: <3a3b4f5b-14d1-27d8-7727-cf23da90988f@bytedance.com>
On Mon 14-11-22 23:12:00, Zhongkun He wrote:
> Sorry,michal. I dont know if my expression is accurate.
> >
> > We shouldn't really rely on mmap_sem for this IMO.
>
> Yes, We should rely on mmap_sem for vma->vm_policy,but not for
> process context policy(task->mempolicy).
But the caller has no way to know which kind of policy is returned so
the locking cannot be conditional on the policy type.
> > There is alloc_lock
> > (aka task lock) that makes sure the policy is stable so that caller can
> > atomically take a reference and hold on the policy. And we do not do
> > that consistently and this should be fixed.
>
> I saw some explanations in the doc("numa_memory_policy.rst") and
> comments(mempolcy.h) why not use locks and reference in page
> allocation:
>
> In process context there is no locking because only the process accesses
> its own state.
>
> During run-time "usage" of the policy, we attempt to minimize atomic
> operations on the reference count, as this can lead to cache lines
> bouncing between cpus and NUMA nodes.
Yes this is all understood but the level of the overhead is not really
clear. So the question is whether this will induce a visible overhead.
Because from the maintainability point of view it is much less costly to
have a clear life time model. Right now we have a mix of reference
counting and per-task requirements which is rather subtle and easy to
get wrong. In an ideal world we would have get_vma_policy always
returning a reference counted policy or NULL. If we really need to
optimize for cache line bouncing we can go with per cpu reference
counters (something that was not available at the time the mempolicy
code has been introduced).
So I am not saying that the task_work based solution is not possible I
just think that this looks like a good opportunity to get from the
existing subtle model.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2022-11-14 18:12 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-11 8:40 Zhongkun He
2022-11-11 19:27 ` Andrew Morton
2022-11-13 16:41 ` [External] " Zhongkun He
2022-11-14 11:44 ` Michal Hocko
2022-11-14 11:46 ` Michal Hocko
2022-11-14 17:52 ` Michal Hocko
2022-11-14 15:12 ` Zhongkun He
2022-11-14 18:12 ` Michal Hocko [this message]
2022-11-15 7:39 ` Zhongkun He
2022-11-16 11:28 ` Zhongkun He
2022-11-16 14:57 ` Michal Hocko
2022-11-17 7:19 ` Zhongkun He
2022-11-21 14:38 ` Michal Hocko
2022-11-22 8:33 ` Zhongkun He
2022-11-22 8:40 ` Michal Hocko
2022-11-14 9:24 ` Zhongkun He
2022-11-12 2:09 ` kernel test robot
2022-11-16 7:04 ` Huang, Ying
2022-11-16 9:38 ` [External] " Zhongkun He
2022-11-16 9:44 ` Michal Hocko
2022-11-17 6:29 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3KFFfMFE55lVdNZ@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=hezhongkun.hzk@bytedance.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox