From: "Huang\, Ying" <ying.huang@intel.com>
To: peterz@infradead.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>, Mel Gorman <mgorman@suse.de>,
Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
"Matthew Wilcox \(Oracle\)" <willy@infradead.org>,
Dave Hansen <dave.hansen@intel.com>,
Andi Kleen <ak@linux.intel.com>, Michal Hocko <mhocko@suse.com>,
David Rientjes <rientjes@google.com>
Subject: Re: [RFC] autonuma: Migrate on fault among multiple bound nodes
Date: Wed, 16 Sep 2020 16:46:37 +0800 [thread overview]
Message-ID: <87pn6mrtw2.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20200916081052.GI2674@hirez.programming.kicks-ass.net> (peterz@infradead.org's message of "Wed, 16 Sep 2020 10:10:52 +0200")
Hi, Peter,
Thanks for comments!
peterz@infradead.org writes:
> On Wed, Sep 16, 2020 at 08:59:36AM +0800, Huang Ying wrote:
>
>> So in this patch, if MPOL_BIND is used to bind the memory of the
>> application to multiple nodes, and in the hint page fault handler both
>> the faulting page node and the accessing node are in the policy
>> nodemask, the page will be tried to be migrated to the accessing node
>> to reduce the cross-node accessing.
>
> Seems fair enough..
>
>> Questions:
>>
>> Sysctl knob kernel.numa_balancing can enable/disable AutoNUMA
>> optimizing globally. And now, it appears that the explicit NUMA
>> memory policy specifying (e.g. via numactl, mbind(), etc.) acts like
>> an implicit per-thread/VMA knob to enable/disable the AutoNUMA
>> optimizing for the thread/VMA. Although this looks like a side effect
>> instead of an API, from commit fc3147245d19 ("mm: numa: Limit NUMA
>> scanning to migrate-on-fault VMAs"), this is used by some users? So
>> the question is, do we need an explicit per-thread/VMA knob to
>> enable/disable AutoNUMA optimizing for the thread/VMA? Or just use
>> the global knob, either optimize all thread/VMAs as long as the
>> explicitly specified memory policies are respected, or don't optimize
>> at all.
>
> I don't understand the question; that commit is not about disabling numa
> balancing, it's about avoiding pointless work and overhead. What's the
> point of scanning memory if you're not going to be allowed to move it
> anyway.
Because we are going to enable the moving, this makes scanning not
pointless, but may also introduce overhead.
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>> Cc: Dave Hansen <dave.hansen@intel.com>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: David Rientjes <rientjes@google.com>
>> ---
>> mm/mempolicy.c | 43 +++++++++++++++++++++++++++++++------------
>> 1 file changed, 31 insertions(+), 12 deletions(-)
>>
>> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> index eddbe4e56c73..a941eab2de24 100644
>> --- a/mm/mempolicy.c
>> +++ b/mm/mempolicy.c
>> @@ -1827,6 +1827,13 @@ static struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
>> return pol;
>> }
>>
>> +static bool mpol_may_mof(struct mempolicy *pol)
>> +{
>> + /* May migrate among bound nodes for MPOL_BIND */
>> + return pol->flags & MPOL_F_MOF ||
>> + (pol->mode == MPOL_BIND && nodes_weight(pol->v.nodes) > 1);
>> +}
>
> This is weird, why not just set F_MOF on the policy?
>
> In fact, why wouldn't something like:
>
> mbind(.mode=MPOL_BIND, .flags=MPOL_MF_LAZY);
>
> work today? Afaict MF_LAZY will unconditionally result in M_MOF.
There are some subtle difference.
- LAZY appears unnecessary for the per-task memory policy via
set_mempolicy(). While migrating among multiple bound nodes appears
reasonable as a per-task memory policy.
- LAZY also means move the pages not on the bound nodes to the bound
nodes if the memory is available. Some users may want to do that only
if should_numa_migrate_memory() returns true.
>> @@ -2494,20 +2503,30 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
>> break;
>>
>> case MPOL_BIND:
>> /*
>> + * Allows binding to multiple nodes. If both current and
>> + * accessing nodes are in policy nodemask, migrate to
>> + * accessing node to optimize page placement. Otherwise,
>> + * use current page if in policy nodemask or MPOL_F_MOF not
>> + * set, else select nearest allowed node, if any. If no
>> + * allowed nodes, use current [!misplaced].
>> */
>> + if (node_isset(curnid, pol->v.nodes)) {
>> + if (node_isset(thisnid, pol->v.nodes)) {
>> + moron = true;
>> + polnid = thisnid;
>> + } else {
>> + goto out;
>> + }
>> + } else if (!(pol->flags & MPOL_F_MOF)) {
>> goto out;
>> + } else {
>> + z = first_zones_zonelist(
>> node_zonelist(numa_node_id(), GFP_HIGHUSER),
>> gfp_zone(GFP_HIGHUSER),
>> &pol->v.nodes);
>> + polnid = zone_to_nid(z->zone);
>> + }
>> break;
>>
>> default:
>
> Did that want to be this instead? I don't think I follow the other
> changes.
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index eddbe4e56c73..2a64913f9ac6 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2501,8 +2501,11 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
> * else select nearest allowed node, if any.
> * If no allowed nodes, use current [!misplaced].
> */
> - if (node_isset(curnid, pol->v.nodes))
> + if (node_isset(curnid, pol->v.nodes)) {
> + if (node_isset(thisnod, pol->v.nodes))
> + goto moron;
> goto out;
> + }
> z = first_zones_zonelist(
> node_zonelist(numa_node_id(), GFP_HIGHUSER),
> gfp_zone(GFP_HIGHUSER),
> @@ -2516,6 +2519,7 @@ int mpol_misplaced(struct page *page, struct vm_area_struct *vma, unsigned long
>
> /* Migrate the page towards the node whose CPU is referencing it */
> if (pol->flags & MPOL_F_MORON) {
> +moron:
> polnid = thisnid;
>
> if (!should_numa_migrate_memory(current, page, curnid, thiscpu))
Yes. This looks better if we can just use F_MOF.
Best Regards,
Huang, Ying
next prev parent reply other threads:[~2020-09-16 8:46 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-16 0:59 Huang Ying
2020-09-16 8:10 ` peterz
2020-09-16 8:46 ` Huang, Ying [this message]
2020-09-17 2:18 ` Huang, Ying
2020-09-16 13:39 ` Qian Cai
2020-09-16 15:29 ` David Hildenbrand
2020-09-16 15:35 ` Qian Cai
2020-09-17 3:11 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pn6mrtw2.fsf@yhuang-dev.intel.com \
--to=ying.huang@intel.com \
--cc=ak@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox