From: Michal Hocko <mhocko@kernel.org>
To: Pan Zhang <zhangpan26@huawei.com>
Cc: akpm@linux-foundation.org, vbabka@suse.cz, rientjes@google.com,
jgg@ziepe.ca, aarcange@redhat.com, yang.shi@linux.alibaba.com,
zhongjiang@huawei.com, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Cristopher Lameter <cl@linux.com>
Subject: Re: [PATCH] mm: mempolicy: fix the absence of the last bit of nodemask
Date: Mon, 14 Oct 2019 11:12:43 +0200 [thread overview]
Message-ID: <20191014091243.GD317@dhcp22.suse.cz> (raw)
In-Reply-To: <1570882789-20579-1-git-send-email-zhangpan26@huawei.com>
[Cc Christopher - th initial emails is http://lkml.kernel.org/r/1570882789-20579-1-git-send-email-zhangpan26@huawei.com]
On Sat 12-10-19 20:19:48, Pan Zhang wrote:
> When I want to use set_mempolicy to get the memory from each node on the numa machine,
> and the MPOL_INTERLEAVE flag seems to achieve this goal.
> However, during the test, it was found that the use result of node was unbalanced.
> The memory was allocated evenly from the nodes except the last node,
> which obviously did not match the expectations.
>
> You can test as follows:
> 1. Create a file that needs to be mmap ped:
> dd if=/dev/zero of=./test count=1024 bs=1M
This will already poppulate the page cache and if it fits into memory
(which seems to be the case in your example output) then your mmap later
will not allocate any new memory.
I suspect that using numactl --interleave 0,1 dd if=/dev/zero of=./test count=1024 bs=1M
will produce an output much closer to your expectation. Right?
[...]
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 4ae967b..a23509f 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1328,9 +1328,11 @@ static int get_nodes(nodemask_t *nodes, const unsigned long __user *nmask,
> unsigned long nlongs;
> unsigned long endmask;
>
> - --maxnode;
> nodes_clear(*nodes);
> - if (maxnode == 0 || !nmask)
> + /*
> + * If the user specified only one node, no need to set nodemask
> + */
> + if (maxnode - 1 == 0 || !nmask)
> return 0;
> if (maxnode > PAGE_SIZE*BITS_PER_BYTE)
> return -EINVAL;
I am afraid this is a wrong fix. It is really hard to grasp the code but my
understanding is that the caller is supposed to provide maxnode larger
than than the nodemask. So if you want 2 nodes then maxnode should be 3.
Have a look at the libnuma (which is a reference implementation)
static void setpol(int policy, struct bitmask *bmp)
{
if (set_mempolicy(policy, bmp->maskp, bmp->size + 1) < 0)
numa_error("set_mempolicy");
}
The semantic is quite awkward but it is that way for years.
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2019-10-14 9:12 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-12 12:19 Pan Zhang
2019-10-14 9:12 ` Michal Hocko [this message]
2019-10-14 9:35 ` Vlastimil Babka
2019-10-14 13:49 ` Pan Zhang
-- strict thread matches above, loose matches on Subject: below --
2019-10-12 12:08 z00417012
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191014091243.GD317@dhcp22.suse.cz \
--to=mhocko@kernel.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=cl@linux.com \
--cc=jgg@ziepe.ca \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=vbabka@suse.cz \
--cc=yang.shi@linux.alibaba.com \
--cc=zhangpan26@huawei.com \
--cc=zhongjiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox