From: "lixinhai.lxh@gmail.com" <lixinhai.lxh@gmail.com>
To: "Hugh Dickins" <hughd@google.com>,
xinhai.li <xinhai.li@outlook.com>,
lixinhai_lxh <lixinhai_lxh@126.com>
Cc: "Vlastimil Babka" <vbabka@suse.cz>,
"Michal Hocko" <mhocko@kernel.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Linux API" <linux-api@vger.kernel.org>
Subject: Re: [PATCH] mm: allow unmapped hole at head side of mbind range
Date: Mon, 28 Oct 2019 16:12:59 +0800 [thread overview]
Message-ID: <2019102816125759600417@gmail.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1910241900070.1096@eggly.anvils>
On 2019-10-28 at 15:14:51 Hugh Dickins wrote:
>On Thu, 24 Oct 2019, Vlastimil Babka wrote:
>
>> + linux-api
>>
>> On 10/24/19 9:35 AM, Li Xinhai wrote:
>> > From: Li Xinhai <xinhai.li@outlook.com>
>> >
>> > mbind_range silently ignore unmapped hole at middle and tail of the
>> > specified range, but report EFAULT if hole at head side.
>>
>>
>> Hmm that's unfortunate. mbind() manpage says:
>>
>> EFAULT Part or all of the memory range specified by nodemask and maxnode
>> points outside your accessible address space. Or, there was an unmapped
>> hole in the specified memory range specified by addr and len.
>>
>> That sounds like any hole inside the specified range should return
>> EFAULT.
>
>Yes (though an exception is allowed when restoring to default).
>
>> But perhaps it can be also interpreted as you suggest, that the
>> whole range is an unmapped hole. There's some risk of breaking existing
>> userspace if we change it either way.
>>
>> > It is more reasonable to support silently ignore holes at any part of
>> > the range, only report EFAULT if the whole range is in hole.
>> >
>> > Signed-off-by: Li Xinhai <xinhai.li@outlook.com>
>
>Xinhai, I'm sceptical about this patch: is it something you found
>by code inspection, or something you found when using mbind()?
>
I encountered issue when using mbind (my issue was about using nodemask
parameter), and then found this special range checking in mbind_range().
>I've not looked long enough to be certain, nor experimented, but:
>
>mbind_range() is only one stage of the mbind() syscall implementation,
>and is preceded by queue_pages_range(): look what queue_pages_test_walk()
>does when MPOL_MF_DISCONTIG_OK not set.
>
>My impression is that mbind_range() is merely correcting an omission
>from the checks already made my queue_pages_test_walk() (an odd way
>to proceed, I admit: would be better to check initially than later).
>
>I do think that you should not make this change without considering
>MPOL_MF_DISCONTIG_OK and its intention.
>
>Hugh
>
A program was used to reveal issues as below.
#include <stddef.h>
#include <sys/mman.h>
#include <numaif.h>
int main(int argc, char *argv[])
{
void *mapAddr;
unsigned long nodemask;
mapAddr = mmap(NULL, 6*(1<<12), PROT_READ|PROT_WRITE, MAP_PRIVATE|
MAP_ANONYMOUS, -1, 0);
// BIND and leave 2 pages as hole in the middle
nodemask = 0x1;
mbind(mapAddr, 6*(1<<12), MPOL_BIND, &nodemask, 2, 0);
munmap(mapAddr+2*(1<<12), 2*(1<<12));
// part 1
mbind(mapAddr-1*(1<<12), 2*(1<<12), MPOL_DEFAULT, NULL, 0, 0);
mbind(mapAddr, 3*(1<<12), MPOL_DEFAULT, NULL, 0, 0);
// part 2
nodemask = 0x2;
mbind(mapAddr+3*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0);
mbind(mapAddr+4*(1<<12), 3*(1<<12), MPOL_BIND, &nodemask, 3, 0);
mbind(mapAddr+3*(1<<12), 1*(1<<12), MPOL_BIND, &nodemask, 3, 0);
mbind(mapAddr+4*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0);
return 0;
}
syscall results:
83 mmap(NULL, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd24e13000
84 mbind(0x7fbd24e13000, 24576, MPOL_BIND, [0x0000000000000001], 2, 0) = 0
85 munmap(0x7fbd24e15000, 8192) = 0
// part 1
86 mbind(0x7fbd24e12000, 8192, MPOL_DEFAULT, NULL, 0, 0) = -1 EFAULT (Bad address)
87 mbind(0x7fbd24e13000, 12288, MPOL_DEFAULT, NULL, 0, 0) = 0
// part 2
88 mbind(0x7fbd24e16000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address)
89 mbind(0x7fbd24e17000, 12288, MPOL_BIND, [0x0000000000000002], 3, 0) = 0
90 mbind(0x7fbd24e16000, 4096, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address)
91 mbind(0x7fbd24e17000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = 0
The results on line 86 and line 89 were not correct (other lines were expected):
line 86: hole at head side of range was reported as error; this should
sucess for MPOL_DEFAULT;
line 89: hole at tail side of range was reported as success; this should
fail for !MPOL_DEFAULT cases;
My patch only corrected line 86 case, but didn't handle line 89 case. It
is better to detect valid or invalid hole for MPOL_DEFAULT and
!MPOL_DEFAULT cases in queue_pages_range phase.
New patch will be prepared, and fullfill the linux API description:
1. for MPOL_DEFAULT, hole at any part of specified range is allowed;
2. for !MPOL_DEFAULT, hole at any part of specified range is not allowed.
Xinhai
(BTW, I am adding two more mail accounts of mine to check which is best for
this mailling list...)
>> > ---
>> >
>> > mm/mempolicy.c | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> >
>> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> > index 4ae967bcf954..ae160d9936d9 100644
>> > --- a/mm/mempolicy.c
>> > +++ b/mm/mempolicy.c
>> > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>> > unsigned long vmend;
>> >
>> > vma = find_vma(mm, start);
>> > - if (!vma || vma->vm_start > start)
>> > + if (!vma || vma->vm_start >= end)
>> > return -EFAULT;
>> >
>> > prev = vma->vm_prev;
>> >
>
next prev parent reply other threads:[~2019-10-28 8:13 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-24 7:35 Li Xinhai
2019-10-24 10:48 ` Vlastimil Babka
2019-10-25 2:32 ` Hugh Dickins
2019-10-28 8:12 ` lixinhai.lxh [this message]
2019-10-24 12:25 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2019102816125759600417@gmail.com \
--to=lixinhai.lxh@gmail.com \
--cc=hughd@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lixinhai_lxh@126.com \
--cc=mhocko@kernel.org \
--cc=vbabka@suse.cz \
--cc=xinhai.li@outlook.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox