linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "lixinhai.lxh@gmail.com" <lixinhai.lxh@gmail.com>
To: "Hugh Dickins" <hughd@google.com>,
	 xinhai.li <xinhai.li@outlook.com>,
	 lixinhai_lxh <lixinhai_lxh@126.com>
Cc: "Vlastimil Babka" <vbabka@suse.cz>,
	 "Michal Hocko" <mhocko@kernel.org>,
	 "linux-mm@kvack.org" <linux-mm@kvack.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	 "Linux API" <linux-api@vger.kernel.org>
Subject: Re: [PATCH] mm: allow unmapped hole at head side of mbind range
Date: Mon, 28 Oct 2019 16:12:59 +0800	[thread overview]
Message-ID: <2019102816125759600417@gmail.com> (raw)
In-Reply-To: <alpine.LSU.2.11.1910241900070.1096@eggly.anvils>

On 2019-10-28 at 15:14:51 Hugh Dickins wrote:
>On Thu, 24 Oct 2019, Vlastimil Babka wrote:
>
>> + linux-api
>>
>> On 10/24/19 9:35 AM, Li Xinhai wrote:
>> > From: Li Xinhai  <xinhai.li@outlook.com>
>> >
>> > mbind_range silently ignore unmapped hole at middle and tail of the
>> > specified range, but report EFAULT if hole at head side.
>>
>>
>> Hmm that's unfortunate. mbind() manpage says:
>>
>> EFAULT Part or all of the memory range specified by nodemask and maxnode
>> points outside your accessible address space.  Or, there was an unmapped
>> hole in  the  specified  memory range specified by addr and len.
>>
>> That sounds like any hole inside the specified range should return
>> EFAULT.
>
>Yes (though an exception is allowed when restoring to default).
>
>> But perhaps it can be also interpreted as you suggest, that the
>> whole range is an unmapped hole.  There's some risk of breaking existing
>> userspace if we change it either way.
>>
>> > It is more reasonable to support silently ignore holes at any part of
>> > the range, only report EFAULT if the whole range is in hole.
>> >
>> > Signed-off-by: Li Xinhai <xinhai.li@outlook.com>
>
>Xinhai, I'm sceptical about this patch: is it something you found
>by code inspection, or something you found when using mbind()?
> 
I encountered issue when using mbind (my issue was about using nodemask 
parameter), and then found this special range checking in mbind_range().

>I've not looked long enough to be certain, nor experimented, but:
>
>mbind_range() is only one stage of the mbind() syscall implementation,
>and is preceded by queue_pages_range(): look what queue_pages_test_walk()
>does when MPOL_MF_DISCONTIG_OK not set.
>
>My impression is that mbind_range() is merely correcting an omission
>from the checks already made my queue_pages_test_walk() (an odd way
>to proceed, I admit: would be better to check initially than later).
>
>I do think that you should not make this change without considering
>MPOL_MF_DISCONTIG_OK and its intention.
>
>Hugh
> 

A program was used to reveal issues as below.

#include <stddef.h>
#include <sys/mman.h>
#include <numaif.h>

int main(int argc, char *argv[])
{
    void *mapAddr;
    unsigned long nodemask;

    mapAddr = mmap(NULL, 6*(1<<12), PROT_READ|PROT_WRITE, MAP_PRIVATE|
        MAP_ANONYMOUS, -1, 0);

    // BIND and leave 2 pages as hole in the middle
    nodemask = 0x1;
    mbind(mapAddr, 6*(1<<12), MPOL_BIND, &nodemask, 2, 0);
    munmap(mapAddr+2*(1<<12), 2*(1<<12));

    // part 1
    mbind(mapAddr-1*(1<<12), 2*(1<<12), MPOL_DEFAULT, NULL, 0, 0);
    mbind(mapAddr, 3*(1<<12), MPOL_DEFAULT, NULL, 0, 0);

    // part 2
    nodemask = 0x2;
    mbind(mapAddr+3*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0);
    mbind(mapAddr+4*(1<<12), 3*(1<<12), MPOL_BIND, &nodemask, 3, 0);
    mbind(mapAddr+3*(1<<12), 1*(1<<12), MPOL_BIND, &nodemask, 3, 0);
    mbind(mapAddr+4*(1<<12), 2*(1<<12), MPOL_BIND, &nodemask, 3, 0);

    return 0;
}

syscall results:
     83 mmap(NULL, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbd24e13000
     84 mbind(0x7fbd24e13000, 24576, MPOL_BIND, [0x0000000000000001], 2, 0) = 0
     85 munmap(0x7fbd24e15000, 8192)            = 0
// part 1
     86 mbind(0x7fbd24e12000, 8192, MPOL_DEFAULT, NULL, 0, 0) = -1 EFAULT (Bad address)
     87 mbind(0x7fbd24e13000, 12288, MPOL_DEFAULT, NULL, 0, 0) = 0
// part 2
     88 mbind(0x7fbd24e16000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address)
     89 mbind(0x7fbd24e17000, 12288, MPOL_BIND, [0x0000000000000002], 3, 0) = 0
     90 mbind(0x7fbd24e16000, 4096, MPOL_BIND, [0x0000000000000002], 3, 0) = -1 EFAULT (Bad address)
     91 mbind(0x7fbd24e17000, 8192, MPOL_BIND, [0x0000000000000002], 3, 0) = 0

The results on line 86 and line 89 were not correct (other lines were expected):
line 86: hole at head side of range was reported as error; this should 
  sucess for MPOL_DEFAULT;
line 89: hole at tail side of range was reported as success; this should 
  fail for !MPOL_DEFAULT cases;
  
My patch only corrected line 86 case, but didn't handle line 89 case. It 
is better to detect valid or invalid hole for MPOL_DEFAULT and 
!MPOL_DEFAULT cases in queue_pages_range phase.

New patch will be prepared, and fullfill the linux API description:
1. for MPOL_DEFAULT, hole at any part of specified range is allowed;
2. for !MPOL_DEFAULT, hole at any part of specified range is not allowed.

Xinhai
(BTW, I am adding two more mail accounts of mine to check which is best for 
this mailling list...)

>> > ---
>> >
>> >  mm/mempolicy.c | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> >
>> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
>> > index 4ae967bcf954..ae160d9936d9 100644
>> > --- a/mm/mempolicy.c
>> > +++ b/mm/mempolicy.c
>> > @@ -738,7 +738,7 @@ static int mbind_range(struct mm_struct *mm, unsigned long start,
>> >         unsigned long vmend;
>> > 
>> >         vma = find_vma(mm, start);
>> > -       if (!vma || vma->vm_start > start)
>> > +       if (!vma || vma->vm_start >= end)
>> >                 return -EFAULT;
>> > 
>> >         prev = vma->vm_prev;
>> >
>

  reply	other threads:[~2019-10-28  8:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-24  7:35 Li Xinhai
2019-10-24 10:48 ` Vlastimil Babka
2019-10-25  2:32   ` Hugh Dickins
2019-10-28  8:12     ` lixinhai.lxh [this message]
2019-10-24 12:25 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2019102816125759600417@gmail.com \
    --to=lixinhai.lxh@gmail.com \
    --cc=hughd@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lixinhai_lxh@126.com \
    --cc=mhocko@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=xinhai.li@outlook.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox