From: Eric B Munson <emunson@akamai.com>
To: Davide Libenzi <davidel@xmailserver.org>,
David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Joern Engel <joern@logfs.org>, Jianguo Wu <wujianguo@huawei.com>,
linux-mm@kvack.org,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [patch][resend] MAP_HUGETLB munmap fails with size not 2MB aligned
Date: Thu, 26 Mar 2015 10:08:59 -0400 [thread overview]
Message-ID: <551412FB.4090406@akamai.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1503260431290.2755@mbplnx>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 03/26/2015 07:56 AM, Davide Libenzi wrote:
> On Wed, 25 Mar 2015, David Rientjes wrote:
>
>> I looked at this thread at http://marc.info/?t=141392508800001
>> since I didn't have it in my mailbox, and I didn't get a chance
>> to actually run your test code.
>>
>> In short, I think what you're saying is that
>>
>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) munmap(ptr,
>> 4KB) == EINVAL
>
> I am not sure you have read the email correctly:
>
> munmap(mmap(size, HUGETLB), size) = EFAIL
>
> For every size not multiple of the huge page size. Whereas:
>
> munmap(mmap(size, HUGETLB), ALIGN(size, HUGEPAGE_SIZE)) = OK
I think Davide is right here, this is a long existing bug in the
MAP_HUGETLB implementation. Specifically, the mmap man page says:
All pages containing a part of the indicated range are unmapped, and
subsequent references to these pages will generate SIGSEGV.
I realize that huge pages may not have been considered by those that
wrote the spec. But if I read this I would assume that all pages,
regardless of size, touched by the munmap() request should be unmapped.
Please include
Acked-by: Eric B Munson <emunson@akamai.com>
to the original patch. I would like to see the mmap man page adjusted
to make note of this behavior as well.
>
>
>> Respecting the mmap(2) POSIX specification? I don't think
>> mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...) mapping 2MB violates
>> POSIX.1-2001 and not only because it obviously doesn't address
>> MAP_HUGETLB, but I don't think the spec says the system cannot
>> map more memory than len.
>>
>> Using MAP_HUGETLB is really more a library function than anything
>> else since you could easily implement the same behavior in a
>> library. That function includes aligning len to the hugepage
>> size, so doing
>>
>> ptr = mmap(..., 4KB, ..., MAP_HUGETLB | ..., ...)
>>
>> is the equivalent to doing
>>
>> ptr = mmap(..., hugepage_size, ..., MAP_HUGETLB | ..., ...)
>>
>> and that doesn't violate any spec. But your patch doesn't change
>> mmap() at all, so let's forget about that.
>
> That is what every mmap() implementation does, irrespectively of
> any page size. And that is also what the POSIX spec states. The
> size will be automatically rounded up to a multiple of the
> underline physical page size. The problem is not mmap() though, in
> this case.
>
>
>> The question you pose is whether munmap(ptr, 4KB) should succeed
>> for a hugetlb vma and in your patch you align this to the
>> hugepage size of the vma in the same manner that munmap(ptr, 2KB)
>> would be aligned to PAGE_SIZE for a non-hugetlb vma.
>>
>> The munmap() spec says the whole pages that include any part of
>> the passed length should be unmapped. In spirit, I would agree
>> with you that the page size for the vma is the hugepage size so
>> that would be what would be unmapped.
>>
>> But that's going by a spec that doesn't address hugepages and is
>> worded in a way that {PAGE_SIZE} is the base unit that both
>> mmap() and munmap() is done. It carries no notion of variable
>> page sizes and how hugepages should be handled with respect to
>> pages of {PAGE_SIZE} length. So I think this is beyond the scope
>> of the spec: any length is aligned to PAGE_SIZE, but the munmap()
>> behavior for hugetlb vmas is not restricted.
>>
>> It would seem too dangerous at this point to change the behavior
>> of munmap(ptr, 4KB) on a hugetlb vma and that userspace bugs
>> could actually arise from aligning to the hugepage size.
>
> You mean, there is an harder failure than the current failure? :)
>
>
>> Some applications purposefully reserve hugetlb pages by mmap()
>> and never munmap() them so they have exclusive access to
>> hugepages that were allocated either at boot or runtime by the
>> sysadmin. If they depend on the return value of munmap() to
>> determine if memory to free is memory dynamically allocated by
>> the application or reserved as hugetlb memory, then this would
>> cause them to break. I can't say for certain that no such
>> application exists.
>
> The fact that certain applications will seldomly call an API,
> should be no reason for API to have bugs, or at the very least, a
> bahviour which not only in not documented in the man pages, but
> also totally unrespectful of the normal mmap/munmap semantics.
> Again, the scenario that you are picturing, is one where an
> application relies on a permanent (that is what it is - it always
> fails unless the munmap size is multiple than huge page size)
> failure of munmap, to do some productive task. An munmap() of huge
> page aligned size, will succeed in both case (vanilla, and patch).
>
>
>> Since hugetlb memory is beyond the scope of the POSIX.1-2001
>> munmap() specification, and there's a potential userspace
>> breakage if the length becomes hugepage aligned, I think the
>> do_unmap() implementation is correct as it stands.
>
> If the length is huge page aligned, it will be working with or
> without patch applied. The problem is for the other 2097151 out of
> 2097152 cases, where length is not indeed aligned to 2MB (or
> whatever hugepage size is for the architecture).
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAEBAgAGBQJVFBL4AAoJELbVsDOpoOa9KoIQAIi/gAmfaQMvv9GEskFMM8Su
x3RKDf7ldHfDYbrwqKBllp/Hp3hrF0IZM5PFvq7edOEAYbgSkhIQAu0QdI2RdVlU
iKCjk4/RI2sAtNjHXZ7XrPSipBhWmbWc83cCTo38ZCbaWRwsiZ0p5t7U/K1I35tb
IKJPu8vTU3d1VupNp2Fse7VD/ImwO2HWMPXCkTT8KUcyVVKLWGQ37cSsN4jPbkQP
yqHhpzGX9P8Qkh0Hs6BFl6kjRheIKIhTD4o8Yq5kJGgXH6O8r09riMIquhogCru+
TFFVW97zjgyjRqSYE3GHu2oCdc4LLuAoRxxDMzaRqcw1C3nqKmtjB3wJEf/Pcina
UcbTlqdDjDAHy6adNb06k2q2WNNn3CdoAQIRs/mjSvdDMN+gh7TDWKMXqtv4AODc
3xedLGL2bidHCzmgrxDU1hkRjMR8DoW2MCayQoOSIpOwVhXeA2koNbgHpJD3Gboh
0y9FvLNoXMQkBi8eksatfT/kT4xn25F7OMwdP5u0euGidXsOSLz4p/87bBJpCzmr
CptQ/V+T7HgMglby8fLfZK9GE5CuwskMzrevF2cUAhyVIkXoiItD6FAh9v6SOjbh
jy4Ctq2xkCbAKhsmrUnoUOVRNlnJ8m0I9Eq1gyWHy6qU3UDmY6XBXbJEPERRbn2T
MZfuVLJLSR4Nrm7fdHoL
=C3GM
-----END PGP SIGNATURE-----
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-03-26 14:09 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-23 2:26 Davide Libenzi
2015-03-26 0:47 ` Hugh Dickins
2015-03-26 1:06 ` Davide Libenzi
2015-03-26 3:17 ` David Rientjes
2015-03-26 11:56 ` Davide Libenzi
2015-03-26 14:08 ` Eric B Munson [this message]
2015-03-30 16:03 ` KOSAKI Motohiro
2015-03-30 20:32 ` Hugh Dickins
2015-03-26 19:15 ` David Rientjes
2015-03-26 19:39 ` Davide Libenzi
2015-03-26 20:03 ` David Rientjes
2015-03-27 9:47 ` Vlastimil Babka
2015-03-27 13:51 ` Eric B Munson
2015-03-27 9:45 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=551412FB.4090406@akamai.com \
--to=emunson@akamai.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=davidel@xmailserver.org \
--cc=hughd@google.com \
--cc=joern@logfs.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=wujianguo@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox