linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "C.Wehrmeyer" <c.wehrmeyer@gmx.de>
To: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	linux-mm@kvack.org, linux-kernel <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: PROBLEM: Remapping hugepages mappings causes kernel to return EINVAL
Date: Tue, 24 Oct 2017 09:41:46 +0200	[thread overview]
Message-ID: <0c934e18-5436-792f-2b2c-ebca3ae2d786@gmx.de> (raw)
In-Reply-To: <20171023180232.luayzqacnkepnm57@dhcp22.suse.cz>

On 2017-10-23 20:02, Michal Hocko wrote:
> On Mon 23-10-17 19:52:27, C.Wehrmeyer wrote:
> [...]
>>> or you can mmap a larger block and
>>> munmap the initial unaligned part.
>>
>> And how is that supposed to be transparent? When I hear "transparent" I
>> think of a mechanism which I can put under a system so that it benefits from
>> it, while the system does not notice or at least does not need to be aware
>> of it. The system also does not need to be changed for it.
> 
> How do you expect to get a huge page when the mapping itself is not
> properly aligned?

There are four ways that I can think of from the top of my head, but 
only one of them would be actually transparent.

1. Provide a flag to mmap, which might be something different from 
MAP_HUGETLB. After all your question revolved merely around properly 
aligned pages - we don't want to *force* the kernel to reserve 
hugepages, we just want it to provide the proper alignment in this case. 
That wouldn't be very transparent, but it would be the easiest route to 
go (and mmap already kind-of supports such a thing).

2. Based on transparent_hugepage/enabled always churn out properly 
aligned pages. In this case madvise(MADV_HUGEPAGE) becomes obsolete - 
after all it's mmap which decides what kind of addresses we get. First 
getting *some* mapping that isn't properly aligned for hugepages and 
*then* trying to mitigate the damage by another syscall not only defies 
the meaning of "transparent", but might also be hard to implement 
kernel-side. Let's say I map 8 MiBs of memory, without mmap knowing that 
I'd prefer this to be allocated via THPs. I could either go with your 
route (map 8 MiBs and then some more, trim at the beginning and the end, 
and then tell madvise that all of that is now going to be hugepages - 
which is something that could easily be done in the kernel, especially 
with the internal knowledge about what the actual page size is and 
without all those context switches that one takes in by mapping, 
munmapping, munmapping *again* and then *madvising* the actual memory), 
or I'd go with my third option.

3. I map 8 MiBs, some some misaligned address from mmap, and then try to 
mitigate the damage by telling madvise that all that is now supposed to 
use hugepages. The dumb way of implementing this would be to split the 
mapping - one section at the beginning has 256 4-KiB pages, the next one 
utilises 3 2-MiB pages, and the last section has 256 4-KiB pages again 
(or some such), effectively equalling 8 MiBs. I don't even know if Linux 
supports variable-page-size mappings, and of course we're still carrying 
512 4-KiBs pages with us that would have easily been mapped into one 
2-MiB page, which is why I call it the dumb way.

4. Like three, but a wee bit smarter: introduce another system call that 
works like madvise(MADV_HUGEPAGE), but let it return the address of a 
properly aligned mapping, thus giving userspace 4 genuine 2-MiB pages. 
Just like 3) that wouldn't be transparent, but at least it's only 4 
context switches that don't give us half-baked hugepages. However, this 
approach would effectively only be 1), just more complicated and 
un-transparent.

tl; dr:

1. Provide mmap with some sort of flag (which would be redundant IMHO) 
in order to churn out properly aligned pages (not transparent, but the 
current MAP_HUGETLB flag isn't either).
2. Based on THP enabling status always churn out properly aligned pages, 
and just failsafe to smaller pages if hugepages couldn't be allocated 
(truly transparent).
3. Map in memory, then tell madvise to make as many hugepages out of it 
as possible while still keeping the initial mapping (not transparent, 
and not sure Linux can actually do that).
4. Introduce a new system call (not transparent from the get-go) to give 
out properly aligned pages, or make them properly aligned while the 
mapping is transformed from not-properly-aligned to properly-aligned.

Your call.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-24  7:42 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <93684e4b-9e60-ef3a-ba62-5719fdf7cff9@gmx.de>
2017-10-19  7:34 ` C.Wehrmeyer
2017-10-20 22:42   ` Mike Kravetz
2017-10-23 11:42     ` Michal Hocko
2017-10-23 12:22       ` C.Wehrmeyer
2017-10-23 12:41         ` Michal Hocko
2017-10-23 14:00           ` C.Wehrmeyer
2017-10-23 16:13             ` Michal Hocko
2017-10-23 16:46               ` C.Wehrmeyer
2017-10-23 16:57                 ` Michal Hocko
2017-10-23 17:52                   ` C.Wehrmeyer
2017-10-23 18:02                     ` Michal Hocko
2017-10-24  7:41                       ` C.Wehrmeyer [this message]
2017-10-24  8:12                         ` Michal Hocko
2017-10-24  8:32                           ` C.Wehrmeyer
2017-10-27 14:29                         ` Vlastimil Babka
2017-10-27 17:06                           ` Mike Kravetz
2017-10-27 17:31                           ` Kirill A. Shutemov
2017-10-23 18:51                     ` Mike Kravetz
2017-10-24  8:09                       ` C.Wehrmeyer
2017-10-07  1:58 C.Wehrmeyer
2017-10-09 16:47 ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0c934e18-5436-792f-2b2c-ebca3ae2d786@gmx.de \
    --to=c.wehrmeyer@gmx.de \
    --cc=aarcange@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox