From: David Rientjes <rientjes@google.com>
To: Daniel Micay <danielmicay@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Aliaksey Kandratsenka <alkondratenko@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Shaohua Li <shli@fb.com>,
linux-mm@kvack.org, linux-api@vger.kernel.org,
Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Mel Gorman <mel@csn.ul.ie>, Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@suse.cz>,
Andy Lutomirski <luto@amacapital.net>,
"google-perftools@googlegroups.com"
<google-perftools@googlegroups.com>
Subject: Re: [PATCH] mremap: add MREMAP_NOHOLE flag --resend
Date: Wed, 25 Mar 2015 19:31:13 -0700 (PDT) [thread overview]
Message-ID: <alpine.DEB.2.10.1503251914260.16714@chino.kir.corp.google.com> (raw)
In-Reply-To: <551351CA.3090803@gmail.com>
On Wed, 25 Mar 2015, Daniel Micay wrote:
> > With tcmalloc, it's simple to always expand the heap by mmaping 2MB ranges
> > for size classes <= 2MB, allocate its own metadata from an arena that is
> > also expanded in 2MB range, and always do madvise(MADV_DONTNEED) for the
> > longest span on the freelist when it does periodic memory freeing back to
> > the kernel, and even better if the freed memory splits at most one
> > hugepage. When memory is pulled from the freelist of memory that has
> > already been returned to the kernel, you can return a span that will make
> > it eligible to be collapsed into a hugepage based on your setting of
> > max_ptes_none, trying to consolidate the memory as much as possible. If
> > your malloc is implemented in a way to understand the benefit of
> > hugepages, and how much memory you're willing to sacrifice (max_ptes_none)
> > for it, then you should _never_ be increasing memory usage by 50%.
>
> If khugepaged was the only source of huge pages, sure. The primary
> source of huge pages is the heuristic handing out an entire 2M page on
> the first page fault in a 2M range.
>
The behavior is a property of what you brk() or mmap() to expand your
heap, you can intentionally require it to fault hugepages or not fault
hugepages without any special madvise().
With the example above, the implementation I wrote specifically tries to
sbrk() in 2MB regions and hands out allocator metadata via a memory arena
doing the same thing. Memory is treated as being on a normal freelist so
that it is considered resident, i.e. the same as faulting 4KB, freeing it,
before tcmalloc does madvise(MADV_DONTNEED), and we naturally prefer to
hand that out before going to the returned freelist or mmap() as fallback.
There will always be fragmentation in your normal freelist spans, so
there's always wasted memory (with or without thp). There should never be
a case where you're always mapping 2MB aligned regions and then only
touching a small portion of it, for >2MB size classes you could easily map
only the size required and you would never get an excess of memory due to
thp at fault.
I think this may be tangential to the thread, though, since this has
nothing to do with mremap() or any new mremap() flag.
If the thp faulting behavior is going to be changed, then it would need to
be something that is opted into and not by any system tunable or madvise()
flag. It would probably need to be a prctl() like PR_SET_THP_DISABLE is
that would control only fault behavior.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-03-26 2:31 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-17 21:09 Shaohua Li
2015-03-18 22:31 ` Andrew Morton
2015-03-19 5:08 ` Shaohua Li
2015-03-19 5:22 ` Andrew Morton
2015-03-19 16:38 ` Shaohua Li
2015-03-19 5:34 ` Daniel Micay
2015-03-22 6:06 ` Aliaksey Kandratsenka
2015-03-22 7:22 ` Daniel Micay
2015-03-24 4:36 ` Aliaksey Kandratsenka
2015-03-24 14:54 ` Daniel Micay
2015-03-25 16:22 ` Vlastimil Babka
2015-03-25 20:49 ` Daniel Micay
2015-03-25 20:54 ` Daniel Micay
2015-03-26 0:19 ` David Rientjes
2015-03-26 0:24 ` Daniel Micay
2015-03-26 2:31 ` David Rientjes [this message]
2015-03-26 3:24 ` Daniel Micay
2015-03-26 3:36 ` Daniel Micay
2015-03-26 17:25 ` Vlastimil Babka
2015-03-26 20:45 ` Daniel Micay
2015-03-23 5:17 ` Shaohua Li
2015-03-24 5:25 ` Aliaksey Kandratsenka
2015-03-24 14:39 ` Daniel Micay
2015-03-25 5:02 ` Shaohua Li
2015-03-26 0:50 ` Minchan Kim
2015-03-26 1:21 ` Daniel Micay
2015-03-26 7:02 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.DEB.2.10.1503251914260.16714@chino.kir.corp.google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=alkondratenko@gmail.com \
--cc=danielmicay@gmail.com \
--cc=google-perftools@googlegroups.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=mel@csn.ul.ie \
--cc=mhocko@suse.cz \
--cc=riel@redhat.com \
--cc=shli@fb.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox