linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	linux_lkml_grp@oracle.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Toshi Kani <toshi.kani@hpe.com>, Boaz Harrosh <boazh@netapp.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH] mm: align anon mmap for THP
Date: Mon, 14 Jan 2019 14:26:26 -0500	[thread overview]
Message-ID: <7d1ccbc3-7dad-99de-1b15-77bb1196f9a3@oracle.com> (raw)
In-Reply-To: <50c6abdc-b906-d16a-2f8f-8647b3d129aa@oracle.com>

On 1/14/2019 1:54 PM, Mike Kravetz wrote:
> On 1/14/19 7:35 AM, Steven Sistare wrote:
>> On 1/11/2019 6:28 PM, Mike Kravetz wrote:
>>> On 1/11/19 1:55 PM, Kirill A. Shutemov wrote:
>>>> On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote:
>>>>> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops'
>>>>> to get an address returned by mmap() suitably aligned for THP.  It seems
>>>>> that if mmap is asking for a mapping length greater than huge page
>>>>> size, it should align the returned address to huge page size.
>>
>> A better heuristic would be to return an aligned address if the length
>> is a multiple of the huge page size.  The gap (if any) between the end of
>> the previous VMA and the start of this VMA would be filled by subsequent
>> smaller mmap requests.  The new behavior would need to become part of the
>> mmap interface definition so apps can rely on it and omit their hoop-jumping
>> code.
> 
> Yes, the heuristic really should be 'length is a multiple of the huge page
> size'.  As you mention, this would still leave gaps.  I need to look closer
> but this may not be any worse than the trick of mapping an area with rounded
> up length and then unmapping pages at the beginning.
> 
> When I sent this out, the thought in the back of my mind was that this doesn't
> really matter unless there is some type of alignment guarantee.  Otherwise,
> user space code needs continue employing their code to check/force alignment.
> Making matters somewhat worse is that I do not believe there is C interface to
> query huge page size.  I thought there was discussion about adding one, but I
> can not find it.

Right. Solaris provides getpagesizes().

>> Personally I would like to see a new MAP_ALIGN flag and treat the addr
>> argument as the alignment (like Solaris), but I am told that adding flags
>> is problematic because old kernels accept undefined flag bits from userland
>> without complaint, so their behavior would change.
> 
> Well, a flag would clearly define desired behavior.
> 
> As others have been mentioned, there are mechanisms in place that allow user
> space code to get the alignment it wants.  However, it is at the expense of
> an additional system call or two.  Perhaps the question is, "Is it worth
> defining new behavior to eliminate this overhead?".
> 
> One other thing to consider is that at mmap time, we likely do not know if
> the vma will/can use THP.  We would know if system wide THP configuration
> is set to never or always.  However, I 'think' the default for most distros
> is madvize.  Therefore, it is not until a subsequent madvise call that we
> know THP will be employed.  If the application code will need to make this
> separate madvise call, then perhaps it is not too much to expect that it
> take explicit action to optimally align the mapping.

True.  It is annoying to write the extra code, but the power user will do it.

The heuristic alignment would primarily benefit applications that are not as
carefully optimized.

- Steve

  reply	other threads:[~2019-01-14 19:26 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-11 20:10 Mike Kravetz
2019-01-11 21:55 ` Kirill A. Shutemov
2019-01-11 23:28   ` Mike Kravetz
2019-01-14 13:50     ` Kirill A. Shutemov
2019-01-14 16:29       ` Harrosh, Boaz
2019-01-14 16:40         ` Michal Hocko
2019-01-14 16:54           ` Harrosh, Boaz
2019-01-14 18:02             ` Michal Hocko
2019-01-14 15:35     ` Steven Sistare
2019-01-14 16:40       ` Harrosh, Boaz
2019-01-14 18:54       ` Mike Kravetz
2019-01-14 19:26         ` Steven Sistare [this message]
2019-01-15  8:24         ` Kirill A. Shutemov
2019-01-15 18:08           ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7d1ccbc3-7dad-99de-1b15-77bb1196f9a3@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=boazh@netapp.com \
    --cc=dan.j.williams@intel.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux_lkml_grp@oracle.com \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=toshi.kani@hpe.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox