linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Jörn Engel" <joern@purestorage.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Uday Shankar <ushankar@purestorage.com>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [bug report?] unintuitive behavior when mapping over hugepage-backed PROT_NONE regions
Date: Fri, 7 Feb 2025 11:35:40 -0800	[thread overview]
Message-ID: <Z6ZgjJnL0utjKy_P@cork> (raw)
In-Reply-To: <b113c695-4c8d-4276-bdc4-409195b636dd@lucifer.local>

On Fri, Feb 07, 2025 at 01:12:33PM +0000, Lorenzo Stoakes wrote:
> 
> So TL;DR is - aggregate operations failing means any or all of the
> operation failed, you can no longer rely on the mapping state being what
> you expected.

Coming back to the "what should the interface be?" question, I can see
three reasonable answers:
1. Failure should result in no change.  We have a bug and will fix it.
2. Failure should result in no change.  But fixing things is exceedingly
   hard and we may have to live with current reality for a long time.
3. Failure should result in undefined behavior.

I think you convincingly argue against the first answer.  It might still
be useful to also argue against the third answer.


For background, I wrote a somewhat weird memory allocator in 2017,
called "big_allocate".  Underlying problem is that your favorite malloc
tends to do a reasonable job for small to medium objects, but eventually
gives up and calls mmap()/munmap() for large objects.  With a heavily
threaded process, the combination of mmap_sem and TLB shootdown via IPI
is a big performance-killer.  Solution is a specialized allocator for
large objects instead of mmap()/munmap().

The original (and still current) design of big_allocate has a mapping
structure somewhat similar to "struct page" in the kernel.  It relies on
having a large chunk of virtual memory space that it directly controls,
so that it can have a simple 1:1 mapping between virtual memory and
"struct page".

To get a large chunk of virtual memory space, big_allocate does a
MAP_NONE mmap().  It then later does the MAP_RW mmap() to allocate
memory.  Often combined with MAP_HUGETLB, for obvious performance
reasons.  (Side note: I wish MAP_RW existed in the headers.)

If memory serves, big_allocate resulted in a 2-3% macrobenchmark
improvement.

Current big_allocate has a number of ugly warts I rather dislike.  One
of those warts is that you now have existing users that rely on mmap()
over existing MAP_NONE mappings working.  At least with the special set
of conditions we care about.

I have some plans to rewrite big_allocate with a different design.  But
for now we have existing code that may make your life harder than you
wished for.

Jörn

--
Without congressional action or a strong judicial precedent, I would
_strongly_ recommend against anyone trusting their private data to a
company with physical ties to the United States.
-- Ladar Levison


  reply	other threads:[~2025-02-07 19:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-06  6:18 Uday Shankar
2025-02-06  9:01 ` Oscar Salvador
2025-02-06 18:11   ` Jörn Engel
2025-02-06 18:54     ` Oscar Salvador
2025-02-07 10:29       ` Lorenzo Stoakes
2025-02-07 10:49     ` Vlastimil Babka
2025-02-07 12:33     ` Lorenzo Stoakes
2025-02-06 19:44   ` Uday Shankar
2025-02-07 13:12 ` Lorenzo Stoakes
2025-02-07 19:35   ` Jörn Engel [this message]
2025-02-08 16:02     ` Lorenzo Stoakes
2025-02-08 17:37       ` Jörn Engel
2025-02-08 17:40         ` Lorenzo Stoakes
2025-02-08 17:53           ` Jörn Engel
2025-02-08 18:00             ` Lorenzo Stoakes
2025-02-08 21:16               ` Jörn Engel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6ZgjJnL0utjKy_P@cork \
    --to=joern@purestorage.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=jannh@google.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=ushankar@purestorage.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox