From: David Hildenbrand <david@redhat.com>
To: Brendan Jackman <jackmanb@google.com>,
peterz@infradead.org, bp@alien8.de, dave.hansen@linux.intel.com,
mingo@redhat.com, tglx@linutronix.de
Cc: akpm@linux-foundation.org, derkling@google.com,
junaids@google.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, reijiw@google.com, rientjes@google.com,
rppt@kernel.org, vbabka@suse.cz, x86@kernel.org,
yosry.ahmed@linux.dev, Patrick Roy <roypat@amazon.co.uk>,
Zi Yan <ziy@nvidia.com>
Subject: Re: [Discuss] First steps for ASI (ASI is fast again)
Date: Thu, 2 Oct 2025 09:45:37 +0200 [thread overview]
Message-ID: <44082771-a35b-4e8d-b08a-bd8cd340c9f2@redhat.com> (raw)
In-Reply-To: <20250812173109.295750-1-jackmanb@google.com>
> I won't re-hash the details of the problem here (see [1]) but in short: file
> pages aren't mapped into the physmap as seen from ASI's restricted address space.
> This causes a major overhead when e.g. read()ing files. The solution we've
> always envisaged (and which I very hastily tried to describe at LSF/MM/BPF this
> year) was to simply stop read() etc from touching the physmap.
>
> This is achieved in this prototype by a mechanism that I've called the "ephmap".
> The ephmap is a special region of the kernel address space that is local to the
> mm (much like the "proclocal" idea from 2019 [2]). Users of the ephmap API can
> allocate a subregion of this, and provide pages that get mapped into their
> subregion. These subregions are CPU-local. This means that it's cheap to tear
> these mappings down, so they can be removed immediately after use (eph =
> "ephemeral"), eliminating the need for complex/costly tracking data structures.
>
> (You might notice the ephmap is extremely similar to kmap_local_page() - see the
> commit that introduces it ("x86: mm: Introduce the ephmap") for discussion).
>
> The ephmap can then be used for accessing file pages. It's also a generic
> mechanism for accessing sensitive data, for example it could be used for
> zeroing sensitive pages, or if necessary for copy-on-write of user pages.
>
At some point we discussed on how to make secretmem pages movable so we
end up having less unmovable pages in the system.
Secretmem pages have their directmap removed once allocated, and
restored once free (truncated from the page cache).
In order to migrate them we would have to temporarily map them, and we
obviously don't want to temporarily map them into the directmap.
Maybe the ephmap could be user for that use case, too.
Another, similar use case, would be guest_memfd with a similar approach
that secretmem took: removing the direct map. While guest_memfd does not
support page migration yet, there are some prototypes that allow
migrating pages for non-CoCo (IOW: ordinary) VMs.
Maybe using the ephmap could be used here too.
I guess an interesting question would be: which MM to use when we are
migrating a page out of random context: memory offlining, page
compaction, memory-failure, alloc_contig_pages, ...
[...]
>
> Despite my title these numbers are kinda disappointing to be honest, it's not
> where I wanted to be by now,
"ASI is faster again" :)
> but it's still an order-of-magnitude better than
> where we were for native FIO a few months ago. I believe almost all of this
> remaining slowdown is due to unnecessary ASI exits, the key areas being:
>
> - On every context_switch(). Google's internal implementation has fixed this (we
> only really need it when switching mms).
>
> - Whenever zeroing sensitive pages from the allocator. This could potentially be
> solved with the ephmap but requires a bit of care to avoid opening CPU attack
> windows.
>
> - In copy-on-write for user pages. The ephmap could also help here but the
> current implementation doesn't support it (it only allows one allocation at a
> time per context).
>
But only the first point would actually be relevant for the FIO
benchmark I assume, right?
So how confident are you that this is really going to be solvable. Or to
ask from another angle: long-term how much slowdown do you expect and
target?
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-10-02 7:45 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-12 17:31 Brendan Jackman
2025-08-19 18:03 ` Brendan Jackman
2025-08-21 8:55 ` Lorenzo Stoakes
2025-08-21 12:15 ` Brendan Jackman
2025-08-22 14:22 ` Lorenzo Stoakes
2025-08-22 17:18 ` Matthew Wilcox
2025-08-22 16:56 ` Uladzislau Rezki
2025-08-22 17:20 ` Brendan Jackman
2025-08-25 9:00 ` Uladzislau Rezki
2025-10-02 7:45 ` David Hildenbrand [this message]
2025-10-02 10:50 ` Brendan Jackman
2025-10-02 11:21 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44082771-a35b-4e8d-b08a-bd8cd340c9f2@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=derkling@google.com \
--cc=jackmanb@google.com \
--cc=junaids@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=reijiw@google.com \
--cc=rientjes@google.com \
--cc=roypat@amazon.co.uk \
--cc=rppt@kernel.org \
--cc=tglx@linutronix.de \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
--cc=yosry.ahmed@linux.dev \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox