From: Hyeonggon Yoo <42.hyeyoo@gmail.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Kees Cook <keescook@chromium.org>,
Mike Rapoport <rppt@linux.ibm.com>,
Peter Zijlstra <peterz@infradead.org>,
Rick Edgecombe <rick.p.edgecombe@intel.com>,
Vlastimil Babka <vbabka@suse.cz>,
linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: [RFC PATCH 0/3] Prototype for direct map awareness in page allocator
Date: Wed, 11 May 2022 16:50:44 +0900 [thread overview]
Message-ID: <Yntq1IhbwjyAHTON@hyeyoo> (raw)
In-Reply-To: <YnCzQJk8Mu1848tG@kernel.org>
On Mon, May 02, 2022 at 09:44:48PM -0700, Mike Rapoport wrote:
> On Sat, Apr 30, 2022 at 01:44:16PM +0000, Hyeonggon Yoo wrote:
> > On Tue, Apr 26, 2022 at 06:21:57PM +0300, Mike Rapoport wrote:
> > > Hello Hyeonggon,
> > >
> > > On Tue, Apr 26, 2022 at 05:54:49PM +0900, Hyeonggon Yoo wrote:
> > > > On Thu, Jan 27, 2022 at 10:56:05AM +0200, Mike Rapoport wrote:
> > > > > From: Mike Rapoport <rppt@linux.ibm.com>
> > > > >
> > > > > Hi,
> > > > >
> > > > > This is a second attempt to make page allocator aware of the direct map
> > > > > layout and allow grouping of the pages that must be mapped at PTE level in
> > > > > the direct map.
> > > > >
> > > >
> > > > Hello mike, It may be a silly question...
> > > >
> > > > Looking at implementation of set_memory*(), they only split
> > > > PMD/PUD-sized entries. But why not _merge_ them when all entries
> > > > have same permissions after changing permission of an entry?
> > > >
> > > > I think grouping __GFP_UNMAPPED allocations would help reducing
> > > > direct map fragmentation, but IMHO merging split entries seems better
> > > > to be done in those helpers than in page allocator.
> > >
> > > Maybe, I didn't got as far as to try merging split entries in the direct
> > > map. IIRC, Kirill sent a patch for collapsing huge pages in the direct map
> > > some time ago, but there still was something that had to initiate the
> > > collapse.
> >
> > But in this case buddy allocator's view of direct map is quite limited.
> > It cannot merge 2M entries to 1G entry as it does not support
> > big allocations. Also it cannot merge entries of pages freed in boot process
> > as they weren't allocated from page allocator.
> >
> > And it will become harder when pages in MIGRATE_UNMAPPED is borrowed
> > from another migrate type....
> >
> > So it would be nice if we can efficiently merge mappings in
> > change_page_attr_set(). this approach can handle cases above.
> >
> > I think in this case grouping allocations and merging mappings
> > should be done separately.
>
> I've added the provision to merge the mappings in __free_one_page() because
> at that spot we know for sure we can replace multiple PTEs with a single
> PMD.
>
> I'm not saying there should be no additional mechanism for collapsing
> direct map pages, but I don't know when and how it should be invoked.
>
I'm still thinking about a way to accurately track number of split
pages - because tracking number of split pages only in CPA code may be
inaccurate when kernel page table is changed outside CPA.
In case you wonder, my code is available at:
https://github.com/hygoni/linux/tree/merge-mapping-v1r3
it also adds vmstat items:
# cat /proc/vmstat | grep direct_map
direct_map_level2_splits 1079
direct_map_level3_splits 6
direct_map_level1_merges 1079
direct_map_level2_merges 6
Thanks,
Hyeonggon
> > > > For example:
> > > > 1) set_memory_ro() splits 1 RW PMD entry into 511 RW PTE
> > > > entries and 1 RO PTE entry.
> > > >
> > > > 2) before freeing the pages, we call set_memory_rw() and we have
> > > > 512 RW PTE entries. Then we can merge it to 1 RW PMD entry.
> > >
> > > For this we need to check permissions of all 512 pages to make sure we can
> > > use a PMD entry to map them.
> >
> > Of course that may be slow. Maybe one way to optimize this is using some bits
> > in struct page, something like: each bit of page->direct_map_split (unsigned long)
> > is set when at least one entry in (PTRS_PER_PTE = 512)/(BITS_PER_LONG = 64) = 8 entries
> > has special permissions.
> >
> > Then we just need to set the corresponding bit when splitting mappings and
> > iterate 8 entries when changing permission back again. (and then unset the bit when 8 entries has
> > usual permissions). we can decide to merge by checking if page->direct_map_split is zero.
> >
> > When scanning, 8 entries would fit into one cacheline.
> >
> > Any other ideas?
> >
> > > Not sure that doing the scan in each set_memory call won't cause an overall
> > > slowdown.
> >
> > I think we can evaluate it by measuring boot time and bpf/module
> > load/unload time.
> >
> > Is there any other workload that is directly affected
> > by performance of set_memory*()?
> >
> > > > 3) after 2) we can do same thing about PMD-sized entries
> > > > and merge them into 1 PUD entry if 512 PMD entries have
> > > > same permissions.
> > > > [...]
> > > > > Mike Rapoport (3):
> > > > > mm/page_alloc: introduce __GFP_UNMAPPED and MIGRATE_UNMAPPED
> > > > > mm/secretmem: use __GFP_UNMAPPED to allocate pages
> > > > > EXPERIMENTAL: x86/module: use __GFP_UNMAPPED in module_alloc
> > > > --
> > > > Thanks,
> > > > Hyeonggon
> > >
> > > --
> > > Sincerely yours,
> > > Mike.
>
> --
> Sincerely yours,
> Mike.
--
Thanks,
Hyeonggon
prev parent reply other threads:[~2022-05-11 7:50 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-27 8:56 Mike Rapoport
2022-01-27 8:56 ` [RFC PATCH 1/3] mm/page_alloc: introduce __GFP_UNMAPPED and MIGRATE_UNMAPPED Mike Rapoport
2022-01-27 8:56 ` [RFC PATCH 2/3] mm/secretmem: use __GFP_UNMAPPED to allocate pages Mike Rapoport
2022-01-27 8:56 ` [RFC PATCH 3/3] EXPERIMENTAL: x86/module: use __GFP_UNMAPPED in module_alloc Mike Rapoport
2022-04-26 8:54 ` [RFC PATCH 0/3] Prototype for direct map awareness in page allocator Hyeonggon Yoo
2022-04-26 15:21 ` Mike Rapoport
2022-04-30 13:44 ` Hyeonggon Yoo
2022-05-03 4:44 ` Mike Rapoport
2022-05-06 16:58 ` Hyeonggon Yoo
2022-05-11 7:50 ` Hyeonggon Yoo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yntq1IhbwjyAHTON@hyeyoo \
--to=42.hyeyoo@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=ira.weiny@intel.com \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rppt@kernel.org \
--cc=rppt@linux.ibm.com \
--cc=vbabka@suse.cz \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox