linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Aaron Lu <aaron.lu@intel.com>
To: Dave Hansen <dave.hansen@intel.com>,
	Rick Edgecombe <rick.p.edgecombe@intel.com>
Cc: Song Liu <song@kernel.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: [RFC PATCH 0/4] x86/mm/cpa: merge small mappings whenever possible
Date: Mon,  8 Aug 2022 22:56:45 +0800	[thread overview]
Message-ID: <20220808145649.2261258-1-aaron.lu@intel.com> (raw)

This is an early RFC. While all reviews are welcome, reviewing this code
now will be a waste of time for the x86 subsystem maintainers. I would,
however, appreciate a preliminary review from the folks on the to and cc
list. I'm posting it to the list in case anyone else is interested in
seeing this early version.

Dave Hansen: I need your ack before this goes to the maintainers.

Here it goes:

On x86_64, Linux has direct mapping of almost all physical memory. For
performance reasons, this mapping is usually set as large page like 2M
or 1G per hardware's capability with read, write and non-execute
protection.

There are cases where some pages have to change their protection to RO
and eXecutable, like pages that host module code or bpf prog. When these
pages' protection are changed, the corresponding large mapping that
cover these pages will have to be splitted into 4K first and then
individual 4k page's protection changed accordingly, i.e. unaffected
pages keep their original protection as RW and NX while affected pages'
protection changed to RO and X.

There is a problem due to this split: the large mapping will remain
splitted even after the affected pages' protection are changed back to
RW and NX, like when the module is unloaded or bpf progs are freed.
After system runs a long time, there can be more and more large mapping
being splitted, causing more and more dTLB misses and overall system
performance getting hurt[1].

For this reason, people tried some techniques to reduce the harm of
large mapping beling splitted, like bpf_prog_pack[2] which packs
multiple bpf progs into a single page instead of allocating and changing
one page's protection for each bpf prog. This approach made large
mapping split happen much fewer.

This patchset addresses this problem in another way: it merges
splitted mappings back to a large mapping when protections of all entries
of the splitted small mapping page table become same again, e.g. when the
page whose protection was changed to RO+X now has its protection changed
back to RW+NX due to reasons like module unload, bpf prog free, etc. and
all other entries' protection are also RW+NX.

One final note is, with features like bpf_prog_pack etc., there can be
much fewer large mapping split IIUC; also, this patchset can not help
when the page which has its protection changed keeps in use. So my take
on this large mapping split problem is: to get the most value of keeping
large mapping intact, features like bpf_prog_pack is important. This
patchset can help to further reduce large mapping split when in use page
that has special protection set finally gets released.

[1]: http://lkml.kernel.org/r/CAPhsuW4eAm9QrAxhZMJu-bmvHnjWjuw86gFZzTHRaMEaeFhAxw@mail.gmail.com
[2]: https://lore.kernel.org/lkml/20220204185742.271030-1-song@kernel.org/

Aaron Lu (4):
  x86/mm/cpa: restore global bit when page is present
  x86/mm/cpa: merge splitted direct mapping when possible
  x86/mm/cpa: add merge event counter
  x86/mm/cpa: add a test interface to split direct map

 arch/x86/mm/pat/set_memory.c  | 411 +++++++++++++++++++++++++++++++++-
 include/linux/mm_types.h      |   6 +
 include/linux/page-flags.h    |   6 +
 include/linux/vm_event_item.h |   2 +
 mm/vmstat.c                   |   2 +
 5 files changed, 420 insertions(+), 7 deletions(-)

-- 
2.37.1



             reply	other threads:[~2022-08-08 14:57 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-08 14:56 Aaron Lu [this message]
2022-08-08 14:56 ` [RFC PATCH 1/4] x86/mm/cpa: restore global bit when page is present Aaron Lu
2022-08-11  5:21   ` Hyeonggon Yoo
2022-08-11  8:16     ` Lu, Aaron
2022-08-11 11:30       ` Hyeonggon Yoo
2022-08-11 12:28         ` Aaron Lu
2022-08-08 14:56 ` [RFC PATCH 2/4] x86/mm/cpa: merge splitted direct mapping when possible Aaron Lu
2022-08-08 14:56 ` [RFC PATCH 3/4] x86/mm/cpa: add merge event counter Aaron Lu
2022-08-08 14:56 ` [TEST NOT_FOR_MERGE 4/4] x86/mm/cpa: add a test interface to split direct map Aaron Lu
2022-08-09 10:04 ` [RFC PATCH 0/4] x86/mm/cpa: merge small mappings whenever possible Kirill A. Shutemov
2022-08-09 14:58   ` Aaron Lu
2022-08-09 17:56     ` Kirill A. Shutemov
2022-08-11  4:50 ` Hyeonggon Yoo
2022-08-11  7:50   ` Lu, Aaron
2022-08-13 16:05   ` Mike Rapoport
2022-08-16  6:33     ` Aaron Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220808145649.2261258-1-aaron.lu@intel.com \
    --to=aaron.lu@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox