linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mike Rapoport <rppt@linux.ibm.com>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org
Subject: Re: [LSF/MM/BPF TOPIC] Page table manipulation primitives
Date: Fri, 7 Feb 2020 11:40:06 -0800	[thread overview]
Message-ID: <20200207194006.GF8731@bombadil.infradead.org> (raw)
In-Reply-To: <20200207174553.mx6onurbvhgn7w5p@box>

On Fri, Feb 07, 2020 at 08:45:53PM +0300, Kirill A. Shutemov wrote:
> On Thu, Feb 06, 2020 at 09:34:10AM -0800, Matthew Wilcox wrote:
> > On Thu, Feb 06, 2020 at 06:57:41PM +0200, Mike Rapoport wrote:
> > > While updating the architectures to properly use 5-level folded page tables
> > > without <asm-generic/?level-fixup.h> and <asm-generic/pgtable-nop4d-hack.h>
> > > I wondered if we can do better than explicitly name each and every level of
> > > the page table, open-code traversal of all the layers numerous times and
> > > have copied do_something_pXd_range().
> > > 
> > > Then I've come across Kirill's "Proof-of-concept: better(?) page-table
> > > manipulation API" [1], but as far as I could see there was no progress
> > > since then.
> > > 
> > > I'd like to resurrect the topic and try to see if we can come up with
> > > actually better page table manipulation API.
> > > 
> > > [1] https://lore.kernel.org/lkml/20180424154355.mfjgkf47kdp2by4e@black.fi.intel.com/
> 
> I played a bit more with it after that, but got distracted to other stuff.
> I'll see if I'll be able to come up with an update.
> 
> > I don't think this approach helps support 64k pages on ARM
> 
> Could you specify what such support would require?

For 64kB pages with a base 4kB page size, you set a special bit in 16 adjacent
aligned PTEs.  When the MMU sees that bit set, it uses a 64k TLB entry.  So
I think what we want for a fully generic interface is:

void set_vpte_at(struct mm_struct *, unsigned long addr, vpte_iter *, vpte_t,
		unsigned int order);

(maybe we don't need an 'order' here; perhaps it's embedded in the vpte_iter)

> > , for example,
> > so it doesn't solve enough problems to be worth doing.  I'd favour
> > an interface which looked more like this:
> > 
> > 	vpte_iter iter;
> > 	vpte_t vpte;
> > 
> > 	vpte_iter_for_each(vpte, iter, start, end, flags) {
> > 		unsigned char order = vpte_order(&iter);
> > 		... do things based on vpte and order ...
> > 	}
> 
> It looks like just an higher level API that can be provided over my
> approach. Maybe it should be the default go-to. But I find it useful to be
> able go into low-level details where it is matters.

I think the key difference is that I would not embed the 'order' in the
vpte, but keep it in the iter.  I don't know that every architecture has
the ability to tell from a union { pte_t, pmd_t, pud_t, p4d_t, pgd_t }
which of the levels it is.

Looking at the code you provided, another difference is that your method
involves a recursive call for each level of the page tables.  I'd rather
express these kinds of things as "I would like to iterate over each
page table entry in this range" than "Have I got to the bottom?  If not,
recursively call myself".  IOW vpte_iter_for_each() would work its way
down to the lowest level, and keep track of where it is in the iter,
so when moving to the next entry in the tree, it knows whether to go up
before going sideways, and then down as far as it needs to.

Whatever we come up with, we should be able to collapse away the levels
which aren't needed, and support whatever non-PTE-level TLB orders the
hardware supports without forcing support for those orders on x86 code.

I don't have a good solution for how to express the 'copy_pt_range' in
your example, where we need to iterate two mms at the same time.  Maybe
that's a special iterator which does exactly that.


      reply	other threads:[~2020-02-07 19:40 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-06 16:57 Mike Rapoport
2020-02-06 17:34 ` Matthew Wilcox
2020-02-07 17:45   ` Kirill A. Shutemov
2020-02-07 19:40     ` Matthew Wilcox [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200207194006.GF8731@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=rppt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox