Re: Inaccessible pages & folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Matthew Wilcox <willy@infradead.org>
To: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: linux-mm@kvack.org, linux-s390@vger.kernel.org
Subject: Re: Inaccessible pages & folios
Date: Mon, 12 Apr 2021 13:43:41 +0100	[thread overview]
Message-ID: <20210412124341.GJ2531743@casper.infradead.org> (raw)
In-Reply-To: <20210412141809.36c349d6@ibm-vm>

On Mon, Apr 12, 2021 at 02:18:09PM +0200, Claudio Imbrenda wrote:
> On Fri, 9 Apr 2021 20:40:59 +0100
> Matthew Wilcox <willy@infradead.org> wrote:
> > I'm going to change __test_set_page_writeback() to take a folio [3]
> > and now I'm wondering what interface you'd like to use.  My
> > preference would be to rename arch_make_page_accessible() to
> > arch_make_folio_accessible() and pass a folio, at which time you
> > would make the entire folio (however many pages might be in it)
> > accessible.  If you would rather, we can leave the interface as
> > arch_make_page_accessible(), in which case we'll just call it N times
> > in __test_set_page_writeback() (and I won't need to touch gup.c).
> 
> For the rename case, how would you handle gup.c?

At first, I'd turn it into arch_make_folio_accessible(page_folio(page));

Eventually, gup.c needs to become folio-aware.  I haven't spent too much
time thinking about it, but code written like this:

                page = pte_page(pte);
                head = try_grab_compound_head(page, 1, flags);
                if (!head)
                        goto pte_unmap;
                if (unlikely(pte_val(pte) != pte_val(*ptep))) {
                        put_compound_head(head, 1, flags);
                        goto pte_unmap;
                }
                VM_BUG_ON_PAGE(compound_head(page) != head, page);

is just crying out for use of folios.  Also, some of the gup callers
would much prefer to work in terms of folios than individual struct pages
(imagine an RDMA adapter that wants to pin several gigabytes of memory
that's allocated using hugetlbfs for example).

> Consider that arch_make_page_accessible deals (typically) with KVM
> guest pages. Once you bundle up the pages in folios, you can have
> different pages in the same folio with different properties.

So what you're saying is that the host might allocate, eg a 1GB folio
for a guest, then the guest splits that up into smaller chunks (eg 1MB),
and would only want one of those small chunks accessible to the hypervisor?

> In case of failure, you could end up with a folio with some pages
> processed and some not processed. Would you stop at the first error?
> What would the state of the folio be? On s390x we use the PG_arch_1 bit
> to mark secure pages, how would that work with folios?
> 
> and how are fault handlers affected by this folio conversion? would
> they still work on pages, or would that also work on folios? on s390x
> we use the arch_make_page_accessible function in some fault handlers.

Folios can be mapped into userspace at an unaligned offset.  So we still
have to work in pages, at least for now.  We might have some optimised
path for aligned folios later.

> a possible approach maybe would be to keep the _page variant, and add a
> _folio wrapper around it

Yes, we can do that.  It's what I'm currently doing for
flush_dcache_folio().

> for s390x the PG_arch_1 is very important to prevent protected pages
> from being fed to I/O, as in that case Very Bad Things™ would happen.
> 
> sorry for the wall of questions, but I actually like your folio
> approach and I want to understand it better, so we can find a way to
> make everything work well together

Great!

> > PS: The prototype is in gfp.h.  That's not really appropriate; gfp.h
> > is about allocating memory, and this call really has nothing to do
> > with memory allocation.  I think mm.h is a better place for it, if
> > you can't find a better header file than that.
> 
> I had put it there because arch_alloc_page and arch_free_page are also
> there, and the behaviour, from a kernel point of view, is similar
> (unaccessible/unallocated pages will trigger a fault). 
> 
> I actually do not have a preference regarding where the prototype
> lives, as long as everything works. If you think mm.h is more
> appropriate, go for it :)

Heh, I see how you got there from the implementors point of view ;-)
I'll move it ...

next prev parent reply	other threads:[~2021-04-12 12:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09 19:40 Matthew Wilcox
2021-04-12 12:18 ` Claudio Imbrenda
2021-04-12 12:43   ` Matthew Wilcox [this message]
2021-04-12 13:37     ` Claudio Imbrenda
2021-04-12 13:55       ` Matthew Wilcox
2021-04-15  9:28         ` Claudio Imbrenda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210412124341.GJ2531743@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=imbrenda@linux.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox