From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, linux-s390@vger.kernel.org
Subject: Re: Inaccessible pages & folios
Date: Thu, 15 Apr 2021 11:28:14 +0200 [thread overview]
Message-ID: <20210415112814.303f7f02@ibm-vm> (raw)
In-Reply-To: <20210412135514.GK2531743@casper.infradead.org>
On Mon, 12 Apr 2021 14:55:14 +0100
Matthew Wilcox <willy@infradead.org> wrote:
[...]
>
> I was only thinking about the page cache case ...
>
> access_ret = arch_make_page_accessible(page);
> /*
> * If writeback has been triggered on a page that cannot be
> made
> * accessible, it is too late to recover here.
> */
> VM_BUG_ON_PAGE(access_ret != 0, page);
>
> ... where it seems all pages _can_ be made accessible.
yes, for that case it is straightforward
> > also, I assume you keep the semantic difference between get_page and
> > pin_page? that's also very important for us
>
> I haven't changed anything in gup.c yet. Just trying to get the page
> cache to suck less right now.
fair enough :)
> > > So what you're saying is that the host might allocate, eg a 1GB
> > > folio for a guest, then the guest splits that up into smaller
> > > chunks (eg 1MB), and would only want one of those small chunks
> > > accessible to the hypervisor?
> >
> > qemu will allocate a big chunk of memory, and I/O would happen only
> > on small chunks (depending on what the guest does). I don't know
> > how swap and pagecache would behave in the folio scenario.
> >
> > Also consider that currently we need 4k hardware pages for protected
> > guests (so folios would be ok, as long as they are backed by small
> > pages)
> >
> > How and when are folios created actually?
> >
> > is there a way to prevent creation of multi-page folios?
>
> Today there's no way to create multi-page folios because I haven't
> submitted the patch to add alloc_folio() and friends:
>
> https://git.infradead.org/users/willy/pagecache.git/commitdiff/4fe26f7a28ffdc850cd016cdaaa74974c59c5f53
>
> We do have a way to allocate compound pages and add them to the page
> cache, but that's only in use by tmpfs/shmem.
>
> What will happen is that (for filesystems which support multipage
> folios), they'll be allocated by the page cache. I expect other
> places will start to use folios after that (eg anonymous memory), but
> I don't know where all those places will be. I hope not to be
> involved in that!
>
> The general principle, though, is that the overhead of tracking
> memory in page-sized units is too high, and we need to use larger
> units by default. There are occasions when we need to do things to
> memory in smaller units, and for those, we can choose to either
> handle sub-folio things, or we can split a folio apart into smaller
> folios.
>
> > > > a possible approach maybe would be to keep the _page variant,
> > > > and add a _folio wrapper around it
> > >
> > > Yes, we can do that. It's what I'm currently doing for
> > > flush_dcache_folio().
> >
> > where would the page flags be stored? as I said, we really depend on
> > that bit to be set correctly to prevent potentially disruptive I/O
> > errors. It's ok if the bit overindicates protection (non-protected
> > pages can be marked as protected), but protected pages must at all
> > times have the bit set.
> >
> > the reason why this hook exists at all, is to prevent secure pages
> > from being accidentally (or maliciously) fed into I/O
>
> You can still use PG_arch_1 on the sub-pages of a folio. It's one of
> the things you'll have to decide, actually. Does setting PG_arch_1 on
> the head page of the folio indicate that the entire page is
> accessible, or just that the head page is accessible? Different page
> flags have made different decisions here.
ok then, I think the simplest and safest thing to do right now is to
keep the flag on each page
in short:
* pagecache -> you can put a loop or introduce a _folio wrapper for
arch_make_page_accessible
* gup.c -> won't be touched for now, but when the time comes, the
PG_arch_1 bit should be set for each page
prev parent reply other threads:[~2021-04-15 9:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-09 19:40 Matthew Wilcox
2021-04-12 12:18 ` Claudio Imbrenda
2021-04-12 12:43 ` Matthew Wilcox
2021-04-12 13:37 ` Claudio Imbrenda
2021-04-12 13:55 ` Matthew Wilcox
2021-04-15 9:28 ` Claudio Imbrenda [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210415112814.303f7f02@ibm-vm \
--to=imbrenda@linux.ibm.com \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox