From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93A44C433ED for ; Mon, 12 Apr 2021 12:44:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2BA806109F for ; Mon, 12 Apr 2021 12:44:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2BA806109F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A89946B0036; Mon, 12 Apr 2021 08:44:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A39BA6B006C; Mon, 12 Apr 2021 08:44:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D9ED6B006E; Mon, 12 Apr 2021 08:44:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0080.hostedemail.com [216.40.44.80]) by kanga.kvack.org (Postfix) with ESMTP id 737F46B0036 for ; Mon, 12 Apr 2021 08:44:01 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2CA318248D7C for ; Mon, 12 Apr 2021 12:44:01 +0000 (UTC) X-FDA: 78023682282.28.F863BE4 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf05.hostedemail.com (Postfix) with ESMTP id 7423EE00011A for ; Mon, 12 Apr 2021 12:43:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=MX4O8F2djzZxJzEjMfbyx+I8Xmr6JSkcmoV9j3dbeu0=; b=uMfRa4qk99jhDYrahrIP73CWJ8 tQU4CusT6qrGUIX8nqUtAyc0NGTjVDeREowrlkdjzq/4p9bOEHtBvC3prSqaKnS1KQMrdfE4BBh2g TT5D3tt6icHr6PmMZBrzmcWEYoFWKxWocSH/tMvncjY21Ojb4BI7qi6XOOmnM4DgQLyczibTkBgy/ ejaT4/IVQceaAVM6Sq7AFt6BPxziFr0GnEXFqZ7dKXQpcwqQMb3HqaDP0CGb9PE/41YWH7IsrWeyK iK7b+69YzyM2VCX2C/eX8H14OLaQvhcyRCSrjfkJIbhNSXyqSKEbHsqvWFwF/omTD/d+qL8X+M8Jk XmjYW4hg==; Received: from willy by casper.infradead.org with local (Exim 4.94 #2 (Red Hat Linux)) id 1lVvun-004L1k-Bg; Mon, 12 Apr 2021 12:43:42 +0000 Date: Mon, 12 Apr 2021 13:43:41 +0100 From: Matthew Wilcox To: Claudio Imbrenda Cc: linux-mm@kvack.org, linux-s390@vger.kernel.org Subject: Re: Inaccessible pages & folios Message-ID: <20210412124341.GJ2531743@casper.infradead.org> References: <20210409194059.GW2531743@casper.infradead.org> <20210412141809.36c349d6@ibm-vm> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20210412141809.36c349d6@ibm-vm> X-Rspamd-Queue-Id: 7423EE00011A X-Stat-Signature: p4fjheag9o4k67ehm593odunbipzefq1 X-Rspamd-Server: rspam02 Received-SPF: none (infradead.org>: No applicable sender policy available) receiver=imf05; identity=mailfrom; envelope-from=""; helo=casper.infradead.org; client-ip=90.155.50.34 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618231439-66103 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Apr 12, 2021 at 02:18:09PM +0200, Claudio Imbrenda wrote: > On Fri, 9 Apr 2021 20:40:59 +0100 > Matthew Wilcox wrote: > > I'm going to change __test_set_page_writeback() to take a folio [3] > > and now I'm wondering what interface you'd like to use. My > > preference would be to rename arch_make_page_accessible() to > > arch_make_folio_accessible() and pass a folio, at which time you > > would make the entire folio (however many pages might be in it) > > accessible. If you would rather, we can leave the interface as > > arch_make_page_accessible(), in which case we'll just call it N times > > in __test_set_page_writeback() (and I won't need to touch gup.c). >=20 > For the rename case, how would you handle gup.c? At first, I'd turn it into arch_make_folio_accessible(page_folio(page)); Eventually, gup.c needs to become folio-aware. I haven't spent too much time thinking about it, but code written like this: page =3D pte_page(pte); head =3D try_grab_compound_head(page, 1, flags); if (!head) goto pte_unmap; if (unlikely(pte_val(pte) !=3D pte_val(*ptep))) { put_compound_head(head, 1, flags); goto pte_unmap; } VM_BUG_ON_PAGE(compound_head(page) !=3D head, page); is just crying out for use of folios. Also, some of the gup callers would much prefer to work in terms of folios than individual struct pages (imagine an RDMA adapter that wants to pin several gigabytes of memory that's allocated using hugetlbfs for example). > Consider that arch_make_page_accessible deals (typically) with KVM > guest pages. Once you bundle up the pages in folios, you can have > different pages in the same folio with different properties. So what you're saying is that the host might allocate, eg a 1GB folio for a guest, then the guest splits that up into smaller chunks (eg 1MB), and would only want one of those small chunks accessible to the hyperviso= r? > In case of failure, you could end up with a folio with some pages > processed and some not processed. Would you stop at the first error? > What would the state of the folio be? On s390x we use the PG_arch_1 bit > to mark secure pages, how would that work with folios? >=20 > and how are fault handlers affected by this folio conversion? would > they still work on pages, or would that also work on folios? on s390x > we use the arch_make_page_accessible function in some fault handlers. Folios can be mapped into userspace at an unaligned offset. So we still have to work in pages, at least for now. We might have some optimised path for aligned folios later. > a possible approach maybe would be to keep the _page variant, and add a > _folio wrapper around it Yes, we can do that. It's what I'm currently doing for flush_dcache_folio(). > for s390x the PG_arch_1 is very important to prevent protected pages > from being fed to I/O, as in that case Very Bad Things=E2=84=A2 would h= appen. >=20 > sorry for the wall of questions, but I actually like your folio > approach and I want to understand it better, so we can find a way to > make everything work well together Great! > > PS: The prototype is in gfp.h. That's not really appropriate; gfp.h > > is about allocating memory, and this call really has nothing to do > > with memory allocation. I think mm.h is a better place for it, if > > you can't find a better header file than that. >=20 > I had put it there because arch_alloc_page and arch_free_page are also > there, and the behaviour, from a kernel point of view, is similar > (unaccessible/unallocated pages will trigger a fault).=20 >=20 > I actually do not have a preference regarding where the prototype > lives, as long as everything works. If you think mm.h is more > appropriate, go for it :) Heh, I see how you got there from the implementors point of view ;-) I'll move it ...