From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB160C433EF for ; Wed, 24 Nov 2021 08:09:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 147296B0075; Wed, 24 Nov 2021 03:09:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F6DA6B0078; Wed, 24 Nov 2021 03:09:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F000F6B007B; Wed, 24 Nov 2021 03:09:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21]) by kanga.kvack.org (Postfix) with ESMTP id DDF0D6B0075 for ; Wed, 24 Nov 2021 03:09:04 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A3F4A181E7296 for ; Wed, 24 Nov 2021 08:08:54 +0000 (UTC) X-FDA: 78843098040.05.842FDAA Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) by imf14.hostedemail.com (Postfix) with ESMTP id 60BE160019A0 for ; Wed, 24 Nov 2021 08:08:53 +0000 (UTC) Received: by mail-qv1-f48.google.com with SMTP id m17so1200141qvx.8 for ; Wed, 24 Nov 2021 00:08:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=CatI3/UOs7Nbmin1JBPpQh0cr4tyNKobEZGQHQA+RwU=; b=Pa3qVeweJM2sSAdUAbzkzH7n63yKR3hW4LgVP5Hnc2hrSKghxBUv6VrkJTyZ5rnnMR YBRhoqQeMT4kfTyXzVHUYd4r9nYM3flAt8mwDZmHUijVPZP712xiueG4Y6jp/qU+bif0 PD9s/4MtvSB5TfxqJ4YTOouWINTD+gpF2YiOQi8N153rh9fE54ZLY26cXDyfV0N1rzds qXGIuZMvptEmM8/jtYDQl9JnWQgmzwTfxVjvlVVGdbPM/K0S3b9ktK0KBpy/QH/V/h/W q52PLSHA+/2qSEmcI6avBOW6SBNdnQmMkA/FjhoABLS8Yrq0K/305GJMuiR4JyapcpwA YYeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=CatI3/UOs7Nbmin1JBPpQh0cr4tyNKobEZGQHQA+RwU=; b=pf4kTPIr6hHWbORk1AtxQhmFGXY9dUfw1b+Rk7ULcMLiEZ3uHU897V7Ax5RH0jp9VQ usLngJUpUdxlM2FI79f+XPvFv/zY+8kbStDUedyCPiZ8XGUuMhMij11fpdXdFu8UjEUX vk8nqkYKerj/pJBY/Dvl5uoixgywGNK+DtwjgiHHzrxOzHvjRPzD3iLiSJZkB9k1i/4Y QTF1p0mP9MH9KF7W+h0I0yAQOssTbwf2TI+r/hOtpIXBNILR/rSjjmmP+O2ChbZ77ZlD r6iZwGba6a4CxzvIRrZOD5mPwrc05YUti4C/s+FSQE17IYGdsaIjl8noZRDcAKXZgD// V2XA== X-Gm-Message-State: AOAM532kcCytrcU2jAc/2UFe8XuKJaAJXu9VCejMIFxNqfGUPDg/41p+ 0kfGMGIo1blX3k0YwNjFY1DCYhDWxcff11hVbx8= X-Google-Smtp-Source: ABdhPJw3ITeVcApQYJjWiPng2yoZje3wvPGkve+3pDdyxUpyY0olX1OfG5FlNwKIb1SxAfgUO6MVuS7R5xoonr00xws= X-Received: by 2002:ad4:4e49:: with SMTP id eb9mr4966977qvb.22.1637741333652; Wed, 24 Nov 2021 00:08:53 -0800 (PST) MIME-Version: 1.0 References: <1637658760-5813-1-git-send-email-huangzhaoyang@gmail.com> In-Reply-To: From: Zhaoyang Huang Date: Wed, 24 Nov 2021 16:08:32 +0800 Message-ID: Subject: Re: [RFC PATCH] mm: introduce alloc hook to apply PTE_CONT To: Ard Biesheuvel Cc: Catalin Marinas , Will Deacon , Anshuman Khandual , Andrew Morton , Nicholas Piggin , Mike Rapoport , Pavel Tatashin , Christophe Leroy , Jonathan Marek , Zhaoyang Huang , Linux Memory Management List , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 60BE160019A0 X-Stat-Signature: mdftnnqf48jqxj9zay4651wxdmf7dy34 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Pa3qVewe; spf=pass (imf14.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.219.48 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1637741333-235639 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 23, 2021 at 5:58 PM Ard Biesheuvel wrote: > > On Tue, 23 Nov 2021 at 10:13, Huangzhaoyang wrote: > > > > From: Zhaoyang Huang > > > > Since there is no PTE_CONT when rodata_full in ARM64, introducing a > > hook function to apply PTE_CONT on the proper page blocks. > > > > Given the discussion around your previous patch, I would expect a > meticulous explanation here why it is guaranteed to be safe to > manipulate the PTE_CONT attribute like this, and how the proposed > logic is correct for all supported page sizes. > > Without using an intermediate invalid mapping for the entire range, > this is never going to work reliably (this is the break-before-make > requirement). And given that marking the entire block invalid will > create intermediate states that are not permitted (a valid PTE_CONT > mapping and an invalid ~PTE_CONT mapping covering the same VA), the > only way to apply changes like these is to temporarily switch all CPUs > to a different translation via TTBR1. And this is not going to happen. As there is no safe way to modify PTE_CONT on a live mapping, please forget all previous patches except current one. > > Also, you never replied to my question regarding the use case and the > performance gain. In our android based system, the multimedia related cases suffers from small pte granularity mostly which use high order page blocks quite a lot. The performance gap even be visible. > > In summary, NAK to this patch or any of the previous ones regarding > PTE_CONT. If you do insist on pursuing this further, please provide an > elaborate and rock solid explanation why your approach is 100% valid > and correct (for all page sizes). And make sure you include an > explanation how your changes comply with the architectural > break-before-make requirements around PTE_CONT attributes. IMHO, It is safe to modify the page block's pte undering *arch_alloc/free_pages* as there is no one else aware of it. Furthermore, I do think tlbflush and barriers are needed for synchronization. With regards to page sizes issue, I think replacing the hard code const value to CONT_PTE_XXX could wrap the difference and make it correct. - if ((order >= 4) && (addr & ~CONT_PTE_MASK) == 0) { - order -= 4; + if ((order >= CONT_PTE_SHIFT) && (addr & ~CONT_PTE_MASK) == 0) { + order -= CONT_PTE_SHIFT; do { cont_pte_low_bound = addr & CONT_PTE_MASK; __change_memory_common(cont_pte_low_bound, (~CONT_PTE_MASK + 1), __pgprot(PTE_CONT), __pgprot(0)); addr = (u64)page_address(page); - page += 4; + page += CONT_PTES; order--; }while (order >= 0); > > > > > --- > > arch/arm64/include/asm/page.h | 5 +++++ > > arch/arm64/mm/pageattr.c | 45 +++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 50 insertions(+) > > > > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h > > index f98c91b..53cdd09 100644 > > --- a/arch/arm64/include/asm/page.h > > +++ b/arch/arm64/include/asm/page.h > > @@ -46,6 +46,11 @@ struct page *alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, > > > > #include > > > > +#define HAVE_ARCH_ALLOC_PAGE > > +#define HAVE_ARCH_FREE_PAGE > > + > > +extern void arch_alloc_page(struct page *page, int order); > > +extern void arch_free_page(struct page *page, int order); > > #endif /* !__ASSEMBLY__ */ > > > > #define VM_DATA_DEFAULT_FLAGS (VM_DATA_FLAGS_TSK_EXEC | VM_MTE_ALLOWED) > > diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c > > index a3bacd7..815a06d 100644 > > --- a/arch/arm64/mm/pageattr.c > > +++ b/arch/arm64/mm/pageattr.c > > @@ -239,3 +239,48 @@ bool kernel_page_present(struct page *page) > > ptep = pte_offset_kernel(pmdp, addr); > > return pte_valid(READ_ONCE(*ptep)); > > } > > + > > +void arch_alloc_page(struct page *page, int order) > > +{ > > + unsigned long addr; > > + unsigned long cont_pte_low_bound; > > + > > + if (!rodata_full) > > + return; > > + > > + addr = (u64)page_address(page); > > + if ((order >= 4) && (addr & ~CONT_PTE_MASK) == 0) { > > + order -= 4; > > + do { > > + cont_pte_low_bound = addr & CONT_PTE_MASK; > > + __change_memory_common(cont_pte_low_bound, > > + (~CONT_PTE_MASK + 1), __pgprot(PTE_CONT), __pgprot(0)); > > + addr = (u64)page_address(page); > > + page += 4; > > + order--; > > + }while (order >= 0); > > + } > > +} > > + > > +void arch_free_page(struct page *page, int order) > > +{ > > + unsigned long addr; > > + unsigned long cont_pte_low_bound; > > + > > + if (!rodata_full) > > + return; > > + > > + addr = (u64)page_address(page); > > + if ((order >= 4) && (addr & ~CONT_PTE_MASK) == 0) { > > + order -= 4; > > + do { > > + cont_pte_low_bound = addr & CONT_PTE_MASK; > > + __change_memory_common(cont_pte_low_bound, > > + (~CONT_PTE_MASK + 1), __pgprot(0), __pgprot(PTE_CONT)); > > + addr = (u64)page_address(page); > > + page += 4; > > + order--; > > + }while (order >= 0); > > + } > > +} > > + > > -- > > 1.9.1 > >