From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f199.google.com (mail-qk0-f199.google.com [209.85.220.199]) by kanga.kvack.org (Postfix) with ESMTP id B7FA36B038A for ; Thu, 16 Mar 2017 09:15:39 -0400 (EDT) Received: by mail-qk0-f199.google.com with SMTP id o135so38655415qke.3 for ; Thu, 16 Mar 2017 06:15:39 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id j14si3868569qtj.57.2017.03.16.06.15.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 16 Mar 2017 06:15:38 -0700 (PDT) Subject: Re: [RFC PATCH v2 14/32] x86: mm: Provide support to use memblock when spliting large pages References: <148846752022.2349.13667498174822419498.stgit@brijesh-build-machine> <148846771545.2349.9373586041426414252.stgit@brijesh-build-machine> <20170310110657.hophlog2juw5hpzz@pd.tnic> From: Paolo Bonzini Message-ID: Date: Thu, 16 Mar 2017 14:15:19 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/mixed; boundary="------------7AC6D09ABF2C56474372EF15" Sender: owner-linux-mm@kvack.org List-ID: To: Brijesh Singh , Borislav Petkov Cc: simon.guinot@sequanux.org, linux-efi@vger.kernel.org, kvm@vger.kernel.org, rkrcmar@redhat.com, matt@codeblueprint.co.uk, linux-pci@vger.kernel.org, linus.walleij@linaro.org, gary.hook@amd.com, linux-mm@kvack.org, paul.gortmaker@windriver.com, hpa@zytor.com, cl@linux.com, dan.j.williams@intel.com, aarcange@redhat.com, sfr@canb.auug.org.au, andriy.shevchenko@linux.intel.com, herbert@gondor.apana.org.au, bhe@redhat.com, xemul@parallels.com, joro@8bytes.org, x86@kernel.org, peterz@infradead.org, piotr.luc@intel.com, mingo@redhat.com, msalter@redhat.com, ross.zwisler@linux.intel.com, dyoung@redhat.com, thomas.lendacky@amd.com, jroedel@suse.de, keescook@chromium.org, arnd@arndb.de, toshi.kani@hpe.com, mathieu.desnoyers@efficios.com, luto@kernel.org, devel@linuxdriverproject.org, bhelgaas@google.com, tglx@linutronix.de, mchehab@kernel.org, iamjoonsoo.kim@lge.com, labbott@fedoraproject.org, tony.luck@intel.com, alexandre.bounine@idt.com, kuleshovmail@gmail.com, linux-kernel@vger.kernel.org, mcgrof@kernel.org, mst@redhat.com, linux-crypto@vger.kernel.org, tj@kernel.org, akpm@linux-foundation.org, davem@davemloft.net This is a multi-part message in MIME format. --------------7AC6D09ABF2C56474372EF15 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit On 10/03/2017 23:41, Brijesh Singh wrote: >> Maybe there's a reason this fires: >> >> WARNING: modpost: Found 2 section mismatch(es). >> To see full details build your kernel with: >> 'make CONFIG_DEBUG_SECTION_MISMATCH=y' >> >> WARNING: vmlinux.o(.text+0x48edc): Section mismatch in reference from >> the function __change_page_attr() to the function >> .init.text:memblock_alloc() >> The function __change_page_attr() references >> the function __init memblock_alloc(). >> This is often because __change_page_attr lacks a __init >> annotation or the annotation of memblock_alloc is wrong. >> >> WARNING: vmlinux.o(.text+0x491d1): Section mismatch in reference from >> the function __change_page_attr() to the function >> .meminit.text:memblock_free() >> The function __change_page_attr() references >> the function __meminit memblock_free(). >> This is often because __change_page_attr lacks a __meminit >> annotation or the annotation of memblock_free is wrong. >> >> But maybe Paolo might have an even better idea... > > I am sure he will have better idea :) Not sure if it's better or worse, but an alternative idea is to turn __change_page_attr and __change_page_attr_set_clr inside out, so that: 1) the alloc_pages/__free_page happens in __change_page_attr_set_clr; 2) __change_page_attr_set_clr overall does not beocome more complex. Then you can introduce __early_change_page_attr_set_clr and/or early_kernel_map_pages_in_pgd, for use in your next patches. They use the memblock allocator instead of alloc/free_page The attached patch is compile-tested only and almost certainly has some thinko in it. But it even skims a few lines from the code so the idea might have some merit. Paolo --------------7AC6D09ABF2C56474372EF15 Content-Type: text/x-patch; name="alloc-in-cpa-set-clr.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="alloc-in-cpa-set-clr.patch" diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c index 28d42130243c..953c8e697562 100644 --- a/arch/x86/mm/pageattr.c +++ b/arch/x86/mm/pageattr.c @@ -490,11 +490,12 @@ static void __set_pmd_pte(pte_t *kpte, unsigned lon= g address, pte_t pte) } =20 static int -try_preserve_large_page(pte_t *kpte, unsigned long address, +try_preserve_large_page(pte_t **p_kpte, unsigned long address, struct cpa_data *cpa) { unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn, old_pfn= ; - pte_t new_pte, old_pte, *tmp; + pte_t *kpte =3D *p_kpte; + pte_t new_pte, old_pte; pgprot_t old_prot, new_prot, req_prot; int i, do_split =3D 1; enum pg_level level; @@ -507,8 +508,8 @@ try_preserve_large_page(pte_t *kpte, unsigned long ad= dress, * Check for races, another CPU might have split this page * up already: */ - tmp =3D _lookup_address_cpa(cpa, address, &level); - if (tmp !=3D kpte) + *p_kpte =3D _lookup_address_cpa(cpa, address, &level); + if (*p_kpte !=3D kpte) goto out_unlock; =20 switch (level) { @@ -634,17 +635,18 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpt= e, unsigned long address, unsigned int i, level; pte_t *tmp; pgprot_t ref_prot; + int retry =3D 1; =20 + if (!debug_pagealloc_enabled()) + spin_lock(&cpa_lock); spin_lock(&pgd_lock); /* * Check for races, another CPU might have split this page * up for us already: */ tmp =3D _lookup_address_cpa(cpa, address, &level); - if (tmp !=3D kpte) { - spin_unlock(&pgd_lock); - return 1; - } + if (tmp !=3D kpte) + goto out; =20 paravirt_alloc_pte(&init_mm, page_to_pfn(base)); =20 @@ -671,10 +673,11 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpt= e, unsigned long address, break; =20 default: - spin_unlock(&pgd_lock); - return 1; + goto out; } =20 + retry =3D 0; + /* * Set the GLOBAL flags only if the PRESENT flag is set * otherwise pmd/pte_present will return true even on a non @@ -718,28 +721,34 @@ __split_large_page(struct cpa_data *cpa, pte_t *kpt= e, unsigned long address, * going on. */ __flush_tlb_all(); - spin_unlock(&pgd_lock); =20 - return 0; -} - -static int split_large_page(struct cpa_data *cpa, pte_t *kpte, - unsigned long address) -{ - struct page *base; +out: + spin_unlock(&pgd_lock); =20 + /* + * Do a global flush tlb after splitting the large page + * and before we do the actual change page attribute in the PTE. + * + * With out this, we violate the TLB application note, that says + * "The TLBs may contain both ordinary and large-page + * translations for a 4-KByte range of linear addresses. This + * may occur if software modifies the paging structures so that + * the page size used for the address range changes. If the two + * translations differ with respect to page frame or attributes + * (e.g., permissions), processor behavior is undefined and may + * be implementation-specific." + * + * We do this global tlb flush inside the cpa_lock, so that we + * don't allow any other cpu, with stale tlb entries change the + * page attribute in parallel, that also falls into the + * just split large page entry. + */ + if (!retry) + flush_tlb_all(); if (!debug_pagealloc_enabled()) spin_unlock(&cpa_lock); - base =3D alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0); - if (!debug_pagealloc_enabled()) - spin_lock(&cpa_lock); - if (!base) - return -ENOMEM; - - if (__split_large_page(cpa, kpte, address, base)) - __free_page(base); =20 - return 0; + return retry; } =20 static bool try_to_free_pte_page(pte_t *pte) @@ -1166,30 +1175,26 @@ static int __cpa_process_fault(struct cpa_data *c= pa, unsigned long vaddr, } } =20 -static int __change_page_attr(struct cpa_data *cpa, int primary) +static int __change_page_attr(struct cpa_data *cpa, pte_t **p_kpte, unsi= gned long address, + int primary) { - unsigned long address; - int do_split, err; unsigned int level; pte_t *kpte, old_pte; + int err =3D 0; =20 - if (cpa->flags & CPA_PAGES_ARRAY) { - struct page *page =3D cpa->pages[cpa->curpage]; - if (unlikely(PageHighMem(page))) - return 0; - address =3D (unsigned long)page_address(page); - } else if (cpa->flags & CPA_ARRAY) - address =3D cpa->vaddr[cpa->curpage]; - else - address =3D *cpa->vaddr; -repeat: - kpte =3D _lookup_address_cpa(cpa, address, &level); - if (!kpte) - return __cpa_process_fault(cpa, address, primary); + if (!debug_pagealloc_enabled()) + spin_lock(&cpa_lock); + *p_kpte =3D kpte =3D _lookup_address_cpa(cpa, address, &level); + if (!kpte) { + err =3D __cpa_process_fault(cpa, address, primary); + goto out; + } =20 old_pte =3D *kpte; - if (pte_none(old_pte)) - return __cpa_process_fault(cpa, address, primary); + if (pte_none(old_pte)) { + err =3D __cpa_process_fault(cpa, address, primary); + goto out; + } =20 if (level =3D=3D PG_LEVEL_4K) { pte_t new_pte; @@ -1228,59 +1233,27 @@ static int __change_page_attr(struct cpa_data *cp= a, int primary) cpa->flags |=3D CPA_FLUSHTLB; } cpa->numpages =3D 1; - return 0; + goto out; } =20 /* * Check, whether we can keep the large page intact * and just change the pte: */ - do_split =3D try_preserve_large_page(kpte, address, cpa); - /* - * When the range fits into the existing large page, - * return. cp->numpages and cpa->tlbflush have been updated in - * try_large_page: - */ - if (do_split <=3D 0) - return do_split; - - /* - * We have to split the large page: - */ - err =3D split_large_page(cpa, kpte, address); - if (!err) { - /* - * Do a global flush tlb after splitting the large page - * and before we do the actual change page attribute in the PTE. - * - * With out this, we violate the TLB application note, that says - * "The TLBs may contain both ordinary and large-page - * translations for a 4-KByte range of linear addresses. This - * may occur if software modifies the paging structures so that - * the page size used for the address range changes. If the two - * translations differ with respect to page frame or attributes - * (e.g., permissions), processor behavior is undefined and may - * be implementation-specific." - * - * We do this global tlb flush inside the cpa_lock, so that we - * don't allow any other cpu, with stale tlb entries change the - * page attribute in parallel, that also falls into the - * just split large page entry. - */ - flush_tlb_all(); - goto repeat; - } + err =3D try_preserve_large_page(p_kpte, address, cpa); =20 +out: + if (!debug_pagealloc_enabled()) + spin_unlock(&cpa_lock); return err; } =20 static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkali= as); =20 -static int cpa_process_alias(struct cpa_data *cpa) +static int cpa_process_alias(struct cpa_data *cpa, unsigned long vaddr) { struct cpa_data alias_cpa; unsigned long laddr =3D (unsigned long)__va(cpa->pfn << PAGE_SHIFT); - unsigned long vaddr; int ret; =20 if (!pfn_range_is_mapped(cpa->pfn, cpa->pfn + 1)) @@ -1290,16 +1263,6 @@ static int cpa_process_alias(struct cpa_data *cpa)= * No need to redo, when the primary call touched the direct * mapping already: */ - if (cpa->flags & CPA_PAGES_ARRAY) { - struct page *page =3D cpa->pages[cpa->curpage]; - if (unlikely(PageHighMem(page))) - return 0; - vaddr =3D (unsigned long)page_address(page); - } else if (cpa->flags & CPA_ARRAY) - vaddr =3D cpa->vaddr[cpa->curpage]; - else - vaddr =3D *cpa->vaddr; - if (!(within(vaddr, PAGE_OFFSET, PAGE_OFFSET + (max_pfn_mapped << PAGE_SHIFT)))) { =20 @@ -1338,33 +1301,64 @@ static int cpa_process_alias(struct cpa_data *cpa= ) return 0; } =20 +static unsigned long cpa_address(struct cpa_data *cpa, unsigned long num= pages) +{ + /* + * Store the remaining nr of pages for the large page + * preservation check. + */ + /* for array changes, we can't use large page */ + cpa->numpages =3D 1; + if (cpa->flags & CPA_PAGES_ARRAY) { + struct page *page =3D cpa->pages[cpa->curpage]; + if (unlikely(PageHighMem(page))) + return -EINVAL; + return (unsigned long)page_address(page); + } else if (cpa->flags & CPA_ARRAY) { + return cpa->vaddr[cpa->curpage]; + } else { + cpa->numpages =3D numpages; + return *cpa->vaddr; + } +} + +static void cpa_advance(struct cpa_data *cpa) +{ + if (cpa->flags & (CPA_PAGES_ARRAY | CPA_ARRAY)) + cpa->curpage++; + else + *cpa->vaddr +=3D cpa->numpages * PAGE_SIZE; +} + static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkali= as) { unsigned long numpages =3D cpa->numpages; + unsigned long vaddr; + struct page *base; + pte_t *kpte; int ret; =20 while (numpages) { - /* - * Store the remaining nr of pages for the large page - * preservation check. - */ - cpa->numpages =3D numpages; - /* for array changes, we can't use large page */ - if (cpa->flags & (CPA_ARRAY | CPA_PAGES_ARRAY)) - cpa->numpages =3D 1; - - if (!debug_pagealloc_enabled()) - spin_lock(&cpa_lock); - ret =3D __change_page_attr(cpa, checkalias); - if (!debug_pagealloc_enabled()) - spin_unlock(&cpa_lock); - if (ret) - return ret; - - if (checkalias) { - ret =3D cpa_process_alias(cpa); - if (ret) + vaddr =3D cpa_address(cpa, numpages); + if (!IS_ERR_VALUE(vaddr)) { +repeat: + ret =3D __change_page_attr(cpa, &kpte, vaddr, checkalias); + if (ret < 0) return ret; + if (ret) { + base =3D alloc_page(GFP_KERNEL|__GFP_NOTRACK); + if (!base) + return -ENOMEM; + if (__split_large_page(cpa, kpte, vaddr, base)) + __free_page(base); + goto repeat; + } + + if (checkalias) { + ret =3D cpa_process_alias(cpa, vaddr); + if (ret < 0) + return ret; + } } =20 /* @@ -1374,11 +1368,7 @@ static int __change_page_attr_set_clr(struct cpa_d= ata *cpa, int checkalias) */ BUG_ON(cpa->numpages > numpages || !cpa->numpages); numpages -=3D cpa->numpages; - if (cpa->flags & (CPA_PAGES_ARRAY | CPA_ARRAY)) - cpa->curpage++; - else - *cpa->vaddr +=3D cpa->numpages * PAGE_SIZE; - + cpa_advance(cpa); } return 0; } --------------7AC6D09ABF2C56474372EF15-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org