From: Ding Tianhong <dingtianhong@huawei.com>
To: Nicholas Piggin <npiggin@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>, <linux-mm@kvack.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>,
Christoph Hellwig <hch@infradead.org>,
Jonathan Cameron <Jonathan.Cameron@Huawei.com>,
<linux-arch@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linuxppc-dev@lists.ozlabs.org>,
Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings
Date: Tue, 26 Jan 2021 19:48:45 +0800 [thread overview]
Message-ID: <a84836bb-d913-eb58-cd16-b268f479bd8b@huawei.com> (raw)
In-Reply-To: <1611653945.t3oot63nwn.astroid@bobo.none>
On 2021/1/26 17:47, Nicholas Piggin wrote:
> Excerpts from Ding Tianhong's message of January 26, 2021 4:59 pm:
>> On 2021/1/26 12:45, Nicholas Piggin wrote:
>>> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
>>> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
>>> supports PMD sized vmap mappings.
>>>
>>> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
>>> or larger, and fall back to small pages if that was unsuccessful.
>>>
>>> Architectures must ensure that any arch specific vmalloc allocations
>>> that require PAGE_SIZE mappings (e.g., module allocations vs strict
>>> module rwx) use the VM_NOHUGE flag to inhibit larger mappings.
>>>
>>> When hugepage vmalloc mappings are enabled in the next patch, this
>>> reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
>>> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
>>>
>>> This can result in more internal fragmentation and memory overhead for a
>>> given allocation, an option nohugevmalloc is added to disable at boot.
>>>
>>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>>> ---
>>> arch/Kconfig | 11 ++
>>> include/linux/vmalloc.h | 21 ++++
>>> mm/page_alloc.c | 5 +-
>>> mm/vmalloc.c | 215 +++++++++++++++++++++++++++++++---------
>>> 4 files changed, 205 insertions(+), 47 deletions(-)
>>>
>>> diff --git a/arch/Kconfig b/arch/Kconfig
>>> index 24862d15f3a3..eef170e0c9b8 100644
>>> --- a/arch/Kconfig
>>> +++ b/arch/Kconfig
>>> @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>> config HAVE_ARCH_HUGE_VMAP
>>> bool
>>>
>>> +#
>>> +# Archs that select this would be capable of PMD-sized vmaps (i.e.,
>>> +# arch_vmap_pmd_supported() returns true), and they must make no assumptions
>>> +# that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
>>> +# can be used to prohibit arch-specific allocations from using hugepages to
>>> +# help with this (e.g., modules may require it).
>>> +#
>>> +config HAVE_ARCH_HUGE_VMALLOC
>>> + depends on HAVE_ARCH_HUGE_VMAP
>>> + bool
>>> +
>>> config ARCH_WANT_HUGE_PMD_SHARE
>>> bool
>>>
>>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>>> index 99ea72d547dc..93270adf5db5 100644
>>> --- a/include/linux/vmalloc.h
>>> +++ b/include/linux/vmalloc.h
>>> @@ -25,6 +25,7 @@ struct notifier_block; /* in notifier.h */
>>> #define VM_NO_GUARD 0x00000040 /* don't add guard page */
>>> #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
>>> #define VM_MAP_PUT_PAGES 0x00000100 /* put pages and free array in vfree */
>>> +#define VM_NO_HUGE_VMAP 0x00000200 /* force PAGE_SIZE pte mapping */
>>>
>>> /*
>>> * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
>>> @@ -59,6 +60,9 @@ struct vm_struct {
>>> unsigned long size;
>>> unsigned long flags;
>>> struct page **pages;
>>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
>>> + unsigned int page_order;
>>> +#endif
>>> unsigned int nr_pages;
>>> phys_addr_t phys_addr;
>>> const void *caller;
>> Hi Nicholas:
>>
>> Give a suggestion :)
>>
>> The page order was only used to indicate the huge page flag for vm area, and only valid when
>> size bigger than PMD_SIZE, so can we use the vm flgas to instead of that, just like define the
>> new flag named VM_HUGEPAGE, it would not break the vm struct, and it is easier for me to backport the serious
>> patches to our own branches. (Base on the lts version).
>
> Hmm, it might be possible. I'm not sure if 1GB vmallocs will be used any
> time soon (or maybe they will for edge case configurations? It would be
> trivial to add support for).
>
1GB vmallocs is really crazy, but maybe used for future. :)
> The other concern I have is that Christophe IIRC was asking about
> implementing a mapping for PPC which used TLB mappings that were
> different than kernel page table tree size. Although I guess we could
> deal with that when it comes.
>
I didn't check the PPC platform, but a agree with you.
> I like the flexibility of page_order though. How hard would it be for
> you to do the backport with VM_HUGEPAGE yourself?
>
Yes, i can fix it with VM_HUGEPAGE for my own branch.
> I should also say, thanks for all the review and testing from the Huawei
> team. Do you have an x86 patch?
I only enable and use it for x86 and aarch64 platform, this serious patches is
really help us a lot. Thanks.
Ding
> Thanks,
> Nick
> .
>
next prev parent reply other threads:[~2021-01-26 11:49 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-26 4:44 [PATCH v11 00/13] huge " Nicholas Piggin
2021-01-26 4:44 ` [PATCH v11 01/13] mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page Nicholas Piggin
2021-01-26 6:40 ` Miaohe Lin
2021-01-28 3:13 ` Ding Tianhong
2021-02-02 10:22 ` Nicholas Piggin
2021-01-26 4:44 ` [PATCH v11 02/13] mm: apply_to_pte_range warn and fail if a large pte is encountered Nicholas Piggin
2021-01-26 6:49 ` Miaohe Lin
2021-01-26 4:45 ` [PATCH v11 03/13] mm/vmalloc: rename vmap_*_range vmap_pages_*_range Nicholas Piggin
2021-01-27 2:10 ` Miaohe Lin
2021-01-26 4:45 ` [PATCH v11 04/13] mm/ioremap: rename ioremap_*_range to vmap_*_range Nicholas Piggin
2021-01-26 6:40 ` Christoph Hellwig
2021-01-28 2:38 ` Miaohe Lin
2021-01-26 4:45 ` [PATCH v11 05/13] mm: HUGE_VMAP arch support cleanup Nicholas Piggin
2021-01-26 6:07 ` Ding Tianhong
2021-01-26 13:26 ` kernel test robot
2021-01-27 5:26 ` kernel test robot
2021-01-26 4:45 ` [PATCH v11 06/13] powerpc: inline huge vmap supported functions Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 07/13] arm64: " Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 08/13] x86: " Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 09/13] mm/vmalloc: provide fallback arch huge vmap support functions Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 10/13] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 11/13] mm/vmalloc: add vmap_range_noflush variant Nicholas Piggin
2021-01-26 4:45 ` [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings Nicholas Piggin
2021-01-26 6:59 ` Ding Tianhong
2021-01-26 9:47 ` Nicholas Piggin
2021-01-26 11:48 ` Ding Tianhong [this message]
2021-01-26 4:45 ` [PATCH v11 13/13] powerpc/64s/radix: Enable huge " Nicholas Piggin
2021-01-27 10:26 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a84836bb-d913-eb58-cd16-b268f479bd8b@huawei.com \
--to=dingtianhong@huawei.com \
--cc=Jonathan.Cameron@Huawei.com \
--cc=akpm@linux-foundation.org \
--cc=christophe.leroy@csgroup.eu \
--cc=hch@infradead.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
--cc=rick.p.edgecombe@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox