From: Muchun Song <songmuchun@bytedance.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
Joao Martins <joao.m.martins@oracle.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Oscar Salvador <osalvador@suse.de>,
David Hildenbrand <david@redhat.com>,
Miaohe Lin <linmiaohe@huawei.com>,
David Rientjes <rientjes@google.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Naoya Horiguchi <naoya.horiguchi@linux.dev>,
Michal Hocko <mhocko@suse.com>,
Matthew Wilcox <willy@infradead.org>,
Xiongchun Duan <duanxiongchun@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>,
muchun.song@linux.dev
Subject: Re: [External] Re: [PATCH v2 09/11] hugetlb: batch PMD split for bulk vmemmap dedup
Date: Wed, 6 Sep 2023 17:11:08 +0800 [thread overview]
Message-ID: <CAMZfGtU2HX4UR1T2HW75xY70ZMSOdzNZ2py=EggoBYqP_1+QFg@mail.gmail.com> (raw)
In-Reply-To: <0b0609d8-bc87-0463-bafd-9613f0053039@linux.dev>
On Wed, Sep 6, 2023 at 4:25 PM Muchun Song <muchun.song@linux.dev> wrote:
>
>
>
> On 2023/9/6 05:44, Mike Kravetz wrote:
> > From: Joao Martins <joao.m.martins@oracle.com>
> >
> > In an effort to minimize amount of TLB flushes, batch all PMD splits
> > belonging to a range of pages in order to perform only 1 (global) TLB
> > flush.
> >
> > Rebased and updated by Mike Kravetz
> >
> > Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> > ---
> > mm/hugetlb_vmemmap.c | 72 +++++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 68 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index a715712df831..d956551699bc 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -37,7 +37,7 @@ struct vmemmap_remap_walk {
> > struct list_head *vmemmap_pages;
> > };
> >
> > -static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
> > +static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start, bool flush)
> > {
> > pmd_t __pmd;
> > int i;
> > @@ -80,7 +80,8 @@ static int split_vmemmap_huge_pmd(pmd_t *pmd, unsigned long start)
> > /* Make pte visible before pmd. See comment in pmd_install(). */
> > smp_wmb();
> > pmd_populate_kernel(&init_mm, pmd, pgtable);
> > - flush_tlb_kernel_range(start, start + PMD_SIZE);
> > + if (flush)
> > + flush_tlb_kernel_range(start, start + PMD_SIZE);
> > } else {
> > pte_free_kernel(&init_mm, pgtable);
> > }
> > @@ -127,11 +128,20 @@ static int vmemmap_pmd_range(pud_t *pud, unsigned long addr,
> > do {
> > int ret;
> >
> > - ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK);
> > + ret = split_vmemmap_huge_pmd(pmd, addr & PMD_MASK,
> > + walk->remap_pte != NULL);
>
> It is bettter to only make @walk->remap_pte indicate whether we should go
> to the last page table level. I suggest reusing VMEMMAP_NO_TLB_FLUSH
> to indicate whether we should flush the TLB at pmd level. It'll be more
> clear.
>
> > if (ret)
> > return ret;
> >
> > next = pmd_addr_end(addr, end);
> > +
> > + /*
> > + * We are only splitting, not remapping the hugetlb vmemmap
> > + * pages.
> > + */
> > + if (!walk->remap_pte)
> > + continue;
> > +
> > vmemmap_pte_range(pmd, addr, next, walk);
> > } while (pmd++, addr = next, addr != end);
> >
> > @@ -198,7 +208,8 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end,
> > return ret;
> > } while (pgd++, addr = next, addr != end);
> >
> > - flush_tlb_kernel_range(start, end);
> > + if (walk->remap_pte)
> > + flush_tlb_kernel_range(start, end);
> >
> > return 0;
> > }
> > @@ -297,6 +308,35 @@ static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
> > set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> > }
> >
> > +/**
> > + * vmemmap_remap_split - split the vmemmap virtual address range [@start, @end)
> > + * backing PMDs of the directmap into PTEs
> > + * @start: start address of the vmemmap virtual address range that we want
> > + * to remap.
> > + * @end: end address of the vmemmap virtual address range that we want to
> > + * remap.
> > + * @reuse: reuse address.
> > + *
> > + * Return: %0 on success, negative error code otherwise.
> > + */
> > +static int vmemmap_remap_split(unsigned long start, unsigned long end,
> > + unsigned long reuse)
> > +{
> > + int ret;
> > + struct vmemmap_remap_walk walk = {
> > + .remap_pte = NULL,
> > + };
> > +
> > + /* See the comment in the vmemmap_remap_free(). */
> > + BUG_ON(start - reuse != PAGE_SIZE);
> > +
> > + mmap_read_lock(&init_mm);
> > + ret = vmemmap_remap_range(reuse, end, &walk);
> > + mmap_read_unlock(&init_mm);
> > +
> > + return ret;
> > +}
> > +
> > /**
> > * vmemmap_remap_free - remap the vmemmap virtual address range [@start, @end)
> > * to the page which @reuse is mapped to, then free vmemmap
> > @@ -602,11 +642,35 @@ void hugetlb_vmemmap_optimize(const struct hstate *h, struct page *head)
> > free_vmemmap_page_list(&vmemmap_pages);
> > }
> >
> > +static void hugetlb_vmemmap_split(const struct hstate *h, struct page *head)
> > +{
> > + unsigned long vmemmap_start = (unsigned long)head, vmemmap_end;
> > + unsigned long vmemmap_reuse;
> > +
> > + if (!vmemmap_should_optimize(h, head))
> > + return;
> > +
> > + vmemmap_end = vmemmap_start + hugetlb_vmemmap_size(h);
> > + vmemmap_reuse = vmemmap_start;
> > + vmemmap_start += HUGETLB_VMEMMAP_RESERVE_SIZE;
> > +
> > + /*
> > + * Split PMDs on the vmemmap virtual address range [@vmemmap_start,
> > + * @vmemmap_end]
> > + */
> > + vmemmap_remap_split(vmemmap_start, vmemmap_end, vmemmap_reuse);
> > +}
> > +
> > void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct list_head *folio_list)
> > {
> > struct folio *folio;
> > LIST_HEAD(vmemmap_pages);
> >
> > + list_for_each_entry(folio, folio_list, lru)
> > + hugetlb_vmemmap_split(h, &folio->page);
>
> Maybe it is reasonable to add a return value to hugetlb_vmemmap_split()
> to indicate whether it has done successfully, if it fails, it must be
> OOM, in which case, there is no sense to continue to split the page table
> and optimize the vmemmap pages subsequently, right?
Sorry, it is reasonable to continue to optimize the vmemmap pages
subsequently since it should succeed because those vmemmap pages
have been split successfully previously.
Seems we should continue to optimize vmemmap once hugetlb_vmemmap_split()
fails, then we will have more memory to continue to split. But it will
make hugetlb_vmemmap_optimize_folios() a little complex. I'd like to
hear you guys' opinions here.
Thanks.
>
> Thanks.
>
> > +
> > + flush_tlb_all();
> > +
> > list_for_each_entry(folio, folio_list, lru)
> > __hugetlb_vmemmap_optimize(h, &folio->page, &vmemmap_pages);
> >
>
next prev parent reply other threads:[~2023-09-06 9:11 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-05 21:43 [PATCH v2 00/11] Batch hugetlb vmemmap modification operations Mike Kravetz
2023-09-05 21:44 ` [PATCH v2 01/11] hugetlb: set hugetlb page flag before optimizing vmemmap Mike Kravetz
2023-09-06 0:48 ` Matthew Wilcox
2023-09-06 1:05 ` Mike Kravetz
2023-10-13 12:58 ` Naoya Horiguchi
2023-10-13 21:43 ` Mike Kravetz
2023-10-16 22:55 ` Andrew Morton
2023-10-17 3:21 ` Mike Kravetz
2023-10-18 1:58 ` Naoya Horiguchi
2023-10-18 3:43 ` Mike Kravetz
2023-09-05 21:44 ` [PATCH v2 02/11] hugetlb: Use a folio in free_hpage_workfn() Mike Kravetz
2023-09-05 21:44 ` [PATCH v2 03/11] hugetlb: Remove a few calls to page_folio() Mike Kravetz
2023-09-05 21:44 ` [PATCH v2 04/11] hugetlb: Convert remove_pool_huge_page() to remove_pool_hugetlb_folio() Mike Kravetz
2023-09-05 21:44 ` [PATCH v2 05/11] hugetlb: restructure pool allocations Mike Kravetz
2023-09-05 21:44 ` [PATCH v2 06/11] hugetlb: perform vmemmap optimization on a list of pages Mike Kravetz
2023-09-06 7:30 ` Muchun Song
2023-09-05 21:44 ` [PATCH v2 07/11] hugetlb: perform vmemmap restoration " Mike Kravetz
2023-09-06 7:33 ` Muchun Song
2023-09-06 8:07 ` Muchun Song
2023-09-06 21:12 ` Mike Kravetz
2023-09-07 3:33 ` Muchun Song
2023-09-07 18:54 ` Mike Kravetz
2023-09-08 20:53 ` Mike Kravetz
2023-09-11 3:10 ` Muchun Song
2023-09-06 20:53 ` Mike Kravetz
2023-09-05 21:44 ` [PATCH v2 08/11] hugetlb: batch freeing of vmemmap pages Mike Kravetz
2023-09-06 7:38 ` Muchun Song
2023-09-06 21:38 ` Mike Kravetz
2023-09-07 6:19 ` Muchun Song
2023-09-07 18:47 ` Mike Kravetz
2023-09-05 21:44 ` [PATCH v2 09/11] hugetlb: batch PMD split for bulk vmemmap dedup Mike Kravetz
2023-09-06 8:24 ` Muchun Song
2023-09-06 9:11 ` Muchun Song [this message]
2023-09-06 9:26 ` [External] " Joao Martins
2023-09-06 9:32 ` [External] " Muchun Song
2023-09-06 9:44 ` Joao Martins
2023-09-06 11:34 ` Muchun Song
2023-09-06 9:13 ` Joao Martins
2023-09-05 21:44 ` [PATCH v2 10/11] hugetlb: batch TLB flushes when freeing vmemmap Mike Kravetz
2023-09-07 6:55 ` Muchun Song
2023-09-07 18:57 ` Mike Kravetz
2023-09-05 21:44 ` [PATCH v2 11/11] hugetlb: batch TLB flushes when restoring vmemmap Mike Kravetz
2023-09-07 6:58 ` Muchun Song
2023-09-07 18:58 ` Mike Kravetz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAMZfGtU2HX4UR1T2HW75xY70ZMSOdzNZ2py=EggoBYqP_1+QFg@mail.gmail.com' \
--to=songmuchun@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=david@redhat.com \
--cc=duanxiongchun@bytedance.com \
--cc=joao.m.martins@oracle.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=naoya.horiguchi@linux.dev \
--cc=osalvador@suse.de \
--cc=rientjes@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox