From: Muchun Song <muchun.song@linux.dev>
To: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Mike Kravetz <mike.kravetz@oracle.com>,
Linux-MM <linux-mm@kvack.org>, Yuan Can <yuancan@huawei.com>
Subject: Re: [PATCH resend] mm: hugetlb_vmemmap: use bulk allocator in alloc_vmemmap_page_list()
Date: Thu, 7 Sep 2023 14:35:23 +0800 [thread overview]
Message-ID: <37EB1F18-83E9-4E40-9694-3C004D04D9C4@linux.dev> (raw)
In-Reply-To: <4d13cd1e-4d5f-4270-a720-3c8098b1f62c@huawei.com>
> On Sep 6, 2023, at 22:58, Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>
>
>
> On 2023/9/6 22:32, Muchun Song wrote:
>>> On Sep 6, 2023, at 17:33, Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>
>>>
>>>
>>> On 2023/9/6 11:25, Muchun Song wrote:
>>>>> On Sep 6, 2023, at 11:13, Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 2023/9/6 10:47, Matthew Wilcox wrote:
>>>>>> On Tue, Sep 05, 2023 at 06:35:08PM +0800, Kefeng Wang wrote:
>>>>>>> It is needed 4095 pages(1G) or 7 pages(2M) to be allocated once in
>>>>>>> alloc_vmemmap_page_list(), so let's add a bulk allocator varietas
>>>>>>> alloc_pages_bulk_list_node() and switch alloc_vmemmap_page_list()
>>>>>>> to use it to accelerate page allocation.
>>>>>> Argh, no, please don't do this.
>>>>>> Iterating a linked list is _expensive_. It is about 10x quicker to
>>>>>> iterate an array than a linked list. Adding the list_head option
>>>>>> to __alloc_pages_bulk() was a colossal mistake. Don't perpetuate it.
>>>>>> These pages are going into an array anyway. Don't put them on a list
>>>>>> first.
>>>>>
>>>>> struct vmemmap_remap_walk - walk vmemmap page table
>>>>>
>>>>> * @vmemmap_pages: the list head of the vmemmap pages that can be freed
>>>>> * or is mapped from.
>>>>>
>>>>> At present, the struct vmemmap_remap_walk use a list for vmemmap page table walk, so do you mean we need change vmemmap_pages from a list to a array firstly and then use array bulk api, even kill list bulk api ?
>>>> It'll be a little complex for hugetlb_vmemmap. Should it be reasonable to
>>>> directly use __alloc_pages_bulk in hugetlb_vmemmap itself?
>>>
>>>
>>> We could use alloc_pages_bulk_array_node() here without introduce a new
>>> alloc_pages_bulk_list_node(), only focus on accelerate page allocation
>>> for now.
>>>
>> No. Using alloc_pages_bulk_array_node() will add more complexity (you need to allocate
>> an array fist) for hugetlb_vmemap and this path that you optimized is only a control
>> path and this optimization is at the millisecond level. So I don't think it is a great
>> value to do this.
> I tried it, yes, a little complex,
>
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 4b9734777f69..5f502e18f950 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -377,26 +377,53 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
> return ret;
> }
>
> +static int vmemmap_bulk_alloc_pages(gfp_t gfp, int nid, unsigned int nr_pages,
> + struct page **pages)
> +{
> + unsigned int last, allocated = 0;
> +
> + do {
> + last = allocated;
> +
> + allocated = alloc_pages_bulk_array_node(gfp, nid, nr_pages, pages);
> + if (allocated == last)
> + goto err;
> +
> + } while (allocated < nr_pages)
> +
> + return 0;
> +err:
> + for (allocated = 0; allocated < nr_pages; allocated++) {
> + if (pages[allocated])
> + __free_page(pages[allocated]);
> + }
> +
> + return -ENOMEM;
> +}
> +
> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
> struct list_head *list)
> {
> gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
> unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
> int nid = page_to_nid((struct page *)start);
> - struct page *page, *next;
> + struct page **pages;
> + int ret = -ENOMEM;
> +
> + pages = kzalloc(array_size(nr_pages, sizeof(struct page *)), gfp_mask);
> + if (!pages)
> + return ret;
> +
> + ret = vmemmap_bulk_alloc_pages(gfp_mask, nid, nr_pages, pages);
> + if (ret)
> + goto out;
>
> while (nr_pages--) {
> - page = alloc_pages_node(nid, gfp_mask, 0);
> - if (!page)
> - goto out;
> - list_add_tail(&page->lru, list);
> + list_add_tail(&pages[nr_pages]->lru, list);
> }
> -
> - return 0;
> out:
> - list_for_each_entry_safe(page, next, list, lru)
> - __free_page(page);
> - return -ENOMEM;
> + kfree(pages);
> + return ret;
> }
>
> or just use __alloc_pages_bulk in it, but as Matthew said, we should
> avoid list usage, list api need to be cleanup and no one should use it,
> or no change, since it is not a hot path :)> Thanks.
Let's keep it no change.
Thanks.
next prev parent reply other threads:[~2023-09-07 6:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-05 10:35 Kefeng Wang
2023-09-06 2:41 ` Muchun Song
2023-09-06 2:47 ` Matthew Wilcox
2023-09-06 3:13 ` Kefeng Wang
2023-09-06 3:14 ` Matthew Wilcox
2023-09-06 12:32 ` Kefeng Wang
2023-09-06 3:25 ` Muchun Song
2023-09-06 9:33 ` Kefeng Wang
2023-09-06 14:32 ` Muchun Song
2023-09-06 14:58 ` Kefeng Wang
2023-09-07 6:35 ` Muchun Song [this message]
2023-10-25 9:32 ` Mel Gorman
2023-10-25 13:50 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=37EB1F18-83E9-4E40-9694-3C004D04D9C4@linux.dev \
--to=muchun.song@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=yuancan@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox