From: Joao Martins <joao.m.martins@oracle.com>
To: Muchun Song <muchun.song@linux.dev>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
Muchun Song <songmuchun@bytedance.com>,
Mike Kravetz <mike.kravetz@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2] mm/hugetlb_vmemmap: remap head page to newly allocated page
Date: Wed, 9 Nov 2022 13:15:36 +0000 [thread overview]
Message-ID: <1dcf6d8c-dcb0-4f52-8fcc-9aa56bd0cdd6@oracle.com> (raw)
In-Reply-To: <A5CDFA33-D30F-4E58-AE84-01191C792D96@linux.dev>
On 09/11/2022 02:42, Muchun Song wrote:
>> On Nov 8, 2022, at 18:38, Joao Martins <joao.m.martins@oracle.com> wrote:
>> On 08/11/2022 09:13, Muchun Song wrote:
>>>> On Nov 7, 2022, at 23:39, Joao Martins <joao.m.martins@oracle.com> wrote:
>>>> + int nid = page_to_nid((struct page *)start);
>>>> + struct page *page = NULL;
>>>> +
>>>> + /*
>>>> + * Allocate a new head vmemmap page to avoid breaking a contiguous
>>>> + * block of struct page memory when freeing it back to page allocator
>>>> + * in free_vmemmap_page_list(). This will allow the likely contiguous
>>>> + * struct page backing memory to be kept contiguous and allowing for
>>>> + * more allocations of hugepages. Fallback to the currently
>>>> + * mapped head page in case should it fail to allocate.
>>>> + */
>>>> + if (IS_ALIGNED((unsigned long)start, PAGE_SIZE))
>>>
>>> I'm curious why we need this check. IIUC, this is unnecessary.
>>>
>>
>> So if the start of the vmemmap range (the head page) we will remap isn't the
>> first struct page, then we would corrupt the other struct pages in
>> that vmemmap page unrelated to hugetlb? That was my thinking
>
> Actually, @start address should be always aligned with PAGE_SIZE. If not,
> vmemmap_remap_range() will complain. So the check can be removed.
>
True, but it will be a VM_BUG_ON().
I can remove this check here, but I would suggest that
rather than crashing that we detect the error in the caller
(vmemmap_remap_free()) prior to proceeding.
>>
>>>> + page = alloc_pages_node(nid, gfp_mask, 0);
>>>> + walk.head_page = page;
>>>>
>>>> /*
>>>> * In order to make remapping routine most efficient for the huge pages,
>>>> --
>>>> 2.17.2
>>>>
>>>
>>> I have implemented a version based on yours, which does not introduce
>>> ->head_page field (Not test if it works). Seems to be simple.
>>>
>>
>> Let me try out with the adjustment below
>>
>>> Thanks.
>>>
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index c98805d5b815..8ee94f6a6697 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -202,12 +202,7 @@ static int vmemmap_remap_range(unsigned long start, unsigned long end,
>>> return ret;
>>> } while (pgd++, addr = next, addr != end);
>>>
>>> - /*
>>> - * We only change the mapping of the vmemmap virtual address range
>>> - * [@start + PAGE_SIZE, end), so we only need to flush the TLB which
>>> - * belongs to the range.
>>> - */
>>> - flush_tlb_kernel_range(start + PAGE_SIZE, end);
>>> + flush_tlb_kernel_range(start, end);
>>>
>>> return 0;
>>> }
>>> @@ -246,6 +241,12 @@ static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
>>> pte_t entry = mk_pte(walk->reuse_page, pgprot);
>>> struct page *page = pte_page(*pte);
>>>
>>> + /* The case of remapping the head vmemmap page. */
I changed this comment to:
/* Remapping the head page requires r/w */
>>> + if (unlikely(addr == walk->reuse_addr)) {
>>
>> You replace the head_page with checking the reuse_addr , but that is
>> set on the tail page. So if we want to rely on reuse_addr perhaps
>> best if do:
>>
>> if (unlikely(addr == (walk->reuse_addr - PAGE_SIZE))) {
>> ...
>> }
>
> I don't think so. The @addr here should be equal to @walk->reuse_addr
> when vmemmap_remap_pte() is fist called since @addr does not be updated
> from vmemmap_pte_range(). Right?
>
Ah, yes -- I misread it. I'm confusing with the 2 vmemmap pages reuse
rather the 1 (which is the latest, yes). And thus the reuse_addr is
pointed at the head page.
Ok, this should be much cleaner with these adjustments you proposed, and
works just as good as far as I tested
I'll respin v3.
prev parent reply other threads:[~2022-11-09 13:18 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-07 15:39 Joao Martins
2022-11-08 9:13 ` Muchun Song
2022-11-08 10:38 ` Joao Martins
2022-11-09 2:42 ` Muchun Song
2022-11-09 13:15 ` Joao Martins [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1dcf6d8c-dcb0-4f52-8fcc-9aa56bd0cdd6@oracle.com \
--to=joao.m.martins@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=muchun.song@linux.dev \
--cc=songmuchun@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox