linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Michal Hocko <mhocko@suse.com>,
	Oscar Salvador <osalvador@suse.de>, Zi Yan <ziy@nvidia.com>,
	David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH 0/3] hugetlb: add demote/split page functionality
Date: Tue, 9 Mar 2021 09:11:11 -0800	[thread overview]
Message-ID: <ebb19eb5-ae9e-22f1-4e19-e5fce32c695c@oracle.com> (raw)
In-Reply-To: <29cb78c5-4fca-0f0a-c603-0c75f9f50d05@redhat.com>

On 3/9/21 1:01 AM, David Hildenbrand wrote:
> On 09.03.21 01:18, Mike Kravetz wrote:
>> To address these issues, introduce the concept of hugetlb page demotion.
>> Demotion provides a means of 'in place' splitting a hugetlb page to
>> pages of a smaller size.  For example, on x86 one 1G page can be
>> demoted to 512 2M pages.  Page demotion is controlled via sysfs files.
>> - demote_size    Read only target page size for demotion
>> - demote    Writable number of hugetlb pages to be demoted
>>
>> Only hugetlb pages which are free at the time of the request can be demoted.
>> Demotion does not add to the complexity surplus pages.  Demotion also honors
>> reserved huge pages.  Therefore, when a value is written to the sysfs demote
>> file that value is only the maximum number of pages which will be demoted.
>> It is possible fewer will actually be demoted.
>>
>> If demote_size is PAGESIZE, demote will simply free pages to the buddy
>> allocator.
> 
> With the vmemmap optimizations you will have to rework the vmemmap layout. How is that handled? Couldn't it happen that you are half-way through splitting a PUD into PMDs when you realize that you cannot allocate vmemmap pages for properly handling the remaining PMDs? What would happen then?
> 
> Or are you planning on making both features mutually exclusive?
> 
> Of course, one approach would be first completely restoring the vmemmap for the whole PUD (allocating more pages than necessary in the end) and then freeing individual pages again when optimizing the layout per PMD.
> 

You are right about the need to address this issue.  Patch 3 has the
comment:

+	/*
+	 * Note for future:
+	 * When support for reducing vmemmap of huge pages is added, we
+	 * will need to allocate vmemmap pages here and could fail.
+	 */

The simplest approach would be to restore the entire vmemmmap for the
larger page and then delete for smaller pages after the split.  We could
hook into the existing vmemmmap reduction code with just a few calls.
This would fail to demote/split, if the allocation fails.  However, this
is not optimal.

Ideally, the code would compute how many pages for vmemmmap are needed
after the split, allocate those and then construct vmmemmap
appropriately when creating the smaller pages.

I think we would want to always do the allocation of vmmemmap pages up
front and not even start the split process if the allocation fails.  No
sense starting something we may not be able to finish.

I purposely did not address that here as first I wanted to get feedback
on the usefulness demote functionality.
-- 
Mike Kravetz


  reply	other threads:[~2021-03-09 17:11 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-09  0:18 Mike Kravetz
2021-03-09  0:18 ` [RFC PATCH 1/3] hugetlb: add demote hugetlb page sysfs interfaces Mike Kravetz
2021-03-09  0:18 ` [RFC PATCH 2/3] hugetlb: add HPageCma flag and code to free non-gigantic pages in CMA Mike Kravetz
2021-03-09  0:18 ` [RFC PATCH 3/3] hugetlb: add hugetlb demote page support Mike Kravetz
2021-03-09  9:01 ` [RFC PATCH 0/3] hugetlb: add demote/split page functionality David Hildenbrand
2021-03-09 17:11   ` Mike Kravetz [this message]
2021-03-09 17:50     ` David Hildenbrand
2021-03-09 18:21       ` Mike Kravetz
2021-03-09 19:01         ` David Hildenbrand
2021-03-10 15:58 ` Oscar Salvador
2021-03-10 16:23 ` Michal Hocko
2021-03-10 16:46   ` Zi Yan
2021-03-10 17:05     ` Michal Hocko
2021-03-10 17:36       ` Zi Yan
2021-03-10 19:56     ` Mike Kravetz
2021-03-10 19:45   ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ebb19eb5-ae9e-22f1-4e19-e5fce32c695c@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=rientjes@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox