linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zi Yan <ziy@nvidia.com>
To: linux-mm@kvack.org, "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>,
	Hugh Dickins <hughd@google.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	Yang Shi <yang@os.amperecomputing.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Yu Zhao <yuzhao@google.com>, John Hubbard <jhubbard@nvidia.com>,
	linux-kernel@vger.kernel.org, Zi Yan <ziy@nvidia.com>
Subject: [RFC PATCH 0/1] Buddy allocator like folio split
Date: Tue,  8 Oct 2024 18:37:47 -0400	[thread overview]
Message-ID: <20241008223748.555845-1-ziy@nvidia.com> (raw)

Hi all,

Matthew and I have discussed about a different way of splitting large
folios. Instead of split one folio uniformly into the same order smaller
ones, doing buddy allocator like split can reduce the total number of
resulting folios, the amount of memory needed for multi-index xarray
split, and keep more large folios after a split. In addition, both
Hugh[1] and Ryan[2] had similar suggestions before.

The patch is an initial implementation. It passes simple order-9 to
lower order split tests for anonymous folios and pagecache folios.
There are still a lot of TODOs to make it upstream. But I would like to gather
feedbacks before that.

Design
===

folio_split() splits a large folio in the same way as buddy allocator
splits a large free page for allocation. The purpose is to minimize the
number of folios after the split. For example, if user wants to free the
3rd subpage in a order-9 folio, folio_split() will split the order-9 folio
as:
O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon
O-1,      O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache
Since anon folio does not support order-1 yet.

The split process is similar to existing approach:
1. Unmap all page mappings (split PMD mappings if exist);
2. Split meta data like memcg, page owner, page alloc tag;
3. Copy meta data in struct folio to sub pages, but instead of spliting
   the whole folio into multiple smaller ones with the same order in a
   shot, this approach splits the folio iteratively. Taking the example
   above, this approach first splits the original order-9 into two order-8,
   then splits left part of order-8 to two order-7 and so on;
4. Post-process split folios, like write mapping->i_pages for pagecache,
   adjust folio refcounts, add split folios to corresponding list;
5. Remap split folios
6. Unlock split folios.


TODOs
===

1. For anon folios, the code needs to check enabled mTHP orders and only
split a large folios to enabled orders. But this might be up to debate
since if no mTHP order is enabled, the folio will be split to order-0
ones.

2. Use xas_nomem() instead of xas_split_alloc() as Matthew suggested. The issue
I am having is that when I use xas_set_order(), xas_store(), xas_nomem(),
xas_split() pattern, xas->xa_alloc is NULL instead of some allocated
memory. I must get something wrong and need some help from Matthew about
it.

3. Use folio_split() in pagecache truncate and do more testing.

4. Need to add shmem support if this is needed.

5. Currently, the inputs of folio_split() are original folio, new order,
and a page pointer that tells where to split to new order. For truncate,
better inputs might be two page pointers on the start and end of the
split and the folio_split() figures out the new order.


Any comments and/or suggestions are welcome. Thanks.



[1] https://lore.kernel.org/linux-mm/9dd96da-efa2-5123-20d4-4992136ef3ad@google.com/
[2] https://lore.kernel.org/linux-mm/cbb1d6a0-66dd-47d0-8733-f836fe050374@arm.com/

Zi Yan (1):
  mm/huge_memory: buddy allocator like folio_split()

 mm/huge_memory.c | 648 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 647 insertions(+), 1 deletion(-)

-- 
2.45.2



             reply	other threads:[~2024-10-08 22:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-08 22:37 Zi Yan [this message]
2024-10-08 22:37 ` [RFC PATCH 1/1] mm/huge_memory: buddy allocator like folio_split() Zi Yan
2024-10-09  9:54 ` [RFC PATCH 0/1] Buddy allocator like folio split Kirill A. Shutemov
2024-10-09 14:21   ` Zi Yan
2024-10-18 18:42 ` David Hildenbrand
2024-10-18 19:10   ` Zi Yan
2024-10-18 19:44     ` Yang Shi
2024-10-18 19:59       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241008223748.555845-1-ziy@nvidia.com \
    --to=ziy@nvidia.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox