linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <balbirs@nvidia.com>
To: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org
Cc: dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	"Karol Herbst" <kherbst@redhat.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Shuah Khan" <shuah@kernel.org>, "Barry Song" <baohua@kernel.org>,
	"Baolin Wang" <baolin.wang@linux.alibaba.com>,
	"Ryan Roberts" <ryan.roberts@arm.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Peter Xu" <peterx@redhat.com>, "Zi Yan" <ziy@nvidia.com>,
	"Kefeng Wang" <wangkefeng.wang@huawei.com>,
	"Jane Chu" <jane.chu@oracle.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Donet Tom" <donettom@linux.ibm.com>
Subject: Re: [RFC 07/11] mm/memremap: Add folio_split support
Date: Thu, 10 Jul 2025 09:34:45 +1000	[thread overview]
Message-ID: <5f107fe7-e189-476b-837c-2668a79c9776@nvidia.com> (raw)
In-Reply-To: <9e8771ef-e734-4d56-aa20-c3fdba0fd5ab@redhat.com>

On 7/9/25 00:31, David Hildenbrand wrote:
> On 06.03.25 05:42, Balbir Singh wrote:
>> When a zone device page is split (via huge pmd folio split). The
>> driver callback for folio_split is invoked to let the device driver
>> know that the folio size has been split into a smaller order.
>>
>> The HMM test driver has been updated to handle the split, since the
>> test driver uses backing pages, it requires a mechanism of reorganizing
>> the backing pages (backing pages are used to create a mirror device)
>> again into the right sized order pages. This is supported by exporting
>> prep_compound_page().
>>
>> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
>> ---
>>   include/linux/memremap.h |  7 +++++++
>>   include/linux/mm.h       |  1 +
>>   lib/test_hmm.c           | 35 +++++++++++++++++++++++++++++++++++
>>   mm/huge_memory.c         |  5 +++++
>>   mm/page_alloc.c          |  1 +
>>   5 files changed, 49 insertions(+)
>>
>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>> index 11d586dd8ef1..2091b754f1da 100644
>> --- a/include/linux/memremap.h
>> +++ b/include/linux/memremap.h
>> @@ -100,6 +100,13 @@ struct dev_pagemap_ops {
>>        */
>>       int (*memory_failure)(struct dev_pagemap *pgmap, unsigned long pfn,
>>                     unsigned long nr_pages, int mf_flags);
>> +
>> +    /*
>> +     * Used for private (un-addressable) device memory only.
>> +     * This callback is used when a folio is split into
>> +     * a smaller folio
> 
> Confusing. When a folio is split, it is split into multiple folios.
> 
> So when will this be invoked?
> 

It is invoked when a folio splits in mm/huge_memory.c. This allows the device
driver to update any metadata it's tracking w.r.t original folio in zone_device_data

>> +     */
>> +    void (*folio_split)(struct folio *head, struct folio *tail);
> 
> head and tail are really suboptimal termonology. They refer to head and tail pages, which is not really the case with folios (in the long run).
> 

Will rename them to original_folio and new_folio if that helps with readability

>>   };
>>     #define PGMAP_ALTMAP_VALID    (1 << 0)
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 98a67488b5fe..3d0e91e0a923 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1415,6 +1415,7 @@ static inline struct folio *virt_to_folio(const void *x)
>>   void __folio_put(struct folio *folio);
>>     void split_page(struct page *page, unsigned int order);
>> +void prep_compound_page(struct page *page, unsigned int order);
>>   void folio_copy(struct folio *dst, struct folio *src);
>>   int folio_mc_copy(struct folio *dst, struct folio *src);
>>   diff --git a/lib/test_hmm.c b/lib/test_hmm.c
>> index a81d2f8a0426..18b6a7b061d7 100644
>> --- a/lib/test_hmm.c
>> +++ b/lib/test_hmm.c
>> @@ -1640,10 +1640,45 @@ static vm_fault_t dmirror_devmem_fault(struct vm_fault *vmf)
>>       return ret;
>>   }
>>   +
>> +static void dmirror_devmem_folio_split(struct folio *head, struct folio *tail)
>> +{
>> +    struct page *rpage = BACKING_PAGE(folio_page(head, 0));
>> +    struct folio *new_rfolio;
>> +    struct folio *rfolio;
>> +    unsigned long offset = 0;
>> +
>> +    if (!rpage) {
>> +        folio_page(tail, 0)->zone_device_data = NULL;
>> +        return;
>> +    }
>> +
>> +    offset = folio_pfn(tail) - folio_pfn(head);
>> +    rfolio = page_folio(rpage);
>> +    new_rfolio = page_folio(folio_page(rfolio, offset));
>> +
>> +    folio_page(tail, 0)->zone_device_data = folio_page(new_rfolio, 0);
>> +
>> +    if (folio_pfn(tail) - folio_pfn(head) == 1) {
>> +        if (folio_order(head))
>> +            prep_compound_page(folio_page(rfolio, 0),
>> +                        folio_order(head));
>> +        folio_set_count(rfolio, 1);
>> +    }
>> +    clear_compound_head(folio_page(new_rfolio, 0));
>> +    if (folio_order(tail))
>> +        prep_compound_page(folio_page(new_rfolio, 0),
>> +                        folio_order(tail));
>> +    folio_set_count(new_rfolio, 1);
>> +    folio_page(new_rfolio, 0)->mapping = folio_page(rfolio, 0)->mapping;
>> +    tail->pgmap = head->pgmap;
> 
> Most of this doesn't look like it should be the responsibility of this callback.
> 
> Setting up a new folio after the split (messing with compound pages etc) really should not be the responsibility of this callback.
> 
> So no, this looks misplaced.
> 

We do need a callback for drivers to do the right thing. In this case if you look at lib/test_hmm.c,
device pages are emulated via backing pages (real folios allocated from system memory). Hence, you
see all the changes here. I can try and simplify this going forward.

>> +}
>> +
>>   static const struct dev_pagemap_ops dmirror_devmem_ops = {
>>       .page_free    = dmirror_devmem_free,
>>       .migrate_to_ram    = dmirror_devmem_fault,
>>       .page_free    = dmirror_devmem_free,
>> +    .folio_split    = dmirror_devmem_folio_split,
>>   };
>>     static int dmirror_device_init(struct dmirror_device *mdevice, int id)
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 995ac8be5709..518a70d1b58a 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -3655,6 +3655,11 @@ static int __split_unmapped_folio(struct folio *folio, int new_order,
>>                           MTHP_STAT_NR_ANON, 1);
>>               }
>>   +            if (folio_is_device_private(origin_folio) &&
>> +                    origin_folio->pgmap->ops->folio_split)
>> +                origin_folio->pgmap->ops->folio_split(
>> +                    origin_folio, release);
> 
> Absolutely ugly. I think we need a wrapper for the
> 
Will do

>> +
>>               /*
>>                * Unfreeze refcount first. Additional reference from
>>                * page cache.
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 17ea8fb27cbf..563f7e39aa79 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -573,6 +573,7 @@ void prep_compound_page(struct page *page, unsigned int order)
>>         prep_compound_head(page, order);
>>   }
>> +EXPORT_SYMBOL_GPL(prep_compound_page);
> 
> Hmmm, that is questionable. We don't want arbitrary modules to make use of that.
> 
> Another sign that you are exposing the wrong functionality/interface (folio_split) to modules.
> 

prep_compound_page is required for generic THP support. In our case the driver lib/test_hmm.c has no
real device pages, just actual folio pages backing it. When the split occurs, we need to ensure
the pgmap entries are correct, the mapping is right and the backing folio is set to the right order.

I tried copying the pages to new folios (but I can't allocate in the split context), I'll see
if I can get rid of this requirement.

Thanks,
Balbir Singh



  reply	other threads:[~2025-07-09 23:34 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06  4:42 [RFC 00/11] THP support for zone device pages Balbir Singh
2025-03-06  4:42 ` [RFC 01/11] mm/zone_device: support large zone device private folios Balbir Singh
2025-03-06 23:02   ` Alistair Popple
2025-07-08 13:37   ` David Hildenbrand
2025-07-09  5:25     ` Balbir Singh
2025-03-06  4:42 ` [RFC 02/11] mm/migrate_device: flags for selecting device private THP pages Balbir Singh
2025-07-08 13:41   ` David Hildenbrand
2025-07-09  5:25     ` Balbir Singh
2025-03-06  4:42 ` [RFC 03/11] mm/thp: zone_device awareness in THP handling code Balbir Singh
2025-07-08 14:10   ` David Hildenbrand
2025-07-09  6:06     ` Alistair Popple
2025-07-09 12:30     ` Balbir Singh
2025-03-06  4:42 ` [RFC 04/11] mm/migrate_device: THP migration of zone device pages Balbir Singh
2025-03-06  9:24   ` Mika Penttilä
2025-03-06 21:35     ` Balbir Singh
2025-03-06  4:42 ` [RFC 05/11] mm/memory/fault: Add support for zone device THP fault handling Balbir Singh
2025-07-08 14:40   ` David Hildenbrand
2025-07-09 23:26     ` Balbir Singh
2025-03-06  4:42 ` [RFC 06/11] lib/test_hmm: test cases and support for zone device private THP Balbir Singh
2025-03-06  4:42 ` [RFC 07/11] mm/memremap: Add folio_split support Balbir Singh
2025-03-06  8:16   ` Mika Penttilä
2025-03-06 21:42     ` Balbir Singh
2025-03-06 22:36   ` Alistair Popple
2025-07-08 14:31   ` David Hildenbrand
2025-07-09 23:34     ` Balbir Singh [this message]
2025-03-06  4:42 ` [RFC 08/11] mm/thp: add split during migration support Balbir Singh
2025-07-08 14:38   ` David Hildenbrand
2025-07-08 14:46     ` Zi Yan
2025-07-08 14:53       ` David Hildenbrand
2025-03-06  4:42 ` [RFC 09/11] lib/test_hmm: add test case for split pages Balbir Singh
2025-03-06  4:42 ` [RFC 10/11] selftests/mm/hmm-tests: new tests for zone device THP migration Balbir Singh
2025-03-06  4:42 ` [RFC 11/11] gpu/drm/nouveau: Add THP migration support Balbir Singh
2025-03-06 23:08 ` [RFC 00/11] THP support for zone device pages Matthew Brost
2025-03-06 23:20   ` Balbir Singh
2025-07-04 13:52     ` Francois Dugast
2025-07-04 16:17       ` Zi Yan
2025-07-06  1:25         ` Balbir Singh
2025-07-06 16:34           ` Francois Dugast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5f107fe7-e189-476b-837c-2668a79c9776@nvidia.com \
    --to=balbirs@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=dakr@kernel.org \
    --cc=david@redhat.com \
    --cc=donettom@linux.ibm.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jane.chu@oracle.com \
    --cc=jglisse@redhat.com \
    --cc=kherbst@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=lyude@redhat.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=peterx@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox