Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Balbir Singh <balbirs@nvidia.com>
To: Zi Yan <ziy@nvidia.com>
Cc: "Matthew Brost" <matthew.brost@intel.com>,
	"Mika Penttilä" <mpenttil@redhat.com>,
	"Francois Dugast" <francois.dugast@intel.com>,
	intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Alistair Popple" <apopple@nvidia.com>,
	"David Hildenbrand" <david@kernel.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
Date: Sat, 10 Jan 2026 09:36:04 +1100	[thread overview]
Message-ID: <ead4661f-5162-40e8-a821-647c05745de0@nvidia.com> (raw)
In-Reply-To: <DB92CD30-1C6A-4533-83C8-BE7091F706A9@nvidia.com>

On 1/10/26 08:14, Zi Yan wrote:
> On 9 Jan 2026, at 17:11, Balbir Singh wrote:
> 
>> On 1/10/26 07:43, Zi Yan wrote:
>>> On 9 Jan 2026, at 16:34, Balbir Singh wrote:
>>>
>>>> On 1/10/26 06:15, Zi Yan wrote:
>>>>> On 9 Jan 2026, at 15:03, Matthew Brost wrote:
>>>>>
>>>>>> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
>>>>>>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
>>>>>>>
>>>>>>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>>>>>>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>>>>>>>>
>>>>>>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>>>>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Split device-private and coherent folios into individual pages before
>>>>>>>>>>>>> freeing so that any order folio can be formed upon the next use of the
>>>>>>>>>>>>> pages.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>>>>>>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>>>>>>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>>>>>>>>>> Cc: David Hildenbrand <david@kernel.org>
>>>>>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>>>>>>> Cc: linux-mm@kvack.org
>>>>>>>>>>>>> Cc: linux-cxl@vger.kernel.org
>>>>>>>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>>>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>  mm/memremap.c | 2 ++
>>>>>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>>>>>>>>> --- a/mm/memremap.c
>>>>>>>>>>>>> +++ b/mm/memremap.c
>>>>>>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>>>>>>>>>>  	case MEMORY_DEVICE_COHERENT:
>>>>>>>>>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>>>>>>>>>>>>  			break;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +		folio_split_unref(folio);
>>>>>>>>>>>>>  		pgmap->ops->folio_free(folio);
>>>>>>>>>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>>>>>>>>  		break;
>>>>>>>>>>>>
>>>>>>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>>>>>>>>> which checks the folio order and act upon that.
>>>>>>>>>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
>>>>>>>>>>
>>>>>>>>>> 'let the driver handle the split?' - I had consisder this as an option.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Passing an order parameter might be better to avoid exposing core MM internals
>>>>>>>>>>> by asking drivers to undo compound pages.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>>>>>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>>>>>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
>>>>>>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>>>>>>>>> split the folio, so I’m leaning toward moving this call into the
>>>>>>>>>> driver’s folio_free function.
>>>>>>>>>
>>>>>>>>> No, that creates asymmetric page handling and is error prone.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree it is asymmetric and symmetric is likely better.
>>>>>>>>
>>>>>>>>> In addition, looking at nouveau’s implementation in
>>>>>>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
>>>>>>>>> which is never split, and passes it to zone_device_folio_init(). This
>>>>>>>>> is wrong, since if the folio is large, it will go through prep_compound_page()
>>>>>>>>> again. The bug has not manifested because there is only order-9 large folios.
>>>>>>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
>>>>>>>>> from a free order-9 folio? Maintain a per-order free folio list and
>>>>>>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>>>>>>>>
>>>>>>>> The way Nouveau handles memory allocations here looks wrong to me—it
>>>>>>>> should probably use DRM Buddy and convert a block buddy to pages rather
>>>>>>>> than tracking a free folio list and free page list. But this is not my
>>>>>>>> driver.
>>>>>>>>
>>>>>>>>> is wrong by calling prep_compound_page() on a folio (already compound page).
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don’t disagree that this implementation is questionable.
>>>>>>>>
>>>>>>>> So what’s the suggestion here—add folio order to folio_free just to
>>>>>>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
>>>>>>>> doesn’t seem right to me either.
>>>>>>>
>>>>>>> Splitting the folio in free_zone_device_folio() and passing folio order
>>>>>>> to folio_free() make sense to me, since after the split, the folio passed
>>>>>>
>>>>>> If this is concensous / direction - I can do this but a tree wide
>>>>>> change.
>>>>>>
>>>>>> I do have another question for everyone here - do we think this spliting
>>>>>> implementation should be considered a Fixes so this can go into 6.19?
>>>>>
>>>>> IMHO, this should be a fix, since it is wrong to call prep_compound_page()
>>>>> on a large folio. IIUC this seems to only affect nouveau now, I will let
>>>>> them to decide.
>>>>>
>>>>
>>>> Agreed, free_zone_device_folio() needs to split the folio on put.
>>>>
>>>>
>>>>>>
>>>>>>> to folio_free() contains no order information, but just the used-to-be
>>>>>>> head page and the remaining 511 pages are free. How does Intel Xe driver
>>>>>>> handle it without knowing folio order?
>>>>>>>
>>>>>>
>>>>>> It’s a bit convoluted, but folio/page->zone_device_data points to a
>>>>>> reference-counted object in GPU SVM. When the object’s reference count
>>>>>> drops to zero, we callback into the driver layer to release the memory.
>>>>>> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
>>>>>> is then released. If it’s not clear, our original allocation size
>>>>>> determines the granularity at which we free the backing store.
>>>>>>
>>>>>>> Do we really need the order info in ->folio_free() if the folio is split
>>>>>>> in free_zone_device_folio()? free_zone_device_folio() should just call
>>>>>>> ->folio_free() 2^order times to free individual page.
>>>>>>>
>>>>>>
>>>>>> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
>>>>>> reference to our GPU SVM object, so we can free the backing in a single
>>>>>> ->folio_free call.
>>>>>>
>>>>>> Now, if that folio gets split at some point into 4KB pages, then we’d
>>>>>> have 512 references to this object set up in the ->folio_split calls.
>>>>>> We’d then expect 512 ->folio_free() calls. Same case here: if, for
>>>>>> whatever reason, we can’t create a 2MB device page during a 2MB
>>>>>> migration and need to create 512 4KB pages so we'd have 512 references
>>>>>> to our GPU SVM object.
>>>>>
>>>>
>>>> I still don't follow why the folio_order does not capture the order of the folio.
>>>> If the folio is split, we should now have 512 split folios for THP
>>>
>>> folio_order() should return 0 after the folio is split.
>>>
>>> In terms of the number of after-split folios, it is 512 for current code base
>>> since THP is only 2MB in zone devices, but not future proof if mTHP support
>>> is added. It also causes confusion in core MM, where folio can have
>>> all kinds of orders.
>>>
>>>
>>
>> I see that folio_split_unref() to see that there is no driver
>> callback during the split. Patch 3 controls the order of
>>
>> +		folio_split_unref(folio);
>>  		pgmap->ops->folio_free(folio);
>>
>> @Matthew, is there a reason to do the split prior to free? pgmap->ops->folio_free(folio)
>> shouldn't impact the folio itself, the backing memory can be freed and then the
>> folio split?
> 
> Quote Matthew from [1]:
> 
> ... this step must be done before calling folio_free and include a barrier,
> as the page can be immediately reallocated.
> 
> [1] https://lore.kernel.org/all/aV8TuK5255NXd2PS@lstrano-desk.jf.intel.com/
> 

Thanks, I am not a TTM/BO expert

So that leaves us with

1. Pass the order to folio_free()
2. Consider calling folio_free() callback for each split folio during folio_split_unref(),
   but that means the driver needs to consolidate all the relevant information

#1 works, but the information there is stale, in the sense that we are passing in the
old order information, the order is useful for the driver to know the size of it's
backing allocation
#2 should work too, but it means PMD_ORDER frees as opposed to 1

Balbir

next prev parent reply	other threads:[~2026-01-09 22:36 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260109085605.443316-1-francois.dugast@intel.com>
2026-01-09  8:54 ` [PATCH v3 1/7] mm: Add folio_split_unref helper Francois Dugast
2026-01-09 13:19   ` David Hildenbrand (Red Hat)
2026-01-09 13:26     ` David Hildenbrand (Red Hat)
2026-01-09 14:30       ` Zi Yan
2026-01-09 15:11         ` David Hildenbrand (Red Hat)
2026-01-09 18:38           ` Matthew Brost
2026-01-09 18:37     ` Andrew Morton
2026-01-09 18:41       ` Zi Yan
2026-01-09 18:54         ` Francois Dugast
2026-01-09 18:43       ` Matthew Brost
2026-01-09 19:22         ` Andrew Morton
2026-01-09 19:26           ` Liam R. Howlett
2026-01-09  8:54 ` [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing Francois Dugast
2026-01-09 11:09   ` Mika Penttilä
2026-01-09 17:28     ` Zi Yan
2026-01-09 18:26       ` Matthew Brost
2026-01-09 18:53         ` Zi Yan
2026-01-09 19:08           ` Matthew Brost
2026-01-09 19:23             ` Zi Yan
2026-01-09 20:03               ` Matthew Brost
2026-01-09 20:15                 ` Zi Yan
2026-01-09 21:34                   ` Balbir Singh
2026-01-09 21:43                     ` Zi Yan
2026-01-09 22:11                       ` Balbir Singh
2026-01-09 22:14                         ` Zi Yan
2026-01-09 22:36                           ` Balbir Singh [this message]
2026-01-09 23:15                             ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ead4661f-5162-40e8-a821-647c05745de0@nvidia.com \
    --to=balbirs@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=mpenttil@redhat.com \
    --cc=osalvador@suse.de \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox