Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Balbir Singh <balbirs@nvidia.com>
To: Zi Yan <ziy@nvidia.com>
Cc: "Matthew Brost" <matthew.brost@intel.com>,
	"Mika Penttilä" <mpenttil@redhat.com>,
	"Francois Dugast" <francois.dugast@intel.com>,
	intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Alistair Popple" <apopple@nvidia.com>,
	"David Hildenbrand" <david@kernel.org>,
	"Oscar Salvador" <osalvador@suse.de>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing
Date: Sat, 10 Jan 2026 09:11:32 +1100	[thread overview]
Message-ID: <5c9f17ce-7174-4e74-92d7-8249f309f756@nvidia.com> (raw)
In-Reply-To: <18E78790-4996-4151-821B-8A1D784A3F7D@nvidia.com>

On 1/10/26 07:43, Zi Yan wrote:
> On 9 Jan 2026, at 16:34, Balbir Singh wrote:
> 
>> On 1/10/26 06:15, Zi Yan wrote:
>>> On 9 Jan 2026, at 15:03, Matthew Brost wrote:
>>>
>>>> On Fri, Jan 09, 2026 at 02:23:49PM -0500, Zi Yan wrote:
>>>>> On 9 Jan 2026, at 14:08, Matthew Brost wrote:
>>>>>
>>>>>> On Fri, Jan 09, 2026 at 01:53:33PM -0500, Zi Yan wrote:
>>>>>>> On 9 Jan 2026, at 13:26, Matthew Brost wrote:
>>>>>>>
>>>>>>>> On Fri, Jan 09, 2026 at 12:28:22PM -0500, Zi Yan wrote:
>>>>>>>>> On 9 Jan 2026, at 6:09, Mika Penttilä wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On 1/9/26 10:54, Francois Dugast wrote:
>>>>>>>>>>
>>>>>>>>>>> From: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>>
>>>>>>>>>>> Split device-private and coherent folios into individual pages before
>>>>>>>>>>> freeing so that any order folio can be formed upon the next use of the
>>>>>>>>>>> pages.
>>>>>>>>>>>
>>>>>>>>>>> Cc: Balbir Singh <balbirs@nvidia.com>
>>>>>>>>>>> Cc: Alistair Popple <apopple@nvidia.com>
>>>>>>>>>>> Cc: Zi Yan <ziy@nvidia.com>
>>>>>>>>>>> Cc: David Hildenbrand <david@kernel.org>
>>>>>>>>>>> Cc: Oscar Salvador <osalvador@suse.de>
>>>>>>>>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>>>>>>>>> Cc: linux-mm@kvack.org
>>>>>>>>>>> Cc: linux-cxl@vger.kernel.org
>>>>>>>>>>> Cc: linux-kernel@vger.kernel.org
>>>>>>>>>>> Signed-off-by: Matthew Brost <matthew.brost@intel.com>
>>>>>>>>>>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>>>>>>>>>>> ---
>>>>>>>>>>>  mm/memremap.c | 2 ++
>>>>>>>>>>>  1 file changed, 2 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/mm/memremap.c b/mm/memremap.c
>>>>>>>>>>> index 63c6ab4fdf08..7289cdd6862f 100644
>>>>>>>>>>> --- a/mm/memremap.c
>>>>>>>>>>> +++ b/mm/memremap.c
>>>>>>>>>>> @@ -453,6 +453,8 @@ void free_zone_device_folio(struct folio *folio)
>>>>>>>>>>>  	case MEMORY_DEVICE_COHERENT:
>>>>>>>>>>>  		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->folio_free))
>>>>>>>>>>>  			break;
>>>>>>>>>>> +
>>>>>>>>>>> +		folio_split_unref(folio);
>>>>>>>>>>>  		pgmap->ops->folio_free(folio);
>>>>>>>>>>>  		percpu_ref_put_many(&folio->pgmap->ref, nr);
>>>>>>>>>>>  		break;
>>>>>>>>>>
>>>>>>>>>> This breaks folio_free implementations like nouveau_dmem_folio_free
>>>>>>>>>> which checks the folio order and act upon that.
>>>>>>>>>> Maybe add an order parameter to folio_free or let the driver handle the split?
>>>>>>>>
>>>>>>>> 'let the driver handle the split?' - I had consisder this as an option.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Passing an order parameter might be better to avoid exposing core MM internals
>>>>>>>>> by asking drivers to undo compound pages.
>>>>>>>>>
>>>>>>>>
>>>>>>>> It looks like Nouveau tracks free folios and free pages—something Xe’s
>>>>>>>> device memory allocator (DRM Buddy) cannot do. I guess this answers my
>>>>>>>> earlier question of how Nouveau avoids hitting the same bug as Xe / GPU
>>>>>>>> SVM with respect to reusing folios. It appears Nouveau prefers not to
>>>>>>>> split the folio, so I’m leaning toward moving this call into the
>>>>>>>> driver’s folio_free function.
>>>>>>>
>>>>>>> No, that creates asymmetric page handling and is error prone.
>>>>>>>
>>>>>>
>>>>>> I agree it is asymmetric and symmetric is likely better.
>>>>>>
>>>>>>> In addition, looking at nouveau’s implementation in
>>>>>>> nouveau_dmem_page_alloc_locked(), it gets a folio from drm->dmem->free_folios,
>>>>>>> which is never split, and passes it to zone_device_folio_init(). This
>>>>>>> is wrong, since if the folio is large, it will go through prep_compound_page()
>>>>>>> again. The bug has not manifested because there is only order-9 large folios.
>>>>>>> Once mTHP support is added, how is nouveau going to allocate a order-4 folio
>>>>>>> from a free order-9 folio? Maintain a per-order free folio list and
>>>>>>> reimplement a buddy allocator? Nevertheless, nouveau’s implementation
>>>>>>
>>>>>> The way Nouveau handles memory allocations here looks wrong to me—it
>>>>>> should probably use DRM Buddy and convert a block buddy to pages rather
>>>>>> than tracking a free folio list and free page list. But this is not my
>>>>>> driver.
>>>>>>
>>>>>>> is wrong by calling prep_compound_page() on a folio (already compound page).
>>>>>>>
>>>>>>
>>>>>> I don’t disagree that this implementation is questionable.
>>>>>>
>>>>>> So what’s the suggestion here—add folio order to folio_free just to
>>>>>> accommodate Nouveau’s rather odd memory allocation algorithm? That
>>>>>> doesn’t seem right to me either.
>>>>>
>>>>> Splitting the folio in free_zone_device_folio() and passing folio order
>>>>> to folio_free() make sense to me, since after the split, the folio passed
>>>>
>>>> If this is concensous / direction - I can do this but a tree wide
>>>> change.
>>>>
>>>> I do have another question for everyone here - do we think this spliting
>>>> implementation should be considered a Fixes so this can go into 6.19?
>>>
>>> IMHO, this should be a fix, since it is wrong to call prep_compound_page()
>>> on a large folio. IIUC this seems to only affect nouveau now, I will let
>>> them to decide.
>>>
>>
>> Agreed, free_zone_device_folio() needs to split the folio on put.
>>
>>
>>>>
>>>>> to folio_free() contains no order information, but just the used-to-be
>>>>> head page and the remaining 511 pages are free. How does Intel Xe driver
>>>>> handle it without knowing folio order?
>>>>>
>>>>
>>>> It’s a bit convoluted, but folio/page->zone_device_data points to a
>>>> reference-counted object in GPU SVM. When the object’s reference count
>>>> drops to zero, we callback into the driver layer to release the memory.
>>>> In Xe, this is a TTM BO that resolves to a DRM Buddy allocation, which
>>>> is then released. If it’s not clear, our original allocation size
>>>> determines the granularity at which we free the backing store.
>>>>
>>>>> Do we really need the order info in ->folio_free() if the folio is split
>>>>> in free_zone_device_folio()? free_zone_device_folio() should just call
>>>>> ->folio_free() 2^order times to free individual page.
>>>>>
>>>>
>>>> No. If it’s a higher-order folio—let’s say a 2MB folio—we have one
>>>> reference to our GPU SVM object, so we can free the backing in a single
>>>> ->folio_free call.
>>>>
>>>> Now, if that folio gets split at some point into 4KB pages, then we’d
>>>> have 512 references to this object set up in the ->folio_split calls.
>>>> We’d then expect 512 ->folio_free() calls. Same case here: if, for
>>>> whatever reason, we can’t create a 2MB device page during a 2MB
>>>> migration and need to create 512 4KB pages so we'd have 512 references
>>>> to our GPU SVM object.
>>>
>>
>> I still don't follow why the folio_order does not capture the order of the folio.
>> If the folio is split, we should now have 512 split folios for THP
> 
> folio_order() should return 0 after the folio is split.
> 
> In terms of the number of after-split folios, it is 512 for current code base
> since THP is only 2MB in zone devices, but not future proof if mTHP support
> is added. It also causes confusion in core MM, where folio can have
> all kinds of orders.
> 
> 

I see that folio_split_unref() to see that there is no driver
callback during the split. Patch 3 controls the order of

+		folio_split_unref(folio);
 		pgmap->ops->folio_free(folio);

@Matthew, is there a reason to do the split prior to free? pgmap->ops->folio_free(folio)
shouldn't impact the folio itself, the backing memory can be freed and then the
folio split?


Balbir

next prev parent reply	other threads:[~2026-01-09 22:11 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260109085605.443316-1-francois.dugast@intel.com>
2026-01-09  8:54 ` [PATCH v3 1/7] mm: Add folio_split_unref helper Francois Dugast
2026-01-09 13:19   ` David Hildenbrand (Red Hat)
2026-01-09 13:26     ` David Hildenbrand (Red Hat)
2026-01-09 14:30       ` Zi Yan
2026-01-09 15:11         ` David Hildenbrand (Red Hat)
2026-01-09 18:38           ` Matthew Brost
2026-01-09 18:37     ` Andrew Morton
2026-01-09 18:41       ` Zi Yan
2026-01-09 18:54         ` Francois Dugast
2026-01-09 18:43       ` Matthew Brost
2026-01-09 19:22         ` Andrew Morton
2026-01-09 19:26           ` Liam R. Howlett
2026-01-09  8:54 ` [PATCH v3 3/7] mm: Split device-private and coherent folios before freeing Francois Dugast
2026-01-09 11:09   ` Mika Penttilä
2026-01-09 17:28     ` Zi Yan
2026-01-09 18:26       ` Matthew Brost
2026-01-09 18:53         ` Zi Yan
2026-01-09 19:08           ` Matthew Brost
2026-01-09 19:23             ` Zi Yan
2026-01-09 20:03               ` Matthew Brost
2026-01-09 20:15                 ` Zi Yan
2026-01-09 21:34                   ` Balbir Singh
2026-01-09 21:43                     ` Zi Yan
2026-01-09 22:11                       ` Balbir Singh [this message]
2026-01-09 22:14                         ` Zi Yan
2026-01-09 22:36                           ` Balbir Singh
2026-01-09 23:15                             ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c9f17ce-7174-4e74-92d7-8249f309f756@nvidia.com \
    --to=balbirs@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=francois.dugast@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=mpenttil@redhat.com \
    --cc=osalvador@suse.de \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox