linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Balbir Singh <balbirs@nvidia.com>, linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, "Karol Herbst" <kherbst@redhat.com>,
	"Lyude Paul" <lyude@redhat.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"David Airlie" <airlied@gmail.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Shuah Khan" <shuah@kernel.org>, "Barry Song" <baohua@kernel.org>,
	"Baolin Wang" <baolin.wang@linux.alibaba.com>,
	"Ryan Roberts" <ryan.roberts@arm.com>,
	"Matthew Wilcox" <willy@infradead.org>,
	"Peter Xu" <peterx@redhat.com>, "Zi Yan" <ziy@nvidia.com>,
	"Kefeng Wang" <wangkefeng.wang@huawei.com>,
	"Jane Chu" <jane.chu@oracle.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Donet Tom" <donettom@linux.ibm.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"Mika Penttilä" <mpenttil@redhat.com>,
	"Matthew Brost" <matthew.brost@intel.com>,
	"Francois Dugast" <francois.dugast@intel.com>
Subject: Re: [v2 01/11] mm/zone_device: support large zone device private folios
Date: Wed, 30 Jul 2025 11:50:12 +0200	[thread overview]
Message-ID: <dbebbba0-3c59-4ee1-b32c-4b9f6ed90d92@redhat.com> (raw)
In-Reply-To: <20250730092139.3890844-2-balbirs@nvidia.com>

On 30.07.25 11:21, Balbir Singh wrote:
> Add routines to support allocation of large order zone device folios
> and helper functions for zone device folios, to check if a folio is
> device private and helpers for setting zone device data.
> 
> When large folios are used, the existing page_free() callback in
> pgmap is called when the folio is freed, this is true for both
> PAGE_SIZE and higher order pages.
> 
> Cc: Karol Herbst <kherbst@redhat.com>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Simona Vetter <simona@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Barry Song <baohua@kernel.org>
> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
> Cc: Jane Chu <jane.chu@oracle.com>
> Cc: Alistair Popple <apopple@nvidia.com>
> Cc: Donet Tom <donettom@linux.ibm.com>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Mika Penttilä <mpenttil@redhat.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Francois Dugast <francois.dugast@intel.com>
> 
> Signed-off-by: Balbir Singh <balbirs@nvidia.com>
> ---
>   include/linux/memremap.h | 10 ++++++++-
>   mm/memremap.c            | 48 +++++++++++++++++++++++++++++-----------
>   2 files changed, 44 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
> index 4aa151914eab..a0723b35eeaa 100644
> --- a/include/linux/memremap.h
> +++ b/include/linux/memremap.h
> @@ -199,7 +199,7 @@ static inline bool folio_is_fsdax(const struct folio *folio)
>   }
>   
>   #ifdef CONFIG_ZONE_DEVICE
> -void zone_device_page_init(struct page *page);
> +void zone_device_folio_init(struct folio *folio, unsigned int order);
>   void *memremap_pages(struct dev_pagemap *pgmap, int nid);
>   void memunmap_pages(struct dev_pagemap *pgmap);
>   void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap);
> @@ -209,6 +209,14 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
>   bool pgmap_pfn_valid(struct dev_pagemap *pgmap, unsigned long pfn);
>   
>   unsigned long memremap_compat_align(void);
> +
> +static inline void zone_device_page_init(struct page *page)
> +{
> +	struct folio *folio = page_folio(page);
> +
> +	zone_device_folio_init(folio, 0);
> +}
> +
>   #else
>   static inline void *devm_memremap_pages(struct device *dev,
>   		struct dev_pagemap *pgmap)
> diff --git a/mm/memremap.c b/mm/memremap.c
> index b0ce0d8254bd..3ca136e7455e 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -427,20 +427,19 @@ EXPORT_SYMBOL_GPL(get_dev_pagemap);
>   void free_zone_device_folio(struct folio *folio)
>   {
>   	struct dev_pagemap *pgmap = folio->pgmap;
> +	unsigned int nr = folio_nr_pages(folio);
> +	int i;

"unsigned long" is to be future-proof.

(folio_nr_pages() returns long and probably soon unsigned long)

[ I'd probably all it "nr_pages" ]

>   
>   	if (WARN_ON_ONCE(!pgmap))
>   		return;
>   
>   	mem_cgroup_uncharge(folio);
>   
> -	/*
> -	 * Note: we don't expect anonymous compound pages yet. Once supported
> -	 * and we could PTE-map them similar to THP, we'd have to clear
> -	 * PG_anon_exclusive on all tail pages.
> -	 */
>   	if (folio_test_anon(folio)) {
> -		VM_BUG_ON_FOLIO(folio_test_large(folio), folio);
> -		__ClearPageAnonExclusive(folio_page(folio, 0));
> +		for (i = 0; i < nr; i++)
> +			__ClearPageAnonExclusive(folio_page(folio, i));
> +	} else {
> +		VM_WARN_ON_ONCE(folio_test_large(folio));
>   	}
>   
>   	/*
> @@ -464,11 +463,20 @@ void free_zone_device_folio(struct folio *folio)
>   
>   	switch (pgmap->type) {
>   	case MEMORY_DEVICE_PRIVATE:
> +		if (folio_test_large(folio)) {

Could do "nr > 1" if we already have that value around.

> +			folio_unqueue_deferred_split(folio);

I think I asked that already but maybe missed the reply: Should these 
folios ever be added to the deferred split queue and is there any value 
in splitting them under memory pressure in the shrinker?

My gut feeling is "No", because the buddy cannot make use of these 
folios, but maybe there is an interesting case where we want that behavior?

> +
> +			percpu_ref_put_many(&folio->pgmap->ref, nr - 1);
> +		}
> +		pgmap->ops->page_free(&folio->page);
> +		percpu_ref_put(&folio->pgmap->ref);

Coold you simply do a

	percpu_ref_put_many(&folio->pgmap->ref, nr);

here, or would that be problematic?

> +		folio->page.mapping = NULL;
> +		break;
>   	case MEMORY_DEVICE_COHERENT:
>   		if (WARN_ON_ONCE(!pgmap->ops || !pgmap->ops->page_free))
>   			break;
> -		pgmap->ops->page_free(folio_page(folio, 0));
> -		put_dev_pagemap(pgmap);
> +		pgmap->ops->page_free(&folio->page);
> +		percpu_ref_put(&folio->pgmap->ref);
>   		break;
>   
>   	case MEMORY_DEVICE_GENERIC:
> @@ -491,14 +499,28 @@ void free_zone_device_folio(struct folio *folio)
>   	}
>   }
>   
> -void zone_device_page_init(struct page *page)
> +void zone_device_folio_init(struct folio *folio, unsigned int order)
>   {
> +	struct page *page = folio_page(folio, 0);
> +
> +	VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES);
> +
> +	/*
> +	 * Only PMD level migration is supported for THP migration
> +	 */

Talking about something that does not exist yet (and is very specific) 
sounds a bit weird.

Should this go into a different patch, or could we rephrase the comment 
to be a bit more generic?

In this patch here, nothing would really object to "order" being 
intermediate.

(also, this is a device_private limitation? shouldn't that check go 
somehwere where we can perform this device-private limitation check?)

> +	WARN_ON_ONCE(order && order != HPAGE_PMD_ORDER);
> +
>   	/*
>   	 * Drivers shouldn't be allocating pages after calling
>   	 * memunmap_pages().
>   	 */
> -	WARN_ON_ONCE(!percpu_ref_tryget_live(&page_pgmap(page)->ref));
> -	set_page_count(page, 1);
> +	WARN_ON_ONCE(!percpu_ref_tryget_many(&page_pgmap(page)->ref, 1 << order));
> +	folio_set_count(folio, 1);
>   	lock_page(page);
> +
> +	if (order > 1) {
> +		prep_compound_page(page, order);
> +		folio_set_large_rmappable(folio);
> +	}
>   }
> -EXPORT_SYMBOL_GPL(zone_device_page_init);
> +EXPORT_SYMBOL_GPL(zone_device_folio_init);


-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-07-30  9:50 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-30  9:21 [v2 00/11] THP support for zone device page migration Balbir Singh
2025-07-30  9:21 ` [v2 01/11] mm/zone_device: support large zone device private folios Balbir Singh
2025-07-30  9:50   ` David Hildenbrand [this message]
2025-08-04 23:43     ` Balbir Singh
2025-08-05  4:22     ` Balbir Singh
2025-08-05 10:57       ` David Hildenbrand
2025-08-05 11:01         ` Balbir Singh
2025-08-05 12:58           ` David Hildenbrand
2025-08-05 21:15             ` Matthew Brost
2025-08-06 12:19               ` Balbir Singh
2025-07-30  9:21 ` [v2 02/11] mm/thp: zone_device awareness in THP handling code Balbir Singh
2025-07-30 11:16   ` Mika Penttilä
2025-07-30 11:27     ` Zi Yan
2025-07-30 11:30       ` Zi Yan
2025-07-30 11:42         ` Mika Penttilä
2025-07-30 12:08           ` Mika Penttilä
2025-07-30 12:25             ` Zi Yan
2025-07-30 12:49               ` Mika Penttilä
2025-07-30 15:10                 ` Zi Yan
2025-07-30 15:40                   ` Mika Penttilä
2025-07-30 15:58                     ` Zi Yan
2025-07-30 16:29                       ` Mika Penttilä
2025-07-31  7:15                         ` David Hildenbrand
2025-07-31  8:39                           ` Balbir Singh
2025-07-31 11:26                           ` Zi Yan
2025-07-31 12:32                             ` David Hildenbrand
2025-07-31 13:34                               ` Zi Yan
2025-07-31 19:09                                 ` David Hildenbrand
2025-08-01  0:49                             ` Balbir Singh
2025-08-01  1:09                               ` Zi Yan
2025-08-01  7:01                                 ` David Hildenbrand
2025-08-01  1:16                               ` Mika Penttilä
2025-08-01  4:44                                 ` Balbir Singh
2025-08-01  5:57                                   ` Balbir Singh
2025-08-01  6:01                                   ` Mika Penttilä
2025-08-01  7:04                                   ` David Hildenbrand
2025-08-01  8:01                                     ` Balbir Singh
2025-08-01  8:46                                       ` David Hildenbrand
2025-08-01 11:10                                         ` Zi Yan
2025-08-01 12:20                                           ` Mika Penttilä
2025-08-01 12:28                                             ` Zi Yan
2025-08-02  1:17                                               ` Balbir Singh
2025-08-02 10:37                                               ` Balbir Singh
2025-08-02 12:13                                                 ` Mika Penttilä
2025-08-04 22:46                                                   ` Balbir Singh
2025-08-04 23:26                                                     ` Mika Penttilä
2025-08-05  4:10                                                       ` Balbir Singh
2025-08-05  4:24                                                         ` Mika Penttilä
2025-08-05  5:19                                                           ` Mika Penttilä
2025-08-05 10:27                                                           ` Balbir Singh
2025-08-05 10:35                                                             ` Mika Penttilä
2025-08-05 10:36                                                               ` Balbir Singh
2025-08-05 10:46                                                                 ` Mika Penttilä
2025-07-30 20:05   ` kernel test robot
2025-07-30  9:21 ` [v2 03/11] mm/migrate_device: THP migration of zone device pages Balbir Singh
2025-07-31 16:19   ` kernel test robot
2025-07-30  9:21 ` [v2 04/11] mm/memory/fault: add support for zone device THP fault handling Balbir Singh
2025-07-30  9:21 ` [v2 05/11] lib/test_hmm: test cases and support for zone device private THP Balbir Singh
2025-07-31 11:17   ` kernel test robot
2025-07-30  9:21 ` [v2 06/11] mm/memremap: add folio_split support Balbir Singh
2025-07-30  9:21 ` [v2 07/11] mm/thp: add split during migration support Balbir Singh
2025-07-31 10:04   ` kernel test robot
2025-07-30  9:21 ` [v2 08/11] lib/test_hmm: add test case for split pages Balbir Singh
2025-07-30  9:21 ` [v2 09/11] selftests/mm/hmm-tests: new tests for zone device THP migration Balbir Singh
2025-07-30  9:21 ` [v2 10/11] gpu/drm/nouveau: add THP migration support Balbir Singh
2025-07-30  9:21 ` [v2 11/11] selftests/mm/hmm-tests: new throughput tests including THP Balbir Singh
2025-07-30 11:30 ` [v2 00/11] THP support for zone device page migration David Hildenbrand
2025-07-30 23:18   ` Alistair Popple
2025-07-31  8:41   ` Balbir Singh
2025-07-31  8:56     ` David Hildenbrand
2025-08-05 21:34 ` Matthew Brost

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dbebbba0-3c59-4ee1-b32c-4b9f6ed90d92@redhat.com \
    --to=david@redhat.com \
    --cc=airlied@gmail.com \
    --cc=apopple@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=dakr@kernel.org \
    --cc=donettom@linux.ibm.com \
    --cc=francois.dugast@intel.com \
    --cc=jane.chu@oracle.com \
    --cc=jglisse@redhat.com \
    --cc=kherbst@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lyude@redhat.com \
    --cc=matthew.brost@intel.com \
    --cc=mpenttil@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rcampbell@nvidia.com \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=simona@ffwll.ch \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox