[PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
@ 2024-11-05  3:29 John Hubbard
  2024-11-05  3:29 ` [PATCH v2 1/1] [PATCH] " John Hubbard
  2024-11-05  8:42 ` [PATCH v2 0/1] " David Hildenbrand
  0 siblings, 2 replies; 8+ messages in thread
From: John Hubbard @ 2024-11-05  3:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, linux-mm, John Hubbard, David Hildenbrand, Vivek Kasireddy,
	Dave Airlie, Gerd Hoffmann, Matthew Wilcox, Christoph Hellwig,
	Jason Gunthorpe, Peter Xu, Arnd Bergmann, Daniel Vetter,
	Dongwon Kim, Hugh Dickins, Junxiao Chang, Mike Kravetz,
	Oscar Salvador, linux-stable

This applies to today's mm-hotfixes-unstable (only). In order to test
this, my earlier patch is a prequisite: commit 255231c75dcd ("mm/gup:
stop leaking pinned pages in low memory conditions").

Changes since v1 [1]:

1) Completely different implementation: instead of changing the
allocator from kmalloc() to kvmalloc(), just avoid allocations entirely.

Note that David's original suggestion [2] included something that I've
left out for now, mostly because it's a pre-existing question and
deserves its own patch. But also, I don't understand it yet, either.


[1] https://lore.kernel.org/20241030030116.670307-1-jhubbard@nvidia.com

[2] https://lore.kernel.org/8d9dc103-47c5-4719-971a-31efb091432a@redhat.com


thanks,
John Hubbard

Cc: David Hildenbrand <david@redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: linux-stable@vger.kernel.org

John Hubbard (1):
  mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases

 mm/gup.c | 116 ++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 77 insertions(+), 39 deletions(-)


base-commit: 0012ab094cad019afcd07bdfc28e2946537e715c
-- 
2.47.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/1] [PATCH] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
  2024-11-05  3:29 [PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases John Hubbard
@ 2024-11-05  3:29 ` John Hubbard
  2024-11-05  8:47   ` David Hildenbrand
  2024-11-06  9:23   ` Oscar Salvador
  2024-11-05  8:42 ` [PATCH v2 0/1] " David Hildenbrand
  1 sibling, 2 replies; 8+ messages in thread
From: John Hubbard @ 2024-11-05  3:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: LKML, linux-mm, John Hubbard, David Hildenbrand, Vivek Kasireddy,
	Dave Airlie, Gerd Hoffmann, Matthew Wilcox, Christoph Hellwig,
	Jason Gunthorpe, Peter Xu, Arnd Bergmann, Daniel Vetter,
	Dongwon Kim, Hugh Dickins, Junxiao Chang, Mike Kravetz,
	Oscar Salvador, linux-stable

commit 53ba78de064b ("mm/gup: introduce
check_and_migrate_movable_folios()") created a new constraint on the
pin_user_pages*() API family: a potentially large internal allocation
must now occur, for FOLL_LONGTERM cases.

A user-visible consequence has now appeared: user space can no longer
pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB
PAGE_SIZE system, when user space tries to (indirectly, via a device
driver that calls pin_user_pages()) pin 2GB, this requires an allocation
of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for
kmalloc().

In addition to the directly visible effect described above, there is
also the problem of adding an unnecessary allocation. The **pages array
argument has already been allocated, and there is no need for a
redundant **folios array allocation in this case.

Fix this by avoiding the new allocation entirely. This is done by
referring to either the original page[i] within **pages, or to the
associated folio. Thanks to David Hildenbrand for suggesting this
approach and for providing the initial implementation (which I've tested
and adjusted slightly) as well.

Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()")
Suggested-by: David Hildenbrand <david@redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: linux-stable@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 mm/gup.c | 116 ++++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 77 insertions(+), 39 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 4637dab7b54f..0f5121ad0b9f 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2273,20 +2273,57 @@ struct page *get_dump_page(unsigned long addr)
 #endif /* CONFIG_ELF_CORE */
 
 #ifdef CONFIG_MIGRATION
+
+/*
+ * An array of either pages or folios ("pofs"). Although it may seem tempting to
+ * avoid this complication, by simply interpreting a list of folios as a list of
+ * pages, that approach won't work in the longer term, because eventually the
+ * layouts of struct page and struct folio will become completely different.
+ * Furthermore, this pof approach avoids excessive page_folio() calls.
+ */
+struct pages_or_folios {
+	union {
+		struct page **pages;
+		struct folio **folios;
+		void **entries;
+	};
+	bool has_folios;
+	long nr_entries;
+};
+
+static struct folio *pofs_get_folio(struct pages_or_folios *pofs, long i)
+{
+	if (pofs->has_folios)
+		return pofs->folios[i];
+	return page_folio(pofs->pages[i]);
+}
+
+static void pofs_clear_entry(struct pages_or_folios *pofs, long i)
+{
+	pofs->entries[i] = NULL;
+}
+
+static void pofs_unpin(struct pages_or_folios *pofs)
+{
+	if (pofs->has_folios)
+		unpin_folios(pofs->folios, pofs->nr_entries);
+	else
+		unpin_user_pages(pofs->pages, pofs->nr_entries);
+}
+
 /*
  * Returns the number of collected folios. Return value is always >= 0.
  */
 static unsigned long collect_longterm_unpinnable_folios(
-					struct list_head *movable_folio_list,
-					unsigned long nr_folios,
-					struct folio **folios)
+		struct list_head *movable_folio_list,
+		struct pages_or_folios *pofs)
 {
 	unsigned long i, collected = 0;
 	struct folio *prev_folio = NULL;
 	bool drain_allow = true;
 
-	for (i = 0; i < nr_folios; i++) {
-		struct folio *folio = folios[i];
+	for (i = 0; i < pofs->nr_entries; i++) {
+		struct folio *folio = pofs_get_folio(pofs, i);
 
 		if (folio == prev_folio)
 			continue;
@@ -2327,16 +2364,15 @@ static unsigned long collect_longterm_unpinnable_folios(
  * Returns -EAGAIN if all folios were successfully migrated or -errno for
  * failure (or partial success).
  */
-static int migrate_longterm_unpinnable_folios(
-					struct list_head *movable_folio_list,
-					unsigned long nr_folios,
-					struct folio **folios)
+static int
+migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
+				   struct pages_or_folios *pofs)
 {
 	int ret;
 	unsigned long i;
 
-	for (i = 0; i < nr_folios; i++) {
-		struct folio *folio = folios[i];
+	for (i = 0; i < pofs->nr_entries; i++) {
+		struct folio *folio = pofs_get_folio(pofs, i);
 
 		if (folio_is_device_coherent(folio)) {
 			/*
@@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_folios(
 			 * convert the pin on the source folio to a normal
 			 * reference.
 			 */
-			folios[i] = NULL;
+			pofs_clear_entry(pofs, i);
 			folio_get(folio);
 			gup_put_folio(folio, 1, FOLL_PIN);
 
@@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_folios(
 		 * calling folio_isolate_lru() which takes a reference so the
 		 * folio won't be freed if it's migrating.
 		 */
-		unpin_folio(folios[i]);
-		folios[i] = NULL;
+		unpin_folio(pofs_get_folio(pofs, i));
+		pofs_clear_entry(pofs, i);
 	}
 
 	if (!list_empty(movable_folio_list)) {
@@ -2387,12 +2423,26 @@ static int migrate_longterm_unpinnable_folios(
 	return -EAGAIN;
 
 err:
-	unpin_folios(folios, nr_folios);
+	pofs_unpin(pofs);
 	putback_movable_pages(movable_folio_list);
 
 	return ret;
 }
 
+static long
+check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
+{
+	LIST_HEAD(movable_folio_list);
+	unsigned long collected;
+
+	collected =
+		collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
+	if (!collected)
+		return 0;
+
+	return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs);
+}
+
 /*
  * Check whether all folios are *allowed* to be pinned indefinitely (long term).
  * Rather confusingly, all folios in the range are required to be pinned via
@@ -2417,16 +2467,13 @@ static int migrate_longterm_unpinnable_folios(
 static long check_and_migrate_movable_folios(unsigned long nr_folios,
 					     struct folio **folios)
 {
-	unsigned long collected;
-	LIST_HEAD(movable_folio_list);
+	struct pages_or_folios pofs = {
+		.folios = folios,
+		.has_folios = true,
+		.nr_entries = nr_folios,
+	};
 
-	collected = collect_longterm_unpinnable_folios(&movable_folio_list,
-						       nr_folios, folios);
-	if (!collected)
-		return 0;
-
-	return migrate_longterm_unpinnable_folios(&movable_folio_list,
-						  nr_folios, folios);
+	return check_and_migrate_movable_pages_or_folios(&pofs);
 }
 
 /*
@@ -2436,22 +2483,13 @@ static long check_and_migrate_movable_folios(unsigned long nr_folios,
 static long check_and_migrate_movable_pages(unsigned long nr_pages,
 					    struct page **pages)
 {
-	struct folio **folios;
-	long i, ret;
+	struct pages_or_folios pofs = {
+		.pages = pages,
+		.has_folios = false,
+		.nr_entries = nr_pages,
+	};
 
-	folios = kmalloc_array(nr_pages, sizeof(*folios), GFP_KERNEL);
-	if (!folios) {
-		unpin_user_pages(pages, nr_pages);
-		return -ENOMEM;
-	}
-
-	for (i = 0; i < nr_pages; i++)
-		folios[i] = page_folio(pages[i]);
-
-	ret = check_and_migrate_movable_folios(nr_pages, folios);
-
-	kfree(folios);
-	return ret;
+	return check_and_migrate_movable_pages_or_folios(&pofs);
 }
 #else
 static long check_and_migrate_movable_pages(unsigned long nr_pages,
-- 
2.47.0



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
  2024-11-05  3:29 [PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases John Hubbard
  2024-11-05  3:29 ` [PATCH v2 1/1] [PATCH] " John Hubbard
@ 2024-11-05  8:42 ` David Hildenbrand
  2024-11-07  4:57   ` John Hubbard
  1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-11-05  8:42 UTC (permalink / raw)
  To: John Hubbard, Andrew Morton
  Cc: LKML, linux-mm, Vivek Kasireddy, Dave Airlie, Gerd Hoffmann,
	Matthew Wilcox, Christoph Hellwig, Jason Gunthorpe, Peter Xu,
	Arnd Bergmann, Daniel Vetter, Dongwon Kim, Hugh Dickins,
	Junxiao Chang, Mike Kravetz, Oscar Salvador, linux-stable

On 05.11.24 04:29, John Hubbard wrote:
> This applies to today's mm-hotfixes-unstable (only). In order to test
> this, my earlier patch is a prequisite: commit 255231c75dcd ("mm/gup:
> stop leaking pinned pages in low memory conditions").
> 
> Changes since v1 [1]:
> 
> 1) Completely different implementation: instead of changing the
> allocator from kmalloc() to kvmalloc(), just avoid allocations entirely.
> 
> Note that David's original suggestion [2] included something that I've
> left out for now, mostly because it's a pre-existing question and
> deserves its own patch. But also, I don't understand it yet, either.

Yeah, I was only adding it because I stumbled over it. It might not be a 
problem, because we simply "skip" if we find a folio that was already 
isolated (possibly by us). What might happen is that we unnecessarily 
drain the LRU.

__collapse_huge_page_isolate() scans the compound_pagelist() list, 
before try-locking and isolating. But it also just "fails" instead of 
retrying forever.

Imagine the page tables looking like the following (e.g., COW in a 
MAP_PRIVATE file mapping that supports large folios)

		      ------ F0P2 was replaced by a new (small) folio
		     |
[ F0P0 ] [ F0P1 ] [ F1P0 ] [F0P3 ]

F0P0: Folio 0, page 0

Assume we try pinning that range and end up in 
collect_longterm_unpinnable_folios() with:

F0, F0, F1, F0


Assume F0 and F1 are not long-term pinnable.

i = 0: We isolate F0
i = 1: We see that it is the same F0 and skip
i = 2: We isolate F1
i = 3: We see !folio_test_lru() and do a lru_add_drain_all() to then
        fail folio_isolate_lru()

So the drain in i=3 could be avoided by scanning the list, if we already 
isolated that one. Working better than I originally thought.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] [PATCH] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
  2024-11-05  3:29 ` [PATCH v2 1/1] [PATCH] " John Hubbard
@ 2024-11-05  8:47   ` David Hildenbrand
  2024-11-05 21:31     ` John Hubbard
  2024-11-06  9:23   ` Oscar Salvador
  1 sibling, 1 reply; 8+ messages in thread
From: David Hildenbrand @ 2024-11-05  8:47 UTC (permalink / raw)
  To: John Hubbard, Andrew Morton
  Cc: LKML, linux-mm, Vivek Kasireddy, Dave Airlie, Gerd Hoffmann,
	Matthew Wilcox, Christoph Hellwig, Jason Gunthorpe, Peter Xu,
	Arnd Bergmann, Daniel Vetter, Dongwon Kim, Hugh Dickins,
	Junxiao Chang, Mike Kravetz, Oscar Salvador, linux-stable

On 05.11.24 04:29, John Hubbard wrote:
> commit 53ba78de064b ("mm/gup: introduce
> check_and_migrate_movable_folios()") created a new constraint on the
> pin_user_pages*() API family: a potentially large internal allocation
> must now occur, for FOLL_LONGTERM cases.
> 
> A user-visible consequence has now appeared: user space can no longer
> pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB
> PAGE_SIZE system, when user space tries to (indirectly, via a device
> driver that calls pin_user_pages()) pin 2GB, this requires an allocation
> of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for
> kmalloc().
> 
> In addition to the directly visible effect described above, there is
> also the problem of adding an unnecessary allocation. The **pages array
> argument has already been allocated, and there is no need for a
> redundant **folios array allocation in this case.
> 
> Fix this by avoiding the new allocation entirely. This is done by
> referring to either the original page[i] within **pages, or to the
> associated folio. Thanks to David Hildenbrand for suggesting this
> approach and for providing the initial implementation (which I've tested
> and adjusted slightly) as well.
> 
> Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dongwon Kim <dongwon.kim@intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Junxiao Chang <junxiao.chang@intel.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: linux-stable@vger.kernel.org
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---



>   
>   	return ret;
>   }
>   
> +static long
> +check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> +{
> +	LIST_HEAD(movable_folio_list);
> +	unsigned long collected;
> +
> +	collected =
> +		collect_longterm_unpinnable_folios(&movable_folio_list, pofs);

Nit: We're allowed to use more than 80 characters :)

(I would prefer the old way it was split across more lines if we really 
want to split; this way here is less common)


Thanks!

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] [PATCH] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
  2024-11-05  8:47   ` David Hildenbrand
@ 2024-11-05 21:31     ` John Hubbard
  0 siblings, 0 replies; 8+ messages in thread
From: John Hubbard @ 2024-11-05 21:31 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: LKML, linux-mm, Vivek Kasireddy, Dave Airlie, Gerd Hoffmann,
	Matthew Wilcox, Christoph Hellwig, Jason Gunthorpe, Peter Xu,
	Arnd Bergmann, Daniel Vetter, Dongwon Kim, Hugh Dickins,
	Junxiao Chang, Mike Kravetz, Oscar Salvador, linux-stable

On 11/5/24 12:47 AM, David Hildenbrand wrote:
> On 05.11.24 04:29, John Hubbard wrote:
...
>>       return ret;
>>   }
>> +static long
>> +check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
>> +{
>> +    LIST_HEAD(movable_folio_list);
>> +    unsigned long collected;
>> +
>> +    collected =
>> +        collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> 
> Nit: We're allowed to use more than 80 characters :)

Yes, I'm aware. Most of gup.c stays nicely within 80 cols, so it's nice
to keep that if not too awkward....

> 
> (I would prefer the old way it was split across more lines if we really want to split; this way here is less common)
> 

OK, if I need to respin I'll apply this, to restore the old way. If
not, maybe Andrew can fix it up please?

diff --git a/mm/gup.c b/mm/gup.c
index 0f5121ad0b9f..0a22f7def83c 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2435,8 +2435,8 @@ check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
         LIST_HEAD(movable_folio_list);
         unsigned long collected;
  
-       collected =
-               collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
+       collected = collect_longterm_unpinnable_folios(&movable_folio_list,
+                                                      pofs);
         if (!collected)
                 return 0;


> 
> Thanks!
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> 

Appreciate the ack, and of course the idea and implementation as well! :)

thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] [PATCH] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
  2024-11-05  3:29 ` [PATCH v2 1/1] [PATCH] " John Hubbard
  2024-11-05  8:47   ` David Hildenbrand
@ 2024-11-06  9:23   ` Oscar Salvador
  2024-11-07  4:20     ` John Hubbard
  1 sibling, 1 reply; 8+ messages in thread
From: Oscar Salvador @ 2024-11-06  9:23 UTC (permalink / raw)
  To: John Hubbard
  Cc: Andrew Morton, LKML, linux-mm, David Hildenbrand,
	Vivek Kasireddy, Dave Airlie, Gerd Hoffmann, Matthew Wilcox,
	Christoph Hellwig, Jason Gunthorpe, Peter Xu, Arnd Bergmann,
	Daniel Vetter, Dongwon Kim, Hugh Dickins, Junxiao Chang,
	Mike Kravetz, linux-stable

On Mon, Nov 04, 2024 at 07:29:44PM -0800, John Hubbard wrote:
> commit 53ba78de064b ("mm/gup: introduce
> check_and_migrate_movable_folios()") created a new constraint on the
> pin_user_pages*() API family: a potentially large internal allocation
> must now occur, for FOLL_LONGTERM cases.
> 
> A user-visible consequence has now appeared: user space can no longer
> pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB
> PAGE_SIZE system, when user space tries to (indirectly, via a device
> driver that calls pin_user_pages()) pin 2GB, this requires an allocation
> of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for
> kmalloc().
> 
> In addition to the directly visible effect described above, there is
> also the problem of adding an unnecessary allocation. The **pages array
> argument has already been allocated, and there is no need for a
> redundant **folios array allocation in this case.
> 
> Fix this by avoiding the new allocation entirely. This is done by
> referring to either the original page[i] within **pages, or to the
> associated folio. Thanks to David Hildenbrand for suggesting this
> approach and for providing the initial implementation (which I've tested
> and adjusted slightly) as well.
> 
> Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
> Cc: Dave Airlie <airlied@redhat.com>
> Cc: Gerd Hoffmann <kraxel@redhat.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dongwon Kim <dongwon.kim@intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Junxiao Chang <junxiao.chang@intel.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Oscar Salvador <osalvador@suse.de>
> Cc: linux-stable@vger.kernel.org
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>

Hi John, thanks for doing this.

Reviewed-by: Oscar Salvador <osalvador@suse.de>

Nit below:

> +static int
> +migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
> +				   struct pages_or_folios *pofs)
>  {
>  	int ret;
>  	unsigned long i;
>  
> -	for (i = 0; i < nr_folios; i++) {
> -		struct folio *folio = folios[i];
> +	for (i = 0; i < pofs->nr_entries; i++) {
> +		struct folio *folio = pofs_get_folio(pofs, i);
>  
>  		if (folio_is_device_coherent(folio)) {
>  			/*
> @@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_folios(
>  			 * convert the pin on the source folio to a normal
>  			 * reference.
>  			 */
> -			folios[i] = NULL;
> +			pofs_clear_entry(pofs, i);
>  			folio_get(folio);
>  			gup_put_folio(folio, 1, FOLL_PIN);
>  
> @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_folios(
>  		 * calling folio_isolate_lru() which takes a reference so the
>  		 * folio won't be freed if it's migrating.
>  		 */
> -		unpin_folio(folios[i]);
> -		folios[i] = NULL;
> +		unpin_folio(pofs_get_folio(pofs, i));

We already retrieved the folio before, cannot we just bypass
pofs_get_folio() here?


-- 
Oscar Salvador
SUSE Labs


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 1/1] [PATCH] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
  2024-11-06  9:23   ` Oscar Salvador
@ 2024-11-07  4:20     ` John Hubbard
  0 siblings, 0 replies; 8+ messages in thread
From: John Hubbard @ 2024-11-07  4:20 UTC (permalink / raw)
  To: Oscar Salvador
  Cc: Andrew Morton, LKML, linux-mm, David Hildenbrand,
	Vivek Kasireddy, Dave Airlie, Gerd Hoffmann, Matthew Wilcox,
	Christoph Hellwig, Jason Gunthorpe, Peter Xu, Arnd Bergmann,
	Daniel Vetter, Dongwon Kim, Hugh Dickins, Junxiao Chang,
	Mike Kravetz, linux-stable

On 11/6/24 1:23 AM, Oscar Salvador wrote:
> On Mon, Nov 04, 2024 at 07:29:44PM -0800, John Hubbard wrote:
...
> Hi John, thanks for doing this.
> 
> Reviewed-by: Oscar Salvador <osalvador@suse.de>

Thanks for the review!

> 
> Nit below:
> 
...
>> @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_folios(
>>   		 * calling folio_isolate_lru() which takes a reference so the
>>   		 * folio won't be freed if it's migrating.
>>   		 */
>> -		unpin_folio(folios[i]);
>> -		folios[i] = NULL;
>> +		unpin_folio(pofs_get_folio(pofs, i));
> 
> We already retrieved the folio before, cannot we just bypass
> pofs_get_folio() here?
> 

Yes, you are right. Andrew, can we please add this fixup on top of the commits
in today's mm-hotfixes-unstable (or, let me know if you'd prefer a v3 instead):


diff --git a/mm/gup.c b/mm/gup.c
index 0a22f7def83c..ad0c8922dac3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2399,7 +2399,7 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list,
  		 * calling folio_isolate_lru() which takes a reference so the
  		 * folio won't be freed if it's migrating.
  		 */
-		unpin_folio(pofs_get_folio(pofs, i));
+		unpin_folio(folio);
  		pofs_clear_entry(pofs, i);
  	}
  


thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
  2024-11-05  8:42 ` [PATCH v2 0/1] " David Hildenbrand
@ 2024-11-07  4:57   ` John Hubbard
  0 siblings, 0 replies; 8+ messages in thread
From: John Hubbard @ 2024-11-07  4:57 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: LKML, linux-mm, Vivek Kasireddy, Dave Airlie, Gerd Hoffmann,
	Matthew Wilcox, Christoph Hellwig, Jason Gunthorpe, Peter Xu,
	Arnd Bergmann, Daniel Vetter, Dongwon Kim, Hugh Dickins,
	Junxiao Chang, Mike Kravetz, Oscar Salvador, linux-stable

On 11/5/24 12:42 AM, David Hildenbrand wrote:
> On 05.11.24 04:29, John Hubbard wrote:
...
> Yeah, I was only adding it because I stumbled over it. It might not be a problem, because we simply "skip" if we find a folio that was already isolated (possibly by us). What might happen is that we unnecessarily drain the LRU.
> 
> __collapse_huge_page_isolate() scans the compound_pagelist() list, before try-locking and isolating. But it also just "fails" instead of retrying forever.
> 
> Imagine the page tables looking like the following (e.g., COW in a MAP_PRIVATE file mapping that supports large folios)
> 
>                ------ F0P2 was replaced by a new (small) folio
>               |
> [ F0P0 ] [ F0P1 ] [ F1P0 ] [F0P3 ]
> 
> F0P0: Folio 0, page 0
> 
> Assume we try pinning that range and end up in collect_longterm_unpinnable_folios() with:
> 
> F0, F0, F1, F0
> 
> 
> Assume F0 and F1 are not long-term pinnable.
> 
> i = 0: We isolate F0
> i = 1: We see that it is the same F0 and skip
> i = 2: We isolate F1
> i = 3: We see !folio_test_lru() and do a lru_add_drain_all() to then
>         fail folio_isolate_lru()
> 
> So the drain in i=3 could be avoided by scanning the list, if we already isolated that one. Working better than I originally thought.

Thanks for spelling out that case, I was having trouble visualizing it,
but now it's clear.

OK, so looking at this, I think it could be extended to more than just
"skip the drain". It seems like we should also avoid counting the folio
(the existing code seems wrong).

So I think this approach would be correct, does it seem accurate to
you as well? Here:

diff --git a/mm/gup.c b/mm/gup.c
index ad0c8922dac3..ab8e706b52f0 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2324,11 +2324,21 @@ static unsigned long collect_longterm_unpinnable_folios(
  
  	for (i = 0; i < pofs->nr_entries; i++) {
  		struct folio *folio = pofs_get_folio(pofs, i);
+		struct folio *tmp_folio;
  
+		/*
+		 * Two checks to see if this folio has already been collected.
+		 * The first check is quick, and the second check is thorough.
+		 */
  		if (folio == prev_folio)
  			continue;
  		prev_folio = folio;
  
+		list_for_each_entry(tmp_folio, movable_folio_list, lru) {
+			if (folio == tmp_folio)
+				continue;
+		}
+
  		if (folio_is_longterm_pinnable(folio))
  			continue;



I need to test this more thoroughly, though, with a directed gup test (I'm not sure we
have one yet).
  

thanks,
-- 
John Hubbard



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-11-07  4:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-11-05  3:29 [PATCH v2 0/1] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases John Hubbard
2024-11-05  3:29 ` [PATCH v2 1/1] [PATCH] " John Hubbard
2024-11-05  8:47   ` David Hildenbrand
2024-11-05 21:31     ` John Hubbard
2024-11-06  9:23   ` Oscar Salvador
2024-11-07  4:20     ` John Hubbard
2024-11-05  8:42 ` [PATCH v2 0/1] " David Hildenbrand
2024-11-07  4:57   ` John Hubbard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox