Re: [PATCH] mm/hmm: Fix a hmm_range_fault() livelock / starvation problem

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: John Hubbard <jhubbard@nvidia.com>,
	Matthew Brost <matthew.brost@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	intel-xe@lists.freedesktop.org,
	 Ralph Campbell <rcampbell@nvidia.com>,
	Christoph Hellwig <hch@lst.de>,
	Jason Gunthorpe <jgg@mellanox.com>,
	 Jason Gunthorpe	 <jgg@ziepe.ca>,
	Leon Romanovsky <leon@kernel.org>,
	linux-mm@kvack.org, 	stable@vger.kernel.org,
	dri-devel@lists.freedesktop.org
Subject: Re: [PATCH] mm/hmm: Fix a hmm_range_fault() livelock / starvation problem
Date: Tue, 03 Feb 2026 10:31:04 +0100	[thread overview]
Message-ID: <9a9853a320a30802ff35803a574aab037aa2fd92.camel@linux.intel.com> (raw)
In-Reply-To: <a5b71dbc-9e3a-4098-8821-21a9a02ec235@nvidia.com>


On Mon, 2026-02-02 at 14:28 -0800, John Hubbard wrote:
> On 2/2/26 1:13 AM, Thomas Hellström wrote:
> > On Sat, 2026-01-31 at 13:42 -0800, John Hubbard wrote:
> > > On 1/31/26 11:00 AM, Matthew Brost wrote:
> > > > On Sat, Jan 31, 2026 at 01:57:21PM +0100, Thomas Hellström
> > > > wrote:
> > > > > On Fri, 2026-01-30 at 19:01 -0800, John Hubbard wrote:
> > > > > > On 1/30/26 10:00 AM, Andrew Morton wrote:
> > > > > > > On Fri, 30 Jan 2026 15:45:29 +0100 Thomas Hellström
> > > > > > > <thomas.hellstrom@linux.intel.com> wrote:
> > > > > > ...
> > > 
> > > > 
> > > > > I'm also not sure a folio refcount should block migration
> > > > > after
> > > > > the
> > > > > introduction of pinned (like in pin_user_pages) pages. Rather
> > > > > perhaps a
> > > > > folio pin-count should block migration and in that case
> > > > > do_swap_page()
> > > > > can definitely do a sleeping folio lock and the problem is
> > > > > gone.
> > > 
> > > A problem for that specific point is that pincount and refcount
> > > both
> > > mean, "the page is pinned" (which in turn literally means "not
> > > allowed
> > > to migrate/move").
> > 
> > Yeah this is what I actually want to challenge since this is what
> > blocks us from doing a clean robust solution here. From brief
> > reading
> > of the docs around the pin-count implementation, I understand it as
> > "If
> > you want to access the struct page metadata, get a refcount, If you
> > want to access the actual memory of a page, take a pin-count"
> > 
> > I guess that might still not be true for all old instances in the
> > kernel using get_user_pages() instead of pin_user_pages() for
> > things
> > like DMA, but perhaps we can set that in stone and document it at
> > least
> > for device-private pages for now which would be sufficient for the
> > do_swap_pages() refcount not to block migration.
> > 
> 
> It's an interesting direction to go...
> 
> > 
> > > 
> > > (In fact, pincount is implemented in terms of refcount, in most
> > > configurations still.)
> > 
> > Yes but that's only a space optimization never intended to
> > conflict,
> > right? Meaning a pin-count will imply a refcount but a refcount
> > will
> > never imply a pin-count?
> > 
> Unfortunately, they are more tightly linked than that today, at least
> until
> someday when specialized folios are everywhere (at which point
> pincount
> gets its own field).
> 
> Until then, it's not just a "space optimization", it's "overload
> refcount
> to also do pincounting". And "let core mm continue to treat refcounts
> as
> meaning that the page is pinned".

So this is what I had in mind:

I think certainly this would work regardless of whether pincount is
implemented by means of refcount with a bias or not, and AFAICT it's
also consistent with 

https://docs.kernel.org/core-api/pin_user_pages.html

But it would not work if some part of core mm grabs a page refcount and
*expects* that to pin a page in the sense that it should not be
migrated. But you're suggesting that's actually the case?

Thanks,
Thomas

diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index a101a187e6da..c07a79995128 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -534,33 +534,15 @@ static void migrate_vma_collect(struct
migrate_vma *migrate)
  * migrate_vma_check_page() - check if page is pinned or not
  * @page: struct page to check
  *
- * Pinned pages cannot be migrated. This is the same test as in
- * folio_migrate_mapping(), except that here we allow migration of a
- * ZONE_DEVICE page.
+ * Pinned pages cannot be migrated.
  */
 static bool migrate_vma_check_page(struct page *page, struct page
*fault_page)
 {
        struct folio *folio = page_folio(page);
 
-       /*
-        * One extra ref because caller holds an extra reference,
either from
-        * folio_isolate_lru() for a regular folio, or
migrate_vma_collect() for
-        * a device folio.
-        */
-       int extra = 1 + (page == fault_page);
-
-       /* Page from ZONE_DEVICE have one extra reference */
-       if (folio_is_zone_device(folio))
-               extra++;
-
-       /* For file back page */
-       if (folio_mapping(folio))
-               extra += 1 + folio_has_private(folio);
-
-       if ((folio_ref_count(folio) - extra) > folio_mapcount(folio))
-               return false;
+       VM_WARN_ON_FOLIO(folio_test_lru(folio) || folio_mapped(folio),
folio);
 
-       return true;
+       return !folio_maybe_dma_pinned(folio);
 }
 



> 
> 
> thanks,

next prev parent reply	other threads:[~2026-02-03  9:31 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-30 14:45 Thomas Hellström
2026-01-30 18:00 ` Andrew Morton
2026-01-30 19:56   ` Thomas Hellström
2026-01-30 20:38     ` Andrew Morton
2026-01-30 21:01       ` Matthew Brost
2026-01-30 21:08         ` Andrew Morton
2026-01-31  0:59           ` Matthew Brost
2026-01-31  3:01   ` John Hubbard
2026-01-31 12:57     ` Thomas Hellström
2026-01-31 19:00       ` Matthew Brost
2026-01-31 21:42         ` John Hubbard
2026-02-01 19:24           ` Matthew Brost
2026-02-01 20:48             ` John Hubbard
2026-02-01 21:07               ` Matthew Brost
2026-02-02  0:10                 ` Alistair Popple
2026-02-02  9:30                   ` Thomas Hellström
2026-02-02 10:25                     ` Alistair Popple
2026-02-02 10:41                       ` Thomas Hellström
2026-02-02 11:22                         ` Alistair Popple
2026-02-02 11:44                           ` Thomas Hellström
2026-02-02 12:26                             ` Alistair Popple
2026-02-02 14:07                               ` Thomas Hellström
2026-02-02 23:13                                 ` Alistair Popple
2026-02-02  9:13           ` Thomas Hellström
2026-02-02 10:34             ` Alistair Popple
2026-02-02 10:51               ` Thomas Hellström
2026-02-02 11:28                 ` Alistair Popple
2026-02-02 22:28             ` John Hubbard
2026-02-03  9:31               ` Thomas Hellström [this message]
2026-02-04  1:13                 ` pincount vs refcount: " John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9a9853a320a30802ff35803a574aab037aa2fd92.camel@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jgg@mellanox.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=leon@kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    --cc=rcampbell@nvidia.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox