linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>, Tony Luck <tony.luck@intel.com>,
	Miaohe Lin <linmiaohe@huawei.com>, <nao.horiguchi@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Muchun Song <muchun.song@linux.dev>,
	Benjamin LaHaise <bcrl@kvack.org>, <jglisse@redhat.com>,
	Zi Yan <ziy@nvidia.com>, Jiaqi Yan <jiaqiyan@google.com>,
	Vishal Moola <vishal.moola@gmail.com>,
	Alistair Popple <apopple@nvidia.com>,
	Jane Chu <jane.chu@oracle.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: Re: [PATCH mm-unstable] mm: migrate: folio_ref_freeze() under xas_lock_irq()
Date: Tue, 25 Jun 2024 17:01:16 +0800	[thread overview]
Message-ID: <de1e4022-476c-44d4-aef2-e0bcb97e9c0a@huawei.com> (raw)
In-Reply-To: <07edaae7-ea5d-b6ae-3a10-f611946f9688@google.com>



On 2024/6/25 13:04, Hugh Dickins wrote:
> Commit "mm: migrate: split folio_migrate_mapping()" drew attention to
> "Note, the folio_ref_freeze() is moved out of xas_lock_irq(), Since the
> folio is already isolated and locked during migration, so suppose that
> there is no functional change."
> 
> That was a mistake. Freezing a folio's refcount to 0 is much like taking
> a spinlock: see how filemap_get_entry() takes rcu_read_lock() then spins
> around until the folio is unfrozen. If the task freezing is preempted (or
> calls cond_resched(), as folio_mc_copy() may do), then it risks deadlock:
> in my case, one CPU in zap_pte_range() with free_swap_and_cache_nr()
> trying to reclaim swap while PTL is held, all the other CPUs in reclaim
> spinning for that PTL.

Oh, thanks for pointing this out, I didn't take that into account.
> 
> I'm uncertain whether it's necessary for interrupts to be disabled as
> well as preemption, but since they have to be disabled for the page
> cache migration, it's much the best to do it all together as before.
> So revert to folio_ref_freeze() under xas_lock_irq(): but keep the
> preliminary folio_ref_count() check, which does make sense before
> trying to copy the folio's data.

This is what my RFC version[1] does, which adds same reference check to
avoid the unnecessary folio_mc_copy().

Should I resend all patches, or Andrew directly pick this one?


Thanks.

[1] 
https://lore.kernel.org/linux-mm/20240129070934.3717659-7-wangkefeng.wang@huawei.com/


> 
> Use "expected_count" for the expected count throughout.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>   mm/migrate.c | 59 +++++++++++++++++++++++++---------------------------
>   1 file changed, 28 insertions(+), 31 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 27f070f64f27..8beedbb42a93 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -400,8 +400,8 @@ static int folio_expected_refs(struct address_space *mapping,
>    * 2 for folios with a mapping
>    * 3 for folios with a mapping and PagePrivate/PagePrivate2 set.
>    */
> -static void __folio_migrate_mapping(struct address_space *mapping,
> -		struct folio *newfolio, struct folio *folio, int expected_cnt)
> +static int __folio_migrate_mapping(struct address_space *mapping,
> +		struct folio *newfolio, struct folio *folio, int expected_count)

We can rename it back to folio_migrate_mapping().

>   {
>   	XA_STATE(xas, &mapping->i_pages, folio_index(folio));
>   	struct zone *oldzone, *newzone;
> @@ -415,13 +415,18 @@ static void __folio_migrate_mapping(struct address_space *mapping,
>   		newfolio->mapping = folio->mapping;
>   		if (folio_test_swapbacked(folio))
>   			__folio_set_swapbacked(newfolio);
> -		return;
> +		return MIGRATEPAGE_SUCCESS;
>   	}
>   
>   	oldzone = folio_zone(folio);
>   	newzone = folio_zone(newfolio);
>   
>   	xas_lock_irq(&xas);
> +	if (!folio_ref_freeze(folio, expected_count)) {
> +		xas_unlock_irq(&xas);
> +		return -EAGAIN;
> +	}
> +
>   	/* Now we know that no one else is looking at the folio */
>   	newfolio->index = folio->index;
>   	newfolio->mapping = folio->mapping;
> @@ -456,7 +461,7 @@ static void __folio_migrate_mapping(struct address_space *mapping,
>   	 * old folio by unfreezing to one less reference.
>   	 * We know this isn't the last reference.
>   	 */
> -	folio_ref_unfreeze(folio, expected_cnt - nr);
> +	folio_ref_unfreeze(folio, expected_count - nr);
>   
>   	xas_unlock(&xas);
>   	/* Leave irq disabled to prevent preemption while updating stats */
> @@ -504,23 +509,19 @@ static void __folio_migrate_mapping(struct address_space *mapping,
>   		}
>   	}
>   	local_irq_enable();
> +
> +	return MIGRATEPAGE_SUCCESS;
>   }
>   
>   int folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio,
>   			  struct folio *folio, int extra_count)
>   {
> -	int expected_cnt = folio_expected_refs(mapping, folio) + extra_count;
> +	int expected_count = folio_expected_refs(mapping, folio) + extra_count;
>   
> -	if (!mapping) {
> -		if (folio_ref_count(folio) != expected_cnt)
> -			return -EAGAIN;
> -	} else {
> -		if (!folio_ref_freeze(folio, expected_cnt))
> -			return -EAGAIN;
> -	}
> +	if (folio_ref_count(folio) != expected_count)
> +		return -EAGAIN;
>   
> -	__folio_migrate_mapping(mapping, newfolio, folio, expected_cnt);
> -	return MIGRATEPAGE_SUCCESS;
> +	return __folio_migrate_mapping(mapping, newfolio, folio, expected_count);
>   }
>   EXPORT_SYMBOL(folio_migrate_mapping);
>   
> @@ -534,16 +535,18 @@ int migrate_huge_page_move_mapping(struct address_space *mapping,
>   	XA_STATE(xas, &mapping->i_pages, folio_index(src));
>   	int ret, expected_count = folio_expected_refs(mapping, src);
>   
> -	if (!folio_ref_freeze(src, expected_count))
> +	if (folio_ref_count(src) != expected_count)
>   		return -EAGAIN;
>   
>   	ret = folio_mc_copy(dst, src);
> -	if (unlikely(ret)) {
> -		folio_ref_unfreeze(src, expected_count);
> +	if (unlikely(ret))
>   		return ret;
> -	}
>   
>   	xas_lock_irq(&xas);
> +	if (!folio_ref_freeze(src, expected_count)) {
> +		xas_unlock_irq(&xas);
> +		return -EAGAIN;
> +	}
>   
>   	dst->index = src->index;
>   	dst->mapping = src->mapping;
> @@ -660,24 +663,18 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
>   			   struct folio *src, void *src_private,
>   			   enum migrate_mode mode)
>   {
> -	int ret, expected_cnt = folio_expected_refs(mapping, src);
> +	int ret, expected_count = folio_expected_refs(mapping, src);
>   
> -	if (!mapping) {
> -		if (folio_ref_count(src) != expected_cnt)
> -			return -EAGAIN;
> -	} else {
> -		if (!folio_ref_freeze(src, expected_cnt))
> -			return -EAGAIN;
> -	}
> +	if (folio_ref_count(src) != expected_count)
> +		return -EAGAIN;
>   
>   	ret = folio_mc_copy(dst, src);
> -	if (unlikely(ret)) {
> -		if (mapping)
> -			folio_ref_unfreeze(src, expected_cnt);
> +	if (unlikely(ret))
>   		return ret;
> -	}
>   
> -	__folio_migrate_mapping(mapping, dst, src, expected_cnt);
> +	ret = __folio_migrate_mapping(mapping, dst, src, expected_count);
> +	if (ret != MIGRATEPAGE_SUCCESS)
> +		return ret;
>   
>   	if (src_private)
>   		folio_attach_private(dst, folio_detach_private(src));


  reply	other threads:[~2024-06-25  9:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-25  5:04 Hugh Dickins
2024-06-25  9:01 ` Kefeng Wang [this message]
2024-06-25 19:23   ` Andrew Morton
2024-06-25 19:51   ` Hugh Dickins
2024-06-26  7:24     ` Kefeng Wang
2024-06-25 20:01 ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=de1e4022-476c-44d4-aef2-e0bcb97e9c0a@huawei.com \
    --to=wangkefeng.wang@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=bcrl@kvack.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jane.chu@oracle.com \
    --cc=jglisse@redhat.com \
    --cc=jiaqiyan@google.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=nao.horiguchi@gmail.com \
    --cc=osalvador@suse.de \
    --cc=tony.luck@intel.com \
    --cc=vishal.moola@gmail.com \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox