linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Donet Tom <donettom@linux.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Ritesh Harjani <ritesh.list@gmail.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@kernel.org>,
	Zi Yan <ziy@nvidia.com>, shuah Khan <shuah@kernel.org>,
	Dev Jain <dev.jain@arm.com>
Subject: Re: [PATCH] mm: migration :shared anonymous migration test is failing
Date: Thu, 19 Dec 2024 13:58:06 +0100	[thread overview]
Message-ID: <c65f7e9c-2e57-4fd5-973f-fc546c8c5827@redhat.com> (raw)
In-Reply-To: <20241219124717.4907-1-donettom@linux.ibm.com>

On 19.12.24 13:47, Donet Tom wrote:
> The migration selftest is currently failing for shared anonymous
> mappings due to a race condition.
> 
> During migration, the source folio's PTE is unmapped by nuking the
> PTE, flushing the TLB,and then marking the page for migration
> (by creating the swap entries). The issue arises when, immediately
> after the PTE is nuked and the TLB is flushed, but before the page
> is marked for migration, another thread accesses the page. This
> triggers a page fault, and the page fault handler invokes
> do_pte_missing() instead of do_swap_page(), as the page is not yet
> marked for migration.
> 
> In the fault handling path, do_pte_missing() calls __do_fault()
> ->shmem_fault() -> shmem_get_folio_gfp() -> filemap_get_entry().
> This eventually calls folio_try_get(), incrementing the reference
> count of the folio undergoing migration. The thread then blocks
> on folio_lock(), as the migration path holds the lock. This
> results in the migration failing in __migrate_folio(), which expects
> the folio's reference count to be 2. However, the reference count is
> incremented by the fault handler, leading to the failure.
> 
> The issue arises because, after nuking the PTE and before marking the
> page for migration, the page is accessed. To address this, we have
> updated the logic to first nuke the PTE, then mark the page for
> migration, and only then flush the TLB. With this patch, If the page is
> accessed immediately after nuking the PTE, the TLB entry is still
> valid, so no fault occurs.

But what about if the PTE is not in the TLB yet, and you get an access 
from another CPU just after clearing the PTE (but not flushing the TLB)? 
The other CPU will still observe PTE=none, trigger a fault etc.

So I don't think what you propose rules out all cases.

-- 
Cheers,

David / dhildenb



  parent reply	other threads:[~2024-12-19 12:58 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-19 12:47 Donet Tom
2024-12-19 12:55 ` David Hildenbrand
2024-12-20  2:16   ` Donet Tom
2024-12-19 12:58 ` David Hildenbrand [this message]
2024-12-20  2:55   ` Donet Tom
2024-12-20 10:11     ` David Hildenbrand
2024-12-23 12:08       ` Donet Tom
2024-12-20  2:31 ` Baolin Wang
2024-12-20  3:12   ` Donet Tom
2024-12-20  3:32     ` Baolin Wang
2024-12-20  4:30       ` Donet Tom
2024-12-20  4:37       ` Dev Jain
2024-12-23 12:02         ` Donet Tom
2024-12-20 10:05 ` kernel test robot
2024-12-20 10:17 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c65f7e9c-2e57-4fd5-973f-fc546c8c5827@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=dev.jain@arm.com \
    --cc=donettom@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ritesh.list@gmail.com \
    --cc=shuah@kernel.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox