From: Donet Tom <donettom@linux.ibm.com>
To: Baolin Wang <baolin.wang@linux.alibaba.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Ritesh Harjani <ritesh.list@gmail.com>,
"Aneesh Kumar K . V" <aneesh.kumar@kernel.org>,
Zi Yan <ziy@nvidia.com>, David Hildenbrand <david@redhat.com>,
shuah Khan <shuah@kernel.org>, Dev Jain <dev.jain@arm.com>
Subject: Re: [PATCH] mm: migration :shared anonymous migration test is failing
Date: Fri, 20 Dec 2024 08:42:45 +0530 [thread overview]
Message-ID: <36f9ab13-e057-40a0-8d0b-9939df056fc6@linux.ibm.com> (raw)
In-Reply-To: <d428013e-6f68-416d-befa-a5a8bab0b566@linux.alibaba.com>
On 12/20/24 08:01, Baolin Wang wrote:
>
>
> On 2024/12/19 20:47, Donet Tom wrote:
>> The migration selftest is currently failing for shared anonymous
>> mappings due to a race condition.
>>
>> During migration, the source folio's PTE is unmapped by nuking the
>> PTE, flushing the TLB,and then marking the page for migration
>> (by creating the swap entries). The issue arises when, immediately
>> after the PTE is nuked and the TLB is flushed, but before the page
>> is marked for migration, another thread accesses the page. This
>> triggers a page fault, and the page fault handler invokes
>> do_pte_missing() instead of do_swap_page(), as the page is not yet
>> marked for migration.
>>
>> In the fault handling path, do_pte_missing() calls __do_fault()
>> ->shmem_fault() -> shmem_get_folio_gfp() -> filemap_get_entry().
>> This eventually calls folio_try_get(), incrementing the reference
>> count of the folio undergoing migration. The thread then blocks
>> on folio_lock(), as the migration path holds the lock. This
>> results in the migration failing in __migrate_folio(), which expects
>> the folio's reference count to be 2. However, the reference count is
>> incremented by the fault handler, leading to the failure.
>>
>> The issue arises because, after nuking the PTE and before marking the
>> page for migration, the page is accessed. To address this, we have
>> updated the logic to first nuke the PTE, then mark the page for
>> migration, and only then flush the TLB. With this patch, If the page is
>> accessed immediately after nuking the PTE, the TLB entry is still
>> valid, so no fault occurs. After marking the page for migration,
>
> IMO, I don't think this assumption is correct. At this point, the TLB
> entry might also be evicted, so a page fault could still occur. It's
> just a matter of probability.
In this patch, we mark the page for migration before flushing the TLB.
This ensures that if someone accesses the page after the TLB flush,
the page fault will occur and in the page fault handler will wait for the
migration to complete. So migration will not fail
Without this patch, if someone accesses the page after the TLB flush
but before it is marked for migration, the migration will fail.
>
> Additionally, IIUC, if another thread is accessing the shmem folio
> causing the migration to fail, I think this is expected, and migration
> failure is not a vital issue?
>
In my case, the shmem migration test is always failing,
even after retries. Would it be correct to consider this
as expected behavior?
next prev parent reply other threads:[~2024-12-20 3:13 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-19 12:47 Donet Tom
2024-12-19 12:55 ` David Hildenbrand
2024-12-20 2:16 ` Donet Tom
2024-12-19 12:58 ` David Hildenbrand
2024-12-20 2:55 ` Donet Tom
2024-12-20 10:11 ` David Hildenbrand
2024-12-23 12:08 ` Donet Tom
2024-12-20 2:31 ` Baolin Wang
2024-12-20 3:12 ` Donet Tom [this message]
2024-12-20 3:32 ` Baolin Wang
2024-12-20 4:30 ` Donet Tom
2024-12-20 4:37 ` Dev Jain
2024-12-23 12:02 ` Donet Tom
2024-12-20 10:05 ` kernel test robot
2024-12-20 10:17 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=36f9ab13-e057-40a0-8d0b-9939df056fc6@linux.ibm.com \
--to=donettom@linux.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ritesh.list@gmail.com \
--cc=shuah@kernel.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox