From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 121A4E77184 for ; Fri, 20 Dec 2024 02:31:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 99CE66B0089; Thu, 19 Dec 2024 21:31:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 94C0B6B008A; Thu, 19 Dec 2024 21:31:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8140C6B008C; Thu, 19 Dec 2024 21:31:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 635D76B0089 for ; Thu, 19 Dec 2024 21:31:49 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C88261603F4 for ; Fri, 20 Dec 2024 02:31:48 +0000 (UTC) X-FDA: 82913761602.29.5214A55 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by imf20.hostedemail.com (Postfix) with ESMTP id AFAD91C0003 for ; Fri, 20 Dec 2024 02:31:10 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=unJVI+sQ; spf=pass (imf20.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734661891; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pjgZ1LddfSjIodmv3uycM/tHdI/M/VxL566nKttvgxM=; b=z/HCbH9IJim2ZZGpDF6IU0XtMTcoc5DoHfxSQSzJHzY48bT8lQyO2923O/lW865RM40vGg fD6KHkj3zvKJrnwchoeyPR+C0xNFbC4DecjWJZiHoHJhn05I0uwTa/JcHcreNjWLIYnKUX 6PzjHfZlZIPXBuRki239N9uTnQOrymM= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=unJVI+sQ; spf=pass (imf20.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.133 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734661891; a=rsa-sha256; cv=none; b=abCM2LWg9fhVAwL40FMPZvvJ0Z7z99NooCAdCENWR/Njqwkm+2i9VFqEX4PZoAPBWVRSnr rg5ZuXzCrnuYfY2ADr7wGKcqZRg6xM+WcS4Nh9bP1XS3j27Pg0f0K66iBakw52S5QoGsj0 A+8aJdyXDLf5EZfJqG6i9ow24qeqOaU= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1734661902; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=pjgZ1LddfSjIodmv3uycM/tHdI/M/VxL566nKttvgxM=; b=unJVI+sQdJIXTDbygtPTNIYAjxonnHCrn1I7P/0t9fZZ1I+tohG6PXMAfG8FpOqtff9tyu0A3U1mqWhOiubf2yVawpbwTTuwN+bWseG7/SDoZTWqehTgsMd94jGxUOmW6GR2CN/gaf5sHb4J91yokRdcUVQIJypX1TkDgzVNtwI= Received: from 30.74.144.151(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WLs2SNR_1734661900 cluster:ay36) by smtp.aliyun-inc.com; Fri, 20 Dec 2024 10:31:41 +0800 Message-ID: Date: Fri, 20 Dec 2024 10:31:39 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: migration :shared anonymous migration test is failing To: Donet Tom , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Ritesh Harjani , "Aneesh Kumar K . V" , Zi Yan , David Hildenbrand , shuah Khan , Dev Jain References: <20241219124717.4907-1-donettom@linux.ibm.com> From: Baolin Wang In-Reply-To: <20241219124717.4907-1-donettom@linux.ibm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: AFAD91C0003 X-Stat-Signature: tfc5gmt43symxu3wt15u4eeutqw9fctq X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1734661870-800969 X-HE-Meta: U2FsdGVkX18ZCKpfA3xtbQENizO0hpZdxGEohNCtETztcTEzMc/4XlGsE5E3d/8QUKwLVOrTd8tC091vO6t06smAehPmbPRwkID+FBJyZ0qPrT54HdZNLCr/bNb8YWW9zrYxpekbWEla/+2Ldk0me0LFfnftSBoYJHdKT7tviMYEg5NJBTE0sc81+zWxEF/lFG36H0UAB/o7MQZTtpEKvCaY4SFnYqOHd8yW146ZKmFJBMZbIxCysV750dqsUUVs0rb0EIyZK1/4PgTA4ysZrrxxWDSSCKfL0+0xzKMfVCYfqPA3+VBIR2T90k4tQNpm7RixSBU3pXXlbVHXf3igX+6sDPgPrTEYkzsbdeFd/a+LcqrARd6cHJhxCk6r4MtTzqTW6AwNGuUX172DiwHe5Ki9Afa5bC0eKGkkQPU77npd9/mofHz5LaWSz5i8c/9Eb2cLlWXxLw7Yd/mQXxWwkzZ03KoJ8KWKKyns+KG02yu/FqlsvO9H57QvJfSXW9zEQRc/OWp9anZT4J7bESSDYAItFZFRS6QRH/E18X1FmsXxEO5bpZfza7iahN70Gs37UIA7B33fHRfhqTLcOiLGJbXeJ90IaLSUgYn2suxC2qezXF7C5dvf0UKabjla8b/zlrZHGHoCEflmn1qWyYlhshO8pY2Z33XKkNuROREcobPgDZNA2S1pLr4GWm3tDfomXcCvU1rFv3NFSQrkiyGFThiracVwu5L0y6a3ZUMKGelLYqNdfA2g2ocaajn1dPp+fp310nz7HbjLnLFSEEJFSjYRJtT+OOredV+VmdDheqL5HKs3X3Weh7OBDZMqi9/2LIp9rTzOW3npvwjkMUBzpb3fH7yA50cu0p2YDVOf9BFh5hCLUJ04BU/e+jYZdtjLbAYzeoEutTfZ8N4frfjj08PSD0o2bZ5WCQJk9Yl0ZwskyJMlHNWS3qMyvUhUL7CK8Jk5keP1MeoXcUJ8y8j XFcgiw8r chtLsOJiyWrafMyuJgK39GVOQlYDWEOS6WajLoBZ1kEzCT1KmaeGj+Ha1KyY36iDLm/oAnCYsOZF8ksDxrdot2j4OSSkDTn6aRl/o/JsiVsfAXA9wCVyp5tMVYZy3WIjXzsauk7THc64IGo4qQFcid+UI5KOASDQ9Cbpy1ZB6dFQ33FlUCMOcDEwtKRswIsPoqAr8pXw8EOsTLrMl70ofIm28efKpyr4QPTOo0Pz7KJDKH2UWGMmi1ZO7cynuuMmCZJP0fFMe6EueBtdXAQV7xdEbD321i0UPvL5hjMMXrtELEVK6NTaOCA3lhlZClvpbnoVw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000006, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/12/19 20:47, Donet Tom wrote: > The migration selftest is currently failing for shared anonymous > mappings due to a race condition. > > During migration, the source folio's PTE is unmapped by nuking the > PTE, flushing the TLB,and then marking the page for migration > (by creating the swap entries). The issue arises when, immediately > after the PTE is nuked and the TLB is flushed, but before the page > is marked for migration, another thread accesses the page. This > triggers a page fault, and the page fault handler invokes > do_pte_missing() instead of do_swap_page(), as the page is not yet > marked for migration. > > In the fault handling path, do_pte_missing() calls __do_fault() > ->shmem_fault() -> shmem_get_folio_gfp() -> filemap_get_entry(). > This eventually calls folio_try_get(), incrementing the reference > count of the folio undergoing migration. The thread then blocks > on folio_lock(), as the migration path holds the lock. This > results in the migration failing in __migrate_folio(), which expects > the folio's reference count to be 2. However, the reference count is > incremented by the fault handler, leading to the failure. > > The issue arises because, after nuking the PTE and before marking the > page for migration, the page is accessed. To address this, we have > updated the logic to first nuke the PTE, then mark the page for > migration, and only then flush the TLB. With this patch, If the page is > accessed immediately after nuking the PTE, the TLB entry is still > valid, so no fault occurs. After marking the page for migration, IMO, I don't think this assumption is correct. At this point, the TLB entry might also be evicted, so a page fault could still occur. It's just a matter of probability. Additionally, IIUC, if another thread is accessing the shmem folio causing the migration to fail, I think this is expected, and migration failure is not a vital issue?