From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 757D2E77184 for ; Fri, 20 Dec 2024 02:16:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D84166B007B; Thu, 19 Dec 2024 21:16:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D32D76B0083; Thu, 19 Dec 2024 21:16:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD3596B0085; Thu, 19 Dec 2024 21:16:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 98C776B007B for ; Thu, 19 Dec 2024 21:16:35 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 433E41A063F for ; Fri, 20 Dec 2024 02:16:35 +0000 (UTC) X-FDA: 82913722416.21.5F8DBC7 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf15.hostedemail.com (Postfix) with ESMTP id 94A30A000F for ; Fri, 20 Dec 2024 02:15:41 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=UGES5jAd; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf15.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734660977; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sZWmln40kCt4oH/1YD56SSyLeoWGH/iTnwRM/Ydg3r4=; b=OKd5/UgdCKTD8ILhfa2Z581G2C8uHYDkU+gXIb+6cj7F8RHSGcyay+dZwF+DhYKKrfeLmo k/85QUpZHhxszMYiLwED2sQgbxCKGQZeNQCOmxg+xBpuxmtCMiAgY6f7SMzfPMj8vNCAG/ RTq3KGOCofV/PgmTHDLOCHCW5Wyi1L8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734660977; a=rsa-sha256; cv=none; b=hWbZLb9D21edaOVtYeOL9weX1N+BiTJ7e4Yx+MKmENnEqmRagXYnCAb8HudWPeRtxr/2WE zuRWj2Da14iRzItj2eCqLcphFND36gTNzx74AZqXPDNv4849ybuZ2RPxwsVCvBsD6TtXAa ae0Dx3oupfcIIAqIwrJUukIgGJPC4t4= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=UGES5jAd; dmarc=pass (policy=none) header.from=ibm.com; spf=pass (imf15.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4BK108Yi026671; Fri, 20 Dec 2024 02:16:29 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pp1; bh=sZWmln 40kCt4oH/1YD56SSyLeoWGH/iTnwRM/Ydg3r4=; b=UGES5jAdd53ljgMY1XJeGE SsTNBiArJLl8UKHYqk+rRfO5tVCaSuadtGOVyrwDF++bMjD9h/EXbuquzeq6cST1 ro3tco8BioVzBNJ3m8ptYyRGklAkjg0RqgzUIxtfuqtHaPrGUqSUG7yMaziMn+7x uD0zJY4jljmMecFhUBqGaYxHbVfI8F1QN4q44HeZnPNMGc0UuKUnrFK/qT87/keO 2+M5Y6bVvZnltZ2jrqGtC9eC/5+76fm6WcPu4FXMOwzCRHbt8uXAR/QOUhihyS/J HJV/y8MtQtKzH2kqxeTba0DADnzIFdLK25gB+P4BD3Dy3fduT6IIEm3SiTmjKGpg == Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 43mmy5aygn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Dec 2024 02:16:29 +0000 (GMT) Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 4BK2GSCS020444; Fri, 20 Dec 2024 02:16:28 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 43mmy5aygj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Dec 2024 02:16:28 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 4BK1vDlp014451; Fri, 20 Dec 2024 02:16:27 GMT Received: from smtprelay02.dal12v.mail.ibm.com ([172.16.1.4]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 43hq21yr64-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 20 Dec 2024 02:16:27 +0000 Received: from smtpav01.wdc07v.mail.ibm.com (smtpav01.wdc07v.mail.ibm.com [10.39.53.228]) by smtprelay02.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4BK2GR9I30671462 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 20 Dec 2024 02:16:27 GMT Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E85AE5805B; Fri, 20 Dec 2024 02:16:26 +0000 (GMT) Received: from smtpav01.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CFC275804B; Fri, 20 Dec 2024 02:16:22 +0000 (GMT) Received: from [9.179.0.110] (unknown [9.179.0.110]) by smtpav01.wdc07v.mail.ibm.com (Postfix) with ESMTP; Fri, 20 Dec 2024 02:16:22 +0000 (GMT) Message-ID: <4ac9f502-ecdf-440c-9797-a6318c92b882@linux.ibm.com> Date: Fri, 20 Dec 2024 07:46:20 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: migration :shared anonymous migration test is failing To: David Hildenbrand , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Ritesh Harjani , Baolin Wang , "Aneesh Kumar K . V" , Zi Yan , shuah Khan , Dev Jain References: <20241219124717.4907-1-donettom@linux.ibm.com> <3c1665df-9367-4d43-8aa1-6726fbb59640@redhat.com> Content-Language: en-US From: Donet Tom In-Reply-To: <3c1665df-9367-4d43-8aa1-6726fbb59640@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: hQ2FmKd1f69A9Rd1JytScFI91HchQvJa X-Proofpoint-GUID: f_5JfQKTFpIwhNh-gqf4HFh-GW8VPEal X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 phishscore=0 adultscore=0 malwarescore=0 priorityscore=1501 impostorscore=0 clxscore=1015 mlxlogscore=927 spamscore=0 lowpriorityscore=0 suspectscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2411120000 definitions=main-2412200015 X-Stat-Signature: 9y9q5ysti6rjbnt3gf596n6ckbaqhkgs X-Rspamd-Queue-Id: 94A30A000F X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1734660941-182353 X-HE-Meta: U2FsdGVkX1/i0t1tLrvobywrdwq/fJ0ZA2m1KMjQ5ngx/tOa42bY3A2Zz7eGbEJAgIbqHx0Xiobol4Lt1XpLPmG2i+FVhY4BBvB5uOjQsTjUM/SdWHxsDA5gPEkauCsf6UnAs1lGlMpLCelKPZuNNRfFqVqvDsoZx6zN5IWfeVSuNlJu6YF/36wwY26x+vH7wXID54q949IxGw5+190hk3DeFX4E5ZQRVI1j1CZO6vMaaNYF2rKTdP9F0EN0irKfzrTYAbBOeIuuyhoCoQAu0zzADkTlZRzNLDpKU3HkQD3sDZOEgFn8FWCmsVuR2e6CoF6HhtmP8HXYcgdr3GooRfsgOLZuxKkYWPEbE7offzD/zoFIBc12jPltXVhIEmO56aw0Fh2NDtC49paRuXRsjtMkR4lhKG+VxVtHx1126m8r0TaBLANPNAxQ2mFBj8LrkhVf97nO/0ALGQOo6Z7Z1vHndkrF8YDhk+oyWha/Bp5SDLtqd0GfUqh/brlu241zsUh9PuwnfDlIpasny1++2i5090v0m9xKsqkm3MSV3q6vv5uRH2WewRMl7gQVFicVw9EArOtTeRNx1F0S1aPSifWnnkNezNUriO9iJL1Lx5Hk8hb/xU4DphFa8o/8hxCOkkgEZgcGvbrrgE2b1onWcbg99S2Go/KQY+k+e1FFB2tH0NnSRbJ5Dj+xs+Zv5XfaN2jgHbwOmH+osfP3p7CtoF2oXQQgSbQUiDfoc/dBTvafKJsENXtbqdEE2nnjyCt1q8A/3RfFe4IsgTOW33i608HEdPYXd6iMF4vi73WFU2EqBJnrKuWe0TRR6UJ6T7DAgiDIzIrwKD8AJZSUwdJE7CJ49RRy4XLmW/fALl9WUoJ+qRcfmYR+pFOrKEWPESgLNg/WRzPECJbSAu3ubGNOH1wYtp5ZSlAmy5/uYwh+CHQIJAh6rlWidntSB0tqpTXl3Kby6Uly1B2z0cbSSv6 Rl/+fFWq VtrIaHomCo8AQoXNfNeDUyHs5WaC5B8o5Y4a4JiC6Fhu6CdUImHmk2sQjmtD748YtN/BFp9W3R85EoihQnPDnuqOfXyjLTZmfzmZ0Huldy1wAHN9phHB/q1SJ3kMphflhqd7rzZiCPAakzcklTOimpnrHJ2Sw/DmIVziROhLZUxCiOPNL282iWX4LBD79hlSZJn9gXXnzsynlYy2vXRkiqOuEymIuE8ICy19rh+31baMisg16bhQqYPxq9OwopM9/QCrqo7zfXX9HggVXFCQQFvpnrDqI1RgPB+RS+FnMI7IbdTr/XCVrAwKNHGXLEVfY/tEUWsVzKVNLVQwHTAKyIYxCBGt+nKHFLJ91JhQdn6eD1rLrEXfmcDwVKVVQ2m1DIoi/+Y5R5MkBszruHv5gBioo0JWr8/RJnsHLb1pA/bf1SPH4FenF1ZdCyZDBwsIT+yb2S9wsZpYTTLOTliRu7OW2nV5UeqMisjmP5isRKRbCt7lwGl14A6e4pQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 12/19/24 18:25, David Hildenbrand wrote: > On 19.12.24 13:47, Donet Tom wrote: >> The migration selftest is currently failing for shared anonymous >> mappings due to a race condition. >> >> During migration, the source folio's PTE is unmapped by nuking the >> PTE, flushing the TLB,and then marking the page for migration >> (by creating the swap entries). The issue arises when, immediately >> after the PTE is nuked and the TLB is flushed, but before the page >> is marked for migration, another thread accesses the page. This >> triggers a page fault, and the page fault handler invokes >> do_pte_missing() instead of do_swap_page(), as the page is not yet >> marked for migration. >> >> In the fault handling path, do_pte_missing() calls __do_fault() >> ->shmem_fault() -> shmem_get_folio_gfp() -> filemap_get_entry(). >> This eventually calls folio_try_get(), incrementing the reference >> count of the folio undergoing migration. The thread then blocks >> on folio_lock(), as the migration path holds the lock. This >> results in the migration failing in __migrate_folio(), which expects >> the folio's reference count to be 2. However, the reference count is >> incremented by the fault handler, leading to the failure. >> >> The issue arises because, after nuking the PTE and before marking the >> page for migration, the page is accessed. To address this, we have >> updated the logic to first nuke the PTE, then mark the page for >> migration, and only then flush the TLB. With this patch, If the page is >> accessed immediately after nuking the PTE, the TLB entry is still >> valid, so no fault occurs. After marking the page for migration, >> flushing the TLB ensures that the next page fault correctly triggers >> do_swap_page() and waits for the migration to complete. >> > > Does this reproduce with > > commit 536ab838a5b37b6ae3f8d53552560b7c51daeb41 > Author: Dev Jain > Date:   Fri Aug 30 10:46:09 2024 +0530 > >     selftests/mm: relax test to fail after 100 migration failures >         It was recently observed at [1] that during the folio > unmapping stage of >     migration, when the PTEs are cleared, a racing thread faulting on > that >     folio may increase the refcount of the folio, sleep on the folio > lock (the >     migration path has the lock), and migration ultimately fails when >     asserting the actual refcount against the expected.  Thereby, the >     migration selftest fails on shared-anon mappings.  The above > enforces the >     fact that migration is a best-effort service, therefore, it is > wrong to >     fail the test for just a single failure; hence, fail the test > after 100 >     consecutive failures (where 100 is still a subjective choice).  > Note that, >     this has no effect on the execution time of the test since that is >     controlled by a timeout. >         [1] > https://lore.kernel.org/all/20240801081657.1386743-1-dev.jain@arm.com/ > > part of 6.12? > > > As part of that discussion, we discussed alternatives, such as > retrying migration more often internally. > I was trying with this patch and was able to recreate the issue on PowerPC. I tried increasing the retry count, but the test was still failing.