From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78395E7718B for ; Mon, 23 Dec 2024 12:08:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EAE626B00B0; Mon, 23 Dec 2024 07:08:19 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E5DED6B00B2; Mon, 23 Dec 2024 07:08:19 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD70E6B00B3; Mon, 23 Dec 2024 07:08:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B0ED26B00B0 for ; Mon, 23 Dec 2024 07:08:19 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 59BDCA0D92 for ; Mon, 23 Dec 2024 12:08:19 +0000 (UTC) X-FDA: 82926100194.15.96C174F Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf19.hostedemail.com (Postfix) with ESMTP id 6AAF11A0007 for ; Mon, 23 Dec 2024 12:07:37 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=ozNBw0F7; spf=pass (imf19.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734955661; a=rsa-sha256; cv=none; b=k5+yzgDkLiHfQaIr333pibU430FaR+m3LorT2L7MImzO/6mj2GcFwbzNSOGxrzO1ApSOFd 3E7i/TRGCm4rczCXvt3FCnz6aZBC8fjZCYRpJ6xh/jwHzyfQKeO3aYpecKvrVxcQZlj2JT ObMYlE8dBwxUpqVYUfiHTpRf/yG2cC0= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=ozNBw0F7; spf=pass (imf19.hostedemail.com: domain of donettom@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=donettom@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734955661; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=znfxjKoPmXLWNasw8NaV5e5hg19ViD/ZoXVm7bNT810=; b=x/WTCKalgNWzO/xy4C/aPbqZ4S9rYj1iW6E1ouOcstSH70yAdKp07NxAebKszSPrm1ONVA W2zjSsZaJ/7YJxmsh8IkwShXw7ofKv/IjyTSRJCfA2otRY/mRBkM19tQFw3daafmp4YCI5 ZeJZ4nKsnMWrjTip8SqO2YlQn2qhXLQ= Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4BMLGLic021919; Mon, 23 Dec 2024 12:08:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=pp1; bh=znfxjK oPmXLWNasw8NaV5e5hg19ViD/ZoXVm7bNT810=; b=ozNBw0F7Hs4YepxZoKqorS KRk93NAs1jIcaANR4h16An7pqw8gmq4cYbFT/xQpATrroCGpwxQr9JEoRgnc/iXZ PcHTVmB6uYPlDoU2ektnWFG2s//IHlLEjKIWjn5YVlwK9HXjFRIcRKtm6ATfSLwQ LrAP3B/BjivLh58gwpPl7fmxoh3UJuFkq+xXD7UEK3L4CnzIszWMxsSx7x4quP7N u5vgcKuLtor/BHTYlHBKrFmkSbbPTobNtzWTWLhD2kt8VQ0+EPJDX2U+1Xy11FlW pcNaE3u2nA+RE6CyRc94ltdUugTqRYVmw1tg2e74Ix/FN0Iomuc1E4WGY/0QvFxQ == Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 43pm84km82-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 23 Dec 2024 12:08:10 +0000 (GMT) Received: from m0353725.ppops.net (m0353725.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 4BNC89CQ014718; Mon, 23 Dec 2024 12:08:10 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 43pm84km7x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 23 Dec 2024 12:08:09 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 4BNArwi1010616; Mon, 23 Dec 2024 12:08:09 GMT Received: from smtprelay03.dal12v.mail.ibm.com ([172.16.1.5]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 43p90mwhe0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 23 Dec 2024 12:08:09 +0000 Received: from smtpav04.wdc07v.mail.ibm.com (smtpav04.wdc07v.mail.ibm.com [10.39.53.231]) by smtprelay03.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4BNC88Q422872702 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 23 Dec 2024 12:08:08 GMT Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 291E958052; Mon, 23 Dec 2024 12:08:08 +0000 (GMT) Received: from smtpav04.wdc07v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 831C158045; Mon, 23 Dec 2024 12:08:03 +0000 (GMT) Received: from [9.179.4.44] (unknown [9.179.4.44]) by smtpav04.wdc07v.mail.ibm.com (Postfix) with ESMTP; Mon, 23 Dec 2024 12:08:02 +0000 (GMT) Message-ID: <47e99079-5e81-4dcb-a9a1-5e25b0f40155@linux.ibm.com> Date: Mon, 23 Dec 2024 17:38:01 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: migration :shared anonymous migration test is failing To: David Hildenbrand , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Ritesh Harjani , Baolin Wang , "Aneesh Kumar K . V" , Zi Yan , shuah Khan , Dev Jain References: <20241219124717.4907-1-donettom@linux.ibm.com> <9bc000ab-1982-41f0-9ca0-2a4ead9aa982@linux.ibm.com> <454f7c85-a6d9-4635-94ff-7fab69060a82@redhat.com> Content-Language: en-US From: Donet Tom In-Reply-To: <454f7c85-a6d9-4635-94ff-7fab69060a82@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 6QLzgylWL-rjejHBhERTDns7hUrlHEFr X-Proofpoint-ORIG-GUID: DBpBtrrnYMDtTgdVM3vgO-93NvmfgG3b X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=659 phishscore=0 impostorscore=0 malwarescore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 suspectscore=0 bulkscore=0 adultscore=0 priorityscore=1501 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2411120000 definitions=main-2412230108 X-Rspamd-Queue-Id: 6AAF11A0007 X-Stat-Signature: n5w81nafyz3efzd7edufy6nbq3jopayj X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1734955657-725301 X-HE-Meta: U2FsdGVkX18epp/UFN9U+Lfz6Q3nMD/8KWRPqSeE61MAQ89XIt2bRYlKuTfydjcAzrRb+uzS55lgiieWC5lfDhHCmin6i4FmZBhnm+dhhLwGXVG8O7kAgBMue/ATIPkZQ2Gw+L67Mx5ED7AnGKHNjG+icIiamazpg7hG++g1muQwpbO5HqeUH3qHhX6hMwV50ems1EJdSTOU7u7smtfvmQgfoP+lU7f5pnvvmCI/VBpBOl/w5bOCDg9uGyKxfyRikQf6Nb87qcyIaZJqkWQ3E8eUYsbVYq5mkFO3NIu8GZLFceUzdH5lJ3ZDklsIpVF2lD4Qw755jNKRYkPliDZ2fr6Bzt2tXuwAIy3HhjVfbEHMeDM+RNA/+VuKoI3+qbwrXmx6TEBkYte3694x80mWw6RH1AKfAVjOzASdqyDA6Vo8XO3dH/ZnaS+g5rlx/cRuy1pf2mPxzT0+BFEITUU/8xF4sJ7mufMpfT1XWkFuA2fGHcnSA9jFKq91JPDcOFCmYKRLGv1SyX2xPei0jeXfyl4oCz2tTbz2qST4fo02UzrctSRBnRkNaOJkRpWEqC79nCxwtgU+dguRa+TwtUqYOXDB+GKYZD0HceKD9IwnLH5dwN5rVwmq5hKqJdfHG3rwoFya0wluA/nQCD2aWmD8xKZ6A3oUA5RGJbLv2C+8qDdXZsF9ya91ycRtwWl/rQAuauL+lJw9Km1Cw9KTAmJ9QeCSmEf9YL5OBk6OGred+ASQZQRHOjy64uFt3mvevtzguKjLqSPlL6coVYK8JeR0TieicF7uXvRJOwKi/56wBB3cxYO52D3eNGGL8UXBvP3RmfrM1uxi8AgoyXPY/gN8FrI0xyBrSQALPt1EvjH8wm5HkpZynHoTNKspM4ywPUZOal5fGFnYXs9nNJmBU/jcfB/tKbH8vda5q+MqeNvwnTM/LrM/mGZWxKn9QCkDPcAqvH1olZW/oS5fZGUAcbZ r8xaWGl8 wUGR1JaSZgKx9ydPPIWNTqBUYvwLY/C72mu14FxdgivI0OI1F2MTK8B0ECEZzqS0WJhh7pDWzU5VKIVRxCnAvzMDnFlFbdHxRIzdXsQXN/o8/Xa7pJeoVpJ12IXrmYMuSM/yU8xtItnIGTtFbq5tU4eeU5/HO+4cpHNegCKQ0rCqBjc5lD1rHNszE4lMeiAdQpxvYFin6pvr1lmLfeJ63SSBjRbAitogFXEboUaXGBw/lARCfoZLmBPAv4p+XYHTcz8iMeFeFbhSIgJZMK7cVsQT8J54ULG6Nwmqrsj4Oa5nDGm4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 12/20/24 15:41, David Hildenbrand wrote: > On 20.12.24 03:55, Donet Tom wrote: >> >> On 12/19/24 18:28, David Hildenbrand wrote: >>> On 19.12.24 13:47, Donet Tom wrote: >>>> The migration selftest is currently failing for shared anonymous >>>> mappings due to a race condition. >>>> >>>> During migration, the source folio's PTE is unmapped by nuking the >>>> PTE, flushing the TLB,and then marking the page for migration >>>> (by creating the swap entries). The issue arises when, immediately >>>> after the PTE is nuked and the TLB is flushed, but before the page >>>> is marked for migration, another thread accesses the page. This >>>> triggers a page fault, and the page fault handler invokes >>>> do_pte_missing() instead of do_swap_page(), as the page is not yet >>>> marked for migration. >>>> >>>> In the fault handling path, do_pte_missing() calls __do_fault() >>>> ->shmem_fault() -> shmem_get_folio_gfp() -> filemap_get_entry(). >>>> This eventually calls folio_try_get(), incrementing the reference >>>> count of the folio undergoing migration. The thread then blocks >>>> on folio_lock(), as the migration path holds the lock. This >>>> results in the migration failing in __migrate_folio(), which expects >>>> the folio's reference count to be 2. However, the reference count is >>>> incremented by the fault handler, leading to the failure. >>>> >>>> The issue arises because, after nuking the PTE and before marking the >>>> page for migration, the page is accessed. To address this, we have >>>> updated the logic to first nuke the PTE, then mark the page for >>>> migration, and only then flush the TLB. With this patch, If the >>>> page is >>>> accessed immediately after nuking the PTE, the TLB entry is still >>>> valid, so no fault occurs. >>> >>> But what about if the PTE is not in the TLB yet, and you get an access >>> from another CPU just after clearing the PTE (but not flushing the >>> TLB)? The other CPU will still observe PTE=none, trigger a fault etc. >>> >> Yes, in this scenario, the migration will fail. Do you think the >> migration test >> failure, even after a retry, should be considered a major issue that >> must be fixed? > > I think it is something we should definitely improve, but I think our > page migration should handle this in a better way, not the unmap > logic. I recall we discussed with Dev some ideas on how to improve that? > > > I'm pretty sure one can trigger similar case using a tmpfs file and > using read/write in a loop instead of memory access -> page faults. So > where racing with page faults is completely out of the picture. Thank you David. I will try this scenario as well and come back with some ideas for improvement.