From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29F70E77170 for ; Thu, 5 Dec 2024 10:10:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 757DD6B007B; Thu, 5 Dec 2024 05:10:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7061F6B0082; Thu, 5 Dec 2024 05:10:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CD216B0083; Thu, 5 Dec 2024 05:10:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3E7486B007B for ; Thu, 5 Dec 2024 05:10:17 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A06D11A14BD for ; Thu, 5 Dec 2024 10:10:16 +0000 (UTC) X-FDA: 82860485064.28.5BC6291 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf17.hostedemail.com (Postfix) with ESMTP id D228640002 for ; Thu, 5 Dec 2024 10:10:02 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf17.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733393400; a=rsa-sha256; cv=none; b=YDnRGNidNp2Lp0uwoHBruBzapFyCQiDWzxIynzJzDtWnIpywyNBMMQgKkykg+Db8+f+dKr 4DMoGfKmU6+gToVfc+FuZAurosbVL2iYE00FWXhTV3ldiEwsNMXYztTtRO61jnnaVxwK8A hLAJJjVkwxQj77E1NscmLWHn8Fn1SMU= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf17.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733393400; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=JduB7IHoFbXJE/LvaWA3NPGFxCSSvljHIaOXZdLg8jg=; b=waMzdYSYdlW6nLLpsWxfB1+LlokcRtkqptGEW66yK76aCbURwF6JGAzQyDl8hBE5xKcBnb KZqfWMGCbDMTBi/X+j+6IA95JQHIdqiLKH4RQpdD0+daZnQOc6wS4mfgYxRaEUvkuNtZqV 8U4oWnPGNkUGn40dadP8wIMDyJBcIiw= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B5CEDFEC; Thu, 5 Dec 2024 02:10:41 -0800 (PST) Received: from [10.162.43.28] (K4MQJ0H1H2.blr.arm.com [10.162.43.28]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 79BA43F5A1; Thu, 5 Dec 2024 02:10:11 -0800 (PST) Message-ID: Date: Thu, 5 Dec 2024 15:40:08 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird From: Dev Jain Subject: Re: [QUESTION] anon_vma lock in khugepaged To: ryan.roberts@arm.com, david@redhat.com, kirill.shutemov@linux.intel.com, willy@infradead.org, ziy@nvidia.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20241128062641.59096-1-dev.jain@arm.com> Content-Language: en-US In-Reply-To: <20241128062641.59096-1-dev.jain@arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D228640002 X-Stat-Signature: d9bs3to79f3r4yzkhajgizpwna8oa7kp X-Rspam-User: X-HE-Tag: 1733393402-368115 X-HE-Meta: U2FsdGVkX18AZzSPTGffjiXR5FpX2eqFze8bd5dJ/q/t0CpZK/MBc6EwYWaOTPxZXcjsTUn5XcDHqpZA/vtXCbJn6FVTpmXWKUgvnlTDflld9NE1x8AKvK5GMQFMbEY4ZDx/5uvSe2lVgfSS9noDAd0RjfJikCMY8Tf5ZPjbHCGgJ1lELQiqBCMV+l/tT1lj8wHue/OjrgKhkzj66IHFKO9a+B1G7EfvJEhNxJ4R4vY8UPe7gNyfElpaRhvb2rXvJi7/Mg6aD10h7KKwEavbayZG9DMfHAZ7OGSgiBfVX3inqrVOqq0gnSgf5ye2WewdIHqgY6dgcrywM+YYAGTY7nhiDwBEujSSQfMYQtqybiUUx4Km1efQmTmsg3KDgN0ZYZQes72zyqa2vMmYHSlhTp3SFU/bR/JofDN2GMMAgBbt19/dvVIcssr+JoF8d3jTYV8EUAMTngYaJLCK5I+zL1x2MuE+WYKWHPfPDGOrcFNXaWPEQD5xYBGVCm1hCtqEjL2+Q0RrnOFtfVLByOlx5mevrm8ILTTYCLV2HTv8wz34agb/2o1qfNOdMXdI5ZKAcjq5YFUaY+Dz5hXMsw7ftCyQgQWMOqoStM0THMHEINL38IZnjRpd+VJl1KvIvWjrNfIGuYqQV/kvZGbXbeYZlOPy7lzsTYI4TL3jXy3OsiPYplOXArtb3823j7Hq/O+jV9BITtf21zJ3kgf5gJ1F6bF4v/DfDnBiydhU/blpV9eGbn2aIcoDIxSDWAHfTheBhdN/dVuz0MJFeMGLIZEgHt/7H2GXkrnu366azPLaowjx0u5zgMWWLX5FaRf2sm+aJGl3qliUWNURlc/47vFQvYP6gYOK0wS2xunyR43MfdEQb83CyUmRoqT7reqW9HbIiTkL/5/y5Wc4kTGZLNffCVT5llbxflJHYupiQ4nqKaOKdMqXLVcxIbKEw83fjfGhTQNEAidbkK7RF9cqYD3 0RWx1kkz tpepEG/ELSVSRLRdXZFAM6IL/BuYvd0FbNWw008E/pLSpUk9mt1WRAXgZICSa6aBBQq6hEVsJ2cbW3vXCIi61eQSKVEOXRP2nBX2FA+Mnpqvee8RXj6asmdbjBrzMy1W399gGT6E03DC6p1iKu9JPOkuW4vNHIIJLdWnFQptrEXTF65tG3KXU9Cuf1mZaajx/8uvl X-Bogosity: Ham, tests=bogofilter, spamicity=0.000005, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 28/11/24 11:56 am, Dev Jain wrote: > Hi, I was looking at khugepaged code and I cannot figure out what will the problem be > if we take the mmap lock in read mode. Shouldn't just taking the PMD lock, then PTL, > then unlocking PTL, then unlocking PMD, solve any races with page table walkers? > > Similar questions: 1. Why do we need anon_vma_lock_write() in collapse_huge_page()? AFAIK we need to walk anon_vma's either when we are forking or when we are unmapping a folio and need to find all VMAs mapping it; the former path takes the mmap_write_lock() and so we have no problem, and for the latter, if we just had anon_vma_lock_read(), then it may happen that kswapd isolates folio from LRU, and traverses rmap and swaps the folio out and khugepaged fails in folio_isolate_lru(), but then that is not a fatal problem but just a performance degradation due to a race (wherein the entire code is racy anyways). What am I missing? 2. In what all scenarios does rmap come into play? Fork, swapping out, any other I am missing? 3. Please confirm the correctness: In stark contrast to page migration, we do not need to do rmap walk and nuke all PTEs referencing the folio, because for anon non-shmem folios, the only way the folio can be shared is forking, and, if that is the case, folio_put() will not release the folio in __collapse_huge_page_copy_succeeded() -> free_page_and_swap_cache(), so the old folio is still there and child processes can read from it. Page migration requires that we are able to deallocate the old folios.