From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D060E77188 for ; Fri, 3 Jan 2025 08:17:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 015CC6B007B; Fri, 3 Jan 2025 03:17:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EE00D6B0082; Fri, 3 Jan 2025 03:17:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D81246B0083; Fri, 3 Jan 2025 03:17:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B810B6B007B for ; Fri, 3 Jan 2025 03:17:31 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 709F71C5518 for ; Fri, 3 Jan 2025 08:17:31 +0000 (UTC) X-FDA: 82965434328.13.33B2A8D Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf09.hostedemail.com (Postfix) with ESMTP id 2FF29140011 for ; Fri, 3 Jan 2025 08:16:54 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf09.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1735892189; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=v71+NbIUaTSlslu4jasCaJQlIGsGgYQdngt1CCWCO5Y=; b=xxKBtHZ85dkN3eXgMRnRkAHkJXgJnir+P6XS72JmB+JyQA5ztz2pl8dSTvk+6tswLAAd76 egcmrRFx0Izbsh5qlP/eeHuaTcP/aSnhRceAbLN7iRYylKptZk2QsGJKRBUMejfohCFCyo lLkQ/+J7IerOepu/cL1MUtUsGoaia/s= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1735892189; a=rsa-sha256; cv=none; b=oMnfjn74Yt0qgui08qdR/bIMp3fkz/b748z6hdKm2fCi9C55Q2Ace3dvXceQIqjvb1brTN XdniLzYHdjPX7BNO2dGQVFlj/s/u+iqCsAc2UVUksoZeh/bii4V+iOJWzI7OzvT8ZY9HMI sE6qUm0hGbc9bU9KT7x5t6aMjxjaxpA= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf09.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7D8E9150C; Fri, 3 Jan 2025 00:17:56 -0800 (PST) Received: from [10.162.41.37] (K4MQJ0H1H2.blr.arm.com [10.162.41.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6BBD83F673; Fri, 3 Jan 2025 00:17:18 -0800 (PST) Message-ID: <7e89702d-c52c-4716-9cd6-33aebade1c71@arm.com> Date: Fri, 3 Jan 2025 13:47:15 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 09/12] khugepaged: Introduce vma_collapse_anon_folio() To: David Hildenbrand , akpm@linux-foundation.org, willy@infradead.org, kirill.shutemov@linux.intel.com Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20241216165105.56185-1-dev.jain@arm.com> <20241216165105.56185-10-dev.jain@arm.com> <2215dd8e-233a-427b-b15c-a2ffbce8f46d@redhat.com> <28013908-65d8-462e-b975-cd0f63d226b1@arm.com> <0368f4f2-cb0f-4633-a86d-5c3f75839b4e@redhat.com> <8d752d25-b9b2-4bf9-9a81-254aeb3ab0f6@arm.com> Content-Language: en-US From: Dev Jain In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 2FF29140011 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: pppuzex9uxk4zr794bnr87n4qgfg7kdt X-HE-Tag: 1735892214-331272 X-HE-Meta: U2FsdGVkX1//6xYdVKDfH/CEdfiUzxY14zPB/Y8I+gMWQxyGQK6azaWOn012y/QB5VxTtvV1HCpvS9FBkiFTviY9PSEu7A5zaxg1vmQpM9z1yqCbp2hS9gJHQIwAJ5C5K1EraRWIROyRYYWE8IFkIb3weO+lGaPvBltEXMWeTeqXJpr3guCZXWIWCm84suv2ye6jvu5Hca+44fUSo7OscquT3D0eZ5zQ6Z3gBxg8C1J7r8pvt7b2ulBdOUjuHrbplsmEX31hvjj+sntIY8qzj5jlMLNMUaEfZEdm7hSOhlrvpGEL9WNnN0g25XpIamw/qvoQLwT3PYk0QrrtL1189kMe2R95fFXp1YHFa3dHHmFpzOcXeERR2q6B4L+sXGut0c8l9JVnXaeGftI0p9XTm+R9fls8TXCTp70ZLFo9+SSmLPobcpJMgOaqilLlY/Qt2ck7jhHVUiOqvOfYp0wrCeDpri+5jkyZmCdx7RIaD5Lv/A0SyK8voXr5h645dCWejK9SzCCu53oIHj9sfSg8jMKEKZsJDNt9lqw0phJ5Z8s8zi61hGLueGUsqBkV2+AH9U7v3XtvQyiN4dkQlq+I0p+0EZvLPs08G8WE0Y2UnTxMG6flrTNziTy9ipfb+t2lyiVvv/53cxqRQYwrRBorS3lytPJlszWbAfpkcCdyv9SX+Z0/FM7pB8iq8uWmgli6w5iQlIsYwPAxnkuMjujh2ykceBcLgvaSyJ5o2CTPoo0iOaZL6HglryObnFxTyrA+t0gDAKKR7l3LfptMxEWlEisIz8mVsQY090+PtlYJdxfmb3rPnJ/MJRQK5xtrg2vLoS+laZtGD6iVj5qLQV0DVzruKszec6HmgP+JvlCYpXUDYZMiVHmnNHkjigds3SGs/V47RSAu15w7cVBopfAAf+hWWnpEUmLHdohYE9+S/tHQBUrD1I1R8SW+wjc6+pOasVBvNRvCmemky2rvjzt AZWgK695 v0Ocg5JWDqmmsP8yTf7aI+KRjGZ2QHaz6QgNZYFRV12Qy4b+cGjbSv0ppDW1wJ2fuWhtdGjfnKLjXra0VlbqQYyx5WPWbrk6ALe7rtRU5Atc+O3ADvoFvJ254h1p77Qpp2yMyZhXOCqmXaGve9jX/mu6rhm0Y8gZ9ifi9/+ARMayBCSIhlWIe1YCgOnr1PCoxVIqpgrOhC3TaKR1e5oGYH8yo6/8n7pkSSBq6zHKpeGWCwqDvEL21bK0lZc5FscLPGS1dUGq/Z9Q7M4s= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 02/01/25 5:03 pm, David Hildenbrand wrote: >>>> >>>> When having to back-off (restore original PTEs), or for copying, >>>> you'll likely need access to the original PTEs, which were already >>>> cleared. So likely you need a temporary copy of the original PTEs >>>> somehow. >>>> >>>> That's why temporarily clearing the PMD und mmap write lock is easier >>>> to implement, at the cost of requiring the mmap lock in write mode >>>> like PMD collapse. >> >> Why do I need to clear the PMD if I am taking the mmap_write_lock() and >> operating only on the PTE? > > One approach I proposed to Nico (and I think he has a prototype) is: > > a) Take all locks like we do today (mmap in write, vma in write, rmap > in write) > > After this step, no "ordinary" page table walkers can run anymore > > b) Clear the PMD entry and flush the TLB like we do today > > After this step, neither the CPU can read/write folios nor GUP-fast > can run. The PTE table is completely isolated. > > c) Now we can work on the (temporarily cleared) PTE table as we > please: isolate folios, lock them, ... without clearing the PTE > entries, just like we do today. > > d) Allocate the new folios (we don't have to hold any spinlocks), copy > + replace the affected PTE entries in the isolated PTE table. Similar > to what we do today, except that we don't clear PTEs but instead > clear+reset. > > e) Unlock+un-isolate + unref the collapsed folios like we do today. > > f) Re-map the PTE-table, like we do today when collapse would have > failed. > > > Of course, after taking all locks we have to re-verify that there is > something to collapse (e.g., in d) we also have to check for > unexpected folio references). The backup path is easy: remap the PTE > table as no PTE entries were touched just yet. > > Observe that many things are "like we do today". > > > As soon as we go to read locks + PTE locks, it all gets more > complicated to get it right. Not that it cannot be done, but the above > is IMHO a lot simpler to get right. Thanks for the reply. I'll go ahead with the write lock algorithm then.