From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 477EEC4828F for ; Wed, 7 Feb 2024 12:11:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF2CC6B0071; Wed, 7 Feb 2024 07:11:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AA2DE6B0072; Wed, 7 Feb 2024 07:11:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 991AC6B0074; Wed, 7 Feb 2024 07:11:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8A0A96B0071 for ; Wed, 7 Feb 2024 07:11:35 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6B5811C11D0 for ; Wed, 7 Feb 2024 12:11:35 +0000 (UTC) X-FDA: 81764893350.07.77383E4 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf08.hostedemail.com (Postfix) with ESMTP id C30DE160016 for ; Wed, 7 Feb 2024 12:11:32 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PYCZmSdg; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf08.hostedemail.com: domain of will@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=will@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707307892; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O+iZCV3C/1AAaB7Vry3eoAC4wcfa8jPoqGfLBwJ/kZU=; b=4fJbJlpEaAVRfdad4WubT1FNFAWSHPCqdMObaSH2c1QgQktJLyDA5RgO6bNO2GNVPgK5tE 0ble5HeuZwXARCvKeV0+yu17YsflxE3/oSp715WRKEG8d/z2cUemZoY+fV3FavGgRhWNdc UM9eWJm1Ri7cvK2av0qhKouUtq/C0QA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=PYCZmSdg; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf08.hostedemail.com: domain of will@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=will@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707307892; a=rsa-sha256; cv=none; b=mjHLFweKpfUVzOqgLivZVoT4izMkGR2/QszYeIxjYvQv60t6z+nx8WQHfsyj6jJXPojmVM E9XpK1TPD5gnzCnvP1avcF23magex1nGNOjGy0WVqv0qZYLquoAolulScEFiribLPDhrN/ t+IIj3NDk8LWnjxOX9UsBhGo4J2JUsk= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id ADFCD617C2; Wed, 7 Feb 2024 12:11:31 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 470F1C43390; Wed, 7 Feb 2024 12:11:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707307891; bh=+eWY02fFCjdYE59uuyivWbW4kftlv5vlA77S17iwxcQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=PYCZmSdgDd7Ka4HkpDKPCOWDml3yR/9iyJOsWJqfoo0xZfG0Ba1bG6M/iB+vgF/Ek XlLLMDXXQyJyEXMCIrIWWy1LOaHDyxHMNIDH0xiwzZsN7J+Xxi12u20fRNlXOMNLxm 1HNQhaJ1v6yit9zALbj44UNRRCI2oP+uJdjhnhDJNh8SlDO75McLcDTCTaISZ8VWLL r/xknjs+QVIEOuI+O0QH8zre4TFjDT4dJUgs1dQB/L5hqomcWda890zTdMcs5dVDFJ QbdXpra5hKOnIhL/UqB/uZK2HjCTiFO6PoTGymYNhttmkZcTOkXf4ZMjqY70+eiarE eeRjS2YbwYi7A== Date: Wed, 7 Feb 2024 12:11:25 +0000 From: Will Deacon To: Matthew Wilcox Cc: Nanyong Sun , Catalin Marinas , mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org, anshuman.khandual@arm.com, wangkefeng.wang@huawei.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize Message-ID: <20240207121125.GA22234@willie-the-truck> References: <20240113094436.2506396-1-sunnanyong@huawei.com> <20240207111252.GA22167@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspam-User: X-Stat-Signature: 5ift7toau94xg9nq4ccx7i3r8nt73zpc X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C30DE160016 X-HE-Tag: 1707307892-664359 X-HE-Meta: U2FsdGVkX1/I+BMWXIUdDs86KHQdXQpMLu3dp/h/soUlaaIssU9vjTIr2eZfzwj0pMTb8PqQTwOGCbkEFQWRDyxBcfnGB5ILLTRnnUSUEMicbCo6vE65pCNwLpc1IdEowzNkJg40vdmOeb3xNRMVepsQM09yjrznGc6bcpRcHwB4RokFH64FUdVyYt8HSLHkJP2t4nkUFCZsl+SxYgSqjgTrYgNSWow2Q6349e55395VMYcV9ftJWhhbf66UvufNdwnr5JfN0hs8aqb3HW8R86wz5tpIka9xYEDsmbKdg3mhEtNttCdTsegR7GBc/r41QecToBcYVOEFQq6MhxQuDchZkXTBGMWo26P52holPb/ByU1S6igHg8lkI27168htDrWh+Qu75LXkiJeldUxE7TPAdwga9FFsNqJD0U/QG41+/1Xeuw3eJSxfIjZNulxRm15fjTx8WWO1pun+UEOQ37kfIaEFEvIOPKaBLZwIk20FhbABk1qVVHnM43tk+SPRKTdOflgTi9Jg/MwfPxGCXb0o6Hhh7ay4A1/EDVjdr0laeRgqlCfIVsnKf7KeKMUFfZv55s/zFfaiPStdb8NfuZSltK6SE7pVHTfBwOM0SqKeJXwFL8GBrdGdtik5VDLLHnh5mM0nsbcugPmM0YG9C+jK+pqfuU2vk88gZyTbLdC8t8aimezjiquAhDN9mHZimXmMazvyGb33iyi4UxNVWn26fZGhKCWkGMNwJolsQ5pqiyc7Q4Kr0uhLDI7RqSHXPIS2t5bJIMFAW02WYwxWMlHfLAhC3jxHDzeFDVwoQFQ0IN/wBHSUAyGNbNNtKbQPbndhMppCb7oOu9hdAy9SeZOu+qpByIgTTLElmATT00fzYQvR98f4EEeLYMgS09dQdWTciAPxf/WJ7D0Pcnhh5ydsDviOgtZ/Y81AaAZL8QR/acJPWwJSP9N3rHr5C2e2aZtbU46hggtk/qiounY 0ciD6yOd IlnEj/S07voM4PMnYKAkHUPlofiKN74D2ipmbNnxn5NC1CM3C7IwC+IUq5skXrJOCclCunIi4XNIfpHXX+cJoIoZ4p33OrrYnkfM4iRoItpyP4PoN8CM3KbafYhFM3jWWHBEuFxL9eEcAlcJS1BBPDgVfAxNZi6wULLlPvdQ58FwFpr993B2sk1MI8+vUlVxKivE0K+FdrFWxAJyXuUHPHtsQmw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 07, 2024 at 11:21:17AM +0000, Matthew Wilcox wrote: > On Wed, Feb 07, 2024 at 11:12:52AM +0000, Will Deacon wrote: > > On Sat, Jan 27, 2024 at 01:04:15PM +0800, Nanyong Sun wrote: > > > > > > On 2024/1/26 2:06, Catalin Marinas wrote: > > > > On Sat, Jan 13, 2024 at 05:44:33PM +0800, Nanyong Sun wrote: > > > > > HVO was previously disabled on arm64 [1] due to the lack of necessary > > > > > BBM(break-before-make) logic when changing page tables. > > > > > This set of patches fix this by adding necessary BBM sequence when > > > > > changing page table, and supporting vmemmap page fault handling to > > > > > fixup kernel address translation fault if vmemmap is concurrently accessed. > > > > I'm not keen on this approach. I'm not even sure it's safe. In the > > > > second patch, you take the init_mm.page_table_lock on the fault path but > > > > are we sure this is unlocked when the fault was taken? > > > I think this situation is impossible. In the implementation of the second > > > patch, when the page table is being corrupted > > > (the time window when a page fault may occur), vmemmap_update_pte() already > > > holds the init_mm.page_table_lock, > > > and unlock it until page table update is done.Another thread could not hold > > > the init_mm.page_table_lock and > > > also trigger a page fault at the same time. > > > If I have missed any points in my thinking, please correct me. Thank you. > > > > It still strikes me as incredibly fragile to handle the fault and trying > > to reason about all the users of 'struct page' is impossible. For example, > > can the fault happen from irq context? > > The pte lock cannot be taken in irq context (which I think is what > you're asking?) While it is not possible to reason about all users of > struct page, we are somewhat relieved of that work by noting that this is > only for hugetlbfs, so we don't need to reason about slab, page tables, > netmem or zsmalloc. My concern is that an interrupt handler tries to access a 'struct page' which faults due to another core splitting a pmd mapping for the vmemmap. In this case, I think we'll end up trying to resolve the fault from irq context, which will try to take the spinlock. Avoiding the fault would make this considerably more robust and the architecture has introduced features to avoid break-before-make in some circumstances (see FEAT_BBM and its levels), so having this optimisation conditional on that would seem to be a better approach in my opinion. Will