From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7744C4828F for ; Wed, 7 Feb 2024 12:20:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5779A6B0074; Wed, 7 Feb 2024 07:20:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 527666B0075; Wed, 7 Feb 2024 07:20:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 416EF6B0078; Wed, 7 Feb 2024 07:20:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 32DD96B0074 for ; Wed, 7 Feb 2024 07:20:34 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 032AC40D54 for ; Wed, 7 Feb 2024 12:20:33 +0000 (UTC) X-FDA: 81764915988.20.4443C86 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf23.hostedemail.com (Postfix) with ESMTP id DB459140003 for ; Wed, 7 Feb 2024 12:20:31 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf23.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707308432; a=rsa-sha256; cv=none; b=0BEMmYBaz0M43lpoEyV4F7rtTiAXV8jl4MT9qCfiFSg4G/HXF8VFn4abCsCYNVoYUcvGMZ r5XVl9hXxRLO3RbZTy6xvAEr+ARKvQlru6eKpZUf8YxNkUUsApBtggMU8aPQY4qbLhcuO0 /1hPkV3gmJTXvoUkNqPOYsdv7+QHZa4= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf23.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707308432; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9TUqxq6nlS/LiBvBHCcRJAZcFW9F5D+siAOHVKsTQ44=; b=7TqON/GRZRy/frLSNJF3R9T7AhcZLEV/IO66FR5LEuS4DEKUK+eHK31Hxd9FyzUYMbTNDm uZC4aICTWuWHuzZLWCR5wKFlqj8d7pL37dc70q4eeRum1sVshjBLBFzzVREyx+kRILlLSs 74ufzo/aJzTqPHYKIfIjVQ1GkatixHU= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 78760CE18D3; Wed, 7 Feb 2024 12:20:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C7F91C433C7; Wed, 7 Feb 2024 12:20:23 +0000 (UTC) Date: Wed, 7 Feb 2024 12:20:21 +0000 From: Catalin Marinas To: Matthew Wilcox Cc: Will Deacon , Nanyong Sun , mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org, anshuman.khandual@arm.com, wangkefeng.wang@huawei.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize Message-ID: References: <20240113094436.2506396-1-sunnanyong@huawei.com> <20240207111252.GA22167@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: DB459140003 X-Stat-Signature: s4anjnxnghoi4nadkfyokdbmzkyqf4ki X-Rspam-User: X-HE-Tag: 1707308431-401381 X-HE-Meta: U2FsdGVkX19kJEH7jfDIHR3xYkf/O16LoNFSMOy+4wcqaRFePLuuRj9Y9ScwxQ6BRfioDAg8ePjugDgAAdTFX4Ibsz8au3rZwA0AOD8nZyInj/yr4yXda+wNWEgax5V0utZ1oxbKK+ttuCA3d03BWu3xM0A6MeCzCn3C3dnJzr9+JzgCJr4QXTK1YFZAQu134v72VXDF7347dejubBejjT+Vv0FjWDohZyN/mNLsmIN9pSzkSK/2p+QGAV9IA6u22fhgh8ceHB+PeP1QnUKrsRILU9q3gE5G9t/IPbq1DbNXYhrSPJ6wtPwv/zSk7US6LgH6NzTSsED+BXc9/MOCEGSlzRELVV9I32nSfcH80BhauT3fBoAgrLJ8tBtMiFz8jBYsBkwjjBg77RNSN5UYHPxTErmV3nuoUjPhsgAQBY9hDGw97YJ/wRmy2FZ8T4qBoHhEq4WTFxjMcXiM17AHptR+sK3wdZxX8pxAvxHAJ9ZaPp1eGngHMIfaqSg1rZjvq/fhcGaSiWOcWWKl5FflWZnnUo2CCVeB9LcFw7iN13ueLLc/gTPIQIyy74sKsI587AOHrvT2A8nJmZkYP5PjR7i9bBfF7kgcXGhZM7qgDebgyv6oOOM6WEQcXhugxTwSCdIQRsSxBWY1zPJZMdCpJw9HPu2ZV4KEjDzuSSYWbGP5mE0ena+8gGaikD5vshnT9PZoJcky3D5ElYUlBEAGOUYjzcyhzgaDnjzoUff5kG9Hlyjjwfl2WwhFJoaxjVQJy3ChhaBJNATEPq0Wky/OqcDum3NS/Xyi+u2PcN8PxBHpm59AU0OeNKqhwkiS3iZAIM2k03m+GM5kApYhDhUyoHLAUdgVSnhoby3qUr99VfZBl46XY1+4hlw3h4hNURa2Ef+ZGMflezdBoNKZ7eRUxxIi2J8IEAI+AvcLxribLDxjPBA5scHfGHz4VQUo23dkxnso/DUvL0K8cNvd1UJ ZyyQZKQn LK1c0yUVJFNFtp77x0rtFWuQOwSRHtJj149FmWtAXwCFVq0qSc8RAKRyXwIA7FFBDus3ZVhwta3wro2+ocnP/owleG4c1ezMxrbucoTmP0JarjmqBtrexXcpxZHAC9fHN5aqJFgr6lP0ODsC9Ga1PwzB+LZDRL73kkJCeVQQOT6vRHDZZqzHYHcp+BV25/e5bQIPulZiqjsYB9OKduYZbdg6UonXJIw/H48FN5K1ShmisHWoXv/IsUztZO+IkN4ttgXIzuZkPl6iZCMSfeTx0SFABlXG7llVGvvcd0JmLlHtSo+0lt9U5YjwMkJzOP8XkeTqV4S8Sd63WdVwpFBLZLAELbQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 07, 2024 at 11:21:17AM +0000, Matthew Wilcox wrote: > On Wed, Feb 07, 2024 at 11:12:52AM +0000, Will Deacon wrote: > > On Sat, Jan 27, 2024 at 01:04:15PM +0800, Nanyong Sun wrote: > > > On 2024/1/26 2:06, Catalin Marinas wrote: > > > > On Sat, Jan 13, 2024 at 05:44:33PM +0800, Nanyong Sun wrote: > > > > > HVO was previously disabled on arm64 [1] due to the lack of necessary > > > > > BBM(break-before-make) logic when changing page tables. > > > > > This set of patches fix this by adding necessary BBM sequence when > > > > > changing page table, and supporting vmemmap page fault handling to > > > > > fixup kernel address translation fault if vmemmap is concurrently accessed. > > > > I'm not keen on this approach. I'm not even sure it's safe. In the > > > > second patch, you take the init_mm.page_table_lock on the fault path but > > > > are we sure this is unlocked when the fault was taken? > > > I think this situation is impossible. In the implementation of the second > > > patch, when the page table is being corrupted > > > (the time window when a page fault may occur), vmemmap_update_pte() already > > > holds the init_mm.page_table_lock, > > > and unlock it until page table update is done.Another thread could not hold > > > the init_mm.page_table_lock and > > > also trigger a page fault at the same time. > > > If I have missed any points in my thinking, please correct me. Thank you. > > > > It still strikes me as incredibly fragile to handle the fault and trying > > to reason about all the users of 'struct page' is impossible. For example, > > can the fault happen from irq context? > > The pte lock cannot be taken in irq context (which I think is what > you're asking?) With this patchset, I think it can: IRQ -> interrupt handler accesses vmemmap -> faults -> fault handler in patch 2 takes the init_mm.page_table_lock to wait for the vmemmap rewriting to complete. Maybe it works if the hugetlb code disabled the IRQs but, as Will said, such fault in any kernel context looks fragile. -- Catalin