From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D12A2C48260 for ; Thu, 8 Feb 2024 09:44:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 236EC6B0074; Thu, 8 Feb 2024 04:44:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E63E6B007D; Thu, 8 Feb 2024 04:44:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0AED76B0080; Thu, 8 Feb 2024 04:44:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id ED6D66B0074 for ; Thu, 8 Feb 2024 04:44:57 -0500 (EST) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B5680A1DC2 for ; Thu, 8 Feb 2024 09:44:57 +0000 (UTC) X-FDA: 81768152634.21.9C1E266 Received: from szxga05-in.huawei.com (szxga05-in.huawei.com [45.249.212.191]) by imf30.hostedemail.com (Postfix) with ESMTP id 8849E80010 for ; Thu, 8 Feb 2024 09:44:54 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of sunnanyong@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=sunnanyong@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707385496; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=STtoSs5arH7Z7B6o8Jc5vxJ2oW35KHtTibhQ7eD7N+k=; b=iz8A05DcuiZAMrxcY0zt9LbQ+8iVhCyX0WM59mZFAyZ+njSFnT1AxxhGIwFXG4b0PpXa7K utfPyJFnibvgOwOALXAdkzHYl0zlbprTdq781nmKsFRRW8wJvAZ6TFRpJZbQ8mVtWn/6+8 AIjmBiGG/NmxR9Ody+TL7z80OWmG9JQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of sunnanyong@huawei.com designates 45.249.212.191 as permitted sender) smtp.mailfrom=sunnanyong@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707385496; a=rsa-sha256; cv=none; b=aon2RtkVXfWE0OxpJtFl+BbYji3P8CcGhg7S0lr1GEu8SPZnxaJtEDjmbX5UMLsVw7Rs3r uV9XJf/fN4qg78kc+1x0+cd33m8Ygy6iEZq2bQqLdAw/VXht3Xmrf6N0goQjxZg0jxL+Lm DDTonL+8GTFBc+sVO1/h5sTD6/0JnPk= Received: from mail.maildlp.com (unknown [172.19.88.163]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4TVsR03SlZz1FKPw; Thu, 8 Feb 2024 17:40:12 +0800 (CST) Received: from kwepemm600003.china.huawei.com (unknown [7.193.23.202]) by mail.maildlp.com (Postfix) with ESMTPS id 23F7518001A; Thu, 8 Feb 2024 17:44:50 +0800 (CST) Received: from [10.174.179.79] (10.174.179.79) by kwepemm600003.china.huawei.com (7.193.23.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Thu, 8 Feb 2024 17:44:48 +0800 Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize To: Catalin Marinas , Matthew Wilcox CC: Will Deacon , , , , , , , , References: <20240113094436.2506396-1-sunnanyong@huawei.com> <20240207111252.GA22167@willie-the-truck> From: Nanyong Sun Message-ID: <44075bc2-ac5f-ffcd-0d2f-4093351a6151@huawei.com> Date: Thu, 8 Feb 2024 17:44:48 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.79] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemm600003.china.huawei.com (7.193.23.202) X-Rspam-User: X-Stat-Signature: iu66w7jpwgirndnk7usk654pj5dbmpf9 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8849E80010 X-HE-Tag: 1707385494-991282 X-HE-Meta: U2FsdGVkX18u4CvLvY/6yxbbebotneBP15OCjAUke7rf7MZWoTAt8qBIMLilMgkJlb6zFbLznpv0OgZEA5KpExtkabzDK/ch3xhtWrULsO9LViXY4MuyuFSwUEZkEriIdLCjIxDqsnt39Zk61uPYYFaP9OJeuH88DcTyP77Mo6VhCSlIzjBenRbn6XicbcKpde2+PdsHqG5bQJ20mik+SY8mf8eFSA0mAv7fL4qOlDkPCl033EnDk76eKm6nTihSKvdNDVi6fQ9AlxUGIEeOU5ejkLwhYBS6yck4twkYsHMe5Fv9dEF+VdGAc5y31IlebV6Vt7R4Z3SyE5Af8ugmfaJeNZhQFXWnoiny0TpqO/ZhYiZhHelENUzIKBYNGYYJd+jHEobJYBAv8Ny0AQWOgr8LGTWwN6Bi+e/HWWgnhEFMK5t1J48gn9REuPbNAs+UugT0rJKvlEvlUJ6NfiVQOPiXWJoKhBNl4WPsamT2UpSn5mKQoDq1sGjAxfj5GLpE1tHNfsm5FYP9OWEzxSoGzj80G1QNuwZmqKSOmnYPitL+NwHGDfhmahnoAQWODZ6OVY5AN7a0KDyPPjwdb8a0ap32E1L2Cg+2I8BVntFLnpWOtrCyHl2zxY8TYHZj8CTvCa2zp5DCStSfamdVqV/EjSs+0nRt1PRkomA4SwcyUGFfp0triOw756a8qni9OblOMs7BzjPJ33pqPphNt9hlz03I/uUSoAx886VWQ6z6gFbDH0h+zuKPBetRctxZFfZg09sKgUY+7qsFdJWDJCZlQNSoyEIvGtkdv+WLLnVlsGTK8ujOfWkO0+sUjikj0kj+UnevMT1bceE6/HmH+Q7p4Hx5s51gjh7FqZ7Xk3L+zP/zchzlTfKJRyy0Vp2Tf9h0pwk3dL8ik9TNXCXBh+dPX+fC3KFlyClvPAxeTrmwE3FU7yO3xiDLJs8PiFZElOVrV46/GPsrB+ou0ob+H0N Y5qAe7s+ 4O4fuMQuP2f+9G7DRBsii/wfLUoeUrhVnVn32pgHSC9tq893p8UlQr1mXB8W7l10/d15h7/MCsiuO6vbH6XTv9VUAr5vlELWUtUZb2nwX4HeaZvW43dLlvrJ+1ZHSc/bXLZStibIXQf2B3Fr9rvQzOEITrRsKMcqrQI02h0KJfXqBLw05ZyAHDn1o4w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/2/7 20:20, Catalin Marinas 写道: > On Wed, Feb 07, 2024 at 11:21:17AM +0000, Matthew Wilcox wrote: >> On Wed, Feb 07, 2024 at 11:12:52AM +0000, Will Deacon wrote: >>> On Sat, Jan 27, 2024 at 01:04:15PM +0800, Nanyong Sun wrote: >>>> On 2024/1/26 2:06, Catalin Marinas wrote: >>>>> On Sat, Jan 13, 2024 at 05:44:33PM +0800, Nanyong Sun wrote: >>>>>> HVO was previously disabled on arm64 [1] due to the lack of necessary >>>>>> BBM(break-before-make) logic when changing page tables. >>>>>> This set of patches fix this by adding necessary BBM sequence when >>>>>> changing page table, and supporting vmemmap page fault handling to >>>>>> fixup kernel address translation fault if vmemmap is concurrently accessed. >>>>> I'm not keen on this approach. I'm not even sure it's safe. In the >>>>> second patch, you take the init_mm.page_table_lock on the fault path but >>>>> are we sure this is unlocked when the fault was taken? >>>> I think this situation is impossible. In the implementation of the second >>>> patch, when the page table is being corrupted >>>> (the time window when a page fault may occur), vmemmap_update_pte() already >>>> holds the init_mm.page_table_lock, >>>> and unlock it until page table update is done.Another thread could not hold >>>> the init_mm.page_table_lock and >>>> also trigger a page fault at the same time. >>>> If I have missed any points in my thinking, please correct me. Thank you. >>> It still strikes me as incredibly fragile to handle the fault and trying >>> to reason about all the users of 'struct page' is impossible. For example, >>> can the fault happen from irq context? >> The pte lock cannot be taken in irq context (which I think is what >> you're asking?) > With this patchset, I think it can: IRQ -> interrupt handler accesses > vmemmap -> faults -> fault handler in patch 2 takes the > init_mm.page_table_lock to wait for the vmemmap rewriting to complete. > Maybe it works if the hugetlb code disabled the IRQs but, as Will said, > such fault in any kernel context looks fragile. How about take a new lock with irq disabled during BBM, like: +void vmemmap_update_pte(unsigned long addr, pte_t *ptep, pte_t pte) +{ +    spin_lock_irq(NEW_LOCK); +    pte_clear(&init_mm, addr, ptep); +    flush_tlb_kernel_range(addr, addr + PAGE_SIZE); +    set_pte_at(&init_mm, addr, ptep, pte); +    spin_unlock_irq(NEW_LOCK); +}