From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 200FCC46CD2 for ; Sat, 27 Jan 2024 05:04:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D50A6B007B; Sat, 27 Jan 2024 00:04:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 785376B007D; Sat, 27 Jan 2024 00:04:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 64C526B007E; Sat, 27 Jan 2024 00:04:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 521956B007B for ; Sat, 27 Jan 2024 00:04:25 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EF8811404C2 for ; Sat, 27 Jan 2024 05:04:24 +0000 (UTC) X-FDA: 81723900048.08.3917594 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf09.hostedemail.com (Postfix) with ESMTP id 342EE140023 for ; Sat, 27 Jan 2024 05:04:20 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf09.hostedemail.com: domain of sunnanyong@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=sunnanyong@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706331863; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KfrXgFqMM3iM6nFU/mvcsMIBYGQhFQYNhxsp+ZPC7+g=; b=isYccV4QpF+Pm7ubUqrfSrnAqJvsYVkoYo4d4roqM2R+V4XbqTvsPEa50+Rfvk9SD6CQcl OOjYBH4tUd5zNsHRjcFxQvis29AKEupThKFi3OWXqbdoEjg+iJPdo055vumLzMJ8+P7atX GbT3ybxayYCtOCWOJFeybSGmKV+HWEw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf09.hostedemail.com: domain of sunnanyong@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=sunnanyong@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706331863; a=rsa-sha256; cv=none; b=SLFJbVSrTvHOQH1aGd70sMmeyQiM5kYcFWQdIqTDDZf//RS9QYKG7qx94UnWOely3lUDf7 0Xay+f2Hhan89NAFekI41lQNV0a27Kbj9EvGDVUkH7dVA03d/UAKS4sSHtj6W19TM0wHjv Dc0BO/sr1iA0E6p92M/5PNIW3NkkX0k= Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4TMMrK55wkzvVNC; Sat, 27 Jan 2024 13:02:41 +0800 (CST) Received: from kwepemm600003.china.huawei.com (unknown [7.193.23.202]) by mail.maildlp.com (Postfix) with ESMTPS id DECCE140153; Sat, 27 Jan 2024 13:04:16 +0800 (CST) Received: from [10.174.179.79] (10.174.179.79) by kwepemm600003.china.huawei.com (7.193.23.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Sat, 27 Jan 2024 13:04:15 +0800 Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize To: Catalin Marinas CC: , , , , , , , , , References: <20240113094436.2506396-1-sunnanyong@huawei.com> From: Nanyong Sun Message-ID: Date: Sat, 27 Jan 2024 13:04:15 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.79] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemm600003.china.huawei.com (7.193.23.202) X-Rspam-User: X-Stat-Signature: b66j19tzqgn999q9jroky7otj5uqs7ij X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 342EE140023 X-HE-Tag: 1706331860-429807 X-HE-Meta: U2FsdGVkX1/9l90GzZXWPj+QyKHWgqoXjMfp8d4SbXHWVJIAk5clZ2qcDcNlogLxg0K9NUiSkVKT/iZqCICyGvSpCikTOsz+HmWYzuKQZRtOWM0okbIoCFCvcwe0JHuzsPEJ3sseQ3hYNSlxQQ7YzjSRYDSwspad3HdcmeoTYjCZyQ9ktTVSTN9/rd2Jvh2RRA+kjxJfXf3FS+dLz5SywvzhXyXYtKWf8SjNmHbnmj3pcCwBzikiXvhDo1zJejAFDT0I1jK8Rbr+LL0xdpRrh+Up1pqgZp6GTaefNURpuzc7I1Au4M2Q0fbM7Ovwoyu8iVv+26WCzaQ5V3yBmG+ILej4VRfaT+9UScQpquCfXvzK92muZUsDvRSFao0z+km1D8FEe3UPpnQw7d0ayFbEaqd6337JELCeSvCWz30dxq4e8lGlt6KSygQPh94JzRXTev/ZQ53zAuCj/zN58gIuQetwkmDakaM8nwGU0GOiS0qASw0mrXq4UcPTa5JHqoDFGF6/G9hEGMvKMldlQuZJf931Jzt9odo/rDqt7JR6YkZJXJ1ektPdzy7+C1B1kfdMa3dgp/cv7FXdgu33CQkOZk4r2yoy1euihjSX5MUdiACKiza5YbuuPkHv8fT/rFHj/4XmrZmYzvkvnkz7SRPF5BYMVwq144MnkRJWd6NMKk8qzIoFhio2nlQpxg3Qm09iLmCXbioqeeUg/HOzBPj/GAUcbcg1GXQ2La3ZMQ2T3jf4ZuGfuY9e4pjc5mIhTJiGh6rdQSQi2Nni3FF1O3t2u4zW/2G4B+70EM1scb3dkndsak6Wb2rtVLITojOPOgCuHr+wkuzKBfsO9UsfMr6G5qCQiNPL0Q9boAxzmXlvh96gqfmICzjOfdn9WiuzEhF5jK4GTb24EqUE1EuLe5M5IHtFAUMETw7mmdgHV6+v/xGqMHHFaxKIq+3nGXCo5SevNf4yDa/Axa8h1PBIB7J xMo7Hx1Y zRdzaiwutXu+ft8h1YfqPsnBeQvPwe5IulWT1IbsFaVHE73lmDfDgtVLJGFFFrGh8jz2OeCnHpy32igyI/D+ENT64k8L9v5cmqk41Kgk16smpnoqxa6gTidXiRu/UCClUQ46kY6ZW1ZURK9kJvbeRkZPjazq1PGputKFsCQZxCr34hSLTOxmxvU02y0Kpz+GAmsbfcDjRr+jxHGL3s5X/8JmBIw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/1/26 2:06, Catalin Marinas wrote: > On Sat, Jan 13, 2024 at 05:44:33PM +0800, Nanyong Sun wrote: >> HVO was previously disabled on arm64 [1] due to the lack of necessary >> BBM(break-before-make) logic when changing page tables. >> This set of patches fix this by adding necessary BBM sequence when >> changing page table, and supporting vmemmap page fault handling to >> fixup kernel address translation fault if vmemmap is concurrently accessed. > I'm not keen on this approach. I'm not even sure it's safe. In the > second patch, you take the init_mm.page_table_lock on the fault path but > are we sure this is unlocked when the fault was taken? I think this situation is impossible. In the implementation of the second patch, when the page table is being corrupted (the time window when a page fault may occur), vmemmap_update_pte() already holds the init_mm.page_table_lock, and unlock it until page table update is done.Another thread could not hold the init_mm.page_table_lock and also trigger a page fault at the same time. If I have missed any points in my thinking, please correct me. Thank you. > Basically you can > get a fault anywhere something accesses a struct page. > > How often is this code path called? I wonder whether a stop_machine() > approach would be simpler. As long as allocating or releasing hugetlb is called.  We cannot limit users to only allocate or release hugetlb when booting or not running any workload on all other cpus, so if use stop_machine(), it will be triggered 8 times every 2M and 4096 times every 1G, which is probably too expensive. I saw that on the X86, in order to improve performance, optimizations such as batch tlb flushing have been done, means that some users are concerned about the performance of hugetlb allocation: https://lwn.net/ml/linux-kernel/20230905214412.89152-1-mike.kravetz@oracle.com/ > Andrew, I'd suggest we drop these patches from the mm tree for the time > being. They haven't received much review from the arm64 folk. Thanks. >