From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9299EC3DA41 for ; Wed, 10 Jul 2024 16:51:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 242876B0098; Wed, 10 Jul 2024 12:51:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1F2AD6B009D; Wed, 10 Jul 2024 12:51:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E1776B009E; Wed, 10 Jul 2024 12:51:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DFD2E6B0098 for ; Wed, 10 Jul 2024 12:51:46 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 855A08017B for ; Wed, 10 Jul 2024 16:51:46 +0000 (UTC) X-FDA: 82324434612.26.9AD4FDA Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf28.hostedemail.com (Postfix) with ESMTP id D0365C000C for ; Wed, 10 Jul 2024 16:51:42 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf28.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720630279; a=rsa-sha256; cv=none; b=4z7zmHHThOlVr8jMGiq9mwpLr4IiQIr7hjunU5TyXk2lEs03Sp/ySzFNQq3HmbQIa9mcch SS+Wp3V2wBukTSwk6uQATNBNhQmM/Xolnb3dLASQVibPqaitiLDj9jRIbkzuHIiO5DSGJg N9buRzi+BlsjTjBWVT6T/OoUVdyJ0qM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf28.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720630279; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y3FzjzyAiVQcshLlpZ8SE+5xbRQfTYs+ESVqP6cdDiY=; b=W+7bVUVkBp//nBcoat1WKQ+necSWYRdDBphb9CSf5cNyXo7US6NsasXGviCo/GTPvbmyIb TKFKIok/dI7rdsslHMym5K2cCEIVBhld71pwdrJb0NuMbaeA1XyJqCA+zvaHsDQKaBfbTP sjVjhjze4pOkHHKHk8lhroajPBARPLQ= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 39485CE1712; Wed, 10 Jul 2024 16:51:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E7DCBC32781; Wed, 10 Jul 2024 16:51:26 +0000 (UTC) Date: Wed, 10 Jul 2024 17:51:24 +0100 From: Catalin Marinas To: Yu Zhao Cc: Nanyong Sun , will@kernel.org, mike.kravetz@oracle.com, muchun.song@linux.dev, akpm@linux-foundation.org, anshuman.khandual@arm.com, willy@infradead.org, wangkefeng.wang@huawei.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3 0/3] A Solution to Re-enable hugetlb vmemmap optimize Message-ID: References: <20240113094436.2506396-1-sunnanyong@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspamd-Queue-Id: D0365C000C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: q9ri85xddj6tz5a3dcm8gkqqg4k3fxx6 X-HE-Tag: 1720630302-600317 X-HE-Meta: U2FsdGVkX1/XYxWgbqccgXfA6p/5NTqzdJ2k3roaOdPtCHm8yNtFYiAqhc3adAwd90bK0guw9IEt+W/oe/Dyp5qEk2/9uB5c0FaUBQ8nw7bJr94LeGY0elwdJxYzv34Wysm0Fso/RGVpK1B0VEtRb9S+KlyTKGYRkG40ytFHh8YI8Kh+BOfIswDlsGLiZtOHQ+eip9uCVd+V/EWyScZr3vdCspG+5VlYZM9IN4Hyya27KoqbS4fnQTLmu/YX6e6cLnz3akLPDspnjBApNqAlUxVzmklYeKcCu7p/JZdqKMPq+EXebPjfi0/AV2GaPchn/1VnHDOXrnPF5sIjSuaVlA1WHUJsfeLKo4oZzqsL5IahJo197cctOcVH82TN1ag2+lnFxpYpknOE1H5dsLaXuhnenf8CCEXwCZRDpmq3qrXrbH5ZYzsUMAOzCmdEz7yWLKpr8yIaE8sMAqAmgaMKV7dBbx4JErHAwQihJ3yNxMgWOBjhhcJlCcDe1Cp/4KUkKV/RNRMD3ZWRrSNMJ35NovB0MbN6NibY5KWmqA7cTU22kCJsSXWZvnjfx+MpLbgUhdWvCzLTWigbshwazgCNrCOWoUeyh4PzBadeRfBd3l3Rz3tngXZosGxQxL4jomcLB4IyWGgwv6W3ByujyGhSoLaaK4E9E1eXlAufwDC4KV2yzskhygHAP8qw4Uv5qIp0/DFUOf3xvdIF4c0YwDGLkvfW5FrAZZjKHJhq1giLbGqakJ8DhecN6sf8R0oyAXqcPFtdzGEm+FXTRvkt7dKVBq/9sqtVJZjJAlhaBssLKyzBnsCiUsxzrhgRul0Qi5W/hPYiaS+P2SXm/Okh/JgFjwp7V7LzOGSEeoEFOjNMWMwEYAyyrDOfYd44zHiV5+PDnG0qbc4QKMdfOEvLW51+pH4BvCYXNZLhBOjg3IR+JZ47dzIzxVeG8d2/Q8ExhJ81iZaaUgBGzM4KS/wrqm6 KhChSW8g JMFHJqUdrz16YpAuRCHOWxRmyQp/qFXX15aUsqA0jT7wHvbn6JoEtrUiSVVB/ksMWXUt8n4TNaXXYa/zDK1jE5L0Y2nRTlbqrGal/cRD4L/9Y8jqYZ0NoRpVrLBdaT37/d+Q6JzmDojKpy/vTXqx4G+l1hk8jY52oILr+0LVUOf34s+57dyOASVCuZJzs/aDKSYGDfLWyAlFfNWNiOoI1rmPPwTXKdP2O1bM/cQTYYrmfMX9BapN7AvHFZAm3Vlfim+f9BhL6dVND8K3FVAnOo0CScF5/s8DkckJkUb7UoVr7usar0Bo5XH2dahH5lSzDlZa/UyskJAMzFFm/1khta33YNVd0XOLUuH+X8CJIR9cf+CkjAClnH1CkkX1Mj1EdL0FsWRaE+TxjrKix1oomyNM0f47eSt2A94FhRV4+Xf9YSeRQBUACl17W4Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jul 05, 2024 at 11:41:34AM -0600, Yu Zhao wrote: > On Fri, Jul 5, 2024 at 9:49 AM Catalin Marinas wrote: > > If I did the maths right, for a 2MB hugetlb page, we have about 8 > > vmemmap pages (32K). Once we split a 2MB vmemap range, > > Correct. > > > whatever else > > needs to be touched in this range won't require a stop_machine(). > > There might be some misunderstandings here. > > To do HVO: > 1. we split a PMD into 512 PTEs; > 2. for every 8 PTEs: > 2a. we allocate an order-0 page for PTE #0; > 2b. we remap PTE #0 *RW* to this page; > 2c. we remap PTEs #1-7 *RO* to this page; > 2d. we free the original order-3 page. Thanks. I now remember why we reverted such support in 060a2c92d1b6 ("arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP"). The main problem is that point 2c also changes the output address of the PTE (and the content of the page slightly). The architecture requires a break-before-make in such scenario, though it would have been nice if it was more specific on what could go wrong. We can do point 1 safely if we have FEAT_BBM level 2. For point 2, I assume these 8 vmemmap pages may be accessed and that's why we can't do a break-before-make safely. I was wondering whether we could make the PTEs RO first and then change the output address but we have another rule that the content of the page should be the same. I don't think entries 1-7 are identical to entry 0 (though we could ask the architects for clarification here). Also, can we guarantee that nothing writes to entry 0 while we would do such remapping? We know entries 1-7 won't be written as we mapped them as RO but entry 0 contains the head page. Maybe it's ok to map it RO temporarily until the newly allocated hugetlb page is returned. If we could get the above work, it would be a lot simpler than thinking of stop_machine() or other locks to wait for such remapping. > To do de-HVO: > 1. for every 8 PTEs: > 1a. we allocate 7 order-0 pages. > 1b. we remap PTEs #1-7 *RW* to those pages, respectively. Similar problem in 1.b, changing the output address. Here we could force the content to be the same and remap PTEs 1-7 RO first to the new page, turn them RW afterwards and it's all compliant with the architecture (even without FEAT_BBM). > > What I meant is that we can leave the vmemmap alias in place and just > > reuse those pages via the linear map etc. The kernel should touch those > > struct pages to corrupt the data. The only problem would be if we > > physically unplug those pages but I don't think that's the case here. > > Set the repercussions of memory corruption aside, we still can't do > this because PTEs #1-7 need to map meaningful data, hence step 2c > above. Yeah, I missed this one. -- Catalin