From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93963D59D7F for ; Mon, 25 Nov 2024 22:23:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0657A6B0082; Mon, 25 Nov 2024 17:23:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0147D6B0083; Mon, 25 Nov 2024 17:23:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E1E256B0088; Mon, 25 Nov 2024 17:23:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C32F86B0082 for ; Mon, 25 Nov 2024 17:23:27 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 746FD1A097B for ; Mon, 25 Nov 2024 22:23:27 +0000 (UTC) X-FDA: 82826044812.28.765A68C Received: from mail-ua1-f47.google.com (mail-ua1-f47.google.com [209.85.222.47]) by imf16.hostedemail.com (Postfix) with ESMTP id EC48218000E for ; Mon, 25 Nov 2024 22:23:21 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=f16HAI9u; spf=pass (imf16.hostedemail.com: domain of yuzhao@google.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732573402; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fPfWzthEWoH2qXCJl59i7kJGHDW2eEE8lyzCKgg7LSc=; b=KBQ9pWXBXMnnMa5S5+sfEVw5ybEQk0y4cI0x0L0dylm0xMN4g6FBm+VblrWBI1s1tk+8d+ Wos9E1SDQrN+xwadnct/0yYGYth3I2m4GkL3fT8PlQGlZAA/otuBlyC/iPjPF4uPAaW/k+ /+dWWAyqLEyMFiRFsrmQ6L3ZXamZFTI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732573402; a=rsa-sha256; cv=none; b=6E/4/LQM8LdyDZdO8p5SIAB9TQ/GLWg5CdKHJhgHmlpRRd84ntLDEJQ3co4uDLmqy95eFu Hd3OVmlHGSxwUG6JYsSjdmr+kwFKgOATd75n1qX1FxGTVk3hXRtkFa1EgAQGJQijPGw5dj xbUVaFgnESl67RU0fXIbbTHFnbusMMg= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=f16HAI9u; spf=pass (imf16.hostedemail.com: domain of yuzhao@google.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ua1-f47.google.com with SMTP id a1e0cc1a2514c-856ed9e5413so1089756241.3 for ; Mon, 25 Nov 2024 14:23:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1732573405; x=1733178205; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=fPfWzthEWoH2qXCJl59i7kJGHDW2eEE8lyzCKgg7LSc=; b=f16HAI9u0U+qMt4qFtudbAfjW4PXL5E3yX473Fvg3WyO4O7d5VntdckETfz5HOpqOU 14GZb5XCXYdyMJWO5aMHfZUpbqVtdNIkoE0AZPJ54ewIn5iAnUbHULvLOFqCgrb+Jo6m 0wYD0+AElYBm0d81Ik5FcB4/MLm4GHjm9BcIDiYILpRm10BN5vL+tORQgDLlTuWUvouN tXi8yK3GTOeB1LT0NUDid9nVEAulYUqmOjwn5RQ+SR0KSMab9zRH+qRpFhXY+xxW0ykB z4aiqsX9V/Gy7ss3DS1jy2ZyoI3nqDMRjX04nFxDtPqvelRnCTJt6QnqF7kk2k+m7Emx /8Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732573405; x=1733178205; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fPfWzthEWoH2qXCJl59i7kJGHDW2eEE8lyzCKgg7LSc=; b=KhxjrKrPV1mmRE1fgolK6ohXvM7DslvMiSULdNITbEqKKTxj58pxfXE7WUAge6BGAU RHJ01teg1jyXwK19/pR5mv7GWM7Ef49gLs27iF7roF0En18JoFbyjtg+jGZzo1OFJ91j NrmG4snspDC5BI4Q/y5GMfoW90Pi85u/y5jyTHdK3Sr1c2gerWIBq7VkiuZGeRbXTP/B UVnZ+s7DdQdz2BLzxfWRAgb3WTJuQsHstyMIdeBGsbT8BBnHWRJpWhwRuasZPQvWvCas zkXLIckQL712tuQo09O0CbfjkEHJe6f2OsMbr5abCE121vmjDzyQ2wG/Gc2xpA4YjKiz FlZA== X-Forwarded-Encrypted: i=1; AJvYcCU3pDiGDpkDZ/k/UEtCSvb1iWJiQYobAYXBGAQI58XljJWtOBK0MnN/wUKSTIqSD0L8xWBZnhfIdA==@kvack.org X-Gm-Message-State: AOJu0YxVRQju/AL7Pa2vb6pftbsYI1NyFibXib/p4vxmF2f0hT5V7agD +DjZXqGA2eCvyaOB6uQ5Ll7EPSpiXBmPCHpAI7ZYucK4D4mB1k9/93c/WkXORCbsiK/C6htPn3r s4GM3ZZsGYrSSO6Umh4C4ZWcvsJ5fcQ7bmj8b X-Gm-Gg: ASbGncsLjiG061BxxEf2kHAacyJtkbkVfJI3irgYO8NJG2dW+Tm4P2cBG9kKdcoDJAA dWeDkKd9MCdhGPRONk8N6Dc1Ob7TedP+r6HaGI/OxHWJHUnY/jeehT7GIlB/RubtF X-Google-Smtp-Source: AGHT+IGqMzc1OyPSWSlieeYBzujX5tl4Y0054LB+eF4R09ye8ciu10JBJUzb6hqH9f2pJ7Sfaoe8ukBDGJNLm5dZMcY= X-Received: by 2002:a05:6102:161f:b0:4af:19c3:61c1 with SMTP id ada2fe7eead31-4af19c36469mr6481868137.27.1732573404437; Mon, 25 Nov 2024 14:23:24 -0800 (PST) MIME-Version: 1.0 References: <20241107202033.2721681-1-yuzhao@google.com> <20241125152203.GA954@willie-the-truck> In-Reply-To: <20241125152203.GA954@willie-the-truck> From: Yu Zhao Date: Mon, 25 Nov 2024 15:22:47 -0700 Message-ID: Subject: Re: [PATCH v2 0/6] mm/arm64: re-enable HVO To: Will Deacon Cc: Andrew Morton , Catalin Marinas , Marc Zyngier , Muchun Song , Thomas Gleixner , Douglas Anderson , Mark Rutland , Nanyong Sun , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: EC48218000E X-Stat-Signature: t76kihdbx6paybk6yi16rwesnr8fytrq X-HE-Tag: 1732573401-454750 X-HE-Meta: U2FsdGVkX1/EHbxBFHWg4bPAazvrFSgqr3ZWCpT+3BjUfR1i0j7Q6oGnGmvnrUD5AShoagB+H0actkWXSV+8kP2jVZSTMdWObeCPH4WHxAs2v36sBXWJ91irRWOKocxtxkcxWSoFZMjS9DpPaoRL8GNLrFfmlXFP4EhSwiFyHSdUxk/+SLly1OmI0Q6iLn40KIW11a9Sr0cBfcP0MreNrq7PeCPNGYQQGicoZwJ5eMFw9YCH5P2XxTBEk10CrfwRJ1GojzMs/zx/y0yqEU13n8KY3UUxHXgfqKTaSmLXWPJefHJG68RQ8ozJB+FujKzrxCxi8NFFD+h63ZJVPRqJzMlwKq46Ew1wD3SZmUe1G+/oU/9FWApiY7TS/hAJ7MlRpMhf8H3vFZd+poQC6yWYy85o4xnI0FSoGjw7in38kB1pPJtMG0TTgrLBvtdMgYbnbskYB35BQIiTKjbdHPE38m+lNKdwe2AqbAdud7Ebuc9IKSqle5t2KjpVv4UxSZ6rKp5UI9kH8Qa3Uocx2jWZawWghjFSPCss9zxelDcB/+Ro2lKTXl6+Uu0omVUfqEKaRcz9mw8OydsGWAbwYulKk1VonS2SXSCTBxlRYzPfeOfJlV9WH+PWpYpx0d+AP1iTHFa9aA4Va1WII0OZacRp21ICi7UQtouEfdFtDZsSgybDgkRzAkipbyNJ/DmmrNppRE5HOB3HNxtV4c6uiJV52yx1C9O5DGMXjlWTguAUyy0DeiR4yz4IxaYhFEN3S+xBXoXLc/rDzCn2B4UlS36PWGatK9lt30qB6yLKnAtLpdRBN/3dPAN5KObmpIKDuTc44ud6QSorSh0Vnwo51gbkFyFlwnkWS2sjmusvCC5lK1RdXDTUps97xA38Zg5FKF2mOEcny+NDt+nJqbTmC5FegTYmZjNBetXQcGBvXwWYhm1j4jud3IZfHsbclKJHZslVSCNmSBkTfq8vI0ZTiO4 qnzOqc/Q MOzvN8IbFngovMB6fmEiJChrw22nn7q9ND87GxlHRRfNUM3jbGVfBqZJdWbICkhRXxw+P6HydSFATBtRf9tvUrDwDT+DDG6JX/GB8I4RhiRM9pE4dbwPcantfGl6Tdmbyiun91HrUW+Xi+B9Y3uZAhnzfOtl3eystKVR1IBOy8GLqSf2ZBHiFkdDsO7KvBd0nK5VQTrm/0zV6BtKT18et+HALX1vLCFs8noAC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Nov 25, 2024 at 8:22=E2=80=AFAM Will Deacon wrote= : > > Hi Yu Zhao, > > On Thu, Nov 07, 2024 at 01:20:27PM -0700, Yu Zhao wrote: > > HVO was disabled by commit 060a2c92d1b6 ("arm64: mm: hugetlb: Disable > > HUGETLB_PAGE_OPTIMIZE_VMEMMAP") due to the following reason: > > > > This is deemed UNPREDICTABLE by the Arm architecture without a > > break-before-make sequence (make the PTE invalid, TLBI, write the > > new valid PTE). However, such sequence is not possible since the > > vmemmap may be concurrently accessed by the kernel. > > > > This series presents one of the previously discussed approaches to > > re-enable HugeTLB Vmemmap Optimization (HVO) on arm64. > > Before jumping into the new mechanisms here, I'd really like to > understand how the current code is intended to work in the relatively > simple case where the vmemmap is page-mapped to start with (i.e. when we > don't need to worry about block-splitting). > > In that case, who are the concurrent users of the vmemmap that we need > to worry about? Any speculative PFN walkers who either only read `struct page[]` or attempt to increment page->_refcount if it's not zero. > Is it solely speculative references via > page_ref_add_unless() or are there others? page_ref_add_unless() needs to be successful before writes can follow; speculative reads are always allowed. > Looking at page_ref_add_unless(), what serialises that against > __hugetlb_vmemmap_restore_folio()? I see there's a synchronize_rcu() > call in the latter, but what prevents an RCU reader coming in > immediately after that? In page_ref_add_unless(), the condtion `!page_is_fake_head(page) && page_ref_count(page)` returns false before a PTE becomes RO. For HVO, i.e., a PTE being switched from RW to RO, page_ref_count() is frozen (remains zero), followed by synchronize_rcu(). After the switch, page_is_fake_head() is true and it appears before page_ref_count() is unfrozen (become non-zero), so the condition remains false. For de-HVO, i.e., a PTE being switched from RO to RW, page_ref_count() again is frozen, followed by synchronize_rcu(). Only this time page_is_fake_head() is false after the switch, and again it appears before page_ref_count() is unfrozen. To answer your question, readers coming in immediately after that won't be able to see non-zero page_ref_count() before it sees page_is_fake_head() being false. IOW, regarding whether it is RW, the condition can be false negative but never false positive. > Even if we resolve the BBM issues, we still need to get the > synchronisation right so that we don't e.g. attempt a cmpxchg() to a > read-only mapping, as the CAS instruction requires write permission on > arm64 even if the comparison ultimately fails. Correct. This applies to x86 as well, i.e., CAS on RO memory crashes the kernel, even if CAS would fail otherwise. > So please help me to understand the basics of HVO before we get bogged > down by the block-splitting on arm64. Gladly. Please let me know if anything from the core MM side is unclear.