From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70F33C5AD49 for ; Mon, 2 Jun 2025 10:31:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CBE656B027E; Mon, 2 Jun 2025 06:31:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C6EB56B027F; Mon, 2 Jun 2025 06:31:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B84D36B0280; Mon, 2 Jun 2025 06:31:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 996706B027E for ; Mon, 2 Jun 2025 06:31:16 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0EA80ED3ED for ; Mon, 2 Jun 2025 10:31:16 +0000 (UTC) X-FDA: 83510093352.06.6878663 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf29.hostedemail.com (Postfix) with ESMTP id 3F5E9120002 for ; Mon, 2 Jun 2025 10:31:14 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf29.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748860274; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=X8nQB15a8WgBRMQQ6eVxgiJ+trkid7b31GrIB+d4tbQ=; b=2nIDNd/HeQ4Z5bziEQfU2uo3Hq3XgHU+BTHtbnX8Y3F2lqkNYzFZbtC9UidSN41nK2nWej 6bgHrzRnWXvSpxXZl09X3W2lTdK238tgwRLVDlsx8MZUkov29pXiXDNY1DaRTdOtdxY7uY z8JbJtutsDz3Hf6QhR/KD6cQKGco4b4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748860274; a=rsa-sha256; cv=none; b=WE0oaFg4lKtLSTxV63Vch2yZ+AIpXpApAuyAPMoWMm1FC8nKggGDGCmS/mBC3Oe9aWBFFw RsNFDEFkPckNkNBKOlg+Ut6QIY5p5vR8OL//I96/0kJ26zhSfljG/0nvTIevLJYsRra8ew kQJamE787tOihO08nL7qBg2h8CEzSy4= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf29.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D6EAD12FC; Mon, 2 Jun 2025 03:30:56 -0700 (PDT) Received: from [10.57.95.206] (unknown [10.57.95.206]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7C8B93F673; Mon, 2 Jun 2025 03:31:07 -0700 (PDT) Message-ID: <23bd2cdf-768f-4053-9839-a0613a25de51@arm.com> Date: Mon, 2 Jun 2025 11:31:05 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 0/6] Lazy mmu mode fixes and improvements Content-Language: en-GB To: Mike Rapoport Cc: Lorenzo Stoakes , Catalin Marinas , Will Deacon , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "David S. Miller" , Andreas Larsson , Juergen Gross , Ajay Kaher , Alexey Makhalov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Boris Ostrovsky , "Aneesh Kumar K.V" , Andrew Morton , Peter Zijlstra , Arnd Bergmann , David Hildenbrand , "Liam R. Howlett" , Vlastimil Babka , Suren Baghdasaryan , Michal Hocko , Alexei Starovoitov , Andrey Ryabinin , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, linux-mm@kvack.org, Jann Horn References: <20250530140446.2387131-1-ryan.roberts@arm.com> <5b5d6352-9018-4658-b8fe-6eadaad46881@lucifer.local> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3F5E9120002 X-Stat-Signature: 56h8dpf8ogp9mbdkjmf1rc6pwqzo5xan X-Rspam-User: X-HE-Tag: 1748860274-200532 X-HE-Meta: U2FsdGVkX19FLyWyPOoZa/OSLjAmLJFC7vRkmgbS4eA69kEnBB1I6aHaCQVZ1lDwXD1Pswa5jscKfqAyQlboCnwul549iz3AHPM03zFdzmu93TJG3LM1amP0zyAF4i6WIj4hxGr4vIFImlTTlxzn1ygHtua2jQq/FePSXsbvoqO3+1Lq5gtNWF5MXSBUNUY1ciIv3bY49uvzMak06izOcTm8/Zs7J/zPtV/b0YcQ1QRpz6r7IMmSdeNeObna8KxCxcfQgjI7BBdxiBetqUQT+tI3RjltRD/7DLGqum9AyliNaLQmK9KF3s6b0OL41c8XG8Lk5Z4hyDJqFGJALiX04XlSiypR3NvdvTprz2fcOf3BZ/eL09IR1Sz9yko6bi4lsi3wkaOeW/E2w+0fVB+C2Yy3hNMJKr7F2mZsMj2IxXylpVrmOHjmPb3UWGeBz/m0a3e4IqoemXzdk+k/VYznpJrcKFZVGgrh4gOHgI2X7NovXbQCVMcrHeS0THs8fJvlDB8NPVrOTnYmeAuIweYEgCYqD8hHW601jyR0qGCFvd7f/miXtb05yZC/+9h4f/Aj6hefuZZ7PKw28tVdovLFG6nPNyB9FEGA15MwJLzxLsZkpp0AWu6NGzI9u/No2kMXVN1zX+HmxZ+yaOml2CuY4y4AXVLvf2cXgui9DCTOwgFvfnbOAKX2RAq+MoisHGq30xHrFONXm5lBZrrAbFGuEtNfWXvv3VRenHMIVfzJAlDFLpZGQ2/BSLmQgxdX2f0VT+2SYOc9d0I/kSA4ISYEnl4yBtjkkhvk9qKlLi9lS13YaMnlqJhe816FGedFxk2gCe0VkrgG1tLuUm2jqCGv9p3lyGwjr0NeeLv9mSuLO9pRCRTLJuk+4EHjK4WNzZi/ueGFvansOlISR20jzGM0FqQxYg6uPE+e X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 31/05/2025 08:46, Mike Rapoport wrote: > Hi Ryan, > > On Fri, May 30, 2025 at 04:55:36PM +0100, Ryan Roberts wrote: >> On 30/05/2025 15:47, Lorenzo Stoakes wrote: >>> +cc Jann who is a specialist in all things page table-y and especially scary >>> edge cases :) >>> >>> On Fri, May 30, 2025 at 03:04:38PM +0100, Ryan Roberts wrote: >>>> Hi All, >>>> >>>> I recently added support for lazy mmu mode on arm64. The series is now in >>>> Linus's tree so should be in v6.16-rc1. But during testing in linux-next we >>>> found some ugly corners (unexpected nesting). I was able to fix those issues by >>>> making the arm64 implementation more permissive (like the other arches). But >>>> this is quite fragile IMHO. So I'd rather fix the root cause and ensure that >>>> lazy mmu mode never nests, and more importantly, that code never makes pgtable >>>> modifications expecting them to be immediate, not knowing that it's actually in >>>> lazy mmu mode so the changes get deferred. >>> >>> When you say fragile, are you confident it _works_ but perhaps not quite as well >>> as you want? Or are you concerned this might be broken upstream in any way? >> >> I'm confident that it _works_ for arm64 as it is, upstream. But if Dev's series >> were to go in _without_ the lazy_mmu bracketting in some manner, then it would >> be broken if the config includes CONFIG_DEBUG_PAGEALLOC. >> >> There's a lot more explanation in the later patches as to how it can be broken, >> but for arm64, the situation is currently like this, because our implementation >> of __change_memory_common() uses apply_to_page_range() which implicitly starts >> an inner lazy_mmu_mode. We enter multiple times, but we exit one the first call >> to exit. Everything works correctly but it's not optimal because C is no longer >> deferred: >> >> arch_enter_lazy_mmu_mode() << outer lazy mmu region >> >> alloc_pages() >> debug_pagealloc_map_pages() >> __kernel_map_pages() >> __change_memory_common() >> arch_enter_lazy_mmu_mode() << inner lazy mmu region >> >> arch_leave_lazy_mmu_mode() << exit; complete A + B >> clear_page() >> << no longer in lazy mode >> arch_leave_lazy_mmu_mode() << nop >> >> An alternative implementation would not add the nested lazy mmu mode, so we end >> up with this: >> >> arch_enter_lazy_mmu_mode() << outer lazy mmu region >> >> alloc_pages() >> debug_pagealloc_map_pages() >> __kernel_map_pages() >> __change_memory_common() >> << deferred due to lazy mmu >> clear_page() << BANG! B has not be actioned >> >> arch_leave_lazy_mmu_mode() >> >> This is clearly a much worse outcome. It's not happening today but it could in >> future. That's why I'm claiming it's fragile. It's much better (IMHO) to >> disallow calling the page allocator when in lazy mmu mode. > > First, I think it should be handled completely inside arch/arm64. Page > allocation worked on lazy mmu mode on other architectures, no reason it > should be changed because of the way arm64 implements lazy mmu. > > Second, DEBUG_PAGEALLOC already implies that performance is bad, for it to > be useful the kernel should be mapped with base pages and there's map/unmap > for every page allocation so optimizing a few pte changes (C in your > example) won't matter much. > > If there's a potential correctness issue with Dev's patches, it should be > dealt with as a part of those patches with the necessary updates of how > lazy mmu is implemented on arm64 and used in pageattr.c. > > And it seems to me that adding something along the lines below to > __kernel_map_pages() would solve DEBUG_PAGEALLOC issue: > > void __kernel_map_pages(struct page *page, int numpages, int enable) > { > unsigned long flags; > bool lazy_mmu = false; > > if (!can_set_direct_map()) > return; > > flags = read_thread_flags(); > if (flags & BIT(TIF_LAZY_MMU)) > lazy_mmu = true; > > set_memory_valid((unsigned long)page_address(page), numpages, enable); > > if (lazy_mmu) > set_thread_flag(TIF_LAZY_MMU); > } Hi Mike, I've thought about this for a bit, and concluded that you are totally right. This is a much smaller, arm64-contained patch. Sorry for the noise here, and thanks for the suggestion. Thanks, Ryan > >> Thanks, >> Ryan >