From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFD98C3ABC3 for ; Mon, 12 May 2025 14:14:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DFB6B6B0147; Mon, 12 May 2025 10:14:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D833C6B0148; Mon, 12 May 2025 10:14:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C26486B0149; Mon, 12 May 2025 10:14:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 9D38A6B0147 for ; Mon, 12 May 2025 10:14:28 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8D470E2FDF for ; Mon, 12 May 2025 14:14:29 +0000 (UTC) X-FDA: 83434451058.14.88D22F5 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf18.hostedemail.com (Postfix) with ESMTP id 3D4161C0003 for ; Mon, 12 May 2025 14:14:25 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747059267; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dj/nEIzK+oAHaOwA4i1++4RCYLNztbF9KBcZKY70iWg=; b=nUXvbmrGthup86DQ/yAShoz/PrcyvARYyFOoss3eyR9GqTgBC1KLwC1KN1MPRPY69K4+6c cs7/S2TP6d0XCFL7Gm6coC2IOINCW3UVHKd/5ZC0ELig9rdteRdg/0dJXIAbZJSjIkMGG/ FWek5Z1fapvDNakNGngsKXSJ37vhHwM= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; spf=pass (imf18.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747059267; a=rsa-sha256; cv=none; b=tre8s8Sdh+ARQ+zRmyzf77+k+toJj6afzWwPXxqf7LHsVuT9oK1pdWbHb/B/wJk2YEYHHz mDGkf3KxlpFM0k7tH9KuL7IICnCN6I+MMtSn4eW65jkFcrXyqAhrePRP/UpN3MC6mX7Sux 8ttxAXotJmla7+vuKUAruNqALcKQU4k= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8F7D3150C; Mon, 12 May 2025 07:14:13 -0700 (PDT) Received: from [10.57.90.222] (unknown [10.57.90.222]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1F1693F673; Mon, 12 May 2025 07:14:21 -0700 (PDT) Message-ID: <08f2e23f-f67e-4d41-ae0b-143df6731977@arm.com> Date: Mon, 12 May 2025 15:14:20 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts Content-Language: en-GB From: Ryan Roberts To: Catalin Marinas Cc: Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, syzbot+5c0d9392e042f41d45c5@syzkaller.appspotmail.com References: <20250512102242.4156463-1-ryan.roberts@arm.com> <7852c047-e8da-44d4-8220-68c2ebca5206@arm.com> In-Reply-To: <7852c047-e8da-44d4-8220-68c2ebca5206@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam10 X-Stat-Signature: gd8iae65dezwjr9jnn67eyxn493ujm3q X-Rspamd-Queue-Id: 3D4161C0003 X-Rspam-User: X-HE-Tag: 1747059265-887162 X-HE-Meta: U2FsdGVkX1//eLjMUuEwat/5O3grGwq/yQR48YVCj97lFLwrt1s5F7gpur8aILLvDtlrABhF1DKIhuLa9R89Tzc139XmnHWabJ78A0AZfdUR8yx2FO/h3Q3bO75arBHFyMVmx62AbtnUyZxvopU3Jz3v1pEm3Fv0xeXLoeZTqA8MzyPQwIk46W8GluIT8Tn3zeMW3cKRkLOXZXy7kPMwP+PIfSS3iZze0jSgH+Vnhj2b0ElLvMqFCZEPQaYWJcdk7WM8XwyEAj2JLYSQdh7IX5WRZa6ceTj8FC7OVHC4KvxDYUQVNJ876vZ+90lSHTS1VG0RvfhUj+Kmc8q98ql6S+BwbAOnrZ+TuAZtSFWS5N38ugAEURCUSslqF9UtOijbrMMhllsXELmNc3TBNxSTeyVlI63xykIj6oWfngUaj1Y51cRGEeisHQV3ZbuQNtaoZh0zzLGhXIxHPFJFjQGe7vveyt0tEzL6YBQuSS2d4770tvUUo+QsW7Z2uAR60JxAZ4blZ4MmMGIfqA9rIgOuY5jRa5OaRDE2M46NiQP0f0v/GHBPbOXaeMRjUcYrciE6hxQc0zFhvHAaAihHy1rUV6ddkTuTvTy4h5d1pqh7xSl7M5FcKJN7YM+cxTYax8k4deTERiJDSl91PwMfmFONtLcyAgINT9nhiQXjhBAN3y/Qi3sOfPJN9bdd/y3GvyINIXilLXPiOXbHWVClEdLhfl5p22S/tAVHMmGlao6ExdSm/i90rz/AfmpxuAqGM5UvLjEc+x37fm6mmGzv2KpSCqhNH96uLV7TBiIcYtvSPatBx1YErfOHQjom/QQULffIfI1tEfpqkFGoJcsuQSGXdnAm34zgdPC8yMjBcY5bSKXWz9fTzpHgOS7/UVTuvbTn2nffvaj9T2crjtSLPDKqKaebKEVAk254kjURjP/GbH3xgnsXX70TrRVf2LSQZrLGD1zYYwsTBdJUVgK8LUz Q0xOHyr4 f+O44LsgSfc2ZXWpnsLD87Ml7F55L+XF8ZJw64HALFNCp7hDQ33egXibDkEu7KUSAAd006Z7fY59IK2bYS/XR1BnW6UrxwneAGsB6sU16y+/oJgeEOWMixoZtGLreJrQUYTcbNADidMJMvTNBqCok3VYdQE9rqGV91Or5c2ZBePD/dIIUiloy6PIvP5vgJ2Mi8ZiNqIRLnyhasHGee4vViwOQrtEY7Zkg+dMPxlW3QGVZm2d6WckkFPGGsg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 12/05/2025 14:53, Ryan Roberts wrote: > On 12/05/2025 14:14, Catalin Marinas wrote: >> On Mon, May 12, 2025 at 11:22:40AM +0100, Ryan Roberts wrote: >>> @@ -79,7 +83,9 @@ static inline void queue_pte_barriers(void) >>> #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE >>> static inline void arch_enter_lazy_mmu_mode(void) >>> { >>> - VM_WARN_ON(in_interrupt()); >>> + if (in_interrupt()) >>> + return; >>> + >>> VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU)); >> >> I still get this warning trigger with some debugging enabled (more >> specifically, CONFIG_DEBUG_PAGEALLOC). Patch applied on top of the arm64 >> for-kernelci. > > Thanks for the report... > > I'll admit I didn't explicitly test CONFIG_DEBUG_PAGEALLOC since I thought we > concluded when talking that the failure mode was the same as KFENCE in that it > was due to pte manipulations in the interrupt context. > > But that's not what this trace shows... > > The warning is basically saying we have nested lazy mmu mode regions, both in > task context, which is completely illegal as far as lazy mmu is concerned. > > Looks like the first nest is zap_pte_range(), which is batching with mmu_gather > and that allocates memory in tlb_next_batch(). And when CONFIG_DEBUG_PAGEALLOC > is enabled, it calls into the arch to make the allocated page valid in the > linear map. arm64 does that with apply_to_page_range(), which does a second lazy > mmu nest. > > I need to have a think about what the right fix is. Will get back to you shortly. > > Thanks, > Ryan > >> >> Is it because the unmap code uses arch_enter_lazy_mmu_mode() already and >> __apply_to_page_range() via __kernel_map_pages() is attempting another >> nested call? I think it's still safe, we just drop the optimisation in >> the outer code and issue the barriers immediately. So maybe drop this >> warning as well but add a comment on how nesting works. Sorry Catalin, I completely missed this propsal on first read! Yes that's what is happening, yes it is still safe, and yes we just drop the optimization. But it's an ugly solution IMHO. The real problem is that arm64 has chosen to use apply_to_page_range() as part of it's implementation of __kernel_map_pages(), and apply_to_page_range() expects to be able to use lazy mmu. lazy mmu spec says "Nesting is not permitted and the mode cannot be used in interrupt context." Clearly neither of those things are true :) Looking at the powerpc lazy mmu implementation, I think it will do exactly what you are suggesting we do here, which is just terminate the optimization early. So I guess that's the most pragmatic approach. I'll re-send this patch. Sigh. Thanks, Ryan >> >> ------------[ cut here ]------------ >> WARNING: CPU: 6 PID: 1 at arch/arm64/include/asm/pgtable.h:89 __apply_to_page_range+0x85c/0x9f8 >> Modules linked in: ip_tables x_tables ipv6 >> CPU: 6 UID: 0 PID: 1 Comm: systemd Not tainted 6.15.0-rc5-00075-g676795fe9cf6 #1 PREEMPT >> Hardware name: QEMU KVM Virtual Machine, BIOS 2024.08-4 10/25/2024 >> pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : __apply_to_page_range+0x85c/0x9f8 >> lr : __apply_to_page_range+0x2b4/0x9f8 >> sp : ffff80008009b3c0 >> x29: ffff80008009b460 x28: ffff0000c43a3000 x27: ffff0001ff62b108 >> x26: ffff0000c43a4000 x25: 0000000000000001 x24: 0010000000000001 >> x23: ffffbf24c9c209c0 x22: ffff80008009b4d0 x21: ffffbf24c74a3b20 >> x20: ffff0000c43a3000 x19: ffff0001ff609d18 x18: 0000000000000001 >> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000003 >> x14: 0000000000000028 x13: ffffbf24c97c1000 x12: ffff0000c43a3fff >> x11: ffffbf24cacc9a70 x10: ffff0000c43a3fff x9 : ffff0001fffff018 >> x8 : 0000000000000012 x7 : ffff0000c43a4000 x6 : ffff0000c43a4000 >> x5 : ffffbf24c9c209c0 x4 : ffff0000c43a3fff x3 : ffff0001ff609000 >> x2 : 0000000000000d18 x1 : ffff0000c03e8000 x0 : 0000000080000000 >> Call trace: >> __apply_to_page_range+0x85c/0x9f8 (P) >> apply_to_page_range+0x14/0x20 >> set_memory_valid+0x5c/0xd8 >> __kernel_map_pages+0x84/0xc0 >> get_page_from_freelist+0x1110/0x1340 >> __alloc_frozen_pages_noprof+0x114/0x1178 >> alloc_pages_mpol+0xb8/0x1d0 >> alloc_frozen_pages_noprof+0x48/0xc0 >> alloc_pages_noprof+0x10/0x60 >> get_free_pages_noprof+0x14/0x90 >> __tlb_remove_folio_pages_size.isra.0+0xe4/0x140 >> __tlb_remove_folio_pages+0x10/0x20 >> unmap_page_range+0xa1c/0x14c0 >> unmap_single_vma.isra.0+0x48/0x90 >> unmap_vmas+0xe0/0x200 >> vms_clear_ptes+0xf4/0x140 >> vms_complete_munmap_vmas+0x7c/0x208 >> do_vmi_align_munmap+0x180/0x1a8 >> do_vmi_munmap+0xac/0x188 >> __vm_munmap+0xe0/0x1e0 >> __arm64_sys_munmap+0x20/0x38 >> invoke_syscall+0x48/0x104 >> el0_svc_common.constprop.0+0x40/0xe0 >> do_el0_svc+0x1c/0x28 >> el0_svc+0x4c/0x16c >> el0t_64_sync_handler+0x10c/0x140 >> el0t_64_sync+0x198/0x19c >> irq event stamp: 281312 >> hardirqs last enabled at (281311): [] bad_range+0x164/0x1c0 >> hardirqs last disabled at (281312): [] el1_dbg+0x24/0x98 >> softirqs last enabled at (281054): [] handle_softirqs+0x4cc/0x518 >> softirqs last disabled at (281019): [] __do_softirq+0x14/0x20 >> ---[ end trace 0000000000000000 ]--- >> >