From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23477C48260 for ; Tue, 13 Feb 2024 21:19:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A853E8D000F; Tue, 13 Feb 2024 16:19:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A36308D000E; Tue, 13 Feb 2024 16:19:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D59A8D000F; Tue, 13 Feb 2024 16:19:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7EB468D000E for ; Tue, 13 Feb 2024 16:19:14 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 374C2120CF5 for ; Tue, 13 Feb 2024 21:19:14 +0000 (UTC) X-FDA: 81788046228.26.5233618 Received: from mx0b-00823401.pphosted.com (mx0b-00823401.pphosted.com [148.163.152.46]) by imf26.hostedemail.com (Postfix) with ESMTP id 6130E14000D for ; Tue, 13 Feb 2024 21:19:12 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=lenovo.com (policy=none); spf=pass (imf26.hostedemail.com: domain of mbland@motorola.com designates 148.163.152.46 as permitted sender) smtp.mailfrom=mbland@motorola.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707859152; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:content-type: content-transfer-encoding:in-reply-to:in-reply-to:references; bh=i1hDySFqG63Qp6QV+ORNcOZ9GkGAHh9I5vtcp4McvOM=; b=suBqT8ywOh21jtVWv7XC3WMRh7iWJu9LHwsUzpL4XAnrWrfN6PUa9kCkPHzxZyU52aHCEO qEJ+SyQmPVV4r0heA6mH15wv4FRDaj3H3FyNZarXS3+Q8128sNMWQHpC7NQHl15HsIbveI sbuBuA+7jCcrYciDnK3MU5rEkiz1564= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=lenovo.com (policy=none); spf=pass (imf26.hostedemail.com: domain of mbland@motorola.com designates 148.163.152.46 as permitted sender) smtp.mailfrom=mbland@motorola.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707859152; a=rsa-sha256; cv=none; b=kkCftS4ejIQ8N8rhfFC2SWMnNSMtSFosddY9JBz0NRdMQs2z1HQf2aUkyZ7p79HGAl3tpy Dd2JXZvJhBiUNXJr9D4cHbF2SofMAN3A9XL6nwYgWXlF5m5GvCtq56+cG5+VC6syRH4MML 6c30Vg6lwN6mT6utYVr+NwMZF8BMcXI= Received: from pps.filterd (m0355090.ppops.net [127.0.0.1]) by m0355090.ppops.net (8.17.1.24/8.17.1.24) with ESMTP id 41DIquMJ018802; Tue, 13 Feb 2024 21:18:27 GMT Received: from va32lpfpp03.lenovo.com ([104.232.228.23]) by m0355090.ppops.net (PPS) with ESMTPS id 3w88298ydq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Feb 2024 21:18:27 +0000 (GMT) Received: from ilclmmrp02.lenovo.com (ilclmmrp02.mot.com [100.65.83.26]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by va32lpfpp03.lenovo.com (Postfix) with ESMTPS id 4TZDhL6Pr5z51Q8g; Tue, 13 Feb 2024 21:18:26 +0000 (UTC) Received: from ilclbld243.mot.com (ilclbld243.mot.com [100.64.22.29]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: mbland) by ilclmmrp02.lenovo.com (Postfix) with ESMTPSA id 4TZDhL5C5Hz3p8NW; Tue, 13 Feb 2024 21:18:26 +0000 (UTC) X-Mailbox-Line: From 4f16ba94cfe7625362d1df20642fee9b9ecec226 Mon Sep 17 00:00:00 2001 From: Maxwell Bland Date: Tue, 13 Feb 2024 14:18:15 -0600 Subject: Re: [PATCH] arm64: allow post-init vmalloc PXNTable To: Greg KH Cc: "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "catalin.marinas@arm.com" , "will@kernel.org" , "dennis@kernel.org" , "tj@kernel.org" , "cl@linux.com" , "akpm@linux-foundation.org" , "shikemeng@huaweicloud.com" , "david@redhat.com" , "rppt@kernel.org" , "anshuman.khandual@arm.com" , "willy@infradead.org" , "ryan.roberts@arm.com" , "rick.p.edgecombe@intel.com" , "pcc@google.com" , "mark.rutland@arm.com" , "rmk+kernel@armlinux.org.uk" , "tglx@linutronix.de" , "gshan@redhat.com" , "Jonathan.Cameron@huawei.com" , "james.morse@arm.com" , Andrew Wheeler In-Reply-To: <2024021321-scooter-citrus-233c@gregkh> Message-Id: <4TZDhL5C5Hz3p8NW@ilclmmrp02.lenovo.com> X-Proofpoint-ORIG-GUID: fM-dRVtHmGGBuWF6Pm8rfHSnF0drw95I X-Proofpoint-GUID: fM-dRVtHmGGBuWF6Pm8rfHSnF0drw95I X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-13_13,2024-02-12_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 adultscore=0 mlxlogscore=999 malwarescore=0 lowpriorityscore=0 suspectscore=0 spamscore=0 clxscore=1034 priorityscore=1501 impostorscore=0 mlxscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2401310000 definitions=main-2402130167 X-Rspamd-Queue-Id: 6130E14000D X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: re654hth5gr54n4t7gz3tc4eumg3r5if X-HE-Tag: 1707859152-917555 X-HE-Meta: U2FsdGVkX1+C7ACrGONK3HJQauGKIcBQbwprnWk1JEOM8COihOPa/l/VKqvIQOg96LiylbZwH4n/U2pIAy/vXoGEkrlL7mnFeplNH2sPh3XMzyr2w5RUDZCgotocdJuLKGCBOY8JrGbNVEN52dsPEQpcNZZRNC4CzdrKP5+qCAhm1pTH++zIbnzsGrFEs0Q+ost730Sn8eGYRSi/Xlmo89pqs44pXXx2v9jZPjkRA33uDPBjpWIjKo/6A5wFYoSKDhv7o1afEk4rPIenPRQfe2fUUPsp2Syy2Jfm77prkBezmoMju78BX+W5VDR+gWcq/EZR3WvbE4tzpZDHmHFr7YsoaP8faVS0xa2DqCD47+eURUdaKp6Vs+6aXVbBpsWL0qtfsFVDPitpLJDw8gZSAIyVLJ1L4YbjQG29qsIJeeFnrTbRix0vZkkiC3xCTytt8/lHYiQtbDL00IDf/nv0Bc6c5NGo6c/SjibbQ0JQ0uggc0ET6wHwJCt0YBCFx4TuwBJEvuYSBHMplRWF7njzZg6nSz22ZCXe06OtRI10r+V7MCcD08gjaT6p+sMXxtoW+SNQBXw9TVSV8snv8OduP7ID1ezxhpql+dOIbCOmcEdg/fM0ZqfJwMFNv/yMs2eGoHX1CrkXvVeygY9xCcmh2xUAOjHa/gmoGWNCol+AnG48WZaVzVV20A67c2orh6x4iCsiOKiPjcbZtmq7Pkm6rg1KZcsSCBY2l9lBBoCVW53BA7zrszIVdV0GP2FzLNK4jaf2xiSeVOW9dalllXzmXPG0W/Xb+UaCRot/DnPnHfUxyJDqRm2yK1V9kjYPG2fuRtPOBpIZsV3tc4M09s0Z2h32w/p6n0o78taAu5T3j9iqPVvLckvunn7P6a92Wnk5BhkEAPa4uF+uCMG5nqvJ2wsmowUnSajg2N+PAQBoaKsbyJ6LN5vWnOwj3uoehoBsZUEg52WsfEFl1rJE2C/ crsrSVa9 uuu6l0QYLL7rK6ZQoXtyMGe8ges/8BZ8Wi/gCZuEUxHy3K3uzGNUADFvKvxhKbRBFzjwIqRAbqOYbgFHzl+f9Snky59fQC8ZzuS5X8CewffTslv4tfEw4rLfWu3XVimwp4Qs+WguqAI4E6kEsOXKx6kLQ0QIiulAuMxRhLeO3RkJ2Sizq/Jh9/Io+TaDpY9EHhaP+2/6aVC53c9OWtjhcMXlRTdTKuBR59JhiZga8WhHieQ/jR8cd4uaVREzRz9euFGT+2dvEUhAeeP2MF4mBAOkQ1Eezx0uE6FMLA0TEBi6+2Wi2ZBNpQvQS3ra5ikTKAeUPHvcChINioWkZQTTULsDGm2hIkaX8GfZAWWntoBOs3ag+m6C4jXEFtnYe72am2rdjqKTYKK7OHxzms3zmNjltJDYqNOegs7fJKo0XPNPeCUfcDIskJje+JHAZJLp+dB6TOWckmPEAsmz38qK8VQGm9iCwJn/dTKH+43qhUMrCR+WL2oUoAJbqKWv0YiELCSzAOcnCob8p0I8VVJQjZkU1DrF6bx15MOmg5drIWRDW3Trm0sAGl6DewClxU0vuOdKB X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Ensures that PXNTable can be set on all table descriptors allocated through vmalloc. Normally, PXNTable is set only during initial memory mapping and does not apply thereafter, making it possible for attackers to target post-init allocated writable PTEs as a staging region for injection of their code into the kernel. Presently it is not possible to efficiently prevent these attacks as VMALLOC_END overlaps with _text, e.g.: VMALLOC_START ffff800080000000 VMALLOC_END fffffbfff0000000 _text ffffb6c0c1400000 _end ffffb6c0c3e40000 Setting VMALLOC_END to _text in init would resolve this issue with the caveat of a sizeable reduction in the size of available vmalloc memory (~70,000 GB) due to requirements on aslr randomness. However, we need to support the enforcement of PXNTable dynamically for our static assignment of this flag during mm initialization to be effective. Because the address of _text is aslr-sensitive and this patch associates this value with VMALLOC_END, we remove the use of VMALLOC_END in a print statement in mm/percpu.c. However, only the format string is updated in crash_core.c, since we are dead at that point regardless. VMALLOC_END is updated in kernel/setup.c to associate the feature closely with aslr and region allocation code. bpf_jit_comp.c must also be remediated to ensure that the module_alloc rather than vmalloc interface is used, so that regions used for BPF allocations are appropriately located into the _text region. Signed-off-by: Maxwell Bland --- This is an attempt to get Moto's SMTP server to send the patch without ruining the formatting. Based on Mark R.'s comments, though, it sounds like: 1) I need to figure out a way to reduce the reduction in virtual memory. 2) I need to actually enforce PXNTable dynamically, to make it clear this is a real upstream issue. 3) I need some testing and quantification to make sure this does not ruin BPF and module allocations. https://lore.kernel.org/all/ZcurbvkUR-BoGTxu@FVFF77S0Q05N.cambridge.arm.com/ Regardless, here's the original patch on the current Github linux main. arch/arm64/Kconfig | 13 +++++++++++++ arch/arm64/include/asm/pgtable.h | 6 ++++++ arch/arm64/include/asm/vmalloc-pxn.h | 9 +++++++++ arch/arm64/kernel/crash_core.c | 2 +- arch/arm64/kernel/setup.c | 9 +++++++++ arch/arm64/net/bpf_jit_comp.c | 5 +++-- mm/percpu.c | 4 ++-- 7 files changed, 43 insertions(+), 5 deletions(-) create mode 100644 arch/arm64/include/asm/vmalloc-pxn.h diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index aa7c1d435139..5f1e75d70e14 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2165,6 +2165,19 @@ config ARM64_DEBUG_PRIORITY_MASKING If unsure, say N endif # ARM64_PSEUDO_NMI +config ARM64_VMALLOC_PXN + bool "Ensures table descriptors pointing to kernel data are PXNTable" + help + Reduces the range of the kernel data vmalloc region to remove any + overlap with kernel code, making it possible to enable the PXNTable + bit on table descriptors allocated after the kernel's initial memory + mapping. + + This increases the performance of security monitors which protect + against malicious updates to page table entries. + + If unsure, say N. + config RELOCATABLE bool "Build a relocatable kernel image" if EXPERT select ARCH_HAS_RELR diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 79ce70fbb751..49f64ea77c81 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -22,7 +22,9 @@ * and fixed mappings */ #define VMALLOC_START (MODULES_END) +#ifndef CONFIG_ARM64_VMALLOC_PXN #define VMALLOC_END (VMEMMAP_START - SZ_256M) +#endif #define vmemmap ((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT)) @@ -35,6 +37,10 @@ #include #include +#ifdef CONFIG_ARM64_VMALLOC_PXN +#include +#endif + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE diff --git a/arch/arm64/include/asm/vmalloc-pxn.h b/arch/arm64/include/asm/vmalloc-pxn.h new file mode 100644 index 000000000000..d054427e2804 --- /dev/null +++ b/arch/arm64/include/asm/vmalloc-pxn.h @@ -0,0 +1,9 @@ +#ifndef _ASM_ARM64_VMALLOC_PXN_H +#define _ASM_ARM64_VMALLOC_PXN_H + +#ifdef CONFIG_ARM64_VMALLOC_PXN +extern u64 __vmalloc_end __ro_after_init; +#define VMALLOC_END (__vmalloc_end) +#endif /* CONFIG_ARM64_VMALLOC_PXN */ + +#endif /* _ASM_ARM64_VMALLOC_PXN_H */ diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c index 66cde752cd74..39dccae11a40 100644 --- a/arch/arm64/kernel/crash_core.c +++ b/arch/arm64/kernel/crash_core.c @@ -24,7 +24,7 @@ void arch_crash_save_vmcoreinfo(void) vmcoreinfo_append_str("NUMBER(MODULES_VADDR)=0x%lx\n", MODULES_VADDR); vmcoreinfo_append_str("NUMBER(MODULES_END)=0x%lx\n", MODULES_END); vmcoreinfo_append_str("NUMBER(VMALLOC_START)=0x%lx\n", VMALLOC_START); - vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%lx\n", VMALLOC_END); + vmcoreinfo_append_str("NUMBER(VMALLOC_END)=0x%llx\n", VMALLOC_END); vmcoreinfo_append_str("NUMBER(VMEMMAP_START)=0x%lx\n", VMEMMAP_START); vmcoreinfo_append_str("NUMBER(VMEMMAP_END)=0x%lx\n", VMEMMAP_END); vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 42c690bb2d60..b7ccee672743 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -54,6 +54,11 @@ #include #include +#ifdef CONFIG_ARM64_VMALLOC_PXN +u64 __vmalloc_end __ro_after_init = VMEMMAP_START - SZ_256M; +EXPORT_SYMBOL(__vmalloc_end); +#endif /* CONFIG_ARM64_VMALLOC_PXN */ + static int num_standard_resources; static struct resource *standard_resources; @@ -298,6 +303,10 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p) kaslr_init(); +#ifdef CONFIG_ARM64_VMALLOC_PXN + __vmalloc_end = ALIGN_DOWN((u64) _text, PMD_SIZE); +#endif + /* * If know now we are going to need KPTI then use non-global * mappings from the start, avoiding the cost of rewriting diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 8955da5c47cf..1fe0d637792c 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include @@ -1690,12 +1691,12 @@ u64 bpf_jit_alloc_exec_limit(void) void *bpf_jit_alloc_exec(unsigned long size) { /* Memory is intended to be executable, reset the pointer tag. */ - return kasan_reset_tag(vmalloc(size)); + return kasan_reset_tag(module_alloc(size)); } void bpf_jit_free_exec(void *addr) { - return vfree(addr); + return module_memfree(addr); } /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */ diff --git a/mm/percpu.c b/mm/percpu.c index 4e11fc1e6def..a902500ebfa0 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -3128,8 +3128,8 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, /* warn if maximum distance is further than 75% of vmalloc space */ if (max_distance > VMALLOC_TOTAL * 3 / 4) { - pr_warn("max_distance=0x%lx too large for vmalloc space 0x%lx\n", - max_distance, VMALLOC_TOTAL); + pr_warn("max_distance=0x%lx too large for vmalloc space\n", + max_distance); #ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK /* and fail if we have fallback */ rc = -EINVAL; -- 2.39.2