From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B71B7C4706F for ; Thu, 28 Dec 2023 13:13:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC9426B00CA; Thu, 28 Dec 2023 08:12:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B7A836B00CB; Thu, 28 Dec 2023 08:12:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CCA96B00CC; Thu, 28 Dec 2023 08:12:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 860CD6B00CA for ; Thu, 28 Dec 2023 08:12:59 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 4E130408C9 for ; Thu, 28 Dec 2023 13:12:59 +0000 (UTC) X-FDA: 81616267278.19.A73A028 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf01.hostedemail.com (Postfix) with ESMTP id 4273640013 for ; Thu, 28 Dec 2023 13:12:56 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf01.hostedemail.com: domain of artem.kuzin@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=artem.kuzin@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703769177; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Rn0ULgNc59sgt5GGZd5PEVfyCBWWhbVx/x5CZoJhwKA=; b=PyGK8kljs71bu3XqhQ9AuxeyXiUghs+P6QQR3B6+W61VXn8J7xZrFG1kajHo+Br+5UNunR upPYpQj5lDGyUtHXVYJF5uG3LfG2jXYs5qTmSsv4T6tuDnTy9jj4Oa3GozwRsVQgceURgA H8y+1YJLWG0lGbhkDvdwc5aC2D1MCsU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf01.hostedemail.com: domain of artem.kuzin@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=artem.kuzin@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703769177; a=rsa-sha256; cv=none; b=NFLqIg9ZRWuZt0s8Xjj3hNwLH/CsjuqaQI9EIZbAUtln0KM86qHor1dhdvruxelNeS3GDI plc+QG6q9rpVAyY65D/x6pWZQWTIT30mWd8f2X0lYu3xz5T5z53iOEjPxDxt17p934El3+ 0VyqrzN+v1Eu/rLvL7pKnLXt1x/eSnc= Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4T185B2nk5z67ncD; Thu, 28 Dec 2023 21:10:38 +0800 (CST) Received: from lhrpeml500001.china.huawei.com (unknown [7.191.163.213]) by mail.maildlp.com (Postfix) with ESMTPS id 3BF711400F4; Thu, 28 Dec 2023 21:12:55 +0800 (CST) Received: from mscphis00060.huawei.com (10.123.65.147) by lhrpeml500001.china.huawei.com (7.191.163.213) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Thu, 28 Dec 2023 13:12:52 +0000 From: To: , , , , , , , , , , , , , CC: , , , , , , , , , , Subject: [PATCH RFC 08/12] x86: make kernel text patching aware about replicas Date: Thu, 28 Dec 2023 21:10:52 +0800 Message-ID: <20231228131056.602411-9-artem.kuzin@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231228131056.602411-1-artem.kuzin@huawei.com> References: <20231228131056.602411-1-artem.kuzin@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.123.65.147] X-ClientProxiedBy: mscpeml500004.china.huawei.com (7.188.26.250) To lhrpeml500001.china.huawei.com (7.191.163.213) X-Rspam-User: X-Stat-Signature: 1i4ah4dkwitauwjkebbk3d3w8nbmm9ga X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4273640013 X-HE-Tag: 1703769176-740583 X-HE-Meta: U2FsdGVkX1+1MZyPjWQhyfiHlOH+7dptZkLfjODO9I5t8D37rHm2FeIKKy5ebmaVnk6zgNdXdJ2xMrKB/QDJxDy/hF5NO4wH187ZLwsVX06sBM94zQ3XrSZN5Yvdy63ah9iUYudr/KOUnPqcKIdxVuGLIdxxDKmINtVCm/biDj+0zEaIXPbRee+1oWVx3KZxioM/Zs1i4gvGWprbLmwSPoosDuohlWhBOsPK5rbfpP3gQxDdOScJCms4lf1i03Rh0DHswJcjp7fEPgJEMzl99USWecA0BG8b0ihKX8Cikho714EY2qX7eVr6YQnKzr2tB6XOcMh8zCdiUa9OFIGOtLQsVivUx/Hm3M+s9501YzusP+eMnOBE0hPt91oViyiSHRg775J94dhzVA6g9MOn9+4T1Epaim4tMLXDKV9y//Uj9uUQyjMWfIrT3GvgYOgznKECRKT7uZVW+MNvCtb6wKI2Tc6ICdCt+/Mx0HiKk2YvLzLwORaJzQWnoyv8HJ82wH9vISKhTtyFKEbh6uU8dYukzY6r5ZVs5xyOB9nBJtYAQq8oolaQfgMHH5zstacw7DCTrDHGOXSsSiGyh7M3LZ4pA6/U6xVZxGLo9/wKA1xd6TnYS7xr7EnlaT71DlM8aE/43XSPVyh2f5ZdC0wgM+QI2ScBa9cmIijrLyhMBB6bSluzvK9/1jyhWMZJKJ52FhmaW5j+h9UT+LcW3Xb7H0SFZK2Fb4Qs3lofFJJy8OXTyvXyL0geJkHBK6q0x4OiHUoxWcwFarZssGlmXAsS55VuXM8Tu9j4D2WTKWg4EMkh+ZXgU9pNAUA9t3ehGd/5sJKUDbYL7Sb7BT75N040PdbiPMqFhKgp2D2sZ/Y6bt7YtQtvwwq91GuXrvy09js1HvWY7WTrfRsVCQOM0XM6dZyuSHvYmTWEcB9m3y1GO0J+Y95WuFsfvGAolLpK8TozsxQY7jfxUMCbM/R9oXb aDCTFhjD S6hEUkTkkajZAdX6Te0nOGQ6pbK4PT56RalJVB2t0XSIoNuidN+Ic3Mmki93KEFhyEbx3Xyyhsm03WnespdJOJ86eLfU9wumKYhImg6H+4D6ggV35hgjJKOyyxqEkbcCW9l1dAX1ZsLa5gpMWfLvOT0/vEjUPr4DrNPkrO7CtXNZ4DaKOUXTqsuFmvVyWARGpv9s4kg8Bd3Opp5biIEeXWBZrzJMNH6R92mayxLSNCmRTu4jU5k8IwxH7TnKrFJXY0si0YpVA7aVjjQnxVkSKAMntbA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Artem Kuzin Co-developed-by: Nikita Panov Signed-off-by: Nikita Panov Co-developed-by: Alexander Grubnikov Signed-off-by: Alexander Grubnikov Signed-off-by: Artem Kuzin --- arch/x86/kernel/alternative.c | 116 ++++++++++++++++++---------------- 1 file changed, 62 insertions(+), 54 deletions(-) diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 44843a492e69..b0abd60bcafe 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -1659,6 +1660,7 @@ void __init_or_module text_poke_early(void *addr, const void *opcode, size_t len) { unsigned long flags; + int nid; if (boot_cpu_has(X86_FEATURE_NX) && is_module_text_address((unsigned long)addr)) { @@ -1669,8 +1671,18 @@ void __init_or_module text_poke_early(void *addr, const void *opcode, */ memcpy(addr, opcode, len); } else { + unsigned long iaddr = (unsigned long)addr; + local_irq_save(flags); - memcpy(addr, opcode, len); + if (is_text_replicated() && is_kernel_text(iaddr)) { + for_each_replica(nid) { + void *vaddr = numa_addr_in_replica(addr, nid); + + memcpy(vaddr, opcode, len); + } + } else { + memcpy(addr, opcode, len); + } local_irq_restore(flags); sync_core(); @@ -1764,36 +1776,21 @@ typedef void text_poke_f(void *dst, const void *src, size_t len); static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t len) { + int nid; bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE; struct page *pages[2] = {NULL}; temp_mm_state_t prev; unsigned long flags; + int size_in_poking_mm = PAGE_SIZE; pte_t pte, *ptep; spinlock_t *ptl; pgprot_t pgprot; - + bool has_replica = numa_addr_has_replica(addr); /* * While boot memory allocator is running we cannot use struct pages as * they are not yet initialized. There is no way to recover. */ BUG_ON(!after_bootmem); - - if (!core_kernel_text((unsigned long)addr)) { - pages[0] = vmalloc_to_page(addr); - if (cross_page_boundary) - pages[1] = vmalloc_to_page(addr + PAGE_SIZE); - } else { - pages[0] = virt_to_page(addr); - WARN_ON(!PageReserved(pages[0])); - if (cross_page_boundary) - pages[1] = virt_to_page(addr + PAGE_SIZE); - } - /* - * If something went wrong, crash and burn since recovery paths are not - * implemented. - */ - BUG_ON(!pages[0] || (cross_page_boundary && !pages[1])); - /* * Map the page without the global bit, as TLB flushing is done with * flush_tlb_mm_range(), which is intended for non-global PTEs. @@ -1812,48 +1809,59 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l local_irq_save(flags); - pte = mk_pte(pages[0], pgprot); - set_pte_at(poking_mm, poking_addr, ptep, pte); + for_each_replica(nid) { + prev = use_temporary_mm(poking_mm); - if (cross_page_boundary) { - pte = mk_pte(pages[1], pgprot); - set_pte_at(poking_mm, poking_addr + PAGE_SIZE, ptep + 1, pte); - } + pages[0] = walk_to_page_node(nid, addr); + if (cross_page_boundary) + pages[1] = walk_to_page_node(nid, addr + PAGE_SIZE); - /* - * Loading the temporary mm behaves as a compiler barrier, which - * guarantees that the PTE will be set at the time memcpy() is done. - */ - prev = use_temporary_mm(poking_mm); + BUG_ON(!pages[0] || (cross_page_boundary && !pages[1])); - kasan_disable_current(); - func((u8 *)poking_addr + offset_in_page(addr), src, len); - kasan_enable_current(); + pte = mk_pte(pages[0], pgprot); + set_pte_at(poking_mm, poking_addr, ptep, pte); - /* - * Ensure that the PTE is only cleared after the instructions of memcpy - * were issued by using a compiler barrier. - */ - barrier(); + if (cross_page_boundary) { + pte = mk_pte(pages[1], pgprot); + set_pte_at(poking_mm, poking_addr + PAGE_SIZE, ptep + 1, pte); + } + /* + * Compiler barrier to ensure that PTE is set before func() + */ + barrier(); - pte_clear(poking_mm, poking_addr, ptep); - if (cross_page_boundary) - pte_clear(poking_mm, poking_addr + PAGE_SIZE, ptep + 1); + kasan_disable_current(); + func((u8 *)poking_addr + offset_in_page(addr), src, len); + kasan_enable_current(); - /* - * Loading the previous page-table hierarchy requires a serializing - * instruction that already allows the core to see the updated version. - * Xen-PV is assumed to serialize execution in a similar manner. - */ - unuse_temporary_mm(prev); + /* + * Ensure that the PTE is only cleared after the instructions of memcpy + * were issued by using a compiler barrier. + */ + barrier(); - /* - * Flushing the TLB might involve IPIs, which would require enabled - * IRQs, but not if the mm is not used, as it is in this point. - */ - flush_tlb_mm_range(poking_mm, poking_addr, poking_addr + - (cross_page_boundary ? 2 : 1) * PAGE_SIZE, - PAGE_SHIFT, false); + pte_clear(poking_mm, poking_addr, ptep); + if (cross_page_boundary) + pte_clear(poking_mm, poking_addr + PAGE_SIZE, ptep + 1); + + /* + * Loading the previous page-table hierarchy requires a serializing + * instruction that already allows the core to see the updated version. + * Xen-PV is assumed to serialize execution in a similar manner. + */ + unuse_temporary_mm(prev); + + /* + * Flushing the TLB might involve IPIs, which would require enabled + * IRQs, but not if the mm is not used, as it is in this point. + */ + + flush_tlb_mm_range(poking_mm, poking_addr, poking_addr + + (cross_page_boundary ? 2 : 1) * size_in_poking_mm, + PAGE_SHIFT, false); + if (!has_replica) + break; + } if (func == text_poke_memcpy) { /* -- 2.34.1