From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DF070C46CD2
	for <linux-mm@archiver.kernel.org>; Tue, 30 Jan 2024 13:56:58 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 7957B6B006E; Tue, 30 Jan 2024 08:56:58 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 6328C6B0075; Tue, 30 Jan 2024 08:56:58 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 484DD6B0089; Tue, 30 Jan 2024 08:56:58 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 331D26B006E
	for <linux-mm@kvack.org>; Tue, 30 Jan 2024 08:56:58 -0500 (EST)
Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id F39F3A1D57
	for <linux-mm@kvack.org>; Tue, 30 Jan 2024 13:56:57 +0000 (UTC)
X-FDA: 81736128474.11.6C20A4C
Received: from szxga06-in.huawei.com (szxga06-in.huawei.com [45.249.212.32])
	by imf19.hostedemail.com (Postfix) with ESMTP id 782FD1A0021
	for <linux-mm@kvack.org>; Tue, 30 Jan 2024 13:56:54 +0000 (UTC)
Authentication-Results: imf19.hostedemail.com;
	dkim=none;
	dmarc=pass (policy=quarantine) header.from=huawei.com;
	spf=pass (imf19.hostedemail.com: domain of tongtiangen@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=tongtiangen@huawei.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1706623016;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=2qmGtunv406cGM/BA/Bx2N841HhYOMXYVlROr2g9kVU=;
	b=ZyVuSoosgoN7zXNKJrNalXoF02ycVNazISIW0EQcBMMVCwgkmswUJ9bpfJucS7LqFGYKTl
	BZW+bRxjnm7W/7gXxymdoe46oN9hGrFBODThJwwUeIXQgWgBaXHwHxRrAPMCut/hXkAfZ/
	QVwztUnf8fnJKUJ13VrNlOIV4PGBVOQ=
ARC-Authentication-Results: i=1;
	imf19.hostedemail.com;
	dkim=none;
	dmarc=pass (policy=quarantine) header.from=huawei.com;
	spf=pass (imf19.hostedemail.com: domain of tongtiangen@huawei.com designates 45.249.212.32 as permitted sender) smtp.mailfrom=tongtiangen@huawei.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706623016; a=rsa-sha256;
	cv=none;
	b=h3k03OS10iXqXbu8MyT6lvEgXMbxo5JYThOVlIEQjz7fEuUpUZRtWJSraBIw/DKVrfuZ3i
	M5cjWbDq8OSIICSuLOOi8BRk6VB7vBlv09qL4eEdXg3Ams4Sw3T3pph3xNyOJjTsuKqxpR
	1pQwcjHmS1u1xvNEjsEWpMTyqEZsasU=
Received: from mail.maildlp.com (unknown [172.19.88.214])
	by szxga06-in.huawei.com (SkyGuard) with ESMTP id 4TPRXm5dNzz1vsnb;
	Tue, 30 Jan 2024 21:56:24 +0800 (CST)
Received: from kwepemm600017.china.huawei.com (unknown [7.193.23.234])
	by mail.maildlp.com (Postfix) with ESMTPS id 77B311A016C;
	Tue, 30 Jan 2024 21:56:49 +0800 (CST)
Received: from [10.174.179.234] (10.174.179.234) by
 kwepemm600017.china.huawei.com (7.193.23.234) with Microsoft SMTP Server
 (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id
 15.1.2507.35; Tue, 30 Jan 2024 21:56:47 +0800
Message-ID: <a9080311-b7a9-bb2c-13a4-650ecc8d459e@huawei.com>
Date: Tue, 30 Jan 2024 21:56:46 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.8.0
Subject: Re: [PATCH v10 6/6] arm64: introduce copy_mc_to_kernel()
 implementation
To: Mark Rutland <mark.rutland@arm.com>
CC: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>,
	James Morse <james.morse@arm.com>, Robin Murphy <robin.murphy@arm.com>,
	Andrey Ryabinin <ryabinin.a.a@gmail.com>, Alexander Potapenko
	<glider@google.com>, Alexander Viro <viro@zeniv.linux.org.uk>, Andrey
 Konovalov <andreyknvl@gmail.com>, Dmitry Vyukov <dvyukov@google.com>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>, Andrew Morton
	<akpm@linux-foundation.org>, Michael Ellerman <mpe@ellerman.id.au>, Nicholas
 Piggin <npiggin@gmail.com>, Christophe Leroy <christophe.leroy@csgroup.eu>,
	Aneesh Kumar K.V <aneesh.kumar@kernel.org>, "Naveen N. Rao"
	<naveen.n.rao@linux.ibm.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo
 Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen
	<dave.hansen@linux.intel.com>, <x86@kernel.org>, "H. Peter Anvin"
	<hpa@zytor.com>, <linux-arm-kernel@lists.infradead.org>,
	<linux-mm@kvack.org>, <linuxppc-dev@lists.ozlabs.org>,
	<linux-kernel@vger.kernel.org>, <kasan-dev@googlegroups.com>,
	<wangkefeng.wang@huawei.com>, Guohanjun <guohanjun@huawei.com>
References: <20240129134652.4004931-1-tongtiangen@huawei.com>
 <20240129134652.4004931-7-tongtiangen@huawei.com>
 <ZbjNbA1Onnjd6kyp@FVFF77S0Q05N>
From: Tong Tiangen <tongtiangen@huawei.com>
In-Reply-To: <ZbjNbA1Onnjd6kyp@FVFF77S0Q05N>
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.174.179.234]
X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To
 kwepemm600017.china.huawei.com (7.193.23.234)
X-Rspamd-Queue-Id: 782FD1A0021
X-Rspam-User: 
X-Rspamd-Server: rspam05
X-Stat-Signature: 3x3ec7oq11xpm1x89x6z5xf4i5ynmzsj
X-HE-Tag: 1706623014-981576
X-HE-Meta: U2FsdGVkX1+f6lx5DOkBwc0KzqjV/52zaOzohJf4qebuYc6OmB+MQuXXsMWIGGJFd9dikqfjAyhVInudDAegYFedrA/4VfFGFpQgRJMFEk6T7GcZn8TvCY0ZXbGPvmxupmQemBPDFHPU+ROJAXwNIe4CnYRcA3chh7q9+0bpXc++YwJ1Hfx1X9vRg1M10wwiblt7LSQg0TYq529Eiph6FNWjliqxhKKZ8iM07cnTttbvBV4hKUoCLqFuxWnQ1vRNBlr08GbTnN4CH+U+3aUVMo+DzQX9yDKwwUscvUBHvxqED1sMh/hm5DxT8hMzt+FQS5XOR2SsLEhPbIgAltT2Pr4emuM3vQUQJdo0e0VP263kg1qHDNPMAusBYO+4BZRS5gvnv1NOPkI4dZGCIpJuEoDCZ8LI8jslyDMaF4XGNmnzdH+2gc2muf68pPI13OoPkNGRzW5cACGqGguClOJRbssXNVINBMO6ojAhP/4wUZP4emnKCJGTg9FTbs+W+aVAhhWiFaSTkjExmKCHAusDELXDSLgJO77B9Y4ODvjxzDMgKzMrdKQHar/0k8F/PWyktRNmKxJksMzsv7ev+7N/j8jBH5tG+QkikBzTqeHdRnXMZg9/WPfBe/UmD6njkVhWAC0XgBzsTkXPIc93tKzRscKUyVKStR0hDh8OXEuaWXYFjyOXsHyZ+aCUzI31k5Y2Yt9b/vrB9ZuZ2ft2NBkKpWztMgzoFcKNSte23EUHxx5B7bY9MLtaDdfKbAykwEQvcQBLRaZlovBnrdRE8DNfF50/pKJsxHv0roAksAFDTuB3ZXrD+UO+JiEalGatB7JgPUu9r5T//ta57OPj9n8raRDTE6/tLSro/4PuCV2Cxg/nnHCQQ+xOrnWPeRZaSAibZwq5Xm9V92NnS2F0n70XA/1ait59PhKBHFEjeT97FJd71mOg+tk29YcGYI1njK/zcuVyvfR0rlEJL5oHoyb
 YTkwdJzQ
 4cRyJFNQLf4Nbc+JeZSnKUsRpd3F7dfNM4vkfueJw2N62cTx0lrObpxYvfQJtA1RWrDR6RoGqAQtM4no/TWDjwxfVxyrcFSyaAyaRDBsKGoyAU8ZOnP3nBuxG25ymn+kjKgArNWB9J45jJcktFK4li3YoKe4TP6OMN3i2E9GWqIcStlDRsjxre6yYy01umQBCXzoS//1CSdPzJhIIGicGhHyXBiCG7rfwdCN2NLhOg69xkU3sJ+hRub+Z/UQjWpfZnnsK5sNNeF/pA080xkWI+G3VEGHCG3e/u6yPLNhMRi2GA/+mZoOGYEreEolmbrxNDrDa
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>


在 2024/1/30 18:20, Mark Rutland 写道:
> On Mon, Jan 29, 2024 at 09:46:52PM +0800, Tong Tiangen wrote:
>> The copy_mc_to_kernel() helper is memory copy implementation that handles
>> source exceptions. It can be used in memory copy scenarios that tolerate
>> hardware memory errors(e.g: pmem_read/dax_copy_to_iter).
>>
>> Currnently, only x86 and ppc suuport this helper, after arm64 support
>> machine check safe framework, we introduce copy_mc_to_kernel()
>> implementation.
>>
>> Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
>> ---
>>   arch/arm64/include/asm/string.h  |   5 +
>>   arch/arm64/include/asm/uaccess.h |  21 +++
>>   arch/arm64/lib/Makefile          |   2 +-
>>   arch/arm64/lib/memcpy_mc.S       | 257 +++++++++++++++++++++++++++++++
>>   mm/kasan/shadow.c                |  12 ++
>>   5 files changed, 296 insertions(+), 1 deletion(-)
>>   create mode 100644 arch/arm64/lib/memcpy_mc.S
> 
> Looking at the diffstat and code, this duplicates arch/arm64/lib/memcpy.S with
> a few annotations. Duplicating that code is not maintainable, and so we cannot
> take this as-is.
> 
> If you want a version that can handle faults that *must* be written such that
> the code is shared with the regular memcpy. That could be done by using macros
> to instantiate two copies (one with fault handling, the other without).
> 
> It would also be very helpful to see *any* indication that this has been
> tested, which is sorely lacking in the series as-is.
> 
> Mark.

OK, so that's what I'm really want to solve right now, a lot of
  duplicate code, and I'm going to think about how to deal with that.

Thank you very much for your good advice:)
Tong.

> 
>> diff --git a/arch/arm64/include/asm/string.h b/arch/arm64/include/asm/string.h
>> index 3a3264ff47b9..995b63c26e99 100644
>> --- a/arch/arm64/include/asm/string.h
>> +++ b/arch/arm64/include/asm/string.h
>> @@ -35,6 +35,10 @@ extern void *memchr(const void *, int, __kernel_size_t);
>>   extern void *memcpy(void *, const void *, __kernel_size_t);
>>   extern void *__memcpy(void *, const void *, __kernel_size_t);
>>   
>> +#define __HAVE_ARCH_MEMCPY_MC
>> +extern int memcpy_mcs(void *, const void *, __kernel_size_t);
>> +extern int __memcpy_mcs(void *, const void *, __kernel_size_t);
>> +
>>   #define __HAVE_ARCH_MEMMOVE
>>   extern void *memmove(void *, const void *, __kernel_size_t);
>>   extern void *__memmove(void *, const void *, __kernel_size_t);
>> @@ -57,6 +61,7 @@ void memcpy_flushcache(void *dst, const void *src, size_t cnt);
>>    */
>>   
>>   #define memcpy(dst, src, len) __memcpy(dst, src, len)
>> +#define memcpy_mcs(dst, src, len) __memcpy_mcs(dst, src, len)
>>   #define memmove(dst, src, len) __memmove(dst, src, len)
>>   #define memset(s, c, n) __memset(s, c, n)
>>   
>> diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
>> index 14be5000c5a0..61e28ef2112a 100644
>> --- a/arch/arm64/include/asm/uaccess.h
>> +++ b/arch/arm64/include/asm/uaccess.h
>> @@ -425,4 +425,25 @@ static inline size_t probe_subpage_writeable(const char __user *uaddr,
>>   
>>   #endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */
>>   
>> +#ifdef CONFIG_ARCH_HAS_COPY_MC
>> +/**
>> + * copy_mc_to_kernel - memory copy that handles source exceptions
>> + *
>> + * @dst:	destination address
>> + * @src:	source address
>> + * @len:	number of bytes to copy
>> + *
>> + * Return 0 for success, or #size if there was an exception.
>> + */
>> +static inline unsigned long __must_check
>> +copy_mc_to_kernel(void *to, const void *from, unsigned long size)
>> +{
>> +	int ret;
>> +
>> +	ret = memcpy_mcs(to, from, size);
>> +	return (ret == -EFAULT) ? size : 0;
>> +}
>> +#define copy_mc_to_kernel copy_mc_to_kernel
>> +#endif
>> +
>>   #endif /* __ASM_UACCESS_H */
>> diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
>> index a2fd865b816d..899d6ae9698c 100644
>> --- a/arch/arm64/lib/Makefile
>> +++ b/arch/arm64/lib/Makefile
>> @@ -3,7 +3,7 @@ lib-y		:= clear_user.o delay.o copy_from_user.o		\
>>   		   copy_to_user.o copy_page.o				\
>>   		   clear_page.o csum.o insn.o memchr.o memcpy.o		\
>>   		   memset.o memcmp.o strcmp.o strncmp.o strlen.o	\
>> -		   strnlen.o strchr.o strrchr.o tishift.o
>> +		   strnlen.o strchr.o strrchr.o tishift.o memcpy_mc.o
>>   
>>   ifeq ($(CONFIG_KERNEL_MODE_NEON), y)
>>   obj-$(CONFIG_XOR_BLOCKS)	+= xor-neon.o
>> diff --git a/arch/arm64/lib/memcpy_mc.S b/arch/arm64/lib/memcpy_mc.S
>> new file mode 100644
>> index 000000000000..7076b500d154
>> --- /dev/null
>> +++ b/arch/arm64/lib/memcpy_mc.S
>> @@ -0,0 +1,257 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (c) 2012-2021, Arm Limited.
>> + *
>> + * Adapted from the original at:
>> + * https://github.com/ARM-software/optimized-routines/blob/afd6244a1f8d9229/string/aarch64/memcpy.S
>> + */
>> +
>> +#include <linux/linkage.h>
>> +#include <asm/assembler.h>
>> +
>> +/* Assumptions:
>> + *
>> + * ARMv8-a, AArch64, unaligned accesses.
>> + *
>> + */
>> +
>> +#define L(label) .L ## label
>> +
>> +#define dstin	x0
>> +#define src	x1
>> +#define count	x2
>> +#define dst	x3
>> +#define srcend	x4
>> +#define dstend	x5
>> +#define A_l	x6
>> +#define A_lw	w6
>> +#define A_h	x7
>> +#define B_l	x8
>> +#define B_lw	w8
>> +#define B_h	x9
>> +#define C_l	x10
>> +#define C_lw	w10
>> +#define C_h	x11
>> +#define D_l	x12
>> +#define D_h	x13
>> +#define E_l	x14
>> +#define E_h	x15
>> +#define F_l	x16
>> +#define F_h	x17
>> +#define G_l	count
>> +#define G_h	dst
>> +#define H_l	src
>> +#define H_h	srcend
>> +#define tmp1	x14
>> +
>> +/* This implementation handles overlaps and supports both memcpy and memmove
>> +   from a single entry point.  It uses unaligned accesses and branchless
>> +   sequences to keep the code small, simple and improve performance.
>> +
>> +   Copies are split into 3 main cases: small copies of up to 32 bytes, medium
>> +   copies of up to 128 bytes, and large copies.  The overhead of the overlap
>> +   check is negligible since it is only required for large copies.
>> +
>> +   Large copies use a software pipelined loop processing 64 bytes per iteration.
>> +   The destination pointer is 16-byte aligned to minimize unaligned accesses.
>> +   The loop tail is handled by always copying 64 bytes from the end.
>> +*/
>> +
>> +SYM_FUNC_START(__pi_memcpy_mcs)
>> +	add	srcend, src, count
>> +	add	dstend, dstin, count
>> +	cmp	count, 128
>> +	b.hi	L(copy_long)
>> +	cmp	count, 32
>> +	b.hi	L(copy32_128)
>> +
>> +	/* Small copies: 0..32 bytes.  */
>> +	cmp	count, 16
>> +	b.lo	L(copy16)
>> +	CPY_MC(9998f, ldp	A_l, A_h, [src])
>> +	CPY_MC(9998f, ldp	D_l, D_h, [srcend, -16])
>> +	CPY_MC(9998f, stp	A_l, A_h, [dstin])
>> +	CPY_MC(9998f, stp	D_l, D_h, [dstend, -16])
>> +	mov x0, #0
>> +	ret
>> +
>> +	/* Copy 8-15 bytes.  */
>> +L(copy16):
>> +	tbz	count, 3, L(copy8)
>> +	CPY_MC(9998f, ldr	A_l, [src])
>> +	CPY_MC(9998f, ldr	A_h, [srcend, -8])
>> +	CPY_MC(9998f, str	A_l, [dstin])
>> +	CPY_MC(9998f, str	A_h, [dstend, -8])
>> +	mov x0, #0
>> +	ret
>> +
>> +	.p2align 3
>> +	/* Copy 4-7 bytes.  */
>> +L(copy8):
>> +	tbz	count, 2, L(copy4)
>> +	CPY_MC(9998f, ldr	A_lw, [src])
>> +	CPY_MC(9998f, ldr	B_lw, [srcend, -4])
>> +	CPY_MC(9998f, str	A_lw, [dstin])
>> +	CPY_MC(9998f, str	B_lw, [dstend, -4])
>> +	mov x0, #0
>> +	ret
>> +
>> +	/* Copy 0..3 bytes using a branchless sequence.  */
>> +L(copy4):
>> +	cbz	count, L(copy0)
>> +	lsr	tmp1, count, 1
>> +	CPY_MC(9998f, ldrb	A_lw, [src])
>> +	CPY_MC(9998f, ldrb	C_lw, [srcend, -1])
>> +	CPY_MC(9998f, ldrb	B_lw, [src, tmp1])
>> +	CPY_MC(9998f, strb	A_lw, [dstin])
>> +	CPY_MC(9998f, strb	B_lw, [dstin, tmp1])
>> +	CPY_MC(9998f, strb	C_lw, [dstend, -1])
>> +L(copy0):
>> +	mov x0, #0
>> +	ret
>> +
>> +	.p2align 4
>> +	/* Medium copies: 33..128 bytes.  */
>> +L(copy32_128):
>> +	CPY_MC(9998f, ldp	A_l, A_h, [src])
>> +	CPY_MC(9998f, ldp	B_l, B_h, [src, 16])
>> +	CPY_MC(9998f, ldp	C_l, C_h, [srcend, -32])
>> +	CPY_MC(9998f, ldp	D_l, D_h, [srcend, -16])
>> +	cmp	count, 64
>> +	b.hi	L(copy128)
>> +	CPY_MC(9998f, stp	A_l, A_h, [dstin])
>> +	CPY_MC(9998f, stp	B_l, B_h, [dstin, 16])
>> +	CPY_MC(9998f, stp	C_l, C_h, [dstend, -32])
>> +	CPY_MC(9998f, stp	D_l, D_h, [dstend, -16])
>> +	mov x0, #0
>> +	ret
>> +
>> +	.p2align 4
>> +	/* Copy 65..128 bytes.  */
>> +L(copy128):
>> +	CPY_MC(9998f, ldp	E_l, E_h, [src, 32])
>> +	CPY_MC(9998f, ldp	F_l, F_h, [src, 48])
>> +	cmp	count, 96
>> +	b.ls	L(copy96)
>> +	CPY_MC(9998f, ldp	G_l, G_h, [srcend, -64])
>> +	CPY_MC(9998f, ldp	H_l, H_h, [srcend, -48])
>> +	CPY_MC(9998f, stp	G_l, G_h, [dstend, -64])
>> +	CPY_MC(9998f, stp	H_l, H_h, [dstend, -48])
>> +L(copy96):
>> +	CPY_MC(9998f, stp	A_l, A_h, [dstin])
>> +	CPY_MC(9998f, stp	B_l, B_h, [dstin, 16])
>> +	CPY_MC(9998f, stp	E_l, E_h, [dstin, 32])
>> +	CPY_MC(9998f, stp	F_l, F_h, [dstin, 48])
>> +	CPY_MC(9998f, stp	C_l, C_h, [dstend, -32])
>> +	CPY_MC(9998f, stp	D_l, D_h, [dstend, -16])
>> +	mov x0, #0
>> +	ret
>> +
>> +	.p2align 4
>> +	/* Copy more than 128 bytes.  */
>> +L(copy_long):
>> +	/* Use backwards copy if there is an overlap.  */
>> +	sub	tmp1, dstin, src
>> +	cbz	tmp1, L(copy0)
>> +	cmp	tmp1, count
>> +	b.lo	L(copy_long_backwards)
>> +
>> +	/* Copy 16 bytes and then align dst to 16-byte alignment.  */
>> +
>> +	CPY_MC(9998f, ldp	D_l, D_h, [src])
>> +	and	tmp1, dstin, 15
>> +	bic	dst, dstin, 15
>> +	sub	src, src, tmp1
>> +	add	count, count, tmp1	/* Count is now 16 too large.  */
>> +	CPY_MC(9998f, ldp	A_l, A_h, [src, 16])
>> +	CPY_MC(9998f, stp	D_l, D_h, [dstin])
>> +	CPY_MC(9998f, ldp	B_l, B_h, [src, 32])
>> +	CPY_MC(9998f, ldp	C_l, C_h, [src, 48])
>> +	CPY_MC(9998f, ldp	D_l, D_h, [src, 64]!)
>> +	subs	count, count, 128 + 16	/* Test and readjust count.  */
>> +	b.ls	L(copy64_from_end)
>> +
>> +L(loop64):
>> +	CPY_MC(9998f, stp	A_l, A_h, [dst, 16])
>> +	CPY_MC(9998f, ldp	A_l, A_h, [src, 16])
>> +	CPY_MC(9998f, stp	B_l, B_h, [dst, 32])
>> +	CPY_MC(9998f, ldp	B_l, B_h, [src, 32])
>> +	CPY_MC(9998f, stp	C_l, C_h, [dst, 48])
>> +	CPY_MC(9998f, ldp	C_l, C_h, [src, 48])
>> +	CPY_MC(9998f, stp	D_l, D_h, [dst, 64]!)
>> +	CPY_MC(9998f, ldp	D_l, D_h, [src, 64]!)
>> +	subs	count, count, 64
>> +	b.hi	L(loop64)
>> +
>> +	/* Write the last iteration and copy 64 bytes from the end.  */
>> +L(copy64_from_end):
>> +	CPY_MC(9998f, ldp	E_l, E_h, [srcend, -64])
>> +	CPY_MC(9998f, stp	A_l, A_h, [dst, 16])
>> +	CPY_MC(9998f, ldp	A_l, A_h, [srcend, -48])
>> +	CPY_MC(9998f, stp	B_l, B_h, [dst, 32])
>> +	CPY_MC(9998f, ldp	B_l, B_h, [srcend, -32])
>> +	CPY_MC(9998f, stp	C_l, C_h, [dst, 48])
>> +	CPY_MC(9998f, ldp	C_l, C_h, [srcend, -16])
>> +	CPY_MC(9998f, stp	D_l, D_h, [dst, 64])
>> +	CPY_MC(9998f, stp	E_l, E_h, [dstend, -64])
>> +	CPY_MC(9998f, stp	A_l, A_h, [dstend, -48])
>> +	CPY_MC(9998f, stp	B_l, B_h, [dstend, -32])
>> +	CPY_MC(9998f, stp	C_l, C_h, [dstend, -16])
>> +	mov x0, #0
>> +	ret
>> +
>> +	.p2align 4
>> +
>> +	/* Large backwards copy for overlapping copies.
>> +	   Copy 16 bytes and then align dst to 16-byte alignment.  */
>> +L(copy_long_backwards):
>> +	CPY_MC(9998f, ldp	D_l, D_h, [srcend, -16])
>> +	and	tmp1, dstend, 15
>> +	sub	srcend, srcend, tmp1
>> +	sub	count, count, tmp1
>> +	CPY_MC(9998f, ldp	A_l, A_h, [srcend, -16])
>> +	CPY_MC(9998f, stp	D_l, D_h, [dstend, -16])
>> +	CPY_MC(9998f, ldp	B_l, B_h, [srcend, -32])
>> +	CPY_MC(9998f, ldp	C_l, C_h, [srcend, -48])
>> +	CPY_MC(9998f, ldp	D_l, D_h, [srcend, -64]!)
>> +	sub	dstend, dstend, tmp1
>> +	subs	count, count, 128
>> +	b.ls	L(copy64_from_start)
>> +
>> +L(loop64_backwards):
>> +	CPY_MC(9998f, stp	A_l, A_h, [dstend, -16])
>> +	CPY_MC(9998f, ldp	A_l, A_h, [srcend, -16])
>> +	CPY_MC(9998f, stp	B_l, B_h, [dstend, -32])
>> +	CPY_MC(9998f, ldp	B_l, B_h, [srcend, -32])
>> +	CPY_MC(9998f, stp	C_l, C_h, [dstend, -48])
>> +	CPY_MC(9998f, ldp	C_l, C_h, [srcend, -48])
>> +	CPY_MC(9998f, stp	D_l, D_h, [dstend, -64]!)
>> +	CPY_MC(9998f, ldp	D_l, D_h, [srcend, -64]!)
>> +	subs	count, count, 64
>> +	b.hi	L(loop64_backwards)
>> +
>> +	/* Write the last iteration and copy 64 bytes from the start.  */
>> +L(copy64_from_start):
>> +	CPY_MC(9998f, ldp	G_l, G_h, [src, 48])
>> +	CPY_MC(9998f, stp	A_l, A_h, [dstend, -16])
>> +	CPY_MC(9998f, ldp	A_l, A_h, [src, 32])
>> +	CPY_MC(9998f, stp	B_l, B_h, [dstend, -32])
>> +	CPY_MC(9998f, ldp	B_l, B_h, [src, 16])
>> +	CPY_MC(9998f, stp	C_l, C_h, [dstend, -48])
>> +	CPY_MC(9998f, ldp	C_l, C_h, [src])
>> +	CPY_MC(9998f, stp	D_l, D_h, [dstend, -64])
>> +	CPY_MC(9998f, stp	G_l, G_h, [dstin, 48])
>> +	CPY_MC(9998f, stp	A_l, A_h, [dstin, 32])
>> +	CPY_MC(9998f, stp	B_l, B_h, [dstin, 16])
>> +	CPY_MC(9998f, stp	C_l, C_h, [dstin])
>> +	mov x0, #0
>> +	ret
>> +
>> +9998:	mov x0, #-EFAULT
>> +	ret
>> +SYM_FUNC_END(__pi_memcpy_mcs)
>> +
>> +SYM_FUNC_ALIAS(__memcpy_mcs, __pi_memcpy_mcs)
>> +EXPORT_SYMBOL(__memcpy_mcs)
>> +SYM_FUNC_ALIAS_WEAK(memcpy_mcs, __memcpy_mcs)
>> +EXPORT_SYMBOL(memcpy_mcs)
>> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
>> index 9ef84f31833f..e6519fd329b2 100644
>> --- a/mm/kasan/shadow.c
>> +++ b/mm/kasan/shadow.c
>> @@ -79,6 +79,18 @@ void *memcpy(void *dest, const void *src, size_t len)
>>   }
>>   #endif
>>   
>> +#ifdef __HAVE_ARCH_MEMCPY_MC
>> +#undef memcpy_mcs
>> +int memcpy_mcs(void *dest, const void *src, size_t len)
>> +{
>> +	if (!check_memory_region((unsigned long)src, len, false, _RET_IP_) ||
>> +	    !check_memory_region((unsigned long)dest, len, true, _RET_IP_))
>> +		return (unsigned long)len;
>> +
>> +	return __memcpy_mcs(dest, src, len);
>> +}
>> +#endif
>> +
>>   void *__asan_memset(void *addr, int c, ssize_t len)
>>   {
>>   	if (!kasan_check_range(addr, len, true, _RET_IP_))
>> -- 
>> 2.25.1
>>
> .