From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97DC8C369A2 for ; Tue, 8 Apr 2025 15:07:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 788BE6B0023; Tue, 8 Apr 2025 11:07:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7042D6B0025; Tue, 8 Apr 2025 11:07:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A3EE6B0024; Tue, 8 Apr 2025 11:07:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3875C6B0005 for ; Tue, 8 Apr 2025 11:07:27 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 1ADD61CCB03 for ; Tue, 8 Apr 2025 15:07:27 +0000 (UTC) X-FDA: 83311205334.14.9A7C3DB Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by imf12.hostedemail.com (Postfix) with ESMTP id 2B4644001B for ; Tue, 8 Apr 2025 15:07:24 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of cmarinas@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744124845; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d3OTMQzt0hQTVDOjhI9WM6jpQHubDD+OAqx28mCZDdU=; b=fx2ahRoLWHQVZPXYnUal7KImwKUkood05hq1GEH4nrq8kZv5bLBJiGGJ3dy6bPOASAB10e btmVlp2BlE7VUGys3gF53xnOLC5xE6hDEkTrysiYeGHHEMiPf7pxdNFevkFUk+ccMzKxe6 tPf4M1HYQbpawEkExYD52AbJUX26idA= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of cmarinas@kernel.org designates 172.234.252.31 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744124845; a=rsa-sha256; cv=none; b=bgCQz8C756GmeGs6/S+dqI1mMx4mL5wzIsuGXpsFBoC/NvuPT7bvxJ+lUzxKP+5xPtWDpS Jdav9W829whDiqjntuLBDdD8yIuUhO1NxhpSa/IQ4KTyf1sMIdVlwnEaH5XHf3sRbCl/H7 rx64ocIBRtNH2/GCII2WAInWayCdVv0= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 1EEBC43C76; Tue, 8 Apr 2025 15:07:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EC7EFC4CEE7; Tue, 8 Apr 2025 15:07:21 +0000 (UTC) Date: Tue, 8 Apr 2025 16:07:19 +0100 From: Catalin Marinas To: Petr Tesarik Cc: Vlastimil Babka , Feng Tang , Harry Yoo , Peng Fan , Hyeonggon Yoo <42.hyeyoo@gmail.com>, David Rientjes , Christoph Lameter , "linux-mm@kvack.org" Subject: Re: slub - extended kmalloc redzone and dma alignment Message-ID: References: <20250404131239.2a987e58@mordecai> <20250404155303.2e0cdd27@mordecai> <39657cf9-e24d-4b85-9773-45fe26dd16ae@suse.cz> <20250408072732.32db7809@mordecai> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250408072732.32db7809@mordecai> X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 2B4644001B X-Stat-Signature: khgbi3ow6t9ued37r6fgikpmq46ro95j X-Rspam-User: X-HE-Tag: 1744124844-986661 X-HE-Meta: U2FsdGVkX19Zuj5NeoPD5z9i8mKACMw5SiY0xh+Qa+V43RxqKSqHQ/3k8jXJyJZfIOOWHT3LDEM7qa6fZFTIsd4Z1K8PIqnq1oVmi+CPe/COgq1B2QoLCL184vO5nXYL6N17yZ200Zbay6jYJMLh2QqPfdIdSGW9PPSTGvsxHs+sj8p6GxEc0NOKgvU593TyVCSrNutoflCbRS+szOnAM92CrRLf+tBbOYL90+hZlbFs/DleHvkTpDK7dk7lS3K8vMjoV4/QrqZ9wos8o+UKx+UL0ZcpbwnD3q8IXB4Tb5QKsR3CoEqDum7FfidK+wCAQKSCOy5OuI4vRWDQx5Axz8YDbrUOR0LINKATijLkNcgpusvM2d7uZ3JPdc4EN63Z4lbyLhJ2GgiuZpTf2YKD5Bzj16HgpZE+e0Eo4NAOwrHvBMauzLXHPURWycY0PwDT5aS5/5keYxxNAcTD7oFl0v9Ij4wrp4BpoUPjzo2FsFFJdxKnR6EblsJi0AtuWne1h3LMSUOnbgZHNwK+R5769jURZewHUjjtvJiJrAsIkNKICyHou8EOT06Z0rHImd9PhzaP5FUi7BgOAhkKQIt/Ps8Vqn5p9zkPB2dpiV5bn23wZzeiPs631ogWPjGKFefrxQAoLYil7BILKPu9+nmA0LPDmEp/a2MK3GciXv1rmO6woPi+qD1LQz59Cgfh+RujQdzzLUEk5ouY7mrgo0OT87Ih9zhcg/dRQ4XHxd8TziT9e4MdtXONz7XpjBM5vF6HGuW/pIJ5sFlH5W6sdz9/sKhCiN5qeSBQ8ZCJkwkXvLqNYYtm66b77bWq85VL6wuk34/NAMrFG6vqvjjLdriCnnAmTez8ijr6KoCmfQGlO0xtHtNkeaGq5yOhWaAzIHB9zJXCdF2B5UYyYOvnS8Suow/ph+LHdmYNhDOxExrawUg5hdmT/VOUfcZVYomvJp/IyfIAmRemuBFFZ6e2quR 0QwlSCFR jcMyszgQgSOLtbPy/tk3MhiaOe1bO44YKqGnX8wRr4iDncaDAMkLWO7giqoUy7D4GDsRAMErrIQ7ivF8rgs2AH2HbdO1Q1Mnd1nREMzg53CfLpGn/BczY2VkXQrcTr3M6VDwVkBMqtrvZ7kqMX4Zyy8bH4rmqiyTLgHrRfZri3IgFqJ6CVR+77ips77uiloZbotHy1/QZPuHTobfJqTtyLCcVBghLsXauT4DQHEd8JTv9q/8PxxZKZ+qY7aQqLz2SLyA4otTtiIOF8hk5VLWlTQnrcbKmjh0JYcBZNO3n7qzsFXt4CnOB8H/duddofrXS8QJ+bjcXpK9Ee7SEUqrnc+nR47KGQ//OrxEThRxnkvNqUUg2dLQXrbOMDYN5c7QctsK0mWthwNao04q1Rk4kS5WQS3Dloiqz84tAF/mY+Lx5blVFoQaZjj7StPDgGm+E7rznXGWcsTVrLrGfRBdFYgCP4GjKz1xFzyT8Z4Cbw0MxLKYV1PLSp573k5eKkk4KLQ+5gCm/e6pt1oY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Apr 08, 2025 at 07:27:32AM +0200, Petr Tesarik wrote: > On Mon, 7 Apr 2025 18:12:09 +0100 > Catalin Marinas wrote: > > Thanks for looping me in. I'm just catching up with this thread. > > > > On Mon, Apr 07, 2025 at 09:54:41AM +0200, Vlastimil Babka wrote: > > > On 4/7/25 09:21, Feng Tang wrote: > > > > On Sun, Apr 06, 2025 at 10:02:40PM +0800, Feng Tang wrote: > > > > [...] > > > >> > I can remember this series, as well as my confusion why > > > >> > 192-byte kmalloc caches were missing on arm64. > > > >> > > > > >> > Nevertheless, I believe ARCH_DMA_MINALIGN is required to avoid > > > >> > putting a DMA buffer on the same cache line as some other data > > > >> > that might be _written_ by the CPU while the corresponding > > > >> > main memory is modified by another bus-mastering device. > > > >> > > > > >> > Consider this layout: > > > >> > > > > >> > ... | DMA buffer | other data | ... > > > >> > ^ ^ > > > >> > +-------------------------+-- cache line boundaries > > > >> > > > > >> > When you prepare for DMA, you make sure that the DMA buffer is > > > >> > not cached by the CPU, so you flush the cache line (from all > > > >> > levels). Then you tell the device to write into the DMA > > > >> > buffer. However, before the device finishes the DMA > > > >> > transaction, the CPU accesses "other data", loading this cache > > > >> > line from main memory with partial results. Worse, if the CPU > > > >> > writes to "other data", it may write the cache line back into > > > >> > main memory, racing with the device writing to DMA buffer, and > > > >> > you end up with corrupted data in DMA buffer. > > > > Yes, cache evictions from 'other data; can override the DMA. Another > > problem, when the DMA completed, the kernel does a cache invalidation > > to remove any speculatively loaded cache lines from the DMA buffer > > but that would also invalidate 'other data', potentially corrupting > > it if it was dirty. > > > > So it's not safe to have DMA into buffers less than ARCH_DMA_MINALIGN > > (and unaligned). > > It's not safe to DMA into buffers that share a CPU cache line with other > data, which could be before or after the DMA buffer, of course. Was the original problem reported for an arm64 platform? It wasn't clear to me from the thread. For arm64, the only problem is if the other data is being modified _while_ the transfer is taking place. Otherwise when mapping the buffer for device, the kernel cleans the caches and writes other data to RAM. See arch_sync_dma_for_device(). This is non-destructive w.r.t. the data in both the DMA buffer and the red zone. After the transfer (FROM_DEVICE), the arch_sync_dma_for_cpu() invalidates the caches, including other data, but since they were previously written to RAM in the for_device case, they'd be read into the cache on access without any corruption. Of course, this assumes that the device keeps within the limits and does not write beyond the DMA buffer into the red zone. If it does, the buffer overflow warning is valid. While I think we are ok for arm64, other architectures may invalidate the caches in the arch_sync_dma_for_device() which could discard the red zone data. A quick grep for arch_sync_dma_for_device() shows several architectures invalidating the caches in the FROM_DEVICE case. > > What I did with reducing the minimum kmalloc() > > alignment was to force bouncing via swiotlb if the size passed to the > > DMA API is small. It may end up bouncing buffers that did not > > originate from kmalloc() or have proper alignment (with padding) but > > that's some heuristics we were willing to accept to be able to use > > small kmalloc() caches on arm64 - see dma_kmalloc_needs_bounce(). > > > > Does redzoning apply to kmalloc() or kmem_cache_create() (or both)? I > > haven't checked yet but if the red zone is within ARCH_DMA_MINALIGN > > (or rather dma_get_cache_alignment()), we could have issues with > > either corrupting the DMA buffer or the red zone. [...] > > I'm sorry if I'm being thick, but IIUC the red zone does not have to be > protected. Yes, we might miss red zone corruption if it happens to race > with a DMA transaction, but I have assumed that this is permissible. I > regard the red zone as a useful debugging tool, not a safety measure > that is guaranteed to detect any write beyond the buffer end. Yeah, it's debugging, but too many false positives make the feature pretty useless. > > > > I'm not familiar with DMA stuff, but Vlastimil's idea does make it > > > > easier for driver developer to write a driver to be used on > > > > different ARCHs, which have different DMA alignment requirement. > > > > Say if the minimal safe size is 8 bytes, the driver can just > > > > request 8 bytes and ARCH_DMA_MINALIGN will automatically chose > > > > the right size for it, which can save memory for ARCHs with > > > > smaller alignment requirement. Meanwhile it does sacrifice part > > > > of the redzone check ability, so I don't have preference here :) > > > > > > Let's clarify first who's expected to ensure the word alignment for > > > DMA, if it's not kmalloc() then I'd rather resist moving it there > > > :) > > > > In theory, the DMA API should handle the alignment as I tried to > > remove it from the kmalloc() code. > > Are we talking about the alignment of the starting address, or buffer > size, or both? The DMA API bouncing logic only checks the buffer size and assumes start/end would be aligned to kmalloc_size_roundup() with no valid data between them (FROM_DEVICE). > > With kmem_cache_create() (or kmalloc() as well), if the object size is > > not cacheline-aligned, is there risk of redzoning around the object > > without any alignment restrictions? The logic in > > dma_kmalloc_size_aligned() would fail for sufficiently large buffers > > but with unaligned red zone around the object. > > This red zone is the extra memory that is normally wasted by kmalloc() > rounding up the requested size to the bucket size. > But dma_kmalloc_size_aligned() already uses kmalloc_size_roundup(size), > so it seems to be covered. Assuming I got kmalloc redzoning right, I think there's still a potential issue. Let's say we have a system with 128-byte DMA alignment required (the largest cache line size). We do a kmalloc(104) and kmalloc_size_roundup() returns 128, so all seems good to the DMA code. However, kmalloc() redzones from 104 to 128 as it tracks the original size. The DMA bouncing doesn't spot it since the kmalloc_size_roundup(104) is aligned to 128. The above is a problem even for architectures that don't select DMA_BOUNCE_UNALIGNED_KMALLOC but have non-coherent DMA (well, selecting it may have a better chance of working if the buffers are small). So I think 946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated kmalloc space than requested") is broken on most architectures that select ARCH_HAS_SYNC_DMA_FOR_DEVICE (arm64 is ok as it does a write-back in arch_sync_dma_for_device() irrespective of direction). We can hide this extended kmalloc() redzoning behind an arch select but, as it is, I'd only do redzoning from an ALIGN(orig_size, dma_get_cache_alignment()) offset. Is the combination of SLAB_HWCACHE_ALIGN and SLAB_RED_ZONE similarly affected? At least that's an explicit opt in and people shouldn't pass it if they intend the objects to be used for DMA. -- Catalin