From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE7E3C36010 for ; Mon, 7 Apr 2025 17:12:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 66FD56B0005; Mon, 7 Apr 2025 13:12:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 61B256B0006; Mon, 7 Apr 2025 13:12:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E3196B0007; Mon, 7 Apr 2025 13:12:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 312A36B0005 for ; Mon, 7 Apr 2025 13:12:16 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AA699AF586 for ; Mon, 7 Apr 2025 17:12:16 +0000 (UTC) X-FDA: 83307891072.09.D2FA979 Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf14.hostedemail.com (Postfix) with ESMTP id 00A27100013 for ; Mon, 7 Apr 2025 17:12:14 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of cmarinas@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744045935; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZeQDrG6w0D8p+IH5jsspKuiWVwsG7k0uKydhLpKV/K0=; b=AVhDRGhK5jwm1SQ/mehjtvFkjZ4KxY9xAxW0KHxKeMgLFBHOztW5HqqrDaWC1u5nrulF1K UEEohneZvwsDDA8WLnuzxEMJjj+fn3cAJ/GrRjkhG5lZOiz75krvcB8CflFHuAhOwm3dB+ WRHlSv5DArB/Dw56ccBmnmO4eT2YtVY= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of cmarinas@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=cmarinas@kernel.org; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744045935; a=rsa-sha256; cv=none; b=2/uv8HhOpBG7oPWRTkpad8xDRTD0ZPShgxKQwP63QshnPx7cz9ISJnwnRlcGNq4fRA8I6h nIA0/aZfs8645bj6BVbT80GLCTG6CPGUUwzMI6EXfbcM5uVrJcJ6oaiCov2Q+eYNuks+oJ Zu+RVp81cDVx6yFLOUpPb/CfrKpS+/Y= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id BE0D961155; Mon, 7 Apr 2025 17:12:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 38BD9C4CEDD; Mon, 7 Apr 2025 17:12:12 +0000 (UTC) Date: Mon, 7 Apr 2025 18:12:09 +0100 From: Catalin Marinas To: Vlastimil Babka Cc: Feng Tang , Petr Tesarik , Harry Yoo , Peng Fan , Hyeonggon Yoo <42.hyeyoo@gmail.com>, David Rientjes , Christoph Lameter , "linux-mm@kvack.org" Subject: Re: slub - extended kmalloc redzone and dma alignment Message-ID: References: <20250404131239.2a987e58@mordecai> <20250404155303.2e0cdd27@mordecai> <39657cf9-e24d-4b85-9773-45fe26dd16ae@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <39657cf9-e24d-4b85-9773-45fe26dd16ae@suse.cz> X-Rspamd-Queue-Id: 00A27100013 X-Stat-Signature: txge6oscnekfx68t1a39jom4xnbwxa5n X-Rspam-User: X-Rspamd-Server: rspam12 X-HE-Tag: 1744045934-651674 X-HE-Meta: U2FsdGVkX1/gKeXcR3+TiO1PAvowazscckK643Kwq+7cbfpIf/WboJC+VhjKqexrDf7vFhP8wFLWXe9sdBtYUv+QdxsT4kYfO+s8MSQhA/J1SvfCw8sESFllcs+bsnFfTfONo5tGCj/rsVS0EpS1I6OZb3U0SVlDHvTj5fEZ3u50C1dnxlbhfhdAt7DclsQsljeYQBHkde93uEDT7NUDljvNLiJQZnfcFKkBZ+qVCZYGMEdT6vTbfln/JNuOfxtvypE6VxCeo2ahvFVUV5SEjqe7ttySRGjiM+w0a3miMKH8OJNDIUHRh0SrBZC/hjpGhKcYukhrxRRvUfVoskDrnzfNCE1+cvtmpgqf339tNWj85Krykvx0OJAh8GKDa21mcuorcjtU/TExL2He6LlKw/RxSr+gNp+y4zPMmyQpGRQhf6EeA/E94XM6j4+CRDJgnXVO0c7KeeMSj/0JkQoZUQbLpuxW+muVd6uwmemIRa8x+D1BvXa7rH70WwSI22T7RGdaRSbQmYFeozCldA6k851prtBd6/U//tCsl7oP5oc7J3oeo0R7CJtOCNdLX+3o/cge+UNV2A6JXkSg2bqYAVPPmF+XevuS23E6/Y0d4nE1lU0r68fHa4DTEimFlxlA9PMMXh7lNMQ2pUbFXwITYD94vsDdRrrxUwFAgamM7RaskBdwsdOYndOpysGg71pA/mYQ0tU201ew0IZrV2w5vQRHERn4pNoCjU6VFafbjsQFt5Ls/MVbFSDr0sUh3DrKb6Es3DbzY+vljejXpmIC+2w8yy/2MwMGW7dGIKmXEjMK16eQy86uFu8wE7r2E/nuu4KEJLDrdbteYhzVhuPrq7AD2wOxdSXnPU5AH/Gt4DK6+nmBl2D3ym4UfMhHD2hm//ujCLhoAhy1KqLh1TF8NUnuayT1IJKwxTZP6N8YZ3CF34Ek+jCqV0Iho0mH1mA9c+PjGfMzxPJsijOuOiZ LhfI94ub IZoVrzIKfb0zz2/DrFpxgfDUXE12qlJcLtCl4ZtX37J80cQpL4L+crRyOamV7P4fX0ADtfPKe4XdvtMNzYaf+mz0WKjcWK36uRaoQucWB0J1zs3IRGOxPrK1coTRI3HfYq7kOXbfaS3wjVohYwP3z0mBIcQ7B3qamWMyrLxSKb8KsU4bvKtlnug3b2Tl/bHRQtCSWIVwvPPXeaVy+vDDpVBhxUbOtp7B5kGldfxbS88odpM177PI0t1NYDwXI3e8BVSf2lzg6us7oDBkbFWKLIqPKdZiil25Bf99kofmYwx8eOssE5QPmTPatRAEWoxPKJUJtHMlv5l1sp9ypD1T24yBIihKyJdvhFHiCpnuUbnjSaNL4/tbTyC7Dd7NpLLrZ6iKrBXzzhH3bV7PAxh9wCYd+EmSG/F1vnKWRjB6Am6qSzDWyj2SI5HDTIo3yOhvMVVXDENfjip27QLM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Vlastimil, Feng, Thanks for looping me in. I'm just catching up with this thread. On Mon, Apr 07, 2025 at 09:54:41AM +0200, Vlastimil Babka wrote: > On 4/7/25 09:21, Feng Tang wrote: > > On Sun, Apr 06, 2025 at 10:02:40PM +0800, Feng Tang wrote: > > [...] > >> > I can remember this series, as well as my confusion why 192-byte > >> > kmalloc caches were missing on arm64. > >> > > >> > Nevertheless, I believe ARCH_DMA_MINALIGN is required to avoid putting > >> > a DMA buffer on the same cache line as some other data that might be > >> > _written_ by the CPU while the corresponding main memory is modified by > >> > another bus-mastering device. > >> > > >> > Consider this layout: > >> > > >> > ... | DMA buffer | other data | ... > >> > ^ ^ > >> > +-------------------------+-- cache line boundaries > >> > > >> > When you prepare for DMA, you make sure that the DMA buffer is not > >> > cached by the CPU, so you flush the cache line (from all levels). Then > >> > you tell the device to write into the DMA buffer. However, before the > >> > device finishes the DMA transaction, the CPU accesses "other data", > >> > loading this cache line from main memory with partial results. Worse, > >> > if the CPU writes to "other data", it may write the cache line back > >> > into main memory, racing with the device writing to DMA buffer, and you > >> > end up with corrupted data in DMA buffer. Yes, cache evictions from 'other data; can override the DMA. Another problem, when the DMA completed, the kernel does a cache invalidation to remove any speculatively loaded cache lines from the DMA buffer but that would also invalidate 'other data', potentially corrupting it if it was dirty. So it's not safe to have DMA into buffers less than ARCH_DMA_MINALIGN (and unaligned). What I did with reducing the minimum kmalloc() alignment was to force bouncing via swiotlb if the size passed to the DMA API is small. It may end up bouncing buffers that did not originate from kmalloc() or have proper alignment (with padding) but that's some heuristics we were willing to accept to be able to use small kmalloc() caches on arm64 - see dma_kmalloc_needs_bounce(). Does redzoning apply to kmalloc() or kmem_cache_create() (or both)? I haven't checked yet but if the red zone is within ARCH_DMA_MINALIGN (or rather dma_get_cache_alignment()), we could have issues with either corrupting the DMA buffer or the red zone. Note that this only applies to DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. > >> > But redzone poisoning should happen long before the DMA buffer cache > >> > line is flushed. The device will not overwrite it unless it was given > >> > wrong buffer length for the transaction, but then that would be a bug > >> > that I'd rather detect. > >> > >> I alaso tend to think it's better for slub to detect these kind of DMA > >> 'overflow'. We've added slub kunit test case for these in commmit > >> 6cd6d33ca41f ("mm/slub, kunit: Add a test case for kmalloc redzone check), > >> which was inspired by a similar DMA related bug as described in > >> commit 120ee599b5bf ("staging: octeon-usb: prevent memory corruption") > > OK so besides Petr's explanation that was about cache (in)coherency and is > AFAIK tied to ARCH_DMA_MINALIGN, there is possibility of DMA that will > really write garbage beyond the buffer that's not word aligned. Can we > assume that this was really a bug in the usage and ensuring word alignment > (not ARCH_DMA_MINALIGN alignment) is required from a different layer than > kmalloc() itself? In that case it would be best to keep the reporting as it is. dma_direct_map_page() for example bounces the DMA if it detects a small size, just in case it came from kmalloc(). Similarly in the iommu code - dev_use_sg_swiotlb(). > > I'm not familiar with DMA stuff, but Vlastimil's idea does make it > > easier for driver developer to write a driver to be used on different > > ARCHs, which have different DMA alignment requirement. Say if the minimal > > safe size is 8 bytes, the driver can just request 8 bytes and > > ARCH_DMA_MINALIGN will automatically chose the right size for it, which > > can save memory for ARCHs with smaller alignment requirement. Meanwhile > > it does sacrifice part of the redzone check ability, so I don't have > > preference here :) > > Let's clarify first who's expected to ensure the word alignment for DMA, if > it's not kmalloc() then I'd rather resist moving it there :) In theory, the DMA API should handle the alignment as I tried to remove it from the kmalloc() code. With kmem_cache_create() (or kmalloc() as well), if the object size is not cacheline-aligned, is there risk of redzoning around the object without any alignment restrictions? The logic in dma_kmalloc_size_aligned() would fail for sufficiently large buffers but with unaligned red zone around the object. Not sure how to fix this though in the DMA API. At least for kmem_cache_create() one can pass SLAB_HWCACHE_ALIGN and it will force the alignment. I need to check what redzoning does in this case (won't have time until tomorrow, I just got back from holiday and lots of emails to go through). -- Catalin