From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAC3CC36002 for ; Wed, 9 Apr 2025 08:39:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3CB196B0083; Wed, 9 Apr 2025 04:39:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39EBE6B008A; Wed, 9 Apr 2025 04:39:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23FCD6B0092; Wed, 9 Apr 2025 04:39:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 047576B0083 for ; Wed, 9 Apr 2025 04:39:09 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 9AE9D1CE718 for ; Wed, 9 Apr 2025 08:39:11 +0000 (UTC) X-FDA: 83313855702.19.FE8B3A1 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) by imf25.hostedemail.com (Postfix) with ESMTP id 94AA5A0010 for ; Wed, 9 Apr 2025 08:39:09 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=CwcjLtv0; spf=pass (imf25.hostedemail.com: domain of ptesarik@suse.com designates 209.85.221.47 as permitted sender) smtp.mailfrom=ptesarik@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744187949; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UHw3gqf2/NeXsFnuKebtz7v6oLj6IeK8CSXXz/DkYY8=; b=4I8CVqdHNJq4LHskh5NSgXKqtcNN5PfROSanpHtYERdWkfQiOPHVmhEHVoJzLPD4002XAq rJU7HM+7p6i46sPqgRjXa5B5fAt8pCjm+0xO4B2Yl9ph9r4Cfm8QJ74vlrJ4YAdN/RM1z8 XvkS7EkPZuhw7InfiwtzMzO8Tx5Y21Y= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=CwcjLtv0; spf=pass (imf25.hostedemail.com: domain of ptesarik@suse.com designates 209.85.221.47 as permitted sender) smtp.mailfrom=ptesarik@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744187949; a=rsa-sha256; cv=none; b=X6/i+sCGqZOYLRzlx08I0NEQOVEiEIt814kGuXcq00ZwU0f/m+NiV9ao5xu44YxgnmplYm +cZIkUgouDq4pH5jxK2a9s/B11i1jGzwfFmYV0lj14gCKmY2DvLUJYdyiUeznTxb1EppmF 5OEi3N2RU7h4ABY0+ndSCi4/O5hNUPo= Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-39123ad8a9fso524474f8f.2 for ; Wed, 09 Apr 2025 01:39:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1744187948; x=1744792748; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=UHw3gqf2/NeXsFnuKebtz7v6oLj6IeK8CSXXz/DkYY8=; b=CwcjLtv0HzM8d9Eb4NN93yje6mUSWd9DlV4QN8lJF9DlJI73t+9lKhtZ0BsM0fxqXh 1hTArIRle5/7xDlVSeKrhX/b9V0EcfyLS3QZAkq7gWFXb+SwD6ZJbrTPCgF1MyR50rBK k6lmynGKjnrsu4jpSnhw+ubD0u61XJ6XCBs2RmltvG2F2tGwwWWkoHnUwK8hzTf9b0SH FiBiebYI6GqqrDNBp3SkEfmPnkk4cqEc0Xoq5HI/Z0zLDoYTKn9ax9Kz8fQ40OYi0nxy uOgUy017cqMcj+gRwBITWriAEDNhDgDkzE1bVJeuC+Be0GNGgwyX6ZQp/qR0rM/RkRaH /pNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744187948; x=1744792748; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UHw3gqf2/NeXsFnuKebtz7v6oLj6IeK8CSXXz/DkYY8=; b=ZE0fzcBWEtLkFuSLVxrhg/K1rnN2YUaqw48SV/wYoZNmPyUNTK4F5frv9GODcwbCdG gunIvOrF/RPlgyx3ACwBeK+Ib6XbKlXG4nkAG7mLZwyprlM47kBPeShVtIfcJlKypnUu Ql3fM5SOD68O0JD+AC5V1DXCaZ590NMf3BuJNxyWske4gKmpUWj2IIPcIDuyfSCG4El/ LjlmSDEDDCt9I98x/HeRnaP6R4dxZ/1bQHN3Ar1fq6f7vK/5UfrG6jg1yPQzwyHibyWK H7YExhKsht7zSoY4CefRMR5C1VG5ny3LlYTcw/l/2SBbcF1fIu9PcKEsSDwKNVoNhbcd UCCg== X-Forwarded-Encrypted: i=1; AJvYcCV8W6JqbLI9/a2th1O/ywr7/Kw7a/UVL1NP5pR2CQsrw740h9hPp/jvegV9iw8w1iipLqB6vMDWcg==@kvack.org X-Gm-Message-State: AOJu0YyFxJdPa8XCio6Pd2JGjpg0u1dQ1JcaBTcCnaJRRBPq1fP3LO11 0nYfArGm1Lou7MUOtivWyOvI6hwWmp9LKh+BKqDP16ys0g5tE8U5mjpg5zlUMec= X-Gm-Gg: ASbGncvWM/LNNWgAhxbqU8N94CkkLWVoGhf0CBciDZNh+lgVTaRQocLHXbn4UmJ+DCl bIsc54MJ7qZOiYsVAqPDB2z04xpEDt66RufJJnmkDeahtrDIPqrdJ8Iw3lPOUiWe29VoafFWFmJ 7jLpA7PLM0er6ZntLuExqorTDzWFB4H9+Fr004CmC4nrqw++gGE8EisgNmCZBo2eIRnhDmTI9dj FniQq+hde5c8YerZ8x0V6ELRZ1CCvFkALGbsYOX5NgvWdsoqMyIVvsgoybqynuHQhme/6x7+V0B fbJK+Hak9KDAdmwi48GL43bWakXtC6sB2kMkqMZJmPu40I0FETf/M82NN8UzgMsO3wcEj5naonB QI2p9mEjFvAdluA/wSPE1r+HC2hhRvg== X-Google-Smtp-Source: AGHT+IHtcWywJlvRTLedWYjuIemVjtMfwoGAMd7Z1WMPVEzwV3KuR/csTfalsP6lbr89BmGo9PcxlA== X-Received: by 2002:a05:6000:290d:b0:38d:ba08:dd64 with SMTP id ffacd0b85a97d-39d87a69a94mr713004f8f.0.1744187947777; Wed, 09 Apr 2025 01:39:07 -0700 (PDT) Received: from mordecai (dynamic-2a00-1028-83b8-1e7a-b223-ac12-b926-9872.ipv6.o2.cz. [2a00:1028:83b8:1e7a:b223:ac12:b926:9872]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-39d8938a69asm955852f8f.50.2025.04.09.01.39.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 01:39:07 -0700 (PDT) Date: Wed, 9 Apr 2025 10:39:04 +0200 From: Petr Tesarik To: Catalin Marinas Cc: Vlastimil Babka , Feng Tang , Harry Yoo , Peng Fan , Hyeonggon Yoo <42.hyeyoo@gmail.com>, David Rientjes , Christoph Lameter , "linux-mm@kvack.org" , Robin Murphy , Sean Christopherson , Halil Pasic Subject: Re: slub - extended kmalloc redzone and dma alignment Message-ID: <20250409103904.54a19faa@mordecai> In-Reply-To: References: <20250404131239.2a987e58@mordecai> <20250404155303.2e0cdd27@mordecai> <39657cf9-e24d-4b85-9773-45fe26dd16ae@suse.cz> <20250408072732.32db7809@mordecai> X-Mailer: Claws Mail 4.3.1 (GTK 3.24.48; x86_64-suse-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 94AA5A0010 X-Stat-Signature: 78er6o9gmnwfxoaybsi7dw1wrh6yaf7m X-HE-Tag: 1744187949-369203 X-HE-Meta: U2FsdGVkX190gUb5lwaIxM34DVrGBB6OLakfQ2nsmtJrFxlRgzusR8W2gAh5ywZ/WBFWzoV7EY9BPL2/jNvdTb63+4RWeQYQtmQ7QRHPGdgvdB8lzWGF6sBGXu21gtYITBxvF0UxTA8FC6tBaTQ1+nE+SfYAYW5sSvUQ54dzjaXUSqkL0+sm4WnY1ow+0ZzY/9iWc9Ox9Ohb38O3WGiBVd1SuOBeOLT8rRRXk4HWkLw4zEjDJ8FKj7crL/WLzDqubq5E+si0uHrbPSIS4LpA0M0WwnT6dYBmq7onb15A175xM2H7eYM+z51K1yjprUzjQacaj442K/lW2qub4ovbivh0ciH5q2D15/jAkPqAdmK/tbneoNrAvjmyvKJ9uZCSdnvPlZ1X4FfVLveVQozUidkvVJb21p4b6H9DWv1nMzeeXfHRm6lhMLzpLasEYVU01qVTkQITgCmpwx9hbRW2KNF6AsOEqGUr0MmUGXxlmPKZpNuP/xlKeZIT8SZFQ+UrnBiGaS3OnjA/IcS0daNFoII65e1wb82K+Cw4zNE1r3hVFr7402R1DokYMxSG3l7LrOCVedX/7/cSAKMH1/ySTvjEbUPyiPww68EeUprvvVu0Xp0P7YkNg+RxGULO6xnfAE9RFpYBH9SnFnRQDDebGaNyKMKoFxrx39ouvtKry3bIrUWmeImJhBKYtcywZp5vrv49L2wkolKz3BffwMYpy5bj6G58gj8gkr94aT0ShEZm2nG+dylAf0/o9ZwvRGNN0dZ46beHG3finjyqXRnxWkZZZw1wYTkMeTeg0EDL0cGOEnoInSRHIdgksQ6SWRGJU+iOZXs2bWzgNWOu9uWllWUX46I2qUCKc0Z2PouPNB2d/KleXwv+ICCRdCPGAGWrzdRn0awkJk92Gsz6ef1LlgcnxwTHfaCg23VA5n4i7JfNTcWLCoVD0hW9yeitpEkekcaXa2ZJL+fPD8YW0lk 9UoGBs/5 xNfYdZ91nxNkHnbtuI3MEKTq5i40PHh8kLD4TT/R0TNx+6wTO9R/pQbdZqFFrmxKvcNdCqKaODWgGDKCOUZ52u5P3O7p3dOC0CLc3j3o23zHQw5ZVdwInYH0xqlsCjtoO++oB2gT+iB5nBca6yT5tjKrRjbt/fjrB0i14U3jaclKnFS2YOUtJvys9GnDz363DX4v2sb88pP9fN7c7NBI0ubEPVc36cedBRIJ0AYlLclPebvCNsrH7TDRMO5gbUb5Xypd6I5UNnTrEmzlj0oOtaMXgk5rH3PV7fir3Bk6ILHDEdRS3CksUjuRVjoStg6CTFTYuMPbmb2lDFz3Gvu6WG6ntj0iqONBzkcVbbbTRvCu/7c+O3GmtwrOx1wkz6NnzgzbebF3b38bx2mx/2n067Ho8XqZgrEtxy609hEdG2Vyx46E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 8 Apr 2025 16:07:19 +0100 Catalin Marinas wrote: > On Tue, Apr 08, 2025 at 07:27:32AM +0200, Petr Tesarik wrote: > > On Mon, 7 Apr 2025 18:12:09 +0100 > > Catalin Marinas wrote: > > > Thanks for looping me in. I'm just catching up with this thread. > > > > > > On Mon, Apr 07, 2025 at 09:54:41AM +0200, Vlastimil Babka wrote: > > > > On 4/7/25 09:21, Feng Tang wrote: > > > > > On Sun, Apr 06, 2025 at 10:02:40PM +0800, Feng Tang wrote: > > > > > [...] > > > > >> > I can remember this series, as well as my confusion why > > > > >> > 192-byte kmalloc caches were missing on arm64. > > > > >> > > > > > >> > Nevertheless, I believe ARCH_DMA_MINALIGN is required to avoid > > > > >> > putting a DMA buffer on the same cache line as some other data > > > > >> > that might be _written_ by the CPU while the corresponding > > > > >> > main memory is modified by another bus-mastering device. > > > > >> > > > > > >> > Consider this layout: > > > > >> > > > > > >> > ... | DMA buffer | other data | ... > > > > >> > ^ ^ > > > > >> > +-------------------------+-- cache line boundaries > > > > >> > > > > > >> > When you prepare for DMA, you make sure that the DMA buffer is > > > > >> > not cached by the CPU, so you flush the cache line (from all > > > > >> > levels). Then you tell the device to write into the DMA > > > > >> > buffer. However, before the device finishes the DMA > > > > >> > transaction, the CPU accesses "other data", loading this cache > > > > >> > line from main memory with partial results. Worse, if the CPU > > > > >> > writes to "other data", it may write the cache line back into > > > > >> > main memory, racing with the device writing to DMA buffer, and > > > > >> > you end up with corrupted data in DMA buffer. > > > > > > Yes, cache evictions from 'other data; can override the DMA. Another > > > problem, when the DMA completed, the kernel does a cache invalidation > > > to remove any speculatively loaded cache lines from the DMA buffer > > > but that would also invalidate 'other data', potentially corrupting > > > it if it was dirty. > > > > > > So it's not safe to have DMA into buffers less than ARCH_DMA_MINALIGN > > > (and unaligned). > > > > It's not safe to DMA into buffers that share a CPU cache line with other > > data, which could be before or after the DMA buffer, of course. >[...] > While I think we are ok for arm64, other architectures may invalidate > the caches in the arch_sync_dma_for_device() which could discard the red > zone data. A quick grep for arch_sync_dma_for_device() shows several > architectures invalidating the caches in the FROM_DEVICE case. Wait. This sounds broken for partial writes into the DMA buffer, i.e. where only part of the buffer is updated by a bus-mastering device. When I worked on swiotlb, I was told that such partial updates must be supported, and that's why the initial swiotlb_bounce() cannot be removed from swiotlb_tbl_map_single(). In fact, the comment says: /* * When the device is writing memory, i.e. dir == DMA_FROM_DEVICE, copy * the original buffer to the TLB buffer before initiating DMA in order * to preserve the original's data if the device does a partial write, * i.e. if the device doesn't overwrite the entire buffer. Preserving * the original data, even if it's garbage, is necessary to match * hardware behavior. Use of swiotlb is supposed to be transparent, * i.e. swiotlb must not corrupt memory by clobbering unwritten bytes. */ You may want to check commit ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE"), commit aa6f8dcbab47 ("swiotlb: rework "fix info leak with DMA_FROM_DEVICE") and commit 1132a1dc053e ("swiotlb: rewrite comment explaining why the source is preserved on DMA_FROM_DEVICE"). I believe there is potential for a nasty race condition, and maybe even info leak. Consider this: 1. DMA buffer is allocated by kmalloc(). The memory area previously contained sensitive information, which had been written to main memory. 2. The DMA buffer is initialized with zeroes, but this new content stays in a CPU cache (because this is kernel memory with a write behind cache policy). 3. DMA is set up, but nothing is written to main memory by the bus-mastering device. 4. The CPU cache line is now discarded in arch_sync_dma_for_cpu(). IIUC the zeroes were never written to main memory, and previous content can now be read by the CPU through the DMA buffer. I haven't checked if any architecture is affected, but I strongly believe that the CPU cache MUST be flushed both before and after the DMA transfer. Any architecture which does not do it that way should be fixed. Or did I miss a crucial detail (again)? > > > What I did with reducing the minimum kmalloc() > > > alignment was to force bouncing via swiotlb if the size passed to > > > the DMA API is small. It may end up bouncing buffers that did not > > > originate from kmalloc() or have proper alignment (with padding) > > > but that's some heuristics we were willing to accept to be able > > > to use small kmalloc() caches on arm64 - see > > > dma_kmalloc_needs_bounce(). > > > > > > Does redzoning apply to kmalloc() or kmem_cache_create() (or > > > both)? I haven't checked yet but if the red zone is within > > > ARCH_DMA_MINALIGN (or rather dma_get_cache_alignment()), we could > > > have issues with either corrupting the DMA buffer or the red > > > zone. [...] > > > > I'm sorry if I'm being thick, but IIUC the red zone does not have > > to be protected. Yes, we might miss red zone corruption if it > > happens to race with a DMA transaction, but I have assumed that > > this is permissible. I regard the red zone as a useful debugging > > tool, not a safety measure that is guaranteed to detect any write > > beyond the buffer end. > > Yeah, it's debugging, but too many false positives make the feature > pretty useless. Agreed. I was defending only sporadic false negatives. If there is room for false positives, that's a no-go. > > > > > I'm not familiar with DMA stuff, but Vlastimil's idea does make it > > > > > easier for driver developer to write a driver to be used on > > > > > different ARCHs, which have different DMA alignment requirement. > > > > > Say if the minimal safe size is 8 bytes, the driver can just > > > > > request 8 bytes and ARCH_DMA_MINALIGN will automatically chose > > > > > the right size for it, which can save memory for ARCHs with > > > > > smaller alignment requirement. Meanwhile it does sacrifice part > > > > > of the redzone check ability, so I don't have preference here :) > > > > > > > > Let's clarify first who's expected to ensure the word alignment for > > > > DMA, if it's not kmalloc() then I'd rather resist moving it there > > > > :) > > > > > > In theory, the DMA API should handle the alignment as I tried to > > > remove it from the kmalloc() code. > > > > Are we talking about the alignment of the starting address, or buffer > > size, or both? > > The DMA API bouncing logic only checks the buffer size and assumes > start/end would be aligned to kmalloc_size_roundup() with no valid data > between them (FROM_DEVICE). If that's correct behavior, then it must be documented, because other places seem to disagree on the assumptions about prior content of a DMA buffer (FROM_DEVICE). See the swiotlb comment above. Petr T