From: Petr Tesarik <ptesarik@suse.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Feng Tang <feng.tang@linux.alibaba.com>,
Harry Yoo <harry.yoo@oracle.com>, Peng Fan <peng.fan@nxp.com>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
David Rientjes <rientjes@google.com>,
Christoph Lameter <cl@linux.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Robin Murphy <robin.murphy@arm.com>,
Sean Christopherson <seanjc@google.com>,
Halil Pasic <pasic@linux.ibm.com>
Subject: Re: slub - extended kmalloc redzone and dma alignment
Date: Wed, 9 Apr 2025 11:05:29 +0200 [thread overview]
Message-ID: <20250409110529.3ad65b3c@mordecai> (raw)
In-Reply-To: <20250409103904.54a19faa@mordecai>
On Wed, 9 Apr 2025 10:39:04 +0200
Petr Tesarik <ptesarik@suse.com> wrote:
> On Tue, 8 Apr 2025 16:07:19 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
>
> > On Tue, Apr 08, 2025 at 07:27:32AM +0200, Petr Tesarik wrote:
> > > On Mon, 7 Apr 2025 18:12:09 +0100
> > > Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > > Thanks for looping me in. I'm just catching up with this thread.
> > > >
> > > > On Mon, Apr 07, 2025 at 09:54:41AM +0200, Vlastimil Babka wrote:
> > > > > On 4/7/25 09:21, Feng Tang wrote:
> > > > > > On Sun, Apr 06, 2025 at 10:02:40PM +0800, Feng Tang wrote:
> > > > > > [...]
> > > > > >> > I can remember this series, as well as my confusion why
> > > > > >> > 192-byte kmalloc caches were missing on arm64.
> > > > > >> >
> > > > > >> > Nevertheless, I believe ARCH_DMA_MINALIGN is required to avoid
> > > > > >> > putting a DMA buffer on the same cache line as some other data
> > > > > >> > that might be _written_ by the CPU while the corresponding
> > > > > >> > main memory is modified by another bus-mastering device.
> > > > > >> >
> > > > > >> > Consider this layout:
> > > > > >> >
> > > > > >> > ... | DMA buffer | other data | ...
> > > > > >> > ^ ^
> > > > > >> > +-------------------------+-- cache line boundaries
> > > > > >> >
> > > > > >> > When you prepare for DMA, you make sure that the DMA buffer is
> > > > > >> > not cached by the CPU, so you flush the cache line (from all
> > > > > >> > levels). Then you tell the device to write into the DMA
> > > > > >> > buffer. However, before the device finishes the DMA
> > > > > >> > transaction, the CPU accesses "other data", loading this cache
> > > > > >> > line from main memory with partial results. Worse, if the CPU
> > > > > >> > writes to "other data", it may write the cache line back into
> > > > > >> > main memory, racing with the device writing to DMA buffer, and
> > > > > >> > you end up with corrupted data in DMA buffer.
> > > >
> > > > Yes, cache evictions from 'other data; can override the DMA. Another
> > > > problem, when the DMA completed, the kernel does a cache invalidation
> > > > to remove any speculatively loaded cache lines from the DMA buffer
> > > > but that would also invalidate 'other data', potentially corrupting
> > > > it if it was dirty.
> > > >
> > > > So it's not safe to have DMA into buffers less than ARCH_DMA_MINALIGN
> > > > (and unaligned).
> > >
> > > It's not safe to DMA into buffers that share a CPU cache line with other
> > > data, which could be before or after the DMA buffer, of course.
> >[...]
> > While I think we are ok for arm64, other architectures may invalidate
> > the caches in the arch_sync_dma_for_device() which could discard the red
> > zone data. A quick grep for arch_sync_dma_for_device() shows several
> > architectures invalidating the caches in the FROM_DEVICE case.
>
> Wait. This sounds broken for partial writes into the DMA buffer, i.e.
> where only part of the buffer is updated by a bus-mastering device.
> When I worked on swiotlb, I was told that such partial updates must
> be supported, and that's why the initial swiotlb_bounce() cannot be
> removed from swiotlb_tbl_map_single(). In fact, the comment says:
>
> /*
> * When the device is writing memory, i.e. dir == DMA_FROM_DEVICE, copy
> * the original buffer to the TLB buffer before initiating DMA in order
> * to preserve the original's data if the device does a partial write,
> * i.e. if the device doesn't overwrite the entire buffer. Preserving
> * the original data, even if it's garbage, is necessary to match
> * hardware behavior. Use of swiotlb is supposed to be transparent,
> * i.e. swiotlb must not corrupt memory by clobbering unwritten bytes.
> */
>
> You may want to check commit ddbd89deb7d3 ("swiotlb: fix info leak with
> DMA_FROM_DEVICE"), commit aa6f8dcbab47 ("swiotlb: rework "fix info leak
> with DMA_FROM_DEVICE") and commit 1132a1dc053e ("swiotlb: rewrite
> comment explaining why the source is preserved on DMA_FROM_DEVICE").
>
> I believe there is potential for a nasty race condition, and maybe even
> info leak. Consider this:
>
> 1. DMA buffer is allocated by kmalloc(). The memory area previously
> contained sensitive information, which had been written to main
> memory.
> 2. The DMA buffer is initialized with zeroes, but this new content
> stays in a CPU cache (because this is kernel memory with a write
> behind cache policy).
> 3. DMA is set up, but nothing is written to main memory by the
> bus-mastering device.
> 4. The CPU cache line is now discarded in arch_sync_dma_for_cpu().
>
> IIUC the zeroes were never written to main memory, and previous content
> can now be read by the CPU through the DMA buffer.
>
> I haven't checked if any architecture is affected, but I strongly
> believe that the CPU cache MUST be flushed both before and after the
> DMA transfer. Any architecture which does not do it that way should be
> fixed.
>
> Or did I miss a crucial detail (again)?
Just after sending this, I realized I did. :(
There is a step between 2 and 3:
2a. arch_sync_dma_for_device() invalidates the CPU cache line.
Architectures which do not write previous content to main memory
effectively undo the zeroing here.
AFAICS the consequence is still the same: race condition and/or info
leak on partial (or zero) DMA write.
Petr T
next prev parent reply other threads:[~2025-04-09 9:05 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 9:30 Vlastimil Babka
2025-04-04 10:30 ` Harry Yoo
2025-04-04 11:12 ` Petr Tesarik
2025-04-04 12:45 ` Vlastimil Babka
2025-04-04 13:53 ` Petr Tesarik
2025-04-06 14:02 ` Feng Tang
2025-04-07 7:21 ` Feng Tang
2025-04-07 7:54 ` Vlastimil Babka
2025-04-07 9:50 ` Petr Tesarik
2025-04-07 17:12 ` Catalin Marinas
2025-04-08 5:27 ` Petr Tesarik
2025-04-08 15:07 ` Catalin Marinas
2025-04-09 8:39 ` Petr Tesarik
2025-04-09 9:05 ` Petr Tesarik [this message]
2025-04-09 9:47 ` Catalin Marinas
2025-04-09 12:18 ` Petr Tesarik
2025-04-09 12:49 ` Catalin Marinas
2025-04-09 13:41 ` Petr Tesarik
2025-04-09 8:51 ` Vlastimil Babka
2025-04-09 11:11 ` Catalin Marinas
2025-04-09 12:22 ` Vlastimil Babka
2025-04-09 14:30 ` Catalin Marinas
2025-04-10 1:54 ` Feng Tang
2025-04-07 7:45 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250409110529.3ad65b3c@mordecai \
--to=ptesarik@suse.com \
--cc=42.hyeyoo@gmail.com \
--cc=catalin.marinas@arm.com \
--cc=cl@linux.com \
--cc=feng.tang@linux.alibaba.com \
--cc=harry.yoo@oracle.com \
--cc=linux-mm@kvack.org \
--cc=pasic@linux.ibm.com \
--cc=peng.fan@nxp.com \
--cc=rientjes@google.com \
--cc=robin.murphy@arm.com \
--cc=seanjc@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox