linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Petr Tesarik <ptesarik@suse.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Feng Tang <feng.tang@linux.alibaba.com>,
	Harry Yoo <harry.yoo@oracle.com>, Peng Fan <peng.fan@nxp.com>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>,
	David Rientjes <rientjes@google.com>,
	Christoph Lameter <cl@linux.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Sean Christopherson <seanjc@google.com>,
	Halil Pasic <pasic@linux.ibm.com>
Subject: Re: slub - extended kmalloc redzone and dma alignment
Date: Wed, 9 Apr 2025 11:05:29 +0200	[thread overview]
Message-ID: <20250409110529.3ad65b3c@mordecai> (raw)
In-Reply-To: <20250409103904.54a19faa@mordecai>

On Wed, 9 Apr 2025 10:39:04 +0200
Petr Tesarik <ptesarik@suse.com> wrote:

> On Tue, 8 Apr 2025 16:07:19 +0100
> Catalin Marinas <catalin.marinas@arm.com> wrote:
> 
> > On Tue, Apr 08, 2025 at 07:27:32AM +0200, Petr Tesarik wrote:  
> > > On Mon, 7 Apr 2025 18:12:09 +0100
> > > Catalin Marinas <catalin.marinas@arm.com> wrote:    
> > > > Thanks for looping me in. I'm just catching up with this thread.
> > > > 
> > > > On Mon, Apr 07, 2025 at 09:54:41AM +0200, Vlastimil Babka wrote:    
> > > > > On 4/7/25 09:21, Feng Tang wrote:      
> > > > > > On Sun, Apr 06, 2025 at 10:02:40PM +0800, Feng Tang wrote:
> > > > > > [...]      
> > > > > >> > I can remember this series, as well as my confusion why
> > > > > >> > 192-byte kmalloc caches were missing on arm64.
> > > > > >> > 
> > > > > >> > Nevertheless, I believe ARCH_DMA_MINALIGN is required to avoid
> > > > > >> > putting a DMA buffer on the same cache line as some other data
> > > > > >> > that might be _written_ by the CPU while the corresponding
> > > > > >> > main memory is modified by another bus-mastering device.
> > > > > >> > 
> > > > > >> > Consider this layout:
> > > > > >> > 
> > > > > >> >  ... | DMA buffer | other data | ...
> > > > > >> >      ^                         ^
> > > > > >> >      +-------------------------+-- cache line boundaries
> > > > > >> > 
> > > > > >> > When you prepare for DMA, you make sure that the DMA buffer is
> > > > > >> > not cached by the CPU, so you flush the cache line (from all
> > > > > >> > levels). Then you tell the device to write into the DMA
> > > > > >> > buffer. However, before the device finishes the DMA
> > > > > >> > transaction, the CPU accesses "other data", loading this cache
> > > > > >> > line from main memory with partial results. Worse, if the CPU
> > > > > >> > writes to "other data", it may write the cache line back into
> > > > > >> > main memory, racing with the device writing to DMA buffer, and
> > > > > >> > you end up with corrupted data in DMA buffer.      
> > > > 
> > > > Yes, cache evictions from 'other data; can override the DMA. Another
> > > > problem, when the DMA completed, the kernel does a cache invalidation
> > > > to remove any speculatively loaded cache lines from the DMA buffer
> > > > but that would also invalidate 'other data', potentially corrupting
> > > > it if it was dirty.
> > > > 
> > > > So it's not safe to have DMA into buffers less than ARCH_DMA_MINALIGN
> > > > (and unaligned).    
> > > 
> > > It's not safe to DMA into buffers that share a CPU cache line with other
> > > data, which could be before or after the DMA buffer, of course.    
> >[...]
> > While I think we are ok for arm64, other architectures may invalidate
> > the caches in the arch_sync_dma_for_device() which could discard the red
> > zone data. A quick grep for arch_sync_dma_for_device() shows several
> > architectures invalidating the caches in the FROM_DEVICE case.  
> 
> Wait. This sounds broken for partial writes into the DMA buffer, i.e.
> where only part of the buffer is updated by a bus-mastering device.
> When I worked on swiotlb, I was told that such partial updates must
> be supported, and that's why the initial swiotlb_bounce() cannot be
> removed from swiotlb_tbl_map_single(). In fact, the comment says:
> 
> 	/*
> 	 * When the device is writing memory, i.e. dir == DMA_FROM_DEVICE, copy
> 	 * the original buffer to the TLB buffer before initiating DMA in order
> 	 * to preserve the original's data if the device does a partial write,
> 	 * i.e. if the device doesn't overwrite the entire buffer.  Preserving
> 	 * the original data, even if it's garbage, is necessary to match
> 	 * hardware behavior.  Use of swiotlb is supposed to be transparent,
> 	 * i.e. swiotlb must not corrupt memory by clobbering unwritten bytes.
> 	 */
> 
> You may want to check commit ddbd89deb7d3 ("swiotlb: fix info leak with
> DMA_FROM_DEVICE"), commit aa6f8dcbab47 ("swiotlb: rework "fix info leak
> with DMA_FROM_DEVICE") and commit 1132a1dc053e ("swiotlb: rewrite
> comment explaining why the source is preserved on DMA_FROM_DEVICE").
> 
> I believe there is potential for a nasty race condition, and maybe even
> info leak. Consider this:
> 
> 1. DMA buffer is allocated by kmalloc(). The memory area previously
>    contained sensitive information, which had been written to main
>    memory.
> 2. The DMA buffer is initialized with zeroes, but this new content
>    stays in a CPU cache (because this is kernel memory with a write
>    behind cache policy).
> 3. DMA is set up, but nothing is written to main memory by the
>    bus-mastering device.
> 4. The CPU cache line is now discarded in arch_sync_dma_for_cpu().
> 
> IIUC the zeroes were never written to main memory, and previous content
> can now be read by the CPU through the DMA buffer.
> 
> I haven't checked if any architecture is affected, but I strongly
> believe that the CPU cache MUST be flushed both before and after the
> DMA transfer. Any architecture which does not do it that way should be
> fixed.
> 
> Or did I miss a crucial detail (again)?

Just after sending this, I realized I did. :(

There is a step between 2 and 3:

2a. arch_sync_dma_for_device() invalidates the CPU cache line.
    Architectures which do not write previous content to main memory
    effectively undo the zeroing here.

AFAICS the consequence is still the same: race condition and/or info
leak on partial (or zero) DMA write.

Petr T


  reply	other threads:[~2025-04-09  9:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-04  9:30 Vlastimil Babka
2025-04-04 10:30 ` Harry Yoo
2025-04-04 11:12   ` Petr Tesarik
2025-04-04 12:45     ` Vlastimil Babka
2025-04-04 13:53       ` Petr Tesarik
2025-04-06 14:02         ` Feng Tang
2025-04-07  7:21           ` Feng Tang
2025-04-07  7:54             ` Vlastimil Babka
2025-04-07  9:50               ` Petr Tesarik
2025-04-07 17:12               ` Catalin Marinas
2025-04-08  5:27                 ` Petr Tesarik
2025-04-08 15:07                   ` Catalin Marinas
2025-04-09  8:39                     ` Petr Tesarik
2025-04-09  9:05                       ` Petr Tesarik [this message]
2025-04-09  9:47                         ` Catalin Marinas
2025-04-09 12:18                           ` Petr Tesarik
2025-04-09 12:49                             ` Catalin Marinas
2025-04-09 13:41                               ` Petr Tesarik
2025-04-09  8:51                     ` Vlastimil Babka
2025-04-09 11:11                       ` Catalin Marinas
2025-04-09 12:22                         ` Vlastimil Babka
2025-04-09 14:30                           ` Catalin Marinas
2025-04-10  1:54                             ` Feng Tang
2025-04-07  7:45         ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250409110529.3ad65b3c@mordecai \
    --to=ptesarik@suse.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=harry.yoo@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=pasic@linux.ibm.com \
    --cc=peng.fan@nxp.com \
    --cc=rientjes@google.com \
    --cc=robin.murphy@arm.com \
    --cc=seanjc@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox