From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1990C433FE for ; Thu, 21 Apr 2022 11:07:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C6B5F6B0071; Thu, 21 Apr 2022 07:07:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C1B606B0073; Thu, 21 Apr 2022 07:07:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE2C16B0074; Thu, 21 Apr 2022 07:07:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 9FA4A6B0071 for ; Thu, 21 Apr 2022 07:07:11 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 6A1CD634F0 for ; Thu, 21 Apr 2022 11:07:11 +0000 (UTC) X-FDA: 79380609462.14.AA6AE5A Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf04.hostedemail.com (Postfix) with ESMTP id A58194002A for ; Thu, 21 Apr 2022 11:07:08 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id 3D200CE21C1; Thu, 21 Apr 2022 11:07:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1FDB7C385A5; Thu, 21 Apr 2022 11:07:01 +0000 (UTC) Date: Thu, 21 Apr 2022 12:06:58 +0100 From: Catalin Marinas To: Christoph Hellwig Cc: Arnd Bergmann , Ard Biesheuvel , Herbert Xu , Will Deacon , Marc Zyngier , Greg Kroah-Hartman , Andrew Morton , Linus Torvalds , Linux Memory Management List , Linux ARM , Linux Kernel Mailing List , "David S. Miller" Subject: Re: [PATCH 07/10] crypto: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf04.hostedemail.com; dkim=none; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=arm.com (policy=none); spf=pass (imf04.hostedemail.com: domain of cmarinas@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=cmarinas@kernel.org X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: A58194002A X-Stat-Signature: nyi35hiw4xhwaxcmtjm3g58jkq6sq6b3 X-HE-Tag: 1650539228-469084 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 21, 2022 at 12:20:22AM -0700, Christoph Hellwig wrote: > Btw, there is another option: Most real systems already require having > swiotlb to bounce buffer in some cases. We could simply force bounce > buffering in the dma mapping code for too small or not properly aligned > transfers and just decrease the dma alignment. We can force bounce if size is small but checking the alignment is trickier. Normally the beginning of the buffer is aligned but the end is at some sizeof() distance. We need to know whether the end is in a kmalloc-128 cache and that requires reaching out to the slab internals. That's doable and not expensive but it needs to be done for every small size getting to the DMA API, something like (for mm/slub.c): folio = virt_to_folio(x); slab = folio_slab(folio); if (slab->slab_cache->align < ARCH_DMA_MINALIGN) ... bounce ... (and a bit different for mm/slab.c) If we scrap ARCH_DMA_MINALIGN altogether from arm64, we can check the alignment against cache_line_size(), though I'd rather keep it for code that wants to avoid bouncing and goes for this compile-time alignment. I think we are down to four options (1 and 2 can be combined): 1. ARCH_DMA_MINALIGN == 128, dynamic arch_kmalloc_minalign() to reduce kmalloc() alignment to 64 on most arm64 SoC - this series. 2. ARCH_DMA_MINALIGN == 128, ARCH_KMALLOC_MINALIGN == 128, add explicit __GFP_PACKED for small allocations. It can be combined with (1) so that allocations without __GFP_PACKED can still get 64-byte alignment. 3. ARCH_DMA_MINALIGN == 128, ARCH_KMALLOC_MINALIGN == 8, swiotlb bounce. 4. undef ARCH_DMA_MINALIGN, ARCH_KMALLOC_MINALIGN == 8, swiotlb bounce. (3) and (4) don't require histogram analysis. Between them, I have a preference for (3) as it gives drivers a chance to avoid the bounce. If (2) is feasible, we don't need to bother with any bouncing or structure alignments, it's an opt-in by the driver/subsystem. However, it may be tedious to analyse the hot spots. While there are a few obvious places (kstrdup), I don't have access to a multitude of devices that may exercise the drivers and subsystems. With (3) the risk is someone complaining about performance or even running out of swiotlb space on some SoCs (I guess the fall-back can be another kmalloc() with an appropriate size). I guess we can limit the choice to either (2) or (3). I have (2) already (needs some more testing). I can attempt (3) and try to run it on some real hardware to see the perf impact. -- Catalin