From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E2CAAF53D77 for ; Mon, 16 Mar 2026 16:54:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2102F6B0321; Mon, 16 Mar 2026 12:54:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E4746B0322; Mon, 16 Mar 2026 12:54:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C77F6B0323; Mon, 16 Mar 2026 12:54:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E9E5A6B0321 for ; Mon, 16 Mar 2026 12:54:20 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 91130C195B for ; Mon, 16 Mar 2026 16:54:20 +0000 (UTC) X-FDA: 84552524280.10.E4CB799 Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf05.hostedemail.com (Postfix) with ESMTP id C4E7710001A for ; Mon, 16 Mar 2026 16:54:18 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=EMrD8o9P; spf=pass (imf05.hostedemail.com: domain of vbabka@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=vbabka@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1773680058; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HUFh44+LdW+iRWSH/NOEPC/Mj468xIQL/M+3MUW4fao=; b=0Buu+gNjg8ruejwqlHAVk6wJEEDbK4DLq8n848w26XYVDG9P/PDe7j1ViSmdSkSYjA5Shu UU0LRulf/JHVGqer1SFB0aZDEsgqy4OlNXoOVo0xhQWxB54edjN9QXFEwwTvS3hJLIO8/C ozHw77GRnQviBJV/J3DOXoet5NIWGQY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1773680058; a=rsa-sha256; cv=none; b=RHsZiPIepjuUURmEEpBS3s0/G+W/5R7zAGuaVddP+9MXvMWiAauAJ5BGSpHUClXvsNx5HD sf6xwYKlQe++be28Jp+mrBVSFaCzF5WlPNcAKFf2qiKWa7cfexbRwu6O7Xyg4wObpoH/CJ wxfIrtExWE3isdRcTqyEObVnjc5zTMM= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=EMrD8o9P; spf=pass (imf05.hostedemail.com: domain of vbabka@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=vbabka@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 2A13960018; Mon, 16 Mar 2026 16:54:18 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1608C19421; Mon, 16 Mar 2026 16:54:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773680057; bh=zIZU4rznj3GxPgHukE1YbuSrTakM9B750YZ4WSHMPTY=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=EMrD8o9P0Bmg6vDwJp7H/T1I/y8HG18Ox5umJp1FoDPm1GS+E+QP3tnBxO+mNVDxk cE+hZ5RqNR/WRshaSrfcAvSG2IpX+oOqZwKECBNImpivH/SsgWqNT8wpHsO5cKXXy4 I0x+ylBGJ4rhlc6aw3LhzM80DLzrJLWIwIf4jpbBBnNZXJ8bbms2PG2/e4Dm245e3u uGZYo5SQI/xUg5U3BJg3IjaZtP4wqH6KuiHlw8lDyD3ujgkX0zbtpqngtdOOL9Qgs7 kFHAqshKQa9wITxLPliuuPuvZfY9D4rERz9PNlNDh4YKF7EBGh+TMPyjk3kdOhgRqA 4TkJtGdRgdzNg== Message-ID: Date: Mon, 16 Mar 2026 17:54:13 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] iomap: add allocation cache for iomap_dio Content-Language: en-US To: changfengnan , Dave Chinner , Harry Yoo , Hao Li Cc: guzebing , brauner@kernel.org, djwong@kernel.org, hch@infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, guzebing@bytedance.com, syzbot@syzkaller.appspotmail.com, linux-mm@kvack.org References: <20260115021108.1913695-1-guzebing1612@gmail.com> From: "Vlastimil Babka (SUSE)" In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Stat-Signature: fnudztxj4pit8bgsyh1f98kqiaf747uj X-Rspamd-Queue-Id: C4E7710001A X-Rspamd-Server: rspam03 X-HE-Tag: 1773680058-436664 X-HE-Meta: U2FsdGVkX19f2JVhmHdEOGtetkhJ7wWABBL/QANk6d2z7jIKD+zK2plX6Se/18NPBL9pU3xVnB/fa0TU5uQM95GGhmWn2B/oTIW+XuwLwvQanfOOX1xx5smOtUMQx2QbU+VRllzXdizt2p9xmIIlX7GTuJL0QuCsFuK5CejOzgFT2+2LHbreCuvIFIZUy9tMz1Puqb2KEro/aCbcq7FRSRMYjh/VkgeUzOYI2qmw2hbUtH9KWJ5MBnLKKTGhm2yyrJ0q6ocD18QGjG/CjkW8ffenr4066QyzwBH/ZAO0AhDwcN4x2LDTLQTfyxGoYhgy6PpHML62TlW2GBBhbMz59j+g1I2T+ra+Y0K4DgGl6JZDr/yPvnyypx6jh4FZFzMMERg05hyWS67kdtGNseB9cU/6pFtObdVOrMy65C0DM6SWAx4Xa/114GWCKkoCRS0MQnSLTgLXRlmgws6QCMx8+l+9rMAlUR50Dz7/rvCwsX+eMM3ktbYfPN/i1X1SwEPewjeNOHpm4It9os2l0WBxSSCEFv0qzWv5aqLmutAob6vkrM5KvHkYUR5je7KNwcZ+xZ1JnonlD2jnShNntvUXaUE5YwkI/xrqhhWI3tdz8nodVgiz8YwMlRcXukVjBw2i+R0OIG5PAmBAjyN7q6+TuS8eQbInzfCEXsYXSxg/6QwDAFjEJ1dMdZUnIk0FxO4uGM4/JoGXjvkwzvcwEiy4gXKVhkN+K5NuOzkatIvH3KD8/7YRixITLMEHTdjb6Dv18DFZtOPIod1uyiHYp04+fQCze7XUzLghfvKSGlGDGPhV1jZTkJrhVwZ5+Pb9hg3rkhmr0PQRxw5tJCzlV4ZsQ98/lnM5JB17YLfsqYzVOp+W3Z+9h1OhtLlDXLkD/SIpDnh353wOaMy38u1J/5lUmF1YqI011M7rVaHXXG4+gGJN2cliEbDp/eFLa8tMpSU0ebgNn+vib5UJyZhru0N Ij5zwjW5 d93tsFmi7l0RPPKH7pwOyeO1rO18ubAFeJOzfEX+U49EZbyqbCX9JYSNTCuPOL9i0AMK9uZ6Zk6rcJd5pzU9BZazJ09/5PPfLZEY8t9DaZzLoN+z90KRWayckEqMBi6J3JGOHbBDXJ9NMt2tpN5TIp9BHuMSqzAaSY+6fkg/6TuFCIZsj2aGGy/kHn6yfiPrG23dgpjUa+23l9VtuKwvz6SKQidcGE5Qp7b8/oCWJM0TrEH19HW3iBaYyMoUfA1bmAAQEF8xNh4Mth4ZnlRcHfF1fzkV3SqNBNC6eHfXTgyTgbrKoGtMtlgZchoGu+H/dknfP0OGzx48FGDEi4CS8RH1JM9Y776sqKCsebV3QgXGS3fyD/q7vl/wbMDD4E5+VhHuqockbYbG4w4pzEqyD54CWJT0cnndtMWUCOIqaKwKtG8tn4WRxGmOxpVBcRKHcyPb9J1181WlHeGRG3duG4Gar207sBNFsW4nGMcDr2/12KyjtK3fKx5e79ewY1P+1QFheAhbCxfd8S2aDEFi7ZiVSdIHoLbLmqzoGWsTq9tUjQOlqa29ZpYOU8NkSKYwbxCsOCHmMSbzKMrOSz34hVq2pksJaSyuN5IvOtiKyRuaoD7GXDFKKG1z+qk3qsB5Ggw1rRGEwck3Dn4tlPki/xsvOIF8thjC1jF70h/P6zeX7l5GfoYwEDK++WgzOFJVbhRi6Jim2JMhTRV8= Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: +CC Harry and Hao On 3/16/26 12:22, changfengnan wrote: > >> From: "Dave Chinner" >> Date:  Thu, Jan 15, 2026, 13:02 >> Subject:  Re: [PATCH v3] iomap: add allocation cache for iomap_dio >> To: "guzebing" >> Cc: , , , , , , , , "Fengnan Chang", , "Vlastimil Babka" >> [cc linux-mm] >>  >> On Thu, Jan 15, 2026 at 10:11:08AM +0800, guzebing wrote: >> > As implemented by the bio structure, we do the same thing on the >> > iomap-dio structure. Add a per-cpu cache for iomap_dio allocations, >> > enabling us to quickly recycle them instead of going through the slab >> > allocator. >> >  >> > By making such changes, we can reduce memory allocation on the direct >> > IO path, so that direct IO will not block due to insufficient system >> > memory. In addition, for direct IO, the read performance of io_uring >> > is improved by about 2.6%. >>  >> Honestly, this just feels wrong. >>  >> If heap memory allocation has performance issues, then the right >> solution is to fix the memory allocator. >>  >> Oh, wait, you're copy-pasting the hacky per-cpu bio allocator cache >> lists into the iomap DIO code. >>  >> IMO, this really should be part of the generic memory allocation >> APIs, not repeatedly tacked on the outside of specific individual >> object allocations. >>  >> >>  >> Huh. per-cpu free lists is the traditional SLAB allocator >> architecture. That was removed a while back because SLUB performs >> better in most cases.... >>  >> >>  >> ISTR somebody was already working to optimise the SLUB allocator to >> address these corner case shortcomings w.r.t. traditional SLABs. >>  >> Yup: >>  >>  >> commit 2d517aa09bbc4203f10cdee7e1d42f3bbdc1b1cd >> Author: Vlastimil Babka >> Date:   Wed Sep 3 14:59:45 2025 +0200 >>  >>     slab: add opt-in caching layer of percpu sheaves >>  >>     Specifying a non-zero value for a new struct kmem_cache_args field >>     sheaf_capacity will setup a caching layer of percpu arrays called >>     sheaves of given capacity for the created cache. >>  >>     Allocations from the cache will allocate via the percpu sheaves (main or >>     spare) as long as they have no NUMA node preference. Frees will also >>     put the object back into one of the sheaves. >>  >>     When both percpu sheaves are found empty during an allocation, an empty >>     sheaf may be replaced with a full one from the per-node barn. If none >>     are available and the allocation is allowed to block, an empty sheaf is >>     refilled from slab(s) by an internal bulk alloc operation. When both >>     percpu sheaves are full during freeing, the barn can replace a full one >>     with an empty one, unless over a full sheaves limit. In that case a >>     sheaf is flushed to slab(s) by an internal bulk free operation. Flushing >>     sheaves and barns is also wired to the existing cpu flushing and cache >>     shrinking operations. >>  >>     The sheaves do not distinguish NUMA locality of the cached objects. If >>     an allocation is requested with kmem_cache_alloc_node() (or a mempolicy >>     with strict_numa mode enabled) with a specific node (not NUMA_NO_NODE), >>     the sheaves are bypassed. >>  >>     The bulk operations exposed to slab users also try to utilize the >>     sheaves as long as the necessary (full or empty) sheaves are available >>     on the cpu or in the barn. Once depleted, they will fallback to bulk >>     alloc/free to slabs directly to avoid double copying. >>  >>     The sheaf_capacity value is exported in sysfs for observability. >>  >>     Sysfs CONFIG_SLUB_STATS counters alloc_cpu_sheaf and free_cpu_sheaf >>     count objects allocated or freed using the sheaves (and thus not >>     counting towards the other alloc/free path counters). Counters >>     sheaf_refill and sheaf_flush count objects filled or flushed from or to >>     slab pages, and can be used to assess how effective the caching is. The >>     refill and flush operations will also count towards the usual >>     alloc_fastpath/slowpath, free_fastpath/slowpath and other counters for >>     the backing slabs.  For barn operations, barn_get and barn_put count how >>     many full sheaves were get from or put to the barn, the _fail variants >>     count how many such requests could not be satisfied mainly  because the >>     barn was either empty or full. While the barn also holds empty sheaves >>     to make some operations easier, these are not as critical to mandate own >>     counters.  Finally, there are sheaf_alloc/sheaf_free counters. >>  >>     Access to the percpu sheaves is protected by local_trylock() when >>     potential callers include irq context, and local_lock() otherwise (such >>     as when we already know the gfp flags allow blocking). The trylock >>     failures should be rare and we can easily fallback. Each per-NUMA-node >>     barn has a spin_lock. >>  >>     When slub_debug is enabled for a cache with sheaf_capacity also >>     specified, the latter is ignored so that allocations and frees reach the >>     slow path where debugging hooks are processed. Similarly, we ignore it >>     with CONFIG_SLUB_TINY which prefers low memory usage to performance. >>  >>     [boot failure: https://lore.kernel.org/all/583eacf5-c971-451a-9f76-fed0e341b815@linux.ibm.com/ ] >>  >>     Reported-and-tested-by: Venkat Rao Bagalkote >>     Reviewed-by: Harry Yoo >>     Reviewed-by: Suren Baghdasaryan >>     Signed-off-by: Vlastimil Babka >>  >> Yeah, recent code, functionality is not enabled by default yet. So, >> kmem_cache_alloc() with: >>  >> struct kmem_cache_args { >> ..... >>         /** >>          * @sheaf_capacity: Enable sheaves of given capacity for the cache. >>          * >>          * With a non-zero value, allocations from the cache go through caching >>          * arrays called sheaves. Each cpu has a main sheaf that's always >>          * present, and a spare sheaf that may be not present. When both become >>          * empty, there's an attempt to replace an empty sheaf with a full sheaf >>          * from the per-node barn. >>          * >>          * When no full sheaf is available, and gfp flags allow blocking, a >>          * sheaf is allocated and filled from slab(s) using bulk allocation. >>          * Otherwise the allocation falls back to the normal operation >>          * allocating a single object from a slab. >>          * >>          * Analogically when freeing and both percpu sheaves are full, the barn >>          * may replace it with an empty sheaf, unless it's over capacity. In >>          * that case a sheaf is bulk freed to slab pages. >>          * >>          * The sheaves do not enforce NUMA placement of objects, so allocations >>          * via kmem_cache_alloc_node() with a node specified other than >>          * NUMA_NO_NODE will bypass them. >>          * >>          * Bulk allocation and free operations also try to use the cpu sheaves >>          * and barn, but fallback to using slab pages directly. >>          * >>          * When slub_debug is enabled for the cache, the sheaf_capacity argument >>          * is ignored. >>          * >>          * %0 means no sheaves will be created. >>          */ >>         unsigned int sheaf_capacity; >> } >>  >> set to the value required is all we need. i.e. something like this >> in iomap_dio_init(): >>  >>  >>         struct kmem_cache_args kmem_args = { >>                 .sheaf_capacity = 256, >>         }; >>  >>         dio_kmem_cache = kmem_cache_create("iomap_dio", sizeof(struct iomap_dio), >>                         &kmem_args, SLAB_PANIC | SLAB_ACCOUNT >>  >> And changing the allocation to kmem_cache_alloc(dio_kmem_cache, >> GFP_KERNEL) should provide the same sort of performance improvement >> as this patch does. >>  >> Can you test this, please? > > Hi Dave: > Sorry it took so long to respond. Guzebing was busy with something else, I did > this test. > I test sheaf_capacity on 7.0-rc3, it doesn't show any performance improvment. 7.0-rc3 already has sheaves in every cache and the old caching scheme removed. An explicit sheaf_capacity can now be used to increase the automatically calculated one, where the value you can observe in /sys/kernel/slab/$cache/sheaf_capacity > Besides, I wrote a simple kernel modules to test the performance difference by > creating a normal memcache and one with sheaf_capacity and testing the time > consuming to request 32 objects and then free 32 objects. which resulted in a > roughly 10% improvement in time spent. That suggests in that test you used larger capacity than the automatically calculated.   > I'm thinking that maybe these improvements may not be significant enough to > see the effect in the io flow. > Using a simple list seems to be the most efficient approach. I think the question is, what improvement do you now see with your added pcpu cache vs kmalloc() when 7.0-rc4 is used as the baseline? Thanks, Vlastimil > Thanks. > Fengnan. > >>  >> If it doesn't provide any performance improvment, then I suspect >> that Vlastimil will be interested to find out why.... >>  >> Also, if it does work, it is likely the bioset mempools (which are >> slab based) can be initialised similarly, removing the need for >> custom per-cpu free lists in the block layer, too. >>  >> -Dave. >>  >> >  >> > v3: >> > kmalloc now is called outside the get_cpu/put_cpu code section. >> >  >> > v2: >> > Factor percpu cache into common code and the iomap module uses it. >> >  >> > v1: >> > https://lore.kernel.org/all/20251121090052.384823-1-guzebing1612@gmail.com/ >> >  >> > Tested-by: syzbot@syzkaller.appspotmail.com >> >  >> > Suggested-by: Fengnan Chang >> > Signed-off-by: guzebing >> > --- >> >  fs/iomap/direct-io.c | 133 ++++++++++++++++++++++++++++++++++++++++++- >> >  1 file changed, 130 insertions(+), 3 deletions(-) >> >  >> > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c >> > index 5d5d63efbd57..4421e4ad3a8f 100644 >> > --- a/fs/iomap/direct-io.c >> > +++ b/fs/iomap/direct-io.c >> > @@ -56,6 +56,130 @@ struct iomap_dio { >> >          }; >> >  }; >> >   >> > +#define PCPU_CACHE_IRQ_THRESHOLD        16 >> > +#define PCPU_CACHE_ELEMENT_SIZE(pcpu_cache_list) \ >> > +        (sizeof(struct pcpu_cache_element) + pcpu_cache_list->element_size) >> > +#define PCPU_CACHE_ELEMENT_GET_HEAD_FROM_PAYLOAD(payload) \ >> > +        ((struct pcpu_cache_element *)((unsigned long)(payload) - \ >> > +                                       sizeof(struct pcpu_cache_element))) >> > +#define PCPU_CACHE_ELEMENT_GET_PAYLOAD_FROM_HEAD(head) \ >> > +        ((void *)((unsigned long)(head) + sizeof(struct pcpu_cache_element))) >> > + >> > +struct pcpu_cache_element { >> > +        struct pcpu_cache_element        *next; >> > +        char        payload[]; >> > +}; >> > +struct pcpu_cache { >> > +        struct pcpu_cache_element        *free_list; >> > +        struct pcpu_cache_element        *free_list_irq; >> > +        int                nr; >> > +        int                nr_irq; >> > +}; >> > +struct pcpu_cache_list { >> > +        struct pcpu_cache __percpu *cache; >> > +        size_t element_size; >> > +        int max_nr; >> > +}; >> > + >> > +static struct pcpu_cache_list *pcpu_cache_list_create(int max_nr, size_t size) >> > +{ >> > +        struct pcpu_cache_list *pcpu_cache_list; >> > + >> > +        pcpu_cache_list = kmalloc(sizeof(struct pcpu_cache_list), GFP_KERNEL); >> > +        if (!pcpu_cache_list) >> > +                return NULL; >> > + >> > +        pcpu_cache_list->element_size = size; >> > +        pcpu_cache_list->max_nr = max_nr; >> > +        pcpu_cache_list->cache = alloc_percpu(struct pcpu_cache); >> > +        if (!pcpu_cache_list->cache) { >> > +                kfree(pcpu_cache_list); >> > +                return NULL; >> > +        } >> > +        return pcpu_cache_list; >> > +} >> > + >> > +static void pcpu_cache_list_destroy(struct pcpu_cache_list *pcpu_cache_list) >> > +{ >> > +        free_percpu(pcpu_cache_list->cache); >> > +        kfree(pcpu_cache_list); >> > +} >> > + >> > +static void irq_cache_splice(struct pcpu_cache *cache) >> > +{ >> > +        unsigned long flags; >> > + >> > +        /* cache->free_list must be empty */ >> > +        if (WARN_ON_ONCE(cache->free_list)) >> > +                return; >> > + >> > +        local_irq_save(flags); >> > +        cache->free_list = cache->free_list_irq; >> > +        cache->free_list_irq = NULL; >> > +        cache->nr += cache->nr_irq; >> > +        cache->nr_irq = 0; >> > +        local_irq_restore(flags); >> > +} >> > + >> > +static void *pcpu_cache_list_alloc(struct pcpu_cache_list *pcpu_cache_list) >> > +{ >> > +        struct pcpu_cache *cache; >> > +        struct pcpu_cache_element *cache_element; >> > + >> > +        cache = per_cpu_ptr(pcpu_cache_list->cache, get_cpu()); >> > +        if (!cache->free_list) { >> > +                if (READ_ONCE(cache->nr_irq) >= PCPU_CACHE_IRQ_THRESHOLD) >> > +                        irq_cache_splice(cache); >> > +                if (!cache->free_list) { >> > +                        put_cpu(); >> > +                        cache_element = kmalloc(PCPU_CACHE_ELEMENT_SIZE(pcpu_cache_list), >> > +                                                                        GFP_KERNEL); >> > +                        if (!cache_element) >> > +                                return NULL; >> > +                        return PCPU_CACHE_ELEMENT_GET_PAYLOAD_FROM_HEAD(cache_element); >> > +                } >> > +        } >> > + >> > +        cache_element = cache->free_list; >> > +        cache->free_list = cache_element->next; >> > +        cache->nr--; >> > +        put_cpu(); >> > +        return PCPU_CACHE_ELEMENT_GET_PAYLOAD_FROM_HEAD(cache_element); >> > +} >> > + >> > +static void pcpu_cache_list_free(void *payload, struct pcpu_cache_list *pcpu_cache_list) >> > +{ >> > +        struct pcpu_cache *cache; >> > +        struct pcpu_cache_element *cache_element; >> > + >> > +        cache_element = PCPU_CACHE_ELEMENT_GET_HEAD_FROM_PAYLOAD(payload); >> > + >> > +        cache = per_cpu_ptr(pcpu_cache_list->cache, get_cpu()); >> > +        if (READ_ONCE(cache->nr_irq) + cache->nr >= pcpu_cache_list->max_nr) >> > +                goto out_free; >> > + >> > +        if (in_task()) { >> > +                cache_element->next = cache->free_list; >> > +                cache->free_list = cache_element; >> > +                cache->nr++; >> > +        } else if (in_hardirq()) { >> > +                lockdep_assert_irqs_disabled(); >> > +                cache_element->next = cache->free_list_irq; >> > +                cache->free_list_irq = cache_element; >> > +                cache->nr_irq++; >> > +        } else { >> > +                goto out_free; >> > +        } >> > +        put_cpu(); >> > +        return; >> > +out_free: >> > +        put_cpu(); >> > +        kfree(cache_element); >> > +} >> > + >> > +#define DIO_ALLOC_CACHE_MAX                256 >> > +static struct pcpu_cache_list *dio_pcpu_cache_list; >> > + >> >  static struct bio *iomap_dio_alloc_bio(const struct iomap_iter *iter, >> >                  struct iomap_dio *dio, unsigned short nr_vecs, blk_opf_t opf) >> >  { >> > @@ -135,7 +259,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) >> >                          ret += dio->done_before; >> >          } >> >          trace_iomap_dio_complete(iocb, dio->error, ret); >> > -        kfree(dio); >> > +        pcpu_cache_list_free(dio, dio_pcpu_cache_list); >> >          return ret; >> >  } >> >  EXPORT_SYMBOL_GPL(iomap_dio_complete); >> > @@ -620,7 +744,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, >> >          if (!iomi.len) >> >                  return NULL; >> >   >> > -        dio = kmalloc(sizeof(*dio), GFP_KERNEL); >> > +        dio = pcpu_cache_list_alloc(dio_pcpu_cache_list); >> >          if (!dio) >> >                  return ERR_PTR(-ENOMEM); >> >   >> > @@ -804,7 +928,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, >> >          return dio; >> >   >> >  out_free_dio: >> > -        kfree(dio); >> > +        pcpu_cache_list_free(dio, dio_pcpu_cache_list); >> >          if (ret) >> >                  return ERR_PTR(ret); >> >          return NULL; >> > @@ -834,6 +958,9 @@ static int __init iomap_dio_init(void) >> >          if (!zero_page) >> >                  return -ENOMEM; >> >   >> > +        dio_pcpu_cache_list = pcpu_cache_list_create(DIO_ALLOC_CACHE_MAX, sizeof(struct iomap_dio)); >> > +        if (!dio_pcpu_cache_list) >> > +                return -ENOMEM; >> >          return 0; >> >  } >> >  fs_initcall(iomap_dio_init); >> > --  >> > 2.20.1 >> >  >> >  >> >  >>  >> --  >> Dave Chinner >> david@fromorbit.com >>