From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4C30C433E0 for ; Wed, 13 May 2020 20:09:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 221A12065C for ; Wed, 13 May 2020 20:09:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20150623.gappssmtp.com header.i=@kernel-dk.20150623.gappssmtp.com header.b="oongAyt3" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 221A12065C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 180D680037; Wed, 13 May 2020 16:09:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 10A748000B; Wed, 13 May 2020 16:09:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F12F280037; Wed, 13 May 2020 16:09:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id D48E08000B for ; Wed, 13 May 2020 16:09:19 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8CCF9180AD817 for ; Wed, 13 May 2020 20:09:19 +0000 (UTC) X-FDA: 76812785238.22.sock95_5159d4c625a54 X-HE-Tag: sock95_5159d4c625a54 X-Filterd-Recvd-Size: 5968 Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Wed, 13 May 2020 20:09:18 +0000 (UTC) Received: by mail-pg1-f196.google.com with SMTP id j21so228267pgb.7 for ; Wed, 13 May 2020 13:09:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=odPRiSe8khfy7ohFJpbmDMC5PLQjZ5y6DzMomFbp5xg=; b=oongAyt3NKd1MX3jceVXwpIIs/tDMSoN/vZ+BCakcME7NWPegVCaXLOVudhhtViiKn V+mK3IJ3NkY3pN8P1P1uAt6B+IWRtjNFTeW6NPp5pTurY7o7s3t+4ra7FqumyiIOJMSD 1d3bdPwrP3aBXRUW1fWmMCz5Rx9616s/vdG68ZDH7EzF1GN4WOS1WRqgYLbEZodeqG7I BTKcE4l//fpkJz9Ny8FbEABbJN/L6nDtXhOD5A3SFtal4lzDJjhpMukrjx3sOB84sWdh cenhbeBxIL4QQiMMZcvDOHaiUdqp/KhV+jgwtUqogNNBKIHO2ODmwbmI93nmgrARkW/a MiAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=odPRiSe8khfy7ohFJpbmDMC5PLQjZ5y6DzMomFbp5xg=; b=S3goYWS/UQTlO4qN1BP52x5FfkGEu6sndgP1hENIIlzCodE4DjLS8Zfn2WCDRXoERF Ero+/223zXPsORDgJGumUnA9A9ZCf1hLAEoVLlrWHJUJfZYlZ8BRnItAT1Zyuj4pOXC0 Tm1uqwNXdBvXBxP80VV+r1jIo1ocPgZuIRmIInov9m64VT2nXvEpfx1zj8IFlLHVmTzB T4h8m/TfqhLPWCbEj57YlVZxFmDORePbjtoKNAypKDxWIRt+y7x1+/Wm14/49O9YqTNQ cXodGB4HLqJMFstwVH2jo8DiKwHImQO1jPQnTvYyFj6gcBy07B1H9p0q+0dd5spuepkL JpAg== X-Gm-Message-State: AOAM533fdj4Mqp6YUWYChUapEIj6qCkAulv5bh2hE6ImaHWOtT/aX3aj DhBK2FSMofB0U0+6hrY0Ku3N2bsIzVo= X-Google-Smtp-Source: ABdhPJwWIOjg3aOHPqEIvBsOXRgpDMjdwvsb2/ZpQfOL2+I1C9kGu2JAv9l5CnHrHdF7vxnyZx1XBg== X-Received: by 2002:a63:1348:: with SMTP id 8mr929534pgt.350.1589400557214; Wed, 13 May 2020 13:09:17 -0700 (PDT) Received: from ?IPv6:2605:e000:100e:8c61:4833:bff6:8281:ef26? ([2605:e000:100e:8c61:4833:bff6:8281:ef26]) by smtp.gmail.com with ESMTPSA id j5sm342442pfa.37.2020.05.13.13.09.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 13 May 2020 13:09:16 -0700 (PDT) Subject: Re: [PATCH RFC} io_uring: io_kiocb alloc cache To: Pekka Enberg , Jann Horn Cc: io-uring , Xiaoguang Wang , joseph qi , Jiufei Xue , Pavel Begunkov , Christoph Lameter , Pekka Enberg , David Rientjes , Joonsoo Kim , Andrew Morton , Linux-MM References: <492bb956-a670-8730-a35f-1d878c27175f@kernel.dk> <20200513191919.GA10975@nero> From: Jens Axboe Message-ID: Date: Wed, 13 May 2020 14:09:14 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <20200513191919.GA10975@nero> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 5/13/20 1:20 PM, Pekka Enberg wrote: > > Hi, > > On Wed, May 13, 2020 at 6:30 PM Jens Axboe wrote: >>> I turned the quick'n dirty from the other day into something a bit >>> more done. Would be great if someone else could run some >>> performance testing with this, I get about a 10% boost on the pure >>> NOP benchmark with this. But that's just on my laptop in qemu, so >>> some real iron testing would be awesome. > > On 5/13/20 8:42 PM, Jann Horn wrote:> +slab allocator people >> 10% boost compared to which allocator? Are you using CONFIG_SLUB? > > On Wed, May 13, 2020 at 6:30 PM Jens Axboe wrote: >>> The idea here is to have a percpu alloc cache. There's two sets of >>> state: >>> >>> 1) Requests that have IRQ completion. preempt disable is not >>> enough there, we need to disable local irqs. This is a lot slower >>> in certain setups, so we keep this separate. >>> >>> 2) No IRQ completion, we can get by with just disabling preempt. > > On 5/13/20 8:42 PM, Jann Horn wrote:> +slab allocator people >> The SLUB allocator has percpu caching, too, and as long as you don't >> enable any SLUB debugging or ASAN or such, and you're not hitting >> any slowpath processing, it doesn't even have to disable interrupts, >> it gets away with cmpxchg_double. > > The struct io_kiocb is 240 bytes. I don't see a dedicated slab for it in > /proc/slabinfo on my machine, so it likely got merged to the kmalloc-256 > cache. This means that there's 32 objects in the per-CPU cache. Jens, on > the other hand, made the cache much bigger: Right, it gets merged with kmalloc-256 (and 5 others) in my testing. > +#define IO_KIOCB_CACHE_MAX 256 > > So I assume if someone does "perf record", they will see significant > reduction in page allocator activity with Jens' patch. One possible way > around that is forcing the page allocation order to be much higher. IOW, > something like the following completely untested patch: Now tested, I gave it a shot. This seems to bring performance to basically what the io_uring patch does, so that's great! Again, just in the microbenchmark test case, so freshly booted and just running the case. Will this patch introduce latencies or non-deterministic behavior for a fragmented system? -- Jens Axboe