From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBCA4C2B9F8 for ; Mon, 24 May 2021 23:41:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 83ADA613D8 for ; Mon, 24 May 2021 23:41:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 83ADA613D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6A2516B007D; Mon, 24 May 2021 19:40:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BDF66B0092; Mon, 24 May 2021 19:40:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 503596B0075; Mon, 24 May 2021 19:40:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0050.hostedemail.com [216.40.44.50]) by kanga.kvack.org (Postfix) with ESMTP id 31C046B0081 for ; Mon, 24 May 2021 19:40:52 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3670A87C5 for ; Mon, 24 May 2021 23:40:50 +0000 (UTC) X-FDA: 78177747060.33.4BE844F Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf29.hostedemail.com (Postfix) with ESMTP id 1DC74130 for ; Mon, 24 May 2021 23:40:41 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1621899648; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=v5sc2D8hG26MRRg4jt1tvt4JkNt4+5RwYIbu9VY2H9c=; b=dUURhfxkILZ7vsr0jaislxF6W4I7LiABdz67kPUR9fI6AAum5R8kYl+BRbblsd/oC5R7my C96sUFHdvn7Bv93AuTGJTqhq2P5hGUBUIU/X1ugNUxT26ZmO4pyZwXSb3mhGBHlh5UqASl mYicJWM4lak9vJuSuvW1kzZqhjb/kas= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1621899648; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=v5sc2D8hG26MRRg4jt1tvt4JkNt4+5RwYIbu9VY2H9c=; b=ipEWoN65xCH8yXk6TMPM8Gyl9ve4YOisL4HbycF0BYiv9A2L7Fh6zFx+joflr4fnt8Ec/W 9Y2VTf3VB9LdViAA== Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id DEF02ACC5; Mon, 24 May 2021 23:40:47 +0000 (UTC) From: Vlastimil Babka To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Lameter , David Rientjes , Pekka Enberg , Joonsoo Kim Cc: Sebastian Andrzej Siewior , Thomas Gleixner , Mel Gorman , Jesper Dangaard Brouer , Peter Zijlstra , Jann Horn , Vlastimil Babka Subject: [RFC 00/26] SLUB: use local_lock for kmem_cache_cpu protection and reduce disabling irqs Date: Tue, 25 May 2021 01:39:20 +0200 Message-Id: <20210524233946.20352-1-vbabka@suse.cz> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Rspamd-Queue-Id: 1DC74130 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=dUURhfxk; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=ipEWoN65; spf=pass (imf29.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.15 as permitted sender) smtp.mailfrom=vbabka@suse.cz; dmarc=none X-Rspamd-Server: rspam04 X-Stat-Signature: aidge7a9fqsdqzkbohratgf6u5pe7jmn X-HE-Tag: 1621899641-967792 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This series was inspired by Mel's pcplist local_lock rewrite, and also in= terest to better understand SLUB's locking and the new primitives and RT variant= s and implications. It should make SLUB more preemption-friendly, especially fo= r RT, hopefully without noticeable regressions, as the fast paths are not affec= ted. Series is based on 5.13-rc3 and also available as a git branch: https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux.git/log/?h=3D= slub-local-lock-v1r9 It received some light stability testing and also basic performance scree= ning (thanks Mel) that didn't show major regressions. But I'm interested in e.= g. Jesper's tests whether the bulk allocator didn't regress. Before the series, SLUB is lockless in both allocation and free fast path= s, but elsewhere, it's disabling irqs for considerable periods of time - especia= lly in allocation slowpath and the bulk allocation, where IRQs are re-enabled on= ly when a new page from the page allocator is needed, and the context allows blocking. The irq disabled sections can then include deactivate_slab() wh= ich walks a full freelist and frees the slab back to page allocator or unfreeze_partials() going through a list of percpu partial slabs. The RT = tree currently has some patches mitigating these, but we can do much better in mainline too. Patches 1-2 are straightforward optimizations removing unnecessary usages= of object_map_lock. Patch 3 is a cleanup of an obviously unnecessary local_irq_save/restore instance. Patch 4 simplifies the fast paths on systems with preemption, based on (hopefully correct) observation that the current loops to verify tid are unnecessary. Patches 5-18 focus on allocation slowpath. Patches 5-8 are preparatory co= de refactoring. Patch 9 moves disabling of irqs into ___slab_alloc() from its callers, wh= ich are the allocation slowpath, and bulk allocation. Instead these callers o= nly disable migration to stabilize the cpu. The following patches then gradua= lly reduce the scope of disabled irqs in ___slab_alloc() and the functions ca= lled from there. As of patch 12, the re-enabling of irqs based on gfp flags be= fore calling the page allocator is removed from allocate_slab(). As of patch 1= 5, it's possible to reach the page allocator (in case of existing slabs depl= eted) without disabling and re-enabling irqs a single time. Patches 19-24 reduce the scope of disabled irqs in remaining functions. P= atch 25 replaces a preempt_disable with migrate_disable in put_cpu_partial(). Patch 26 replaces the remaining explicitly irq disabled sections that pro= tect percpu variables with a local_lock, and updates the locking documentation= in the file's comment. The result is that irq disabling is only done for minimum amount of time = needed and as part of spin lock or local lock operations to make them irq-safe, = except one case around slab_lock which is a bit spinlock. This should have obvious implications for better preemption, especially o= n RT. Also some RT patches should now be unnecessary, IIUC: mm: slub: Enable irqs for __GFP_WAIT [1] becomes unnecessary as of patc= h 12. The following two once the IPI flush_slab() handler is dealt with, as dis= cussed later: mm: sl[au]b: Change list_lock to raw_spinlock_t [2] - the SLAB part can= be dropped as a different patch restricts RT to SLUB anyway. And after thi= s series the list_lock in SLUB is never used with irqs disabled before taking th= e lock. mm: slub: Move discard_slab() invocations out of IRQ-off sections [3] s= hould be unnecessary as this series does move these invocations outside irq disa= bled sections Some caveats that will probably have to be solved on PREEMPT_RT - I'm jus= t not sure enough from reading Documentation/locking/locktypes.rst how some thi= ngs work. Advice welcome. * There are paths such as: get_partial_node() - does spin_lock_irqsave(&n->list_lock); acquire_slab() __cmpxchg_double_slab() slab_lock() - a bit spinlock without explicit irqsave On !PREEMPT_RT this is fine as spin_lock_irqsave() disables irq so slab_l= ock() doesn't need to and it's still irq-safe. I assume there are no such guara= ntees on PREEMPT_RT where spin_lock_irqsave() is just a mutex with disabled migrat= ion? So RT will have to make sure all paths to slab_lock go through explicit i= rqsave? * There is this path involving IPI: flush_all() on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1); IPI with interrupts disabled (is it still true also on RT?) flush_cpu_slab() flush_slab() manipulate kmem_cache_cpu variables deactivate_slab(); The problems here are that in flush_slab() we manipulate variables normal= ly protected by the local_lock. On !PREEMPT_RT we don't need the local_lock = here because local_lock_irqsave() just disables irqs and we already got them disabled from the IPI. On PREEMPT_RT we IIUC actually even can't take the local_lock due to the irqs already disabled. So that's a problem. Another issue is that deactivate_slab() above will take the node_lock spi= nlock, so with irqs disabled it would still have to be a raw spinlock as patch [= 2] does. And it will also call discard_slab() which should be also called wi= thout irqs disabled. So for these reasons, the RT patch "mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context= " [4] converting IPIs to workqueues will still be needed. Then the work handler can use local_lock normally and that should solve the issues with flush_a= ll() and hopefully allow ditching patch [2]. Or is there perhaps a simpler way to make this flush IPI not disable IRQ = on PREEMPT_RT? [1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git= /tree/patches/0003-mm-slub-Enable-irqs-for-__GFP_WAIT.patch?h=3Dlinux-5.1= 2.y-rt-patches [2] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git= /tree/patches/0001-mm-sl-au-b-Change-list_lock-to-raw_spinlock_t.patch?h=3D= linux-5.12.y-rt-patches [3] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git= /tree/patches/0004-mm-slub-Move-discard_slab-invocations-out-of-IRQ-off.p= atch?h=3Dlinux-5.12.y-rt-patches [4] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git= /tree/patches/0005-mm-slub-Move-flush_cpu_slab-invocations-__free_slab-.p= atch?h=3Dlinux-5.12.y-rt-patches Vlastimil Babka (26): mm, slub: allocate private object map for sysfs listings mm, slub: allocate private object map for validate_slab_cache() mm, slub: don't disable irq for debug_check_no_locks_freed() mm, slub: simplify kmem_cache_cpu and tid setup mm, slub: extract get_partial() from new_slab_objects() mm, slub: dissolve new_slab_objects() into ___slab_alloc() mm, slub: return slab page from get_partial() and set c->page afterwards mm, slub: restructure new page checks in ___slab_alloc() mm, slub: move disabling/enabling irqs to ___slab_alloc() mm, slub: do initial checks in ___slab_alloc() with irqs enabled mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() mm, slub: restore irqs around calling new_slab() mm, slub: validate partial and newly allocated slabs before loading them mm, slub: check new pages with restored irqs mm, slub: stop disabling irqs around get_partial() mm, slub: move reset of c->page and freelist out of deactivate_slab() mm, slub: make locking in deactivate_slab() irq-safe mm, slub: call deactivate_slab() without disabling irqs mm, slub: move irq control into unfreeze_partials() mm, slub: discard slabs in unfreeze_partials() without irqs disabled mm, slub: detach whole partial list at once in unfreeze_partials() mm, slub: detach percpu partial list in unfreeze_partials() using this_cpu_cmpxchg() mm, slub: only disable irq with spin_lock in __unfreeze_partials() mm, slub: don't disable irqs in slub_cpu_dead() mm, slub: use migrate_disable() in put_cpu_partial() mm, slub: convert kmem_cpu_slab protection to local_lock include/linux/slub_def.h | 2 + mm/slub.c | 496 ++++++++++++++++++++++++--------------- 2 files changed, 314 insertions(+), 184 deletions(-) --=20 2.31.1