From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E06EEC021B2 for ; Mon, 24 Feb 2025 01:36:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 502AA6B0085; Sun, 23 Feb 2025 20:36:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B2AC6B0088; Sun, 23 Feb 2025 20:36:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37A596B0089; Sun, 23 Feb 2025 20:36:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 188F26B0085 for ; Sun, 23 Feb 2025 20:36:42 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B763A4B526 for ; Mon, 24 Feb 2025 01:36:41 +0000 (UTC) X-FDA: 83153123802.20.CAF4C07 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf11.hostedemail.com (Postfix) with ESMTP id CC59A40006 for ; Mon, 24 Feb 2025 01:36:39 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="qjD/sRzA"; spf=pass (imf11.hostedemail.com: domain of surenb@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740360999; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0Z0iilXXeFUJb8Y16LfAkZ2RuptL+aVBKvKCYUlU6uU=; b=YpJDTeEW3kKknj1nrxhTgt5ScKEC7anTmNdAWxOfBqLG65WsxnBBc+hybeBWdLdyr8uRbN dwDgxWG1vw0iyyE8GuYcM6+OAQ/kQ1pdqhY5JlswKjwq2OM/8MLDjv3QNpmX7SSgVmcCIv SZDdbkpsb4jPJEJ9YeT6wDLigenil3I= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="qjD/sRzA"; spf=pass (imf11.hostedemail.com: domain of surenb@google.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740360999; a=rsa-sha256; cv=none; b=qUJGx3VDlDLniSwmfiJDQbRUullt/D9vmnx2UZeEUhEI6NNfwQ1aHtk88arvVUPxWwxpH0 l99PVONIdLoT5tKNfcLelgrKn6uOSew20yacDxsnSWwqCGoTDGMv61grtSGhP+kdyRwFsA 7udmOTCJdTUqHaJEtGyxLjfTw4FYyeQ= Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-471fa3b19bcso308101cf.0 for ; Sun, 23 Feb 2025 17:36:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740360999; x=1740965799; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0Z0iilXXeFUJb8Y16LfAkZ2RuptL+aVBKvKCYUlU6uU=; b=qjD/sRzA1TEitqeApMDs1DtHykoQ7DHzAz5HA4pc66Xnjmj9VjcfKr5gvMtRpAFCW4 qaxf5D3j5PG/1pbNAejEhc1ardM5JDK2SSmsFJY12VJtZqpLUnBzO3s4JrPftYJ9j1Rq zZVY/Gln1MYdDvw6GoF/r0io7ej6CajGuF6QSRR+PsAWuE6/CmD6gMhM4clZ0md0VGR2 CvmCNC/TWkt01q8QftgNsZ3GWRSAixDsnERDUrefmOIRhe4pAFPem9DSClwAyaGYrERI kvR2ADTCedkDipWT6rMpOYwxwgg94HOFHF88V8AltVBuYuqDK2nuTIEXMoOWutaDD/KJ hudQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740360999; x=1740965799; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0Z0iilXXeFUJb8Y16LfAkZ2RuptL+aVBKvKCYUlU6uU=; b=g/c7BS1oS7HzKDLNjnqx8mRBRiWv2qVOZCCHrc/omCzo09eukCR1PCNvyKWONe3gzh GCbMIkqKugmSPsNLFhp2TSW66VW2cdCx6I/REXYhuPJHWhm5HLNYfvRUn2S35/rnnJof FcJZPA9w6rUBm3yGbA0ktBODjnz2AzrsA3uql7+zm71c2yAF2E1LHIWYchm4XCLVvxHM X5ChjdEVb6TWLNlaUTZtrSho/FTjU6VrGGad8NaMebZLtsl+AgpeMevFOoSxAdYXlaWb wRgXwCSVDl7g2YhzTy2PNhlIh9ySqt9bPYc0aCdYw1jNnQGRMGXwujvBYtI/6EiCaD7n SMug== X-Forwarded-Encrypted: i=1; AJvYcCVPHiJf+N1xMMe0wS1tnRc2i9qqIFSNAMuCScHXToUrqYmaqKxFiTum9n8SGprTOMcDjbmw26cXrQ==@kvack.org X-Gm-Message-State: AOJu0YzZOjIEDULdiMfHd58QiDH2j8WaC+GhHxh+n86nnDrvWJwyAMZi tf/0FKAJ8RUECvq+KztowVmV2ou86S+tBHSHTMQy8P9hXI++UNe6q4ta+e6qPKEA+ZZTHiGl8rf adai0WiamE625u5ZVAVCYgp1zDyKn8Ib6jnSg X-Gm-Gg: ASbGncsquz4HIdn4nu13blzfioOiFyXWVOifUQFBWkfr7yv6i1EY2ftsNRg89mngK70 psWIch0mUkpAMluZ+a51+OKLalbeiAN5AdoXvBTineKIno+uP8cAmrWSxft390rMAkRITSFX2mo AVwqmKGTg= X-Google-Smtp-Source: AGHT+IHPYPjlFDSqGO+p0LMHymECdrDlqiQmBYt//urxRVRmELCUI4EioUeoj+QXByhoEOFDN8rmoDb6uqKX7u7x3vg= X-Received: by 2002:a05:622a:3cd:b0:471:9ece:b13b with SMTP id d75a77b69052e-47234b4f60amr4398531cf.1.1740360998584; Sun, 23 Feb 2025 17:36:38 -0800 (PST) MIME-Version: 1.0 References: <20250214-slub-percpu-caches-v2-0-88592ee0966a@suse.cz> In-Reply-To: From: Suren Baghdasaryan Date: Sun, 23 Feb 2025 17:36:27 -0800 X-Gm-Features: AWEUYZmlUqp1pDA_wMiYyR9GBcFZPosDdJv_JqLSwV5RNu8wPZMrLyH-r3FhCWM Message-ID: Subject: Re: [PATCH RFC v2 00/10] SLUB percpu sheaves To: Kent Overstreet Cc: Vlastimil Babka , "Liam R. Howlett" , Christoph Lameter , David Rientjes , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, Sebastian Andrzej Siewior , Alexei Starovoitov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: CC59A40006 X-Stat-Signature: 36a6fbrtcuzeyqp87m3aa9ou3r4he7y5 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1740360999-704528 X-HE-Meta: U2FsdGVkX1/O6W1tSaqAo+e7ZetP7787i6CI+ZEIsTkedL4i9QjTM8HhT+svEXb/yT7fmXhWxFyGihv66llpOSjUrsR3QLOWsqLbkZikOzfZt80CIrTKYc1K3Aoq2oh2TOupcJHsPJaN3prn2yqL5SEE8V89m+nOMbrz70NqFK42IKGML+H/lTJFTSWdp28BKRcfiTgUAwnC3koMNuEnrg/jUfbgOtvVYHccOU1XiQFmJ4mAEKZxwKEwEl+Q26ZOC6kwmZgLlJxvvl82Kd3LQDa6enidnRwpjT1Oed/GIzbGXa9Q6/03Xn+4e24+WvohiTFaMCSkonF/TCfq1oKVlfFeYYC97x/gpKzP6drMUMtM7vtC3aHOAFNHBUKYzaJzGOb4AVOPq8wwUz9wHPOlY7BIqfHoAHOEk017tJBPck4JEhi9b3NXYzYAUpvCBxf12fC94p+C6e/8vnjiXgAvDvGtyGBslDpxqTE3mhwtXfGBklaLPt+ALZ0KkcocEfJL34LV5FfgoEmiFZQQm1FlRtVN0QbC1OYYVqpdpR5pGsZqJ8i0Tvzr9j6VFT2LpYr+gZjJUaCwxvJUNXP63l1ERVBkScJLmWGE6CVP/QfauZ6DFkyZfqn6kid1JqU9Vs3V8gi2QU5NDpCiUfKyGRmcQzbrbRGOUabwPnObo0LlieMq/B0mDKXPm8XD+8FPcAEp5rWWMzloRzVn4gctJEZdWBv2lD8jLWKkBi8kYKjabtGcQEdzWnKt+09VdXhH69CtnQ4s4AEjotf16CcWmfJyY1tklU3UgXF2YqPELM+BLd4+ta1qcTqY683COzNn2wmVfVfQmwtGKjkryOie81YgFmJg5IKPcm43PVA8NxGceB4UX6eB1aclqUBKXblHKcABIL7QIES4TNZVM3Lxt7ei4IDNbgm7hHZsXytgJW4UGNhPFVs4W40qIOVrfG72Gdj9Dqxk1RBvmcHq5kCrHTS a5X2AWsn JENUE9SmzAmMMJnzlMK95NeaW/1rDfCNLVY03UIYnYaWU503DJWquqyqw9Jo1/dfuTuu5PtbJ23pAiNAlmufV5Dxb9xby75Vg0QIJjeg/MOyapzO/YigWomLYVaBHFvQtHGCxaGQGszVQk7HxZ9TSsO+/Lr7tekzMGdIZ6hyKr33owCDWn9LwQZ4+s5T8WNvdli1tppDWn+pCKFJqYEegWifZsYx47YxOFc9+/sh9F1YEFpCUx/dvRfAS67/PgbCgpzqRWOAiF8dmWs9ql3oc9HvH3hjf5RRpnnlJlr3V5rXAnEn2Ua0FA891PA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Feb 22, 2025 at 8:44=E2=80=AFPM Suren Baghdasaryan wrote: > > On Sat, Feb 22, 2025 at 4:19=E2=80=AFPM Kent Overstreet > wrote: > > > > On Fri, Feb 14, 2025 at 05:27:36PM +0100, Vlastimil Babka wrote: > > > - Cheaper fast paths. For allocations, instead of local double cmpxch= g, > > > after Patch 5 it's preempt_disable() and no atomic operations. Same= for > > > freeing, which is normally a local double cmpxchg only for a short > > > term allocations (so the same slab is still active on the same cpu = when > > > freeing the object) and a more costly locked double cmpxchg otherwi= se. > > > The downside is the lack of NUMA locality guarantees for the alloca= ted > > > objects. > > > > Is that really cheaper than a local non locked double cmpxchg? > > Don't know about this particular part but testing sheaves with maple > node cache and stress testing mmap/munmap syscalls shows performance > benefits as long as there is some delay to let kfree_rcu() do its job. > I'm still gathering results and will most likely post them tomorrow. Here are the promised test results: First I ran an Android app cycle test comparing the baseline against sheave= s used for maple tree nodes (as this patchset implements). I registered about 3% improvement in app launch times, indicating improvement in mmap syscall performance. Next I ran an mmap stress test which maps 5 1-page readable file-backed areas, faults them in and finally unmaps them, timing mmap syscalls. Repeats that 200000 cycles and reports the total time. Average of 10 such runs is used as the final result. 3 configurations were tested: 1. Sheaves used for maple tree nodes only (this patchset). 2. Sheaves used for maple tree nodes with vm_lock to vm_refcnt conversion [= 1]. This patchset avoids allocating additional vm_lock structure on each mmap syscall and uses TYPESAFE_BY_RCU for vm_area_struct cache. 3. Sheaves used for maple tree nodes and for vm_area_struct cache with vm_l= ock to vm_refcnt conversion [1]. For the vm_area_struct cache I had to replace TYPESAFE_BY_RCU with sheaves, as we can't use both for the same cache. The values represent the total time it took to perform mmap syscalls, less = is better. (1) baseline control Little core 7.58327 6.614939 (-12.77%) Medium core 2.125315 1.428702 (-32.78%) Big core 0.514673 0.422948 (-17.82%) (2) baseline control Little core 7.58327 5.141478 (-32.20%) Medium core 2.125315 0.427692 (-79.88%) Big core 0.514673 0.046642 (-90.94%) (3) baseline control Little core 7.58327 4.779624 (-36.97%) Medium core 2.125315 0.450368 (-78.81%) Big core 0.514673 0.037776 (-92.66%) Results in (3) vs (2) indicate that using sheaves for vm_area_struct yields slightly better averages and I noticed that this was mostly due to sheaves results missing occasional spikes that worsened TYPESAFE_BY_RCU averages (the results seemed more stable with sheaves). [1] https://lore.kernel.org/all/20250213224655.1680278-1-surenb@google.com/ > > > > > Especially if you now have to use pushf/popf... > > > > > - kfree_rcu() batching and recycling. kfree_rcu() will put objects to= a > > > separate percpu sheaf and only submit the whole sheaf to call_rcu() > > > when full. After the grace period, the sheaf can be used for > > > allocations, which is more efficient than freeing and reallocating > > > individual slab objects (even with the batching done by kfree_rcu() > > > implementation itself). In case only some cpus are allowed to handl= e rcu > > > callbacks, the sheaf can still be made available to other cpus on t= he > > > same node via the shared barn. The maple_node cache uses kfree_rcu(= ) and > > > thus can benefit from this. > > > > Have you looked at fs/bcachefs/rcu_pending.c?