From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD001C021A4 for ; Mon, 24 Feb 2025 21:12:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 58E3628000D; Mon, 24 Feb 2025 16:12:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 53E0B28000C; Mon, 24 Feb 2025 16:12:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4060728000D; Mon, 24 Feb 2025 16:12:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 22F3428000C for ; Mon, 24 Feb 2025 16:12:50 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D687947E32 for ; Mon, 24 Feb 2025 21:12:49 +0000 (UTC) X-FDA: 83156087658.12.5A4FEBC Received: from mail-qt1-f179.google.com (mail-qt1-f179.google.com [209.85.160.179]) by imf30.hostedemail.com (Postfix) with ESMTP id 0A76480008 for ; Mon, 24 Feb 2025 21:12:47 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=K8gNPs9V; spf=pass (imf30.hostedemail.com: domain of surenb@google.com designates 209.85.160.179 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740431568; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ptHrpv1FgW+eRVlXRG/z+3/zll41m0KxvzIOfrCzOzc=; b=LS7MGqZ28s2u4yKifxho80D+E6d0hDVCERhgJnx37tLF8KKEFyOnRccLP0u5dfrgtu0vNb AXZZ9QvQbh1ZxSReNDaH7l2QdldkParjmn+ims4JkXjK0cR5MU/Nm+e/IFD1DDxf9U9hDx jhmjFRm+xKXJ30E3WP4kqHZ+zBe2D1M= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=K8gNPs9V; spf=pass (imf30.hostedemail.com: domain of surenb@google.com designates 209.85.160.179 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740431568; a=rsa-sha256; cv=none; b=t7nf2JrOePbfiOEipGJVwUpai61F5ozWA+oVx3wGGhJtvzI8afiLNm4Ge0qEEuBbXmCKu+ WtXWW5W2/KwJRWIABxN0rWtLS6IFvr1rVFDY+dG4cnkJJuOoGqIerUzGybEVKXQv0fXq1T BN/mhnIhZSmQ6ijOLxamF1f207C2Wkk= Received: by mail-qt1-f179.google.com with SMTP id d75a77b69052e-471fbfe8b89so105231cf.0 for ; Mon, 24 Feb 2025 13:12:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740431567; x=1741036367; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ptHrpv1FgW+eRVlXRG/z+3/zll41m0KxvzIOfrCzOzc=; b=K8gNPs9VNM0RhbUD0Yq5iBVrB+jN7njaTIWX6eS/YJK8K9TcmQWwMPdVckq5FIb1qG D5xXcucpUukSwsO0bRsfrLa0TFE0HiFJRz+3vWxei53BP3rJvS+JVP6K7wilLtjtl2cN c8tCeODAYbps5ClhzLvLbQ6gvjLYsAl5UFNzqH1wGoqcwxnbHJvE8YogJkNSY3lQnLpY HLP0v7vYvLKTbNsmas2iJAJQJcspKYgwiojR1KLB6yVlMndb6HiR1ZNUZ+BwR7Ct3nTp PRvj+dfI+XQtMoRd6j4CjY6cdNUkjCvTdU3u235B6LSy1tqM8Up10o1qj+/QwtREWtRY i1lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740431567; x=1741036367; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ptHrpv1FgW+eRVlXRG/z+3/zll41m0KxvzIOfrCzOzc=; b=ORubO91rnFQx5QtqvslcmaC7UCCNFGr/yHvPsDVGVUS16AXt0zrl9QnQy1nStYHs0V CaAu9a6M74aW9cdq+TWFvUNvaEJB45/KXCdiN10kIjynhxuhwZfU42GDBYSapwbviNZc R16f1kfh7R8si5Xvej+lw6iwhydF6Qw1p3afyJfF/eG3oUueqVnBg/mFcEdwUsi+/q3W RZxfZ5eSAwkG4MVN6BxpUf2C5StrU6lALqKE6oOygY2sfuyMU5wNciyS641/SSDIqNDO ohd6yn/syXesRr8XaEb84yRxouvb2j2uBfM9XOdCybMCmgYenBPkzJavLcjj6/SnqY9S ZIhA== X-Forwarded-Encrypted: i=1; AJvYcCWc6Lj+kXmhowHpGitj9ygoTVtVJfFUebGEZeYR0cqEyj9ATPeKMhcF8Wsy2Jv+ioOHQU44IOH1LQ==@kvack.org X-Gm-Message-State: AOJu0YxPxwfxyofklI6fjkOa4zKNW0oyLVgG/XMY9fd5N0We+FjBeHXD 53MA0MpTbDntXMc39w/NbPTVz21uamuCb+u2KhYmwXedDrX2XHg6eJvN8tZeeHtM27MmFArhs4G aUOGferdAt3NoAKibS8YSiRvlFmCQSAonoEBZ X-Gm-Gg: ASbGncsTLLqhjVsVKwS6ndnC4U7bynIVSJjzvAyeeOYzFRIcnz4fIi8XNc8c6zQHTaF FdrNTwELgEjlqUI5DlURaI98b9gPySWtt4qPG324jxd/ig1lU/9vJw3C+0vcyT6gAavhrAeAxu8 bELGcVxvs= X-Google-Smtp-Source: AGHT+IFy5lkphk9aK6cggXXVbFQ6uyJCO+DUGGn0PWKuSmR1qT/KwV3OZ+DadA63L8nDA133ubA4cCo/H77/xMgqUcI= X-Received: by 2002:a05:622a:4a:b0:466:8f39:fc93 with SMTP id d75a77b69052e-47376e5e3aamr1083211cf.3.1740431566920; Mon, 24 Feb 2025 13:12:46 -0800 (PST) MIME-Version: 1.0 References: <20250214-slub-percpu-caches-v2-0-88592ee0966a@suse.cz> <173d4dbe-399d-4330-944c-9689588f18e8@suse.cz> In-Reply-To: <173d4dbe-399d-4330-944c-9689588f18e8@suse.cz> From: Suren Baghdasaryan Date: Mon, 24 Feb 2025 13:12:35 -0800 X-Gm-Features: AWEUYZmqQ_oMQj1qN6J1ZfJrxg2kX6Ja6mVP58QLqJ3DeU8D3DiCKPbvDksnOuo Message-ID: Subject: Re: [PATCH RFC v2 00/10] SLUB percpu sheaves To: Vlastimil Babka Cc: Kent Overstreet , "Liam R. Howlett" , Christoph Lameter , David Rientjes , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, Sebastian Andrzej Siewior , Alexei Starovoitov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 0A76480008 X-Stat-Signature: b9hwd3t9sgtubgg8qcd783fa77hf1qr3 X-HE-Tag: 1740431567-459398 X-HE-Meta: U2FsdGVkX19m56xTOmCgP5QV5aYSY25sr9F6xAypk9GhCDTFxk7gCmLXs9Udy6v+C42q0dngRsDbbgKNzYliRmjQrG+W8zT7p+XTE8ue6x22UX3OS9FPNlBTIkTQV1ED8NK3TQ9ceSMpXE/XNaTj2OL8pcnpqFru5cSUY4p9PaVsuCX2SRU/WLDuhz3xHFSWiiskoy6tDWLG4UI6iQcETOSsmA4fNKgFZF7qby50W5Cf9OPkv+g97JNFhWJ0rr9VNKUxQy+5QRGN2xVqhZsr285cRK1gPgGnAdPbnTJBrcVvC6xTBINMmg3E4NGUHY0eQJF4Lpl2HhtwPzxXTI3hSX1ONa1LHiQkCbZn0AGnYkbdgjgMAg5WiRo/i/JDdkWPRxAarQoA/UmuH/29zCUhXIj9L9/devuwuIHbefu6vytkwhyuh6Kt+HE7nJcbxoifJU8sfLk4i5p4iCS6Qya1Jht2+YO4sC1sBzoMHV1Er8XfndTNuZSRN7h3+fqC3e+h2PC+oGaHB+8dSDv9m8jYpGNwxuKiQhv8i+31H61qHDXAiZgYCk/nIfrJQKwFFBdgkfUFS/T99ZI1geA2X4tMzVYTrGE9ZoekT+XSFjiZjTWFe4x24pOPPVLeljFrgEMv42UExPpf7dhcFaw+/FMA2eDqPg692PrYajFIXHF/tCmn0doMsJCI3zwHM4XecWKY6MWK9NUG2NKRlaqe3v1eIXf5HnFhFl9uAWNqoFij7mxxPbS7z8mC+ttoXxccnV3zNJ+BH0+AMQ9XoAUmRRhVTHKpE9Kiw8DzgGmrLStTyW+DzEe01zygesQrdKZ0HYga2CRt0aJT3h5DLUkJcR/4FvlQViuyac3upj+5hMFLTO8/JksLuJ+aH7tkoaz5XBPmXzvF3KfzdiMI5+FdTKZz28ACy7dTD4lnfEFBevn/MbcArxaSzvyTVrucFbdag6NZKeZVLJIBgankwBhxoM4 cZPC2Es6 rgRVNT5/lWLLZr2wdUmU0retkfB4/KhyibS63MM4nZyBsppXXZ+mEisIVtMq9Y1CXKbnHDM9uBCGmjXR7b4b2fA0RpFNv047yKOANSBYJenDu41L3SOU4waJemW/+hcI/rrAgAkfoOJKAMot5D5E5Stk+n+L4+gpl2DO/3Q/h6EVZ0XrEmGU8GjbXIxva1vvFgPDhrL1/H+sV3aH+NnnnuZOt24lmdwdicM6nefFL5IePFBqBN6A7supbW2XgUNVImSoan6nwlHybDJ7BAnT1KU0zEB7lGO4eVD09th5+xDalNfI2HESuvr93Dg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 24, 2025 at 12:53=E2=80=AFPM Vlastimil Babka w= rote: > > On 2/24/25 02:36, Suren Baghdasaryan wrote: > > On Sat, Feb 22, 2025 at 8:44=E2=80=AFPM Suren Baghdasaryan wrote: > >> > >> Don't know about this particular part but testing sheaves with maple > >> node cache and stress testing mmap/munmap syscalls shows performance > >> benefits as long as there is some delay to let kfree_rcu() do its job. > >> I'm still gathering results and will most likely post them tomorrow. > > Without such delay, the perf is same or worse? The perf is about the same if there is no delay. > > > Here are the promised test results: > > > > First I ran an Android app cycle test comparing the baseline against sh= eaves > > used for maple tree nodes (as this patchset implements). I registered a= bout > > 3% improvement in app launch times, indicating improvement in mmap sysc= all > > performance. > > There was no artificial 500us delay added for this test, right? Correct. No artificial changes in this test. > > > Next I ran an mmap stress test which maps 5 1-page readable file-backed > > areas, faults them in and finally unmaps them, timing mmap syscalls. > > Repeats that 200000 cycles and reports the total time. Average of 10 su= ch > > runs is used as the final result. > > 3 configurations were tested: > > > > 1. Sheaves used for maple tree nodes only (this patchset). > > > > 2. Sheaves used for maple tree nodes with vm_lock to vm_refcnt conversi= on [1]. > > This patchset avoids allocating additional vm_lock structure on each mm= ap > > syscall and uses TYPESAFE_BY_RCU for vm_area_struct cache. > > > > 3. Sheaves used for maple tree nodes and for vm_area_struct cache with = vm_lock > > to vm_refcnt conversion [1]. For the vm_area_struct cache I had to repl= ace > > TYPESAFE_BY_RCU with sheaves, as we can't use both for the same cache. > > Hm why we can't use both? I don't think any kmem_cache_create check makes > them exclusive? TYPESAFE_BY_RCU only affects how slab pages are freed, it > doesn't e.g. delay reuse of individual objects, and caching in a sheaf > doesn't write to the object. Am I missing something? Ah, I was under impression that to use sheaves I would have to ensure the freeing happens via kfree_rcu()->kfree_rcu_sheaf() path but now that you mentioned that, I guess I could keep using kmem_cache_free() and that would use free_to_pcs() internally... When time comes to free the page, TYPESAFE_BY_RCU will free it after the grace period. I can try that combination as well and see if anything breaks. > > > The values represent the total time it took to perform mmap syscalls, l= ess is > > better. > > > > (1) baseline control > > Little core 7.58327 6.614939 (-12.77%) > > Medium core 2.125315 1.428702 (-32.78%) > > Big core 0.514673 0.422948 (-17.82%) > > > > (2) baseline control > > Little core 7.58327 5.141478 (-32.20%) > > Medium core 2.125315 0.427692 (-79.88%) > > Big core 0.514673 0.046642 (-90.94%) > > > > (3) baseline control > > Little core 7.58327 4.779624 (-36.97%) > > Medium core 2.125315 0.450368 (-78.81%) > > Big core 0.514673 0.037776 (-92.66%) > > > > Results in (3) vs (2) indicate that using sheaves for vm_area_struct > > yields slightly better averages and I noticed that this was mostly due > > to sheaves results missing occasional spikes that worsened > > TYPESAFE_BY_RCU averages (the results seemed more stable with > > sheaves). > > Thanks a lot, that looks promising! Indeed, that looks better than I expected :) Cheers! > > > [1] https://lore.kernel.org/all/20250213224655.1680278-1-surenb@google.= com/ > > >