From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 526FDC021B2 for ; Tue, 25 Feb 2025 20:27:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C10AE6B008C; Tue, 25 Feb 2025 15:27:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BBFF06B0093; Tue, 25 Feb 2025 15:27:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A887F280001; Tue, 25 Feb 2025 15:27:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8A1AE6B008C for ; Tue, 25 Feb 2025 15:27:13 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 41A7BAF7E7 for ; Tue, 25 Feb 2025 20:27:13 +0000 (UTC) X-FDA: 83159601546.25.C1B9258 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf28.hostedemail.com (Postfix) with ESMTP id 65F26C0006 for ; Tue, 25 Feb 2025 20:27:11 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=JYAzzprk; spf=pass (imf28.hostedemail.com: domain of surenb@google.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740515231; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z+xN7AiTJrC0ynsK4hEQTFSzLqDnqccgLdu8kvgFfMA=; b=D9ohCD42ptmAgB719l3/S77UzqKwooX/GVCwTMqc/mLB5wVfOu+3Jkq6xMvTbmQPzD0Rtt 8RD6TU7v+fgej/7V8zSnWGAdXbF68eBY00/halBkG40ZZFgedbm0XyPoacnQuBD55k06xD ZZJkOwWRf7Ho6xYfx7WBlZc3p1vU4hc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=JYAzzprk; spf=pass (imf28.hostedemail.com: domain of surenb@google.com designates 209.85.160.180 as permitted sender) smtp.mailfrom=surenb@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740515231; a=rsa-sha256; cv=none; b=3z5T65ph94deV0c2kbB1BHnaE3NU8+VCFADGq8MeA9oIIC62ch3Hy5M27PNbzWCxOBWHNb EI3hCitNqly5nMhVz7a9QZcdtwe5mhtQM14vVJlaAyaBoMAUwsx0Z0zik/nZhsfmW16Ni1 oDSz+1NRkivqL11+LXuhdF0tysVZ/jQ= Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-472098e6e75so3391cf.1 for ; Tue, 25 Feb 2025 12:27:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740515230; x=1741120030; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=z+xN7AiTJrC0ynsK4hEQTFSzLqDnqccgLdu8kvgFfMA=; b=JYAzzprkgqDzycKZ3Gcp5WpD8A1mTUoZIldNYJjXa+6N4gCKww8Chpp5n7y9H1Lswr BvCJ2orrZXNfbCbvI767bcuaBVI6QUWHTnfzBIVRsNscrI4Q0lKxgXar6t1bbadck+AP PGx7auLmFZLfj0N6rF3+OttGpZzzy099s0V1TMRT2rtuN6aYT4AGFfI6t1SNbVBIFObH Dp7IawTeFRKdhrQ48ZHl2dO8HEVIu3yngU7XBhrdSkwXDDI9JPJIPCkN87wBvQGcL69w W7+XmRU+3/5hVKVEsFdS6rNUxetSeBl/zRI7kFlNkqoTQlLkLndTL5ZtdLCivMF/Tprp 3pkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740515230; x=1741120030; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z+xN7AiTJrC0ynsK4hEQTFSzLqDnqccgLdu8kvgFfMA=; b=Uos1JAyG3XsiurWOuE63RHTQiIXxgovxDFtmM6d8NYqrnajcaiY9dgXJ54AzY0tCXu qPS3nVmauSKuhg/WzF6Gvk6OwjtGZWNwuoz8Uv+9KPKKqQ6RhPNyp4S1kRVnZ8+/Ibc4 XFjvHAvcQfLP8R1x0SVmhH3C2s4FQO/oPSnFj3N+q1PlfsmLIFBjSaZQLsYtY62IETrZ Yfn9Zh0Di5bAyaw6mOKH/q0ZBsp8+8HVWyFGngesX0s2V7PT8dmHBRr9M6hz7ivGZ60Y VJcAvdmHpLlZxSQUq6viZ1Q9zO0WNBa6HxflXK1ubV6rh08KpyamZglrEUzcL1Yeq0vE nAmw== X-Forwarded-Encrypted: i=1; AJvYcCWKLZPY5XLDTHoF4twFTifvnI+4u3hIJnAA9G/mc6KhtYGzqLm5N6MQwEZ9rKWCGZ2EkFBIeEvWXA==@kvack.org X-Gm-Message-State: AOJu0Yzc3q/HNBT8mf3Gh2VLDwPLeMre958LhIoI138uLULo9cQHWo7h YdtvO3y2zBPmFrOkKwe1zr+56l3T4p4Z7w+Br9ieLvk/Fk+SrD43UzC00gRGF54KbAJJmb+saU3 /+HIBDmbX19OY9+FhLwaWphQbZdPo6Bb2iws1 X-Gm-Gg: ASbGncvUHpND81VQVfEJOuc8GI2RVSxJA2bk+bXTM9b3Oss8uysi3uALKKlbur/bvzg FL7XWeda3NKu0nszdclb32WmE8uW5ccgMVkd10PIirFRY936QhIYyUtQebUe2FGDq33gQYBTehm CMpbo7CZo= X-Google-Smtp-Source: AGHT+IGCPRUl7EyfMG5UNOKIzDP+flI/Kvq0Te1XG5ifn6l5XLiP/ay3uq/Wy8cM0LsrEbECRUyyFMnrwQptEMRROk4= X-Received: by 2002:a05:622a:1a95:b0:471:f34d:1d83 with SMTP id d75a77b69052e-47376e6f153mr5771291cf.7.1740515230157; Tue, 25 Feb 2025 12:27:10 -0800 (PST) MIME-Version: 1.0 References: <20250214-slub-percpu-caches-v2-0-88592ee0966a@suse.cz> <173d4dbe-399d-4330-944c-9689588f18e8@suse.cz> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 25 Feb 2025 12:26:59 -0800 X-Gm-Features: AWEUYZmbUvBZYQOEnyJSGvoQZOLqN-tvNLOGbur4IzoQIF06Ruiv7YZpQQ18P-k Message-ID: Subject: Re: [PATCH RFC v2 00/10] SLUB percpu sheaves To: Vlastimil Babka Cc: Kent Overstreet , "Liam R. Howlett" , Christoph Lameter , David Rientjes , Roman Gushchin , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Uladzislau Rezki , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rcu@vger.kernel.org, maple-tree@lists.infradead.org, Sebastian Andrzej Siewior , Alexei Starovoitov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 65F26C0006 X-Stat-Signature: 6dyab4umu3f6p1gakyqjjesqqyuprd8d X-HE-Tag: 1740515231-354238 X-HE-Meta: U2FsdGVkX1/JBPHVdLy2L4JSYAJTOloLzre49c626tSJ4/gehP4rzIKfcyHmalg/FCY2tS6KhtKVJoofgoRJX23GTIj8JAvgLWkUxVUx5qas5WvHhMmkzSLuAt/CvQkEkVHK6eSgsMy54cVsStwzoWnhvlUevdzDKkj4GIKb5W87YaqBBs0gaiVgH2Lv0FpoijvJ6G5fYhSyX+T9u/LsJYvVN69EPbQKy9vy3xHQfyQ5RBSj3P6gIuKD3BIYSfY/Z7LK0pMrYsyZcpNBa8qiqCM/0Dr0HWBDRdfGCMyQ0tfINGJ3beUxW1DI93RFLT6Hg2iY8sbhaRcsQQE2zJD3izo/4FwE0xsA0eKskjKogfscqpAeDuzhqm0QcD275A1giaRf+xQ0a2RQHx+2qcUUGk5tfxxyREqdLRKvsrXWBQcrMeGdR5eSBxmiOhHGH+OVaJZemXPHbCRxd7AFlBFzPEgm7WRiszY7mRIHIaS0Q0Udd41tuCUXtIS8hKfncW6hoHPbeDrPLveVG2F920K2+++7gr182pPiREOsa5EZiveAcapHVkR8Ri+ZJH7/PHzeADXNwv/5GT9u46/qG35KDmiETwrZ00EcDbghzq0UCT7wAyOOmCuJlgESabho49iHLForBxyNHoLIWYaF0NH3DzDu4eptmrfxyzUg0j0CIi6wPFBUr8pw5qMom3pricWmy/uW+4xkcZkBV4ZcSyPNqwTQocsr35JKllifTa2UWGZf9Vafv9XhRZrmjQ5Fjv2X9V4Z15Jb2s5IsXFJB1P3L9k8UPV2QPMeUuk2J1BYkgb9sqjmX3G1e2E/hpwugFWer5YfSUrzS/QhHSZyo1Oez4VBF2DjVUBgPyvPlKeBNkrZlaAO4yazOpXP6HVI7MxoWIoO+2yrGp2MKafXUmYEaQMn5CFgwDMjzUAnB43FcXctbtTbICdiCO6JXCHAO4pOYxkUn1u4FqUIA3kPP4X 1wHsKHiQ vzt2XfjZzWYHF0KYYVtczF7XQIfkowl844et5jDqaYQYIqxAKfm1a8/MEl/OleXVmLtZHmyDCBtQNdOJ90Y7snLHpuno+b/u3aLzQwl5t7yJyzvZadhBTdDhd49NcG/0lU2YtSHmT7++qPJr7huDx6kF0dwZBMD0iy+25mPCdaot7m/faVXfp2ThpidOElA3Kpm/xoGHm1064YLsCo907CmPe1v5nGz2zuhNkHj3F4iePgk9PEaFdHzh9W1bx3KffQycEdfgfasaI893MQ7rRadfjoRewnkNn8TZPQwn3e8FPYT6VtLnSYZsmvg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Feb 24, 2025 at 1:12=E2=80=AFPM Suren Baghdasaryan wrote: > > On Mon, Feb 24, 2025 at 12:53=E2=80=AFPM Vlastimil Babka = wrote: > > > > On 2/24/25 02:36, Suren Baghdasaryan wrote: > > > On Sat, Feb 22, 2025 at 8:44=E2=80=AFPM Suren Baghdasaryan wrote: > > >> > > >> Don't know about this particular part but testing sheaves with maple > > >> node cache and stress testing mmap/munmap syscalls shows performance > > >> benefits as long as there is some delay to let kfree_rcu() do its jo= b. > > >> I'm still gathering results and will most likely post them tomorrow. > > > > Without such delay, the perf is same or worse? > > The perf is about the same if there is no delay. > > > > > > Here are the promised test results: > > > > > > First I ran an Android app cycle test comparing the baseline against = sheaves > > > used for maple tree nodes (as this patchset implements). I registered= about > > > 3% improvement in app launch times, indicating improvement in mmap sy= scall > > > performance. > > > > There was no artificial 500us delay added for this test, right? > > Correct. No artificial changes in this test. > > > > > > Next I ran an mmap stress test which maps 5 1-page readable file-back= ed > > > areas, faults them in and finally unmaps them, timing mmap syscalls. > > > Repeats that 200000 cycles and reports the total time. Average of 10 = such > > > runs is used as the final result. > > > 3 configurations were tested: > > > > > > 1. Sheaves used for maple tree nodes only (this patchset). > > > > > > 2. Sheaves used for maple tree nodes with vm_lock to vm_refcnt conver= sion [1]. > > > This patchset avoids allocating additional vm_lock structure on each = mmap > > > syscall and uses TYPESAFE_BY_RCU for vm_area_struct cache. > > > > > > 3. Sheaves used for maple tree nodes and for vm_area_struct cache wit= h vm_lock > > > to vm_refcnt conversion [1]. For the vm_area_struct cache I had to re= place > > > TYPESAFE_BY_RCU with sheaves, as we can't use both for the same cache= . > > > > Hm why we can't use both? I don't think any kmem_cache_create check mak= es > > them exclusive? TYPESAFE_BY_RCU only affects how slab pages are freed, = it > > doesn't e.g. delay reuse of individual objects, and caching in a sheaf > > doesn't write to the object. Am I missing something? > > Ah, I was under impression that to use sheaves I would have to ensure > the freeing happens via kfree_rcu()->kfree_rcu_sheaf() path but now > that you mentioned that, I guess I could keep using kmem_cache_free() > and that would use free_to_pcs() internally... When time comes to free > the page, TYPESAFE_BY_RCU will free it after the grace period. > I can try that combination as well and see if anything breaks. This seems to be working fine. The new configuration is: 4. Sheaves used for maple tree nodes and for vm_area_struct cache with vm_lock to vm_refcnt conversion [1]. vm_area_struct cache uses both TYPESAFE_BY_RCU and sheaves (but obviously not kfree_rcu_sheaf()). > > > > > > The values represent the total time it took to perform mmap syscalls,= less is > > > better. > > > > > > (1) baseline control > > > Little core 7.58327 6.614939 (-12.77%) > > > Medium core 2.125315 1.428702 (-32.78%) > > > Big core 0.514673 0.422948 (-17.82%) > > > > > > (2) baseline control > > > Little core 7.58327 5.141478 (-32.20%) > > > Medium core 2.125315 0.427692 (-79.88%) > > > Big core 0.514673 0.046642 (-90.94%) > > > > > > (3) baseline control > > > Little core 7.58327 4.779624 (-36.97%) > > > Medium core 2.125315 0.450368 (-78.81%) > > > Big core 0.514673 0.037776 (-92.66%) (4) baseline control Little core 7.58327 4.642977 (-38.77%) Medium core 2.125315 0.373692 (-82.42%) Big core 0.514673 0.043613 (-91.53%) I think the difference between (3) and (4) is noise. Thanks, Suren. > > > > > > Results in (3) vs (2) indicate that using sheaves for vm_area_struct > > > yields slightly better averages and I noticed that this was mostly du= e > > > to sheaves results missing occasional spikes that worsened > > > TYPESAFE_BY_RCU averages (the results seemed more stable with > > > sheaves). > > > > Thanks a lot, that looks promising! > > Indeed, that looks better than I expected :) > Cheers! > > > > > > [1] https://lore.kernel.org/all/20250213224655.1680278-1-surenb@googl= e.com/ > > > > >