From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB5F2C02197 for ; Sat, 14 Sep 2024 13:45:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 13C216B007B; Sat, 14 Sep 2024 09:45:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0EBEA6B0082; Sat, 14 Sep 2024 09:45:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1C2D6B0083; Sat, 14 Sep 2024 09:45:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D56246B007B for ; Sat, 14 Sep 2024 09:45:30 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6C351162039 for ; Sat, 14 Sep 2024 13:45:30 +0000 (UTC) X-FDA: 82563466020.08.2C36ADD Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by imf09.hostedemail.com (Postfix) with ESMTP id 882F7140010 for ; Sat, 14 Sep 2024 13:45:28 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d9P712i6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.167.50 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726321449; a=rsa-sha256; cv=none; b=T5Q3cLr3ab/Tr1DmIiMMUL9y2QFIdAlodZ3q8+Pu3fCmK73+Fn2g8WtlmKasiAFDipm4c+ BSbCUHiQa1wzUqmJezl6nJWjfJHkK5yIt6pGvbkQo83XUeWMgyvHmKnrLGQqiim3SzcYEp Kg0wRaggx5SLU6Y7QpWZf9BFgPBUoMg= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=d9P712i6; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf09.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.167.50 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726321449; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Dnx6z67Qvm8Om8aao/Sb8GX+CmG9uaSgAAMTwiPf1j4=; b=Klsp0mVrtqWYMDJM9aeO8SauFFRfhJuBbRnhDW4uBdIKG20Qge/2Po3h+D7qZ/SIV15dUi Rr4PLhVeZt4fMUXie6HLHwjSKqYBa7Rh2rpVyET20hICsadCv8qBE8Ot+5v5MtMn5g6wpX h3hcSCS4cBBDL3zxTYXuRcx42ke/N9M= Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-536584f6c84so3970521e87.0 for ; Sat, 14 Sep 2024 06:45:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726321527; x=1726926327; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Dnx6z67Qvm8Om8aao/Sb8GX+CmG9uaSgAAMTwiPf1j4=; b=d9P712i6RTrKeCHbPlJA4FPgou6ThQaE4lfeg7wgQqJbqNMQO4jHPXA+8bUSzzyqXB 3ZMqipZhN2TmuTtEJgj4+iiEBlWllboWL/Uo3XQ1U3Zs4akii0ftIwu95CKxwTyugxr9 iZ8o6XvuF486lvICLALHGMqn65EmKQKGLze4Xrnw7Xt8DM33TGO6e+fTDnCFPIYq6iUG n87OxzghTf3mNMB5ywmiRaQ0kuJ+Hx7DZhk0Ua9sGonZHM5wFkte7Ool0za+RutjErGQ krxlWj2/DAKa0cV4C2vSMYWvssqitMM003+tduytO0bA0VO3jjAPqsldaTk0U7czmH/w XJVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726321527; x=1726926327; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Dnx6z67Qvm8Om8aao/Sb8GX+CmG9uaSgAAMTwiPf1j4=; b=pjoqmCjOXNxJW0JM22+GTThW+Zoyd7n+yQ6pw2ZeeEfMec1JkFv3wI9FAeO07r6341 Xe3KkJC0atWcJBvT39w7EgSJEqC5MhGETgP+gSVFl/ZlGeNL6rOnc6TSBcLx7PDmJt29 qzRQn0Si3qHOiurTP4rNk4o1jyQYNjffJG8j78na16PwlbBDV7UZLcrdgXIoZjy9XY7D 3tRQjEz+tNFMt4VqjhlsXs7j5pKBaZshjMk9pjHh7jiobdY0caTVSbpxZkYrwvlB6PS3 rM6S2RsAvTCdaiWhY1i6/9Ul1AhlO5Hc49Zz7Gnwm9K8iiVXQIet1/vMoayjLX4tQ817 6/1w== X-Gm-Message-State: AOJu0YwGk+GFyncQfk2Z1xiFfOE6BzEIwn4krC6TXVWxyKw7aIYYMhcE Z4kHuekI7pcTiZyVxGFDlthYgTaOL9Er6cbHuykZY3H/ZtH9mFCMQ5NNtVz95TBFtjp9CQYeYAr 8xe4Ysu4DWNJSzSZ2wqobV2iV+54= X-Google-Smtp-Source: AGHT+IGbsdHKb+cxECZzgxcilu2dyLbNfszhUAzcNWB/SvN9RP1kKKljZFHLhLufBTxBpmgsNE/LruHhG1Fyv0ydQM4= X-Received: by 2002:a05:6512:138c:b0:535:6778:952b with SMTP id 2adb3069b0e04-53678fed19cmr5292708e87.44.1726321526389; Sat, 14 Sep 2024 06:45:26 -0700 (PDT) MIME-Version: 1.0 References: <20240819070204.753179-1-liuyongqiang13@huawei.com> <6e744d2b-bbb3-4e1f-bd61-e0e971f974db@huawei.com> In-Reply-To: <6e744d2b-bbb3-4e1f-bd61-e0e971f974db@huawei.com> From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Sat, 14 Sep 2024 22:45:14 +0900 Message-ID: Subject: Re: [PATCH] mm, slub: prefetch freelist in ___slab_alloc() To: Yongqiang Liu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, zhangxiaoxu5@huawei.com, cl@linux.com, wangkefeng.wang@huawei.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 882F7140010 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ck8wecd1waj4da6oxhyxcho38fygj4bq X-HE-Tag: 1726321528-479761 X-HE-Meta: U2FsdGVkX18u+hjCgQEEGWfDvj2XiuAarZxsmvCfVtaG95UulhgD3vtc58FXm2BzNN0RFF71N/rxttYEOOcsqMRWK8rrx3gvHEsSWKkCT/IvIB9LMzcsStAU776lgXAklQL1Y4r31dZ7WIGBHxBW/+FtuP6qLYyQdaiG8bK2Gu6BlYhv5MQJwnDvSd8SlVH/GO+1sM9iiHhBcHHVUYla2g0FtYJLPg0ucyONdDTqYMa4QyFVmDIqd/iiF0zzUI2OX4EV8A6OpTZ5p+imGDH0LMk0twHxokrql68PUbV/oKzRKVfARRmWLSKSuYO/zdyeVNoHJblyGGFUWGJsN/NajcVagA+QE0aKTwW9EOQdIkpwB3hLDq/hKSrD80dkHExd699e1+KQshGPaCtuGrpi+OI6GRHTadxhFNxm1xc46q//9yQIiJpflp9RSy2FiicvrVEvdTxk9Fa1t/za6kCXbjtuONHlTX7WDuCHYOO+yJ3u0kWblP+gn4BtgCrsirSvk+Y6rXMvc3vtzScCRC6ZfmzsAIB6NmSGaY+st6354kw7pIxQXQBlTBe3uNnf++wrtkzp+FLE40Bsig6okBUAZRSJAtzq9Imcnz8t5uIF4fnN0tweXUK6BrdeKVZ2nZG01E2T3viBsN0jjErDTWVyY7mIX95toj0Of97hkWM+p1kphCylm5SK+1bRP9pKasX5pE9H5d3w6itoHUACOMTXBNXI2scuZihFAsNbHg0Xm4gCDR6HT+PnP7VYXDzDMBj9AhJzgxmj9PrVOdUp1P0ukd7tera0f/DfkZtLELszeeBzbncziOg5lqaFMUaWiyEqEHvk/Iw+GEiJ/fUBg5lusfNu99CK2mFe6abagKUVv3if+MYLx9mi5Vky922Qn3RimPGmzqeTqbPv47U9O69DysA32Vil0Fiq1Br115ODBaoLshLBSZDEM26tSH2oFo2i6gMa0C5PZSVQf+8eUDb IhqaaV37 nL3K2wjosoubnk0UB7vhXabsOggfd8S0J2totYi2if5lNm6MHXsw6xEpkKoEc7H9ttbYB5i/ZFGEIy7ynYZj/EqirIGPdiopzN+hOt40YGsi1rTzs1sXBWgPr++DuRIloLZ8wVAW7pOq7mAyeF7yUSN77q61LS5sWn9fq6EEqNk1E7MCjq3mQ2Ny3zKlgiKJP4mBDkvdqOcKTzaEA0xEaKh3tWGVDD35r4t1A7XhPePn4R0XNOvc80o0Qn//T08pJ8+7SGCgF4RUShic+Ji3uOvp06ivtBrU54thfmft8cZML/xOKtFluo/l8P8TybUr7p6o6eZaVD/DfTEdPRojlchE6CyOnt9khQ2Hzhd4HNU7gFxieNsPglT96YHR5LEBXjsfPu8JPxhk/Kfvp49xePmsNXg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Aug 21, 2024 at 3:58=E2=80=AFPM Yongqiang Liu wrote: > > > =E5=9C=A8 2024/8/19 17:33, Hyeonggon Yoo =E5=86=99=E9=81=93: > > On Mon, Aug 19, 2024 at 4:02=E2=80=AFPM Yongqiang Liu wrote: > >> commit 0ad9500e16fe ("slub: prefetch next freelist pointer in > >> slab_alloc()") introduced prefetch_freepointer() for fastpath > >> allocation. Use it at the freelist firt load could have a bit > >> improvement in some workloads. Here is hackbench results at > >> arm64 machine(about 3.8%): > >> > >> Before: > >> average time cost of 'hackbench -g 100 -l 1000': 17.068 > >> > >> Afther: > >> average time cost of 'hackbench -g 100 -l 1000': 16.416 > >> > >> There is also having about 5% improvement at x86_64 machine > >> for hackbench. > > I think adding more prefetch might not be a good idea unless we have > > more real-world data supporting it because prefetch might help when sla= b > > is frequently used, but it will end up unnecessarily using more cache > > lines when slab is not frequently used. > Hi, sorry for the late reply. Thanks for explaining how it impacts hackbench, even when prefetch is added in the slow path. However, I still think the main issue is that hackbench is too synthetic to make a strong argument that prefetch in the slow path would help in most real-world scenarios. > Yes, prefetching unnecessary objects is a bad idea. But I think the slab > entered > > in slowpath that means it will more likely need more objects. The fast path is hit when an object can be allocated from the CPU slab without much work, and the slow path is hit when it can=E2=80=99t. This doesn't give any indication about future allocations. To be honest, I'm not even sure if prefetching in the fast path really helps if slab is not frequently called. Just because it hits the fast path or slow path doesn=E2=80=99t necessarily mean more objects will be needed in the future. And then, I don't think "Prefetch some data because we might need it in the future" is not a good argument because if we don't need it, it just wastes a cache line. If it helps in some cases but hurts in other cases, is not a net gain. I might be wrong. If I am, please prove me wrong and convince me and others= . Best, Hyeonggon > I've tested the cases from commit 0ad9500e16fe ("slub: prefetch next free= list pointer in > slab_alloc()"). Here is the result: > Before: > > Performance counter stats for './hackbench 50 process 4000' (32 runs): > > 2545.28 msec task-clock # 6.938 CPUs > utilized ( +- 1.75% ) > 6166 context-switches # 0.002 > M/sec ( +- 1.58% ) > 1129 cpu-migrations # 0.444 > K/sec ( +- 2.16% ) > 13298 page-faults # 0.005 > M/sec ( +- 0.38% ) > 4435113150 cycles # 1.742 > GHz ( +- 1.22% ) > 2259717630 instructions # 0.51 insn per > cycle ( +- 0.05% ) > 385847392 branches # 151.593 > M/sec ( +- 0.06% ) > 6205369 branch-misses # 1.61% of all > branches ( +- 0.56% ) > > 0.36688 +- 0.00595 seconds time elapsed ( +- 1.62% ) > After: > > Performance counter stats for './hackbench 50 process 4000' (32 runs): > > 2277.61 msec task-clock # 6.855 CPUs > utilized ( +- 0.98% ) > 5653 context-switches # 0.002 > M/sec ( +- 1.62% ) > 1081 cpu-migrations # 0.475 > K/sec ( +- 1.89% ) > 13217 page-faults # 0.006 > M/sec ( +- 0.48% ) > 3751509945 cycles # 1.647 > GHz ( +- 1.14% ) > 2253177626 instructions # 0.60 insn per > cycle ( +- 0.06% ) > 384509166 branches # 168.821 > M/sec ( +- 0.07% ) > 6045031 branch-misses # 1.57% of all > branches ( +- 0.58% ) > > 0.33225 +- 0.00321 seconds time elapsed ( +- 0.97% ) > > > > > Also I don't understand how adding prefetch in slowpath affects the per= formance > > because most allocs/frees should be done in the fastpath. Could you > > please explain? > > By adding some debug info to count the slowpath for the hackbench: > > 'hackbench -g 100 -l 1000' slab alloc total: 80416886, and the slowpath: > 7184236. > > About 9% slowpath in total allocation. The perf stats in arm64 as follow= =EF=BC=9A > > Before: > Performance counter stats for './hackbench -g 100 -l 1000' (32 runs): > > 34766611220 branches ( +- 0.01% ) > 382593804 branch-misses # 1.10% of all > branches ( +- 0.14% ) > 1120091414 cache-misses ( +- 0.08% ) > 76810485402 L1-dcache-loads ( +- 0.03% ) > 1120091414 L1-dcache-load-misses # 1.46% of all > L1-dcache hits ( +- 0.08% ) > > 23.8854 +- 0.0804 seconds time elapsed ( +- 0.34% ) > > After: > Performance counter stats for './hackbench -g 100 -l 1000' (32 runs): > > 34812735277 branches ( +- 0.01% ) > 393449644 branch-misses # 1.13% of all > branches ( +- 0.15% ) > 1095185949 cache-misses ( +- 0.15% ) > 76995789602 L1-dcache-loads ( +- 0.03% ) > 1095185949 L1-dcache-load-misses # 1.42% of all > L1-dcache hits ( +- 0.15% ) > > 23.341 +- 0.104 seconds time elapsed ( +- 0.45% ) > > It seems having less L1-dcache-load-misses. > > > > >> Signed-off-by: Yongqiang Liu > >> --- > >> mm/slub.c | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/mm/slub.c b/mm/slub.c > >> index c9d8a2497fd6..f9daaff10c6a 100644 > >> --- a/mm/slub.c > >> +++ b/mm/slub.c > >> @@ -3630,6 +3630,7 @@ static void *___slab_alloc(struct kmem_cache *s,= gfp_t gfpflags, int node, > >> VM_BUG_ON(!c->slab->frozen); > >> c->freelist =3D get_freepointer(s, freelist); > >> c->tid =3D next_tid(c->tid); > >> + prefetch_freepointer(s, c->freelist); > >> local_unlock_irqrestore(&s->cpu_slab->lock, flags); > >> return freelist; > >> > >> -- > >> 2.25.1 > >>