From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81987C3DA6E for ; Thu, 21 Dec 2023 00:19:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A5BCC8D0003; Wed, 20 Dec 2023 19:19:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A0C148D0001; Wed, 20 Dec 2023 19:19:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AC0E8D0003; Wed, 20 Dec 2023 19:19:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 7A25A8D0001 for ; Wed, 20 Dec 2023 19:19:54 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 382C71C09B9 for ; Thu, 21 Dec 2023 00:19:54 +0000 (UTC) X-FDA: 81588917508.02.327E41A Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf06.hostedemail.com (Postfix) with ESMTP id 6EF55180029 for ; Thu, 21 Dec 2023 00:19:52 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=djbFwF39; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703117992; a=rsa-sha256; cv=none; b=oYhI12MPQ3xBH2m/DfoWhWL81hZptINskHZyZqmIsvR9m5VlKI86ZU12QpQhRre6M7mywx jkThskb12lvfxreDE8lpMC8HqmGBL1wbg/5rl1e/dTiNPbDAqBXUx7i1EaA3zHRooVN/Mn eIcd/Dtw7PdSt1QyzI0ar40rnzs1Yx0= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=djbFwF39; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703117992; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wuiGIGRCemwNH8PWp5epsekKJJKgPaBxQM1YEsDyA0M=; b=xh+jO1k+Y2la6q2kcUob/l99B0JLRbaoajwK01lUb5YXUIgR7OAcqN1dPZDJ+JjnNNN0VC uDGuSQEO2ent3WH3iRyTC26/JDisjqZFd3sbH/c3jPHI1uo1uGI9es2nrkGAtFR4Jq2PGj YfbFsBWtZETcISTZykKbmqhnc/W22mg= Received: by mail-ed1-f43.google.com with SMTP id 4fb4d7f45d1cf-553e22940a3so289511a12.1 for ; Wed, 20 Dec 2023 16:19:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1703117991; x=1703722791; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wuiGIGRCemwNH8PWp5epsekKJJKgPaBxQM1YEsDyA0M=; b=djbFwF39M72Bm5pIp6P616ljXzUESOofOVXyeeEC9xU0Tk7nO+WamePaIs5TEEXB9c 6gCE+lCkdjmzIVoam0Aqhfl1oV9Hkr9hgsIH1Rip7Z7ZJBCp/uGS5hpN1bGXEJc8Pn1b iEidVZ8766UL2z+ROTct0FOczcpJFiijIpJko2NCDchhtTqUyretiH5FE5gA9+idXxiE +0KJ7GT7bB/LIN1Zrdc+R6UHr1B5JQEIh9IcmRBQG/UA4HBULBjoZHHCsQaaJ4EabGQQ hrp3y2fPXUDiCmRt4JKRM5pbn7g53zgnJea/bGd+uKVQry0pQ9GvEcPE9vP2DQ6TYsNQ j9Yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703117991; x=1703722791; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wuiGIGRCemwNH8PWp5epsekKJJKgPaBxQM1YEsDyA0M=; b=TjY0fPMgZDQ4XgXAUEd9cus103xWMpMnGl2EaIdVDiVgpR3ZbIcu44XX9UdX4n76Vb nL0y8L/kZukbdyUWvQ18yCI5gHKIn+jXbco9auy/wQ4Uj9y4ivY7STFVyM5EJ07MA54U 1vQBNBbVK1ZfAJfPq2i611ix/NT8niMbtGNuinukcQjujbc+6E+OmV4KsgcMQdL1vFw7 LTyr9TCLpkYCHdl4w8UrneA9VzEVR75XAjJAl+CKsml3aCo7b3bcrdtFlONblEXlAaoY eoSmGSQqh/Yhc5X6FtTVrGqloZ/kQZ5qt8mzbLNnLpSFje2rWtXtX8v8tX8hyVNYu68x 3REA== X-Gm-Message-State: AOJu0Yxx04yMjYaOmePxTBLkUmgLescJQnRP9JmOqHN6fX7fyzBiAn2Q jGw8x7vblN+53zO8juuB6dmKSv7dUsd7RFQzEAsOGg== X-Google-Smtp-Source: AGHT+IGCUEqkemJZ+j/E16kaAot3yn8ee9J81XvCCoUDLEhPnbkXG/xvpLfo9Bb0s56i6+X1dN14NlonUC3H+ZcWGgg= X-Received: by 2002:a17:906:49:b0:a23:f50:6cff with SMTP id 9-20020a170906004900b00a230f506cffmr6167990ejg.111.1703117990663; Wed, 20 Dec 2023 16:19:50 -0800 (PST) MIME-Version: 1.0 References: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> <20231213-zswap-dstmem-v3-6-4eac09b94ece@bytedance.com> <2a22e9b5-dc8c-4c4d-81c2-2f4e1850cf3d@bytedance.com> In-Reply-To: <2a22e9b5-dc8c-4c4d-81c2-2f4e1850cf3d@bytedance.com> From: Yosry Ahmed Date: Wed, 20 Dec 2023 16:19:14 -0800 Message-ID: Subject: Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store To: Chengming Zhou Cc: Nhat Pham , Chris Li , Seth Jennings , Vitaly Wool , Dan Streetman , Johannes Weiner , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 6EF55180029 X-Stat-Signature: p6ds5sjqyj3871tf9totuys4gcx6obqc X-HE-Tag: 1703117992-220453 X-HE-Meta: U2FsdGVkX186bLlW8evf4AYfxQKX/ePlnsnOs9J3pMHcWeOw6qpXXgy/2YpNrznhXXaU1P/c3P2Ctm+L9x6hm4lKJPSICSKK+70TCBZ8SOfNz7MLtYcTGzGlcyAfMkZNVPz5IhQTnueeRMNaWLiVFsNBoMDnnCSIborwRwt8SUDSEKBp9osniXMOEL18k6fjxnPUx/hbgnrL7dLhIu8DhXmJ7YheJFh6B9lT5RvFIZIG9WINsCGUXnVKXhqytJy7o4uUUttlk8gbQcZ+ZVdsdrpqgkjFxJ85vCYVvnJeZ5Dgc1nyH9U2TbKmbCWI3q4z+GrRcdvw52077QAy+xy1BpMyBT862tsVG9wnhbmSuwT1MRtkhgRVr38GTfxzm4gbmNj0vHVmB5Ebfp/FW/GOl8/i5t5pNf1qWDir3kAWe+F0a13eJ2wZpgSIhaQi7hAs417oEuocoDhsQpWJnApcgFmBsasWkfJvVa8toIOhP4IyhOOw3qgjlm5Rog9rE/vLXTORrx38a/96ei6oGwzMy5ANbpWZ3J6DvFMcB2yT1ctqkz89UwsnfbcVtvG6I7FdE+Rcs/52kLg67h5EhKFXyNvPIQVFHypwXQO0L2yRVQkNPNo3aAYvQhc27D1Op3I22StG6R7z61NuNf/js9INBCQfXQlqkEXBCWD1QMrSEYqGuWXjq+ROcRmqLz7MiF1NnIqlMbZqXAicUCyH9nKAbNyeNItRt6Q3cDYH+xosoX0zNISnHV+HawfHLcRz3w93C2Nk1sm8QY2Hy6bVnvaGqiZD9V55VLpK9oN+S8WcNxKMOw6tnYj+xKPMZvz0jBA82ZNQa16G/DO0ZPgDpsT6R5/f8HwDSgq40SSEH2lMx5JQuTcCS+mk+p50CWL0ic/ZGuEIcKmLvt3ynZXoPSPJ8fZZgfg1L/Q8IiQkkudPlqVOrRlM27pXh+TrnuOf9m3/aiB6RPGnKsD+ZuvvFYA kMq4vFg/ ykhf7lojTtVP+aOsxR3BnlbLfPJlTYvs9Isw8vIqzJvE3D23p3N0w8wrd6vhViLzurt5z+pWnLOlvqVrGK/lX3E889u7OLcgIhODl7D2i/+raMBFRtdgTAS1VQkU0g2U/HV20K7pWk50LJLfDbIiav8moqiB+e37dElXhwyXdCBEUVKtCT5xRg/9j0fi0aZlr+wKl8FiP4p5VI/7HwDJQIKFdB/iTL47tyHh4syT4KLesfKeguxlxgCXpWNuymAdpiXQSdkonNRv61563PZ7YR91WkjqsgZrZmfMTdsfDelYA5HvrjXwa2LGGiw2BJKiOo6lS8OczqoFz0pibjku0j3A3kOqYkY/zG6LuP9QetLjX3+g= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Dec 20, 2023 at 4:20=E2=80=AFAM Chengming Zhou wrote: > > On 2023/12/20 05:39, Yosry Ahmed wrote: > > On Tue, Dec 19, 2023 at 10:43=E2=80=AFAM Nhat Pham = wrote: > >> > >> On Tue, Dec 19, 2023 at 5:29=E2=80=AFAM Chris Li w= rote: > >>> > >>> Hi Chengming and Yosry, > >>> > >>> On Mon, Dec 18, 2023 at 3:50=E2=80=AFAM Chengming Zhou > >>> wrote: > >>>> > >>>> Since the introduce of reusing the dstmem in the load path, it seems > >>>> confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mut= ex > >>>> now for purposes other than what the naming suggests. > >>>> > >>>> Yosry suggested removing these two fields from acomp_ctx, and direct= ly > >>>> using zswap_dstmem and zswap_mutex in both the load and store paths, > >>>> rename them, and add proper comments above their definitions that th= ey > >>>> are for generic percpu buffering on the load and store paths. > >>>> > >>>> So this patch remove dstmem and mutex from acomp_ctx, and rename the > >>>> zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on > >>>> the load and store paths. > >>> > >>> Sorry joining this discussion late. > >>> > >>> I get the rename of "dstmem" to "buffer". Because the buffer is used > >>> for both load and store as well. What I don't get is that, why do we > >>> move it out of the acomp_ctx struct. Now we have 3 per cpu entry: > >>> buffer, mutex and acomp_ctx. I think we should do the reverse, fold > >>> this three per cpu entry into one struct the acomp_ctx. Each per_cpu > >>> load() has a sequence of dance around the cpu id and disable preempt > >>> etc, while each of the struct member load is just a plan memory load. > >>> It seems to me it would be more optimal to combine this three per cpu > >>> entry into acomp_ctx. Just do the per cpu for the acomp_ctx once. > >> > >> I agree with Chris. From a practicality POV, what Chris says here > >> makes sense. From a semantic POV, this buffer is only used in > >> (de)compression contexts - be it in store, load, or writeback - so it > >> belonging to the orignal struct still makes sense to me. Why separate > >> it out, without any benefits. Just rename the old field buffer or > >> zswap_buffer and call it a day? It will be a smaller patch too! > >> > > > > My main concern is that the struct name is specific for the crypto > > acomp stuff, but that buffer and mutex are not. > > How about we keep it in the struct, but refactor the struct as follows: > > > > struct zswap_ctx { > > struct { > > struct crypto_acomp *acomp; > > struct acomp_req *req; > > struct crypto_wait wait; > > } acomp_ctx; > > u8 *dstmem; > > struct mutex *mutex; > > }; > > > > , and then rename zswap_pool.acomp_ctx to zswap_pool.ctx? > > I think there are two viewpoints here, both works ok to me. > > The first is from ownship or lifetime, these percpu mutex and dstmem > are shared between all pools, so no one pool own the mutex and dstmem, > nor does the percpu acomp_ctx in each pool. > > The second is from usage, these percpu mutex and dstmem are used by > the percpu acomp_ctx in each pool, so it's easy to use to put pointers > to them in the percpu acomp_ctx. > > Actually I think it's simpler to let the percpu acomp_ctx has its own > mutex and dstmem, which in fact are the necessary parts when it use > the acomp interfaces. > > This way, we could delete the percpu mutex and dstmem, and its hotplugs, > and not shared anymore between all pools. Maybe we would have many pools > at the same time in the future, like different compression algorithm or > different zpool for different memcg workloads. Who knows? :-) > > So how about this patch below? Just RFC. The general approach looks fine to me, although I still prefer we reorganize the struct as Chris and I discussed: rename crypto_acomp_ctx to a generic name, add a (anonymous) struct for the crypto_acomp stuff, rename dstmem to buffer or so. I think we can also make the mutex a static part of the struct, any advantage to dynamically allocating it? > > Subject: [PATCH] mm/zswap: make each acomp_ctx has its own mutex and dstm= em > > Signed-off-by: Chengming Zhou > --- > include/linux/cpuhotplug.h | 1 - > mm/zswap.c | 86 ++++++++++++-------------------------- > 2 files changed, 26 insertions(+), 61 deletions(-) > > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > index efc0c0b07efb..c3e06e21766a 100644 > --- a/include/linux/cpuhotplug.h > +++ b/include/linux/cpuhotplug.h > @@ -124,7 +124,6 @@ enum cpuhp_state { > CPUHP_ARM_BL_PREPARE, > CPUHP_TRACE_RB_PREPARE, > CPUHP_MM_ZS_PREPARE, > - CPUHP_MM_ZSWP_MEM_PREPARE, > CPUHP_MM_ZSWP_POOL_PREPARE, > CPUHP_KVM_PPC_BOOK3S_PREPARE, > CPUHP_ZCOMP_PREPARE, > diff --git a/mm/zswap.c b/mm/zswap.c > index 2c349fd88904..37301f1a80a9 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -694,63 +694,31 @@ static void zswap_alloc_shrinker(struct zswap_pool = *pool) > /********************************* > * per-cpu code > **********************************/ > -static DEFINE_PER_CPU(u8 *, zswap_dstmem); > -/* > - * If users dynamically change the zpool type and compressor at runtime,= i.e. > - * zswap is running, zswap can have more than one zpool on one cpu, but = they > - * are sharing dtsmem. So we need this mutex to be per-cpu. > - */ > -static DEFINE_PER_CPU(struct mutex *, zswap_mutex); > - > -static int zswap_dstmem_prepare(unsigned int cpu) > -{ > - struct mutex *mutex; > - u8 *dst; > - > - dst =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); > - if (!dst) > - return -ENOMEM; > - > - mutex =3D kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cp= u)); > - if (!mutex) { > - kfree(dst); > - return -ENOMEM; > - } > - > - mutex_init(mutex); > - per_cpu(zswap_dstmem, cpu) =3D dst; > - per_cpu(zswap_mutex, cpu) =3D mutex; > - return 0; > -} > - > -static int zswap_dstmem_dead(unsigned int cpu) > -{ > - struct mutex *mutex; > - u8 *dst; > - > - mutex =3D per_cpu(zswap_mutex, cpu); > - kfree(mutex); > - per_cpu(zswap_mutex, cpu) =3D NULL; > - > - dst =3D per_cpu(zswap_dstmem, cpu); > - kfree(dst); > - per_cpu(zswap_dstmem, cpu) =3D NULL; > - > - return 0; > -} > - > static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *n= ode) > { > struct zswap_pool *pool =3D hlist_entry(node, struct zswap_pool, = node); > struct crypto_acomp_ctx *acomp_ctx =3D per_cpu_ptr(pool->acomp_ct= x, cpu); > struct crypto_acomp *acomp; > struct acomp_req *req; > + int ret =3D 0; > + > + acomp_ctx->dstmem =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_= node(cpu)); > + if (!acomp_ctx->dstmem) > + return -ENOMEM; > + > + acomp_ctx->mutex =3D kmalloc_node(sizeof(struct mutex), GFP_KERNE= L, cpu_to_node(cpu)); > + if (!acomp_ctx->mutex) { > + ret =3D -ENOMEM; > + goto mutex_fail; > + } > + mutex_init(acomp_ctx->mutex); > > acomp =3D crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_no= de(cpu)); > if (IS_ERR(acomp)) { > pr_err("could not alloc crypto acomp %s : %ld\n", > pool->tfm_name, PTR_ERR(acomp)); > - return PTR_ERR(acomp); > + ret =3D PTR_ERR(acomp); > + goto acomp_fail; > } > acomp_ctx->acomp =3D acomp; > > @@ -758,8 +726,8 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, s= truct hlist_node *node) > if (!req) { > pr_err("could not alloc crypto acomp_request %s\n", > pool->tfm_name); > - crypto_free_acomp(acomp_ctx->acomp); > - return -ENOMEM; > + ret =3D -ENOMEM; > + goto req_fail; > } > acomp_ctx->req =3D req; > > @@ -772,10 +740,15 @@ static int zswap_cpu_comp_prepare(unsigned int cpu,= struct hlist_node *node) > acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, > crypto_req_done, &acomp_ctx->wait); > > - acomp_ctx->mutex =3D per_cpu(zswap_mutex, cpu); > - acomp_ctx->dstmem =3D per_cpu(zswap_dstmem, cpu); > - > return 0; > +req_fail: > + crypto_free_acomp(acomp_ctx->acomp); > +acomp_fail: > + kfree(acomp_ctx->mutex); > +mutex_fail: > + kfree(acomp_ctx->dstmem); > + > + return ret; > } > > static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node= ) > @@ -788,6 +761,8 @@ static int zswap_cpu_comp_dead(unsigned int cpu, stru= ct hlist_node *node) > acomp_request_free(acomp_ctx->req); > if (!IS_ERR_OR_NULL(acomp_ctx->acomp)) > crypto_free_acomp(acomp_ctx->acomp); > + kfree(acomp_ctx->mutex); > + kfree(acomp_ctx->dstmem); > } > > return 0; > @@ -1901,13 +1876,6 @@ static int zswap_setup(void) > goto cache_fail; > } > > - ret =3D cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:pr= epare", > - zswap_dstmem_prepare, zswap_dstmem_dead); > - if (ret) { > - pr_err("dstmem alloc failed\n"); > - goto dstmem_fail; > - } > - > ret =3D cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE, > "mm/zswap_pool:prepare", > zswap_cpu_comp_prepare, > @@ -1939,8 +1907,6 @@ static int zswap_setup(void) > if (pool) > zswap_pool_destroy(pool); > hp_fail: > - cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE); > -dstmem_fail: > kmem_cache_destroy(zswap_entry_cache); > cache_fail: > /* if built-in, we aren't unloaded on failure; don't allow use */ > -- > 2.20.1 >