From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 773DCC41535 for ; Fri, 22 Dec 2023 17:38:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3BD36B0078; Fri, 22 Dec 2023 12:38:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DEA0F6B007B; Fri, 22 Dec 2023 12:38:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB26D6B007D; Fri, 22 Dec 2023 12:38:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id BB7C16B0078 for ; Fri, 22 Dec 2023 12:38:04 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 88A03C0F24 for ; Fri, 22 Dec 2023 17:38:04 +0000 (UTC) X-FDA: 81595162488.11.C34822F Received: from mail-oo1-f49.google.com (mail-oo1-f49.google.com [209.85.161.49]) by imf19.hostedemail.com (Postfix) with ESMTP id AF0111A000A for ; Fri, 22 Dec 2023 17:38:02 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qVQgDA5u; spf=pass (imf19.hostedemail.com: domain of chriscli@google.com designates 209.85.161.49 as permitted sender) smtp.mailfrom=chriscli@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703266682; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=O5n3huQU5lWBUr8Rz2OnID+02fDHGMZrkL0eteOMUa8=; b=7jUGLNlpeV9fkb9aGb6hBp6OsN6OPmfIjhE3DJVkR5HZYwyHkY07zp0amTp53zZRyF9Nq9 lgqoHSYs8VyKC+nY/+jlrlElPIG4R07ntGpzoN8Y73bL1Kt8YjQ9DfYbOMrRsynZQnbSgH qygTPwKWpwqUzu7Au7Pkx8/sX4Vj/Q4= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=qVQgDA5u; spf=pass (imf19.hostedemail.com: domain of chriscli@google.com designates 209.85.161.49 as permitted sender) smtp.mailfrom=chriscli@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703266682; a=rsa-sha256; cv=none; b=X+2K8Yt7w5Sz6cce7RmtOKlqu74RTP06eG9BvGdeLt6mbPFRfPH7mTNx7MI7Jnzmu6QRfz gSDOiY7+4xCHmdCqiQkrogVpTDjO9Q9DV1vuI0cmrUEpE1YJJloW3p4roYzljHnDBtKGxg ncKZ7roTeNYouesRD7f4uyoyCs/zUnY= Received: by mail-oo1-f49.google.com with SMTP id 006d021491bc7-593f6fb21a5so1397864eaf.2 for ; Fri, 22 Dec 2023 09:38:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1703266682; x=1703871482; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=O5n3huQU5lWBUr8Rz2OnID+02fDHGMZrkL0eteOMUa8=; b=qVQgDA5ud6yXN+CPdt/KFHICBED7xJueCBVlnMsrTc2uRUR0LV6cVI4NJeX0wtzvbJ kp2BtVEwUgYhuPn2c9d9OGoSWezfGx9AdQ9NQ/0EiYFgMMCzrQlhldxlLnE5ucmWS7dc g3IbDoL0nqUOKbxuMDAofgTD3L9ylaRBRPbUARy/anPx0wu3MM4Vnd7MqeKgduSzgY/H bizDIReLW9yLeCEVrX21oE8hWWbjPNcqXrvvPqayg8dJJD2DWSCqfrJlGZ5JQZA2JVt/ dCpClC5ILX9hBejLtDEanaPuj5VG0ip+mNYYw/lx7aMgSJAYl+5FvK6iQ4Z1dGiVGUNg tkXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703266682; x=1703871482; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=O5n3huQU5lWBUr8Rz2OnID+02fDHGMZrkL0eteOMUa8=; b=EKwLRTfvyXg2LKamN5djEP3KhVH7sydD5VxQB34ZAyEQsyUl3Ffe1LZ88vKtg7uAS1 appxLnUGoVs+0YFJLW/eccRGt3a9HfP5PrTNRwIlJduhThg1gnK1A7YB7U7hfG50zok+ Q48wxX91UsoQ98VVJ43P1gdYU7zhTPsXXoGzbfKNzylGJRnvAm31Hjk+5MGkVhkg9wLr gcc9Ct+mL4VOJ/vZYVkbnyuBajgdGZ5FSr/15LuZ9WCdRiWp2RsShVdkHc2mz18XukVQ zY2hm5Cosy25nPdoGGJ+4W5pi6zqa5p9lwEipeRV31Uy1lTuVc324mqVncWvPHE0VUzW P/AA== X-Gm-Message-State: AOJu0Yw9KPJYNEnqPW6Lfc6Yp8A3v+abFK5ftFVNAYJJIEbWqUmhAfcD u65LXRybWlx8jWicxcT82pfntDfOvnWMBahh3GL5TbOMReZh X-Google-Smtp-Source: AGHT+IEd7rO4QQp3uc4cnmz5R0Xp817l34JKN7OhVMrNW0RwUM9Q2PeqO0H8gBh9lpdECakIloV+ZAXDTmf4/YTOGzo= X-Received: by 2002:a05:6359:630f:b0:172:f38f:876e with SMTP id sf15-20020a056359630f00b00172f38f876emr1266970rwb.28.1703266681434; Fri, 22 Dec 2023 09:38:01 -0800 (PST) MIME-Version: 1.0 References: <20231213-zswap-dstmem-v3-0-4eac09b94ece@bytedance.com> <20231213-zswap-dstmem-v3-6-4eac09b94ece@bytedance.com> <2a22e9b5-dc8c-4c4d-81c2-2f4e1850cf3d@bytedance.com> In-Reply-To: <2a22e9b5-dc8c-4c4d-81c2-2f4e1850cf3d@bytedance.com> From: Chris Li Date: Fri, 22 Dec 2023 09:37:50 -0800 Message-ID: Subject: Re: [PATCH v3 6/6] mm/zswap: directly use percpu mutex and buffer in load/store To: Chengming Zhou Cc: Yosry Ahmed , Nhat Pham , Seth Jennings , Vitaly Wool , Dan Streetman , Johannes Weiner , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: AF0111A000A X-Rspam-User: X-Stat-Signature: 9rmhe4nnhr8r7zuhgymwehwfqaqgnnmx X-Rspamd-Server: rspam01 X-HE-Tag: 1703266682-63700 X-HE-Meta: U2FsdGVkX1/LOvWZ1FznMvxiorXSXd6hT8lnuEuI9sG61T4fUi75/jIifN44iMRtPwtxywFLowKWZ2JhDRLqg3mHBQTJ+wJNdfKvtExvQSc3x3Y+rAV4ZNxWrk5dSnMWMzuuHG6pxLJ/uch6CKezkXRV1UAiPabfD+v8tcM7CgAiRl8OQk2iNd0WEOPILQrIgY6COlTkAD0PukgAwBv2A7rXZkGFA2nzBdbDETmADbm/JAEWi8XEfUZDXJSr5OWzEDAiVGsJ/YGU4HJxLRJhmSQ/WRRP7kNwSPUuMqboyGXcCGcbqcef8E+s8DoHC5T1Dyw0CJqa0DRJcDp8DmPg5dLQiZ3ENHaQwOR4V12VwxbBTDAxqWVHqob48TmceG4bt4Aj7+i0HF5aplwZr4YsXZ2G7DmGhXRmxt2ULnYDoySHKLq2eb56A9PaTH1tQpxBEqLbA07TI3P9n2qxZtMvsQLOQztBwRHXfESoYv0vv749dN1kv4QQ8XYR8Sp2+wm+0qEP0GBT38Vejbjn0gdCQo3w9hs1Ys0eAU6+x3nGAp2tchhHSbeOr8Fuzm/xaLJfbFBPaPWtOkUF1lo4FWGrkrdlTdmfYz9XhkEfowGKpcj7roCyOtVdvXS1QA7G8hEwOnogGAeAuPuz8/e8or3X8dFzZ5R7od9EU6fwrzIM7wpZY4AeerzVCTdB89QVOoYoHXhAULgfOxPS7EPkxgSesSC3/Zp5g4f1YHwDXnoQeaI9sgmV/SaLcZF4ZwZDU6ybDxt1d8LCG7fqhnRf8pp5bGpNSpFMMCbxwvVkOli1dmXDCa2sX2CBMpbYEjcmqqNqG1NT8lwvm9NCEREQzIKuWtoy9zxGxvozAwE2On7pPWkj9J+BGNrlPyBtdQpu8YNe+dueoLyUtMzCd/P2QkHTGGIgH5I2TjPk5YXwwIl+x7IeJTP3ZVeY6wurZkOrhdQFy9ltMkvdhT3vTp81Of5 SDPcRjzF 36/RbLPZdYI9PZ2Og5wUw0ZvdEpVqScjrXqcBSqrfyqul4eKv84UY5AV4nGdC125SCbW6I/rXLUrLFk2eAejz6O9oBI/DKSQAmnqB/u4s1naEdNbnGiou8tEsRtGfTSeYjHryECmPX4tBj11iqMUK0XvxVlLGVzV4DjsskptpfqGUBxX0zjimOGE4VhF23Yy/Sa0dcdlbR0ln3WUp5Tw2bvUvLMOp4MoBNyi/1SEs12AXBipWld6n6aZjWj7+JpCyWQZd/SvI49c/UTVgEmQRM9QPgk1CafiD6FnPrkoStvdmCunH7+H6Db+FUgHT2I7DUsmdpX2nnFuEFtP38HgjiRFzHIrNysNnRlMMtHQoL+zAPh4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Chengming, The patch looks good to me. Acked-by: Chris Li (Google) On Wed, Dec 20, 2023 at 4:21=E2=80=AFAM Chengming Zhou wrote: > > On 2023/12/20 05:39, Yosry Ahmed wrote: > > On Tue, Dec 19, 2023 at 10:43=E2=80=AFAM Nhat Pham = wrote: > >> > >> On Tue, Dec 19, 2023 at 5:29=E2=80=AFAM Chris Li w= rote: > >>> > >>> Hi Chengming and Yosry, > >>> > >>> On Mon, Dec 18, 2023 at 3:50=E2=80=AFAM Chengming Zhou > >>> wrote: > >>>> > >>>> Since the introduce of reusing the dstmem in the load path, it seems > >>>> confusing that we are now using acomp_ctx->dstmem and acomp_ctx->mut= ex > >>>> now for purposes other than what the naming suggests. > >>>> > >>>> Yosry suggested removing these two fields from acomp_ctx, and direct= ly > >>>> using zswap_dstmem and zswap_mutex in both the load and store paths, > >>>> rename them, and add proper comments above their definitions that th= ey > >>>> are for generic percpu buffering on the load and store paths. > >>>> > >>>> So this patch remove dstmem and mutex from acomp_ctx, and rename the > >>>> zswap_dstmem to zswap_buffer, using the percpu mutex and buffer on > >>>> the load and store paths. > >>> > >>> Sorry joining this discussion late. > >>> > >>> I get the rename of "dstmem" to "buffer". Because the buffer is used > >>> for both load and store as well. What I don't get is that, why do we > >>> move it out of the acomp_ctx struct. Now we have 3 per cpu entry: > >>> buffer, mutex and acomp_ctx. I think we should do the reverse, fold > >>> this three per cpu entry into one struct the acomp_ctx. Each per_cpu > >>> load() has a sequence of dance around the cpu id and disable preempt > >>> etc, while each of the struct member load is just a plan memory load. > >>> It seems to me it would be more optimal to combine this three per cpu > >>> entry into acomp_ctx. Just do the per cpu for the acomp_ctx once. > >> > >> I agree with Chris. From a practicality POV, what Chris says here > >> makes sense. From a semantic POV, this buffer is only used in > >> (de)compression contexts - be it in store, load, or writeback - so it > >> belonging to the orignal struct still makes sense to me. Why separate > >> it out, without any benefits. Just rename the old field buffer or > >> zswap_buffer and call it a day? It will be a smaller patch too! > >> > > > > My main concern is that the struct name is specific for the crypto > > acomp stuff, but that buffer and mutex are not. > > How about we keep it in the struct, but refactor the struct as follows: > > > > struct zswap_ctx { > > struct { > > struct crypto_acomp *acomp; > > struct acomp_req *req; > > struct crypto_wait wait; > > } acomp_ctx; > > u8 *dstmem; > > struct mutex *mutex; > > }; > > > > , and then rename zswap_pool.acomp_ctx to zswap_pool.ctx? > > I think there are two viewpoints here, both works ok to me. > > The first is from ownship or lifetime, these percpu mutex and dstmem > are shared between all pools, so no one pool own the mutex and dstmem, > nor does the percpu acomp_ctx in each pool. > > The second is from usage, these percpu mutex and dstmem are used by > the percpu acomp_ctx in each pool, so it's easy to use to put pointers > to them in the percpu acomp_ctx. > > Actually I think it's simpler to let the percpu acomp_ctx has its own > mutex and dstmem, which in fact are the necessary parts when it use > the acomp interfaces. Agree, that is why I prefer to keep the struct together. I am fine with what Yosry suggested and the anonymous struct, just consider it is not critically necessary. > > This way, we could delete the percpu mutex and dstmem, and its hotplugs, That is the real value of this patch. Thanks for doing that. > and not shared anymore between all pools. Maybe we would have many pools > at the same time in the future, like different compression algorithm or > different zpool for different memcg workloads. Who knows? :-) As long as the zswap is not re-enterable, e.g. never have the nested page fault that causes zswap_load within another zswap_load, I think we are fine having more than one pool share the buffer. In fact, if we trigger the nested zswap load, I expect the kernel will dead lock on the nested mutex acquire because the mutex is already taken. We will know about kernel panic rather than selient memory corruption. > > So how about this patch below? Just RFC. > > Subject: [PATCH] mm/zswap: make each acomp_ctx has its own mutex and dstm= em Thank you for doing that, you can consider submitting it as the real patch instead of the RFC. I see real value in this patch removing some per CPU fields. > > Signed-off-by: Chengming Zhou > --- > include/linux/cpuhotplug.h | 1 - > mm/zswap.c | 86 ++++++++++++-------------------------- > 2 files changed, 26 insertions(+), 61 deletions(-) > > diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h > index efc0c0b07efb..c3e06e21766a 100644 > --- a/include/linux/cpuhotplug.h > +++ b/include/linux/cpuhotplug.h > @@ -124,7 +124,6 @@ enum cpuhp_state { > CPUHP_ARM_BL_PREPARE, > CPUHP_TRACE_RB_PREPARE, > CPUHP_MM_ZS_PREPARE, > - CPUHP_MM_ZSWP_MEM_PREPARE, > CPUHP_MM_ZSWP_POOL_PREPARE, > CPUHP_KVM_PPC_BOOK3S_PREPARE, > CPUHP_ZCOMP_PREPARE, > diff --git a/mm/zswap.c b/mm/zswap.c > index 2c349fd88904..37301f1a80a9 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -694,63 +694,31 @@ static void zswap_alloc_shrinker(struct zswap_pool = *pool) > /********************************* > * per-cpu code > **********************************/ > -static DEFINE_PER_CPU(u8 *, zswap_dstmem); > -/* > - * If users dynamically change the zpool type and compressor at runtime,= i.e. > - * zswap is running, zswap can have more than one zpool on one cpu, but = they > - * are sharing dtsmem. So we need this mutex to be per-cpu. > - */ > -static DEFINE_PER_CPU(struct mutex *, zswap_mutex); > - > -static int zswap_dstmem_prepare(unsigned int cpu) Nice! Chris > -{ > - struct mutex *mutex; > - u8 *dst; > - > - dst =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_node(cpu)); > - if (!dst) > - return -ENOMEM; > - > - mutex =3D kmalloc_node(sizeof(*mutex), GFP_KERNEL, cpu_to_node(cp= u)); > - if (!mutex) { > - kfree(dst); > - return -ENOMEM; > - } > - > - mutex_init(mutex); > - per_cpu(zswap_dstmem, cpu) =3D dst; > - per_cpu(zswap_mutex, cpu) =3D mutex; > - return 0; > -} > - > -static int zswap_dstmem_dead(unsigned int cpu) > -{ > - struct mutex *mutex; > - u8 *dst; > - > - mutex =3D per_cpu(zswap_mutex, cpu); > - kfree(mutex); > - per_cpu(zswap_mutex, cpu) =3D NULL; > - > - dst =3D per_cpu(zswap_dstmem, cpu); > - kfree(dst); > - per_cpu(zswap_dstmem, cpu) =3D NULL; > - > - return 0; > -} > - > static int zswap_cpu_comp_prepare(unsigned int cpu, struct hlist_node *n= ode) > { > struct zswap_pool *pool =3D hlist_entry(node, struct zswap_pool, = node); > struct crypto_acomp_ctx *acomp_ctx =3D per_cpu_ptr(pool->acomp_ct= x, cpu); > struct crypto_acomp *acomp; > struct acomp_req *req; > + int ret =3D 0; > + > + acomp_ctx->dstmem =3D kmalloc_node(PAGE_SIZE, GFP_KERNEL, cpu_to_= node(cpu)); > + if (!acomp_ctx->dstmem) > + return -ENOMEM; > + > + acomp_ctx->mutex =3D kmalloc_node(sizeof(struct mutex), GFP_KERNE= L, cpu_to_node(cpu)); > + if (!acomp_ctx->mutex) { > + ret =3D -ENOMEM; > + goto mutex_fail; > + } > + mutex_init(acomp_ctx->mutex); > > acomp =3D crypto_alloc_acomp_node(pool->tfm_name, 0, 0, cpu_to_no= de(cpu)); > if (IS_ERR(acomp)) { > pr_err("could not alloc crypto acomp %s : %ld\n", > pool->tfm_name, PTR_ERR(acomp)); > - return PTR_ERR(acomp); > + ret =3D PTR_ERR(acomp); > + goto acomp_fail; > } > acomp_ctx->acomp =3D acomp; > > @@ -758,8 +726,8 @@ static int zswap_cpu_comp_prepare(unsigned int cpu, s= truct hlist_node *node) > if (!req) { > pr_err("could not alloc crypto acomp_request %s\n", > pool->tfm_name); > - crypto_free_acomp(acomp_ctx->acomp); > - return -ENOMEM; > + ret =3D -ENOMEM; > + goto req_fail; > } > acomp_ctx->req =3D req; > > @@ -772,10 +740,15 @@ static int zswap_cpu_comp_prepare(unsigned int cpu,= struct hlist_node *node) > acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG, > crypto_req_done, &acomp_ctx->wait); > > - acomp_ctx->mutex =3D per_cpu(zswap_mutex, cpu); > - acomp_ctx->dstmem =3D per_cpu(zswap_dstmem, cpu); > - > return 0; > +req_fail: > + crypto_free_acomp(acomp_ctx->acomp); > +acomp_fail: > + kfree(acomp_ctx->mutex); > +mutex_fail: > + kfree(acomp_ctx->dstmem); > + > + return ret; > } > > static int zswap_cpu_comp_dead(unsigned int cpu, struct hlist_node *node= ) > @@ -788,6 +761,8 @@ static int zswap_cpu_comp_dead(unsigned int cpu, stru= ct hlist_node *node) > acomp_request_free(acomp_ctx->req); > if (!IS_ERR_OR_NULL(acomp_ctx->acomp)) > crypto_free_acomp(acomp_ctx->acomp); > + kfree(acomp_ctx->mutex); > + kfree(acomp_ctx->dstmem); > } > > return 0; > @@ -1901,13 +1876,6 @@ static int zswap_setup(void) > goto cache_fail; > } > > - ret =3D cpuhp_setup_state(CPUHP_MM_ZSWP_MEM_PREPARE, "mm/zswap:pr= epare", > - zswap_dstmem_prepare, zswap_dstmem_dead); > - if (ret) { > - pr_err("dstmem alloc failed\n"); > - goto dstmem_fail; > - } > - > ret =3D cpuhp_setup_state_multi(CPUHP_MM_ZSWP_POOL_PREPARE, > "mm/zswap_pool:prepare", > zswap_cpu_comp_prepare, > @@ -1939,8 +1907,6 @@ static int zswap_setup(void) > if (pool) > zswap_pool_destroy(pool); > hp_fail: > - cpuhp_remove_state(CPUHP_MM_ZSWP_MEM_PREPARE); > -dstmem_fail: > kmem_cache_destroy(zswap_entry_cache); > cache_fail: > /* if built-in, we aren't unloaded on failure; don't allow use */ > -- > 2.20.1 > >