From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FA7CC27C6E for ; Fri, 14 Jun 2024 04:46:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C82E26B0098; Fri, 14 Jun 2024 00:42:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C31E36B00A2; Fri, 14 Jun 2024 00:42:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB7796B0099; Fri, 14 Jun 2024 00:42:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A870C6B00BC for ; Fri, 14 Jun 2024 00:40:12 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1B5B11A047E for ; Fri, 14 Jun 2024 04:40:11 +0000 (UTC) X-FDA: 82228242222.05.E3DD0B8 Received: from mail-yb1-f175.google.com (mail-yb1-f175.google.com [209.85.219.175]) by imf12.hostedemail.com (Postfix) with ESMTP id 472A040005 for ; Fri, 14 Jun 2024 04:40:09 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ab65Zynr; spf=pass (imf12.hostedemail.com: domain of flintglass@gmail.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718340006; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lbtquG4p0hVvqKFk1U/Le3nDHhfbk+WM+GokVxGfK54=; b=vkPRQ89r16bQCNtvk80TLUKCLRDzSSnNLFc9ZPrcuN4KgcD6+hBS/Zl86vGkun7seGqS84 fRlU/d/Y/tszFeX7+La6PeokqrF6gymDhiV5umO+AX/aCiT1EYd5POahHS7Luy81bXvAn5 kJTHtUAdWEe0+oLps5T4POshurvfpJo= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Ab65Zynr; spf=pass (imf12.hostedemail.com: domain of flintglass@gmail.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718340006; a=rsa-sha256; cv=none; b=UrGTCcn/fpWzm1zMO0XbV1fIyh586g5bY/dcn0dZD5iEqxL2U0X0QQmXCO0qX3Hy9mxemF QPag+pGoVcuOwD0ttR0mk+YJqW7JE1h86v9B1g5zvxg5vBQZ9+h4VGKIeNO43P/6Vly71c 6cIKxf3w0sSTnQy6tTPM8g2yntVXi28= Received: by mail-yb1-f175.google.com with SMTP id 3f1490d57ef6-dfa7fafa0c0so2329018276.0 for ; Thu, 13 Jun 2024 21:40:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718340008; x=1718944808; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lbtquG4p0hVvqKFk1U/Le3nDHhfbk+WM+GokVxGfK54=; b=Ab65Zynr435ZsQsk3thq3SSWybr+JVL5/QsKyAZc5FjkO+LPLdaUGehSALnZM+yZTH Te7e0+7w88hiRE6eUHbrkowv31B5hK929X77Neh+TPJCzOiPkfS2MnQgnlLkHk4vI/YR sbsvIM4H7ai/8k8nlLO9aTeNeAsx1svtF0q0JsF3EMWbc/zsOvopjNYi0l2w1PpiddGz l9Ots5vW3EFJDk0mIn6fNclzYGLp/ZxO8gcBUZtcxwwJDWPmSeilZll9W5TcYTBmRIK3 sYmWWzgPqfO+APBdoNanAXBy4VVKDka3AxZhIxu3l81o8h+xSXHojcg9D3CIJY5/LgUU H/nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718340008; x=1718944808; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lbtquG4p0hVvqKFk1U/Le3nDHhfbk+WM+GokVxGfK54=; b=bdS1pGyUIcCOOcaPYcGZT+zqSUYpGuVvXEYK1i2hqgVaTeY982Bnb85pVpdERhQnlC sg6v4nLpGpblg2FADuhVQ2ZDiDzoNJXDzXkMyjEzi5pqi5IN/ygtExlYdn1M/6GTCTb0 I7LWKmdHJfoK3rz2bN6t3aCn1YtOY0ZQokAllj1ePxhtXWj0vlKbwSfAVuB/Vbxey40l zT3aMgJ0hmQjI/Z55WceK5VdOS6LhTAY77Lecud+K0k2Jh7ylaBwq7cne66nM9yFoBHF dkUDsBN8gBnQKbxuxCSNgcTAAeFH3/jaGzmIc4KCUVLyUIalce0q4hyvKJCV7paKyNUo 4nSg== X-Forwarded-Encrypted: i=1; AJvYcCWzQp2dGLY3wLavlsJ7M42qdouaf4jdql0x2cKk4zXonsipZb911D4H2JdVlq5cUp0Vy49IZZ7xzN7/tAAPqH1Kzi4= X-Gm-Message-State: AOJu0Yx844+1NBSfOOkae5TCRSqnmdn0SVE9H2SYNFXRMvkG9j5awCpV A5BhPVPlERBaC9Er38TCu0AWzy1tKCh5p7qx2THDtJfOTVuRjWY2ZyQx/br95c7rCjcPqzE2Kzu 4+bO/STMBiX1SjrMSRZB9ny1HZ5c= X-Google-Smtp-Source: AGHT+IHBsx4z2qtUMxZdNHJ7rhea/P9NlQ3RdnormcjNH+YUFkeA2RwUCdIJnmNFgrDo65EOWOo2cUHVGvpZXlOKNQU= X-Received: by 2002:a25:abb2:0:b0:dfb:f8c:39f2 with SMTP id 3f1490d57ef6-dfefebc9ca9mr2619063276.5.1718340008223; Thu, 13 Jun 2024 21:40:08 -0700 (PDT) MIME-Version: 1.0 References: <20240608155316.451600-1-flintglass@gmail.com> <20240608155316.451600-2-flintglass@gmail.com> In-Reply-To: From: Takero Funaki Date: Fri, 14 Jun 2024 13:39:57 +0900 Message-ID: Subject: Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration To: Shakeel Butt Cc: Nhat Pham , Yosry Ahmed , Johannes Weiner , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 472A040005 X-Stat-Signature: dchr6ae4dfn7ksmn3mzkfw8p4jo4jfd8 X-Rspam-User: X-HE-Tag: 1718340009-491083 X-HE-Meta: U2FsdGVkX1+MgE1ibsZ3o92RF2pbMALePLzB8EqVmmYmtAFaZqVXjMArUOSe7p+PE4ysrxUUs3qNOFfeNDLloPg4k45IfAA5puzRzbMNJVhPKZflsCcZOtXrWsqUdY6M7bfq5Drp9Urbmu2qXtBnaJ0+O7dMVF3FB0FQM5x545U0e8R6IEkhZmKaMO+xCcnkgMsw8W9Z3jmPBo7BuEuq6YHAaTYcFa+HBkzqD4dtSqWSd3YVj6d+tmqOD4o7xo7JYcYMPkC0yKwehoTBh17kZU4bioo6vr4/CjjNtzUdc4dBC1Zb5oq7vKOJvwiKOLkAwxuc2fMFxtO2MZUsnfzWw99zsZKrVQfc+ufIL5NDCPRWUHGpntq/n2qJj3VyWpsIpH+a7QGZd4wCkY/jj6Y94A4SPND1pqX38IKBrs4is9gyY82lgCLxEfWEOnkOkusLC6KCDEpqGWbcLp3qBcE5ikjVqNCDkTNGyioFx5xN/mXBjldE0H/qDsBJ+1/Z+LDd+EoFF6ijTWFSW3NoTi2rGn9WFE8pbCJ1jzbUCm/kc2on+O61V8XJmIsFtTbv/oqP+4U1If2/mws6P2xZHoo0sPWs1RjsQLq2NsUnKx2usnKElmLm7m6dxgj/s9LQUSgvlDy4CUADLxJLqJOIAUncTQRDQUufm4gVsmBmQKLqtkDBsWwrxg4DeFMeX545mgTa6x7Awwthe9U2wCtGRcgSyiLvE0EXjruG8+JMeBnPKcYMEaPkUi1kfbEdyEJm/lbrHhQaPHcEILxKQ9tJ3QAl9tO+W8KYgVEKAoK4ZduW+ncBtoGbWWObmaUvoX3xUDucg7XfFkQQ0oUr793OQxtCz51yT+hRSL8FO1rBwjwssU2E+6IfX3Bv9zeel8G1VAS1U6WQe2eXN9RaaAV9MQfvB4PumYg3eysuxGfm7ShcLdUWGNJVRoKIJaFT9hYa76OUrDIw50jBur7m99F+Tmk YC6uQKok tcQe9lO858jXwwUlvTRMTt3u0ArZTSFrFfPtgxhnoPzswUYKuQN/9YAm3XcU1q0vElGD+Fc6LfpRSYnbig3gvZZy52vP2Ho+tiGzhvwDBEVYBTuuN1sjV82co2X9vsGPQHVH7EnR8ZkXRMKFQUBE0vTn1ARnTMBwYFLgzI2BTNhQ9pyd484LcSyXddLqUjKdjfs5xf5x80KnNKq7myzTcCjTrnOKmsy5jnnJ29fhy9ALK2JzLsREgXTwfoOqUbqxW2JSbO1uv44ck3ga0WfCAJRmyT/YFHpI6NoIlQAL6evY4w+Cb762wEdC3vj6u1DUrfwDAwM00aS8436UJnCX1q34ujPo3PAD2yWzPfkttIlLmlTwnlscgI8lczTAWu/2oW1nD X-Bogosity: Ham, tests=bogofilter, spamicity=0.000335, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 2024=E5=B9=B46=E6=9C=8814=E6=97=A5(=E9=87=91) 1:49 Shakeel Butt : > > On Thu, Jun 13, 2024 at 08:04:39AM GMT, Nhat Pham wrote: > [...] > > > > > > > > > > Is the idea here to avoid moving the iterator to another offline = memcg > > > > > that zswap_memcg_offline_cleanup() was already called for, to avo= id > > > > > holding a ref on that memcg until the next run of zswap shrinking= ? > > > > > > > > > > If yes, I think it's probably worth doing. But why do we need to > > > > > release and reacquire the lock in the loop above? > > > > > > > > Yes, the existing cleaner might leave the offline, already-cleaned = memcg. > > > > > > > > The reacquiring lock is to not loop inside the critical section. > > > > In shrink_worker of v0 patch, the loop was restarted on offline mem= cg > > > > without releasing the lock. Nhat pointed out that we should drop th= e > > > > lock after every mem_cgroup_iter() call. v1 was changed to reacquir= e > > > > once per iteration like the cleaner code above. > > > > > > I am not sure how often we'll run into a situation where we'll be > > > holding the lock for too long tbh. It should be unlikely to keep > > > encountering offline memcgs for a long time. > > > > > > Nhat, do you think this could cause a problem in practice? > > > > I don't remember prescribing anything to be honest :) I think I was > > just asking why can't we just drop the lock, then "continue;". This is > > mostly for simplicity's sake. > > > > https://lore.kernel.org/linux-mm/CAKEwX=3DMwrRc43iM2050v5u-TPUK4Yn+a4G7= +h6ieKhpQ7WtQ=3DA@mail.gmail.com/ I apologize for misinterpreting your comments. Removing release/reacquire. > > > > But I think as Takero pointed out, it would still skip over the memcg > > that was (concurrently) updated to zswap_next_shrink by the memcg > > offline callback. > > What's the issue with keep traversing until an online memcg is found? > Something like the following: > > > spin_lock(&zswap_shrink_lock); > do { > zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_sh= rink, NULL); > } while (zswap_next_shrink && !mem_cgroup_online(zswap_next_shrin= k)); > > if (!zswap_next_shrink) > zswap_next_shrink =3D mem_cgroup_iter(NULL, NULL, NULL); > .... > > Is the concern that there can a lot of offlined memcgs which may cause > need resched warnings? To avoid using the goto-based loop, here's the new version, including Shakeel's suggestion: ```c do { spin_lock(&zswap_shrink_lock); /* * Advance the cursor to start shrinking from the next memcg * after zswap_next_shrink. One memcg might be skipped from * shrinking if the cleaner also advanced the cursor, but it * will happen at most once per offlining memcg. */ do { zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_shrink, NULL); memcg =3D zswap_next_shrink; } while (memcg && !mem_cgroup_tryget_online(memcg)); if (!memcg) { spin_unlock(&zswap_shrink_lock); ``` We can add or remove `spin_unlock();spin_lock();` just after mem_cgroup_iter(), if needed. I believe the behavior is identical to v1 except for the starting point of iteration. For Naht's comment, 2. No skipping over zswap_next_shrink updated by the memcg offline cleaner. While this was true for v1, I'm moved to accept this skipping as it's negligibly rare. As Yorsy commented, v1 retried the last memcg from the last shrink_worker() run. There are several options for shrink_worker where to start with: 1. Starting from the next memcg after zswap_next_shrink: It might skip one memcg, but this is quite rare. It is the current behavior before patch. 2. Starting from zswap_next_shrink: It might shrink one more page from the memcg in addition to the one by the last shrink_worker() run. This should also be rare, but probably more frequent than option 1. This is the v0 patch behavior. 3. Addressing both: Save the last memcg as well. The worker checks if it has been modified by the cleaner and advances only if it hasn't. Like this: ```c do { if (zswap_last_shrink =3D=3D zswap_next_shrink) { zswap_next_shrink =3D mem_cgroup_iter(NULL, zswap_next_shrink, NULL); } memcg =3D zswap_next_shrink; } while (memcg && !mem_cgroup_tryget_online(memcg)); zswap_last_shrink =3D memcg; ``` Which one would be better? or any other idea?