From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F4E6C38150 for ; Sat, 6 Jul 2024 02:25:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A0E106B0098; Fri, 5 Jul 2024 22:25:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9B1F46B0096; Fri, 5 Jul 2024 22:25:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82D136B0098; Fri, 5 Jul 2024 22:25:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5E7E46B0092 for ; Fri, 5 Jul 2024 22:25:40 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D2B99404C5 for ; Sat, 6 Jul 2024 02:25:39 +0000 (UTC) X-FDA: 82307736798.10.60982C3 Received: from mail-vs1-f54.google.com (mail-vs1-f54.google.com [209.85.217.54]) by imf07.hostedemail.com (Postfix) with ESMTP id 0E52B4000E for ; Sat, 6 Jul 2024 02:25:37 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hT2Vmll2; spf=pass (imf07.hostedemail.com: domain of flintglass@gmail.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720232717; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=gbIMyqv9tB/GsH5Iw1jm24no2JCeDavbkHBCIGxkGw8=; b=h20H/+czCapxGOCfBg6pEMSpqd1OcNJwgy9md/HcR2lWAJPAwssA+qpEelBsyfNsRH+LVL m2McGgsDGIFbHXeQ7FUrYTQ4513tnZvm8nSLlIq13UNtEZJgDXTrskCoLuyvzD5dgSwv8i 64sP2k76z6eKiI/pya1AZuitQAUhND8= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hT2Vmll2; spf=pass (imf07.hostedemail.com: domain of flintglass@gmail.com designates 209.85.217.54 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720232717; a=rsa-sha256; cv=none; b=u1naUkmzIK95Lw4fNmzt2UTFxo2+tzPl9n8ZXvXh9PZMCw33EYLWW5PTp9T14X0/LQURT4 cc4aGxFEWoEZZ4m/ArQ9xoUjCSDnoTia8ftd7+isaQukcVqHNu8OE0R2SOp9rUT2AAo1Xq MKylJ9XlGEvvYeXYEoH6CUfhzMu9ViY= Received: by mail-vs1-f54.google.com with SMTP id ada2fe7eead31-48febcc8819so819818137.3 for ; Fri, 05 Jul 2024 19:25:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720232737; x=1720837537; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gbIMyqv9tB/GsH5Iw1jm24no2JCeDavbkHBCIGxkGw8=; b=hT2Vmll2lezF163FIJD2eizxsNXoJv0hNVckCfZBBbA3ujmn9Z6A8NrHh2zIdR5aps cLghMr+a/OhnCebb4EMNtEg9qxrYAXeYJ2lr4cwCsDIHNJ0sV6v+PBs0sydvshv670Km H+ydH5Ytu0WkHXsH3N2TDfggJj7HcS792IcntMrl0hWBns/+5XUcS9rJ+k2xWMaQXEmO fNCewIp3IVkWJLOD/p3u2ZSi4UaMJjWY8nRbc5/nbA4b4u/b+W8spDNq07ZlXgtYDgpK kwpCHpEBQHyUrRP/VYEqybqOUTsRKaYBoOirIndXEA4hyQLszHHhtqP462HenThpvT16 idvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720232737; x=1720837537; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gbIMyqv9tB/GsH5Iw1jm24no2JCeDavbkHBCIGxkGw8=; b=ZW+tGcKOHI7KH377bIycq8KjXXFENCf0hismKzxIvApWjVxVaRKHASg3svHpID19WE oWFFPqw88sjs/9BozryWfYU7s/0VLPWOY1WPWrfcVelcvcA9bZ9QebQO+H14GHOYHCS8 WhYXSQ81nUmFbcfbDdYXC/FZyRlVgLamodTcW1UtHOWs3KAeN62R5YgxVw8uPspyX3xU K3SsPiJml6n6BIss+cJcwHumFvahniBkfOQHr9ma5B6LyvKu5IOWHTbz05uMwuiT95e8 sAT87RsAbO9jetzp6JYDFhwJj4nLmH5TcxaExnr9WoLvHrwMsTPZK2hON95SQnCff6i8 +QKQ== X-Forwarded-Encrypted: i=1; AJvYcCUmWIvdM7jH78QyrqpuZd+DcdeTvwqak47QLr0Ux3GItEUiLQCpgfE9RubhPN9kRdCJD7NCxL+MpdfSgOevOlxsmRs= X-Gm-Message-State: AOJu0YykAORCBlRDunFP832LoLh7ddIGcCd/87fzYVb3BTHk86JNSSOu SHk77Bh+k1ja1chjEHeD1ojJPgd1glTDcZsXHv43PQQ/7HQ9jZ05 X-Google-Smtp-Source: AGHT+IE3TI13f2MUhqpsXQi2pAAWlKDBboY+QlYd3pKVMH50xN1fnf49vI6E9VU4VCM40PvN8Tuofg== X-Received: by 2002:a67:e98a:0:b0:48f:eda3:2f80 with SMTP id ada2fe7eead31-48fee6255c2mr7281458137.5.1720232736977; Fri, 05 Jul 2024 19:25:36 -0700 (PDT) Received: from cbuild.incus (h101-111-009-128.hikari.itscom.jp. [101.111.9.128]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-70b15417a7bsm971274b3a.205.2024.07.05.19.25.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jul 2024 19:25:36 -0700 (PDT) From: Takero Funaki To: Johannes Weiner , Yosry Ahmed , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo Cc: Takero Funaki , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 1/6] mm: zswap: fix global shrinker memcg iteration Date: Sat, 6 Jul 2024 02:25:17 +0000 Message-ID: <20240706022523.1104080-2-flintglass@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240706022523.1104080-1-flintglass@gmail.com> References: <20240706022523.1104080-1-flintglass@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: 6ytzcge3xidhsci5aibphrfsp4e6k3ha X-Rspam-User: X-Rspamd-Queue-Id: 0E52B4000E X-Rspamd-Server: rspam02 X-HE-Tag: 1720232737-414326 X-HE-Meta: U2FsdGVkX1/4r1RBDRByWd1sI+s8JcRl4tJ5ddvQfQZkw0erm2BxF0UcBZarBMwTDv0TFsEukFXWqLon6ikNPI9dAee/to9bqN83z0w735cnzSYVMEc3HspglpntER1a4mnuV6mkfADmkQt/iZWGbBLhp1TiGojd0NGhS537lP/CkT88hdq2tdR7897gSkFI3VhCszu3NRuzZErsSBaQrq+QIbTECcYynGEgpZ1kzn6om5IEgArknLcOC+c7Vaoc8hCvu9bX9jseulGi23ahhhHE1BJ+JjqcTiZQNNOO0hc45VtdjifGSr2b5DMKU+y5klKJo8uWPes5QZ2SgVN7BL7Rz5BGpl/IVY3mSMqkkVp7MfinSlLw+GTHhV+tKdW8/U2dM9IINxbQTK0j3ZNeJkA2ktFv7LqYfpx2WLTrhVvgtOwHJZrqZ2YCvVvakv38eQHMhFQXOxq99G4SoVpOwt//pdVKz/rJvVD9Xw+Vx9ep2OvNWmfktHvRg6NKRafWCIBF5nEFQ051NURv7c3yxVTiVzEYgimGSJegI7HS6SciQPJKUeXfNKIHIJxwAhNAE4AbcksdX19AHOnUzLFQGvkl5m2zuVTCUF8KHNuJZ5tRO6gcBc5aAJQPojwo60kS/3UfNWkveyNoYYyrXt1MDRugeCJdcWeKifUt/z6T46z2NhfHqS7f3KtJ2QZetH/Dvxn1xUyk/eTFMreGIeSrjfByZEpkJ9IQ3vu7pY7PQwMQ+t24QowGENvE9ButsNzI5Gf8lFC5OzH8y8scX7WGyLPBKAOaLBVwe9LuT0JMGj7V1BZmmsgQ33s+yX5/4soAT6XOyAYqfQ2K0bVFB6M9Nl17GTpds9WpjbvMqqA4a2vslMT3jfVZ5pVdyzvva7xI3hTrlqQs5sQteCwrrRUsp8qmGLwUJx5+axFQqJyd2+fiVBX8mceWW99eS5fnOKnvhg5i7/E4QSYg1+veY6J uqLJ1OGf dIvK/otJUDZoAEq0ToqrfRzKIb9nxACBN002mVnUg9kjUqfjkrk+rWnZvvPm61O9299apgOB5q9uSsbjhdVfSLbyiHXuuUC5karOJurYTm5rKF3G/lB2FWOXfyIkRzvl0WW9n8Jsbaa/83N6hbcbCtSTbte2zjBBp//Out/8wu7Ush2TpsoDXYwj6HeCSFBg4c7S5np0JkYODAiHDAzjHApTG6Ug7varSWoonIZKALmYv8Hbl8ERcKW5pt1pSSE6qQwwU433treQH3LfoEGyHG0D1YvI5HUqJ8V3Vb+f2FXjg6FQPg1OaczMqFhGEdKFVBAltrW4iEcKTo6rkU0CUIuIPdSgWVK6L+lF4 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch fixes an issue where the zswap global shrinker stopped iterating through the memcg tree. The problem was that shrink_worker() would stop iterating when a memcg was being offlined and restart from the tree root. Now, it properly handles the offlie memcg and continues shrinking with the next memcg. Note that, to avoid a refcount leak of offline memcg encountered during the memcg tree walking, shrink_worker() must continue iterating to find the next online memcg. The following minor issues in the existing code are also resolved by the change in the iteration logic: - A rare temporary refcount leak in the offline memcg cleaner, where the next memcg of the offlined memcg is also offline. The leaked memcg cannot be freed until the next shrink_worker() releases the reference. - One memcg was skipped from shrinking when the offline memcg cleaner advanced the cursor of memcg tree. It is addressed by a flag to indicate that the cursor has already been advanced. Fixes: a65b0e7607cc ("zswap: make shrinking memcg-aware") Signed-off-by: Takero Funaki --- mm/zswap.c | 94 ++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 73 insertions(+), 21 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index a50e2986cd2f..29944d8145af 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -171,6 +171,7 @@ static struct list_lru zswap_list_lru; /* The lock protects zswap_next_shrink updates. */ static DEFINE_SPINLOCK(zswap_shrink_lock); static struct mem_cgroup *zswap_next_shrink; +static bool zswap_next_shrink_changed; static struct work_struct zswap_shrink_work; static struct shrinker *zswap_shrinker; @@ -775,12 +776,39 @@ void zswap_folio_swapin(struct folio *folio) } } +/* + * This function should be called when a memcg is being offlined. + * + * Since the global shrinker shrink_worker() may hold a reference + * of the memcg, we must check and release the reference in + * zswap_next_shrink. + * + * shrink_worker() must handle the case where this function releases + * the reference of memcg being shrunk. + */ void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) { /* lock out zswap shrinker walking memcg tree */ spin_lock(&zswap_shrink_lock); - if (zswap_next_shrink == memcg) - zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); + if (zswap_next_shrink == memcg) { + /* + * We advances the cursor to put back the offlined memcg. + * shrink_worker() should not advance the cursor again. + */ + zswap_next_shrink_changed = true; + + do { + zswap_next_shrink = mem_cgroup_iter(NULL, + zswap_next_shrink, NULL); + } while (zswap_next_shrink && + !mem_cgroup_online(zswap_next_shrink)); + /* + * We verified the next memcg is online. Even if the next + * memcg is being offlined here, another cleaner must be + * waiting for our lock. We can leave the online memcg + * reference. + */ + } spin_unlock(&zswap_shrink_lock); } @@ -1319,18 +1347,42 @@ static void shrink_worker(struct work_struct *w) /* Reclaim down to the accept threshold */ thr = zswap_accept_thr_pages(); - /* global reclaim will select cgroup in a round-robin fashion. */ + /* global reclaim will select cgroup in a round-robin fashion. + * + * We save iteration cursor memcg into zswap_next_shrink, + * which can be modified by the offline memcg cleaner + * zswap_memcg_offline_cleanup(). + * + * Since the offline cleaner is called only once, we cannot leave an + * offline memcg reference in zswap_next_shrink. + * We can rely on the cleaner only if we get online memcg under lock. + * + * If we get an offline memcg, we cannot determine the cleaner has + * already been called or will be called later. We must put back the + * reference before returning from this function. Otherwise, the + * offline memcg left in zswap_next_shrink will hold the reference + * until the next run of shrink_worker(). + */ do { spin_lock(&zswap_shrink_lock); - zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); - memcg = zswap_next_shrink; /* - * We need to retry if we have gone through a full round trip, or if we - * got an offline memcg (or else we risk undoing the effect of the - * zswap memcg offlining cleanup callback). This is not catastrophic - * per se, but it will keep the now offlined memcg hostage for a while. - * + * Start shrinking from the next memcg after zswap_next_shrink. + * To not skip a memcg, do not advance the cursor when it has + * already been advanced by the offline cleaner. + */ + do { + if (zswap_next_shrink_changed) { + /* cleaner advanced the cursor */ + zswap_next_shrink_changed = false; + } else { + zswap_next_shrink = mem_cgroup_iter(NULL, + zswap_next_shrink, NULL); + } + memcg = zswap_next_shrink; + } while (memcg && !mem_cgroup_tryget_online(memcg)); + + /* * Note that if we got an online memcg, we will keep the extra * reference in case the original reference obtained by mem_cgroup_iter * is dropped by the zswap memcg offlining callback, ensuring that the @@ -1344,17 +1396,11 @@ static void shrink_worker(struct work_struct *w) goto resched; } - if (!mem_cgroup_tryget_online(memcg)) { - /* drop the reference from mem_cgroup_iter() */ - mem_cgroup_iter_break(NULL, memcg); - zswap_next_shrink = NULL; - spin_unlock(&zswap_shrink_lock); - - if (++failures == MAX_RECLAIM_RETRIES) - break; - - goto resched; - } + /* + * We verified the memcg is online and got an extra memcg + * reference. Our memcg might be offlined concurrently but the + * respective offline cleaner must be waiting for our lock. + */ spin_unlock(&zswap_shrink_lock); ret = shrink_memcg(memcg); @@ -1368,6 +1414,12 @@ static void shrink_worker(struct work_struct *w) resched: cond_resched(); } while (zswap_total_pages() > thr); + + /* + * We can still hold the original memcg reference. + * The reference is stored in zswap_next_shrink, and then reused + * by the next shrink_worker(). + */ } /********************************* -- 2.43.0