From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36148C27C75 for ; Tue, 11 Jun 2024 14:50:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CB4986B00A0; Tue, 11 Jun 2024 10:50:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C64586B00A6; Tue, 11 Jun 2024 10:50:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B04866B00A7; Tue, 11 Jun 2024 10:50:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8CE256B00A0 for ; Tue, 11 Jun 2024 10:50:22 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3C6031C2196 for ; Tue, 11 Jun 2024 14:50:22 +0000 (UTC) X-FDA: 82218893484.30.8BE1088 Received: from mail-yw1-f170.google.com (mail-yw1-f170.google.com [209.85.128.170]) by imf14.hostedemail.com (Postfix) with ESMTP id 805AF100010 for ; Tue, 11 Jun 2024 14:50:20 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dOwev7g3; spf=pass (imf14.hostedemail.com: domain of flintglass@gmail.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718117420; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=i1hblv3agIvwTk/x1XoXFXapGpQD81CwVBOihZsPGt8=; b=Ke5NecS3k6L/TQBXuAXwZ5GcAewBAJDSDitEDqbjoxYEOcvtv8Plw29xhIzaceZU2YBkpP QemShd2FdoZ2CzN8i31BJdLbhZePByjcj+Z6W3QrbNDhcanCKe1FCUZOqrOZihfMXVx9Om bnCtUQ0fni5LLtopnyN9bctg/RxzPCA= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dOwev7g3; spf=pass (imf14.hostedemail.com: domain of flintglass@gmail.com designates 209.85.128.170 as permitted sender) smtp.mailfrom=flintglass@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718117420; a=rsa-sha256; cv=none; b=bDZMfQfK8UzJzs10YCFHqojrvNpmsMh5wB/0Kyaft71XrxHVzet8+Hc4590RnkqOsUIph+ u3z7ZsiqxtZl9jXHkAoQHiIOZXRNnb8CNd/vglljWj4LmXsNcuDvA3HkuWsl1IAT6IqHwl o22mGFOCZle0lMGtOJTop+S2GaepkBw= Received: by mail-yw1-f170.google.com with SMTP id 00721157ae682-62cf4d32c68so14102957b3.1 for ; Tue, 11 Jun 2024 07:50:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718117419; x=1718722219; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=i1hblv3agIvwTk/x1XoXFXapGpQD81CwVBOihZsPGt8=; b=dOwev7g34BN7oSg4dtfwa7ZbB47Exm2OxTwJdAwcp5w/+EE4u+Ua3xN7MomnCeBDmf j8MQ8Lzwg4pcV0DBbOa23iGl1bPoRIXyejYM0T5/yGk2XNgYv8R6V407qhzceE+a9Wui lQTxIX20ajVmrqOVh6lgA7UwMazi9WnSLV3kvLyfsDXgd66vqaQQuA6waekdhS6hr5Ci L1HgcWF4GnyX0GVzqZ7AN072GyzYJzvZEwLDKhfKN4+kFef5o7dsE+CxAzj0TDiWmuyL xOr/Bnt5QRiMIZPG6Drbp6bjovVES086Y4JFa0Dgv0r3yYBronmPrLacTBQ7pyQ0uwFI teIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718117419; x=1718722219; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=i1hblv3agIvwTk/x1XoXFXapGpQD81CwVBOihZsPGt8=; b=RxsYV2nGeclKzmbZU39ixw6Oo7ow9A81/cBcK7Kygu7hReXrfuv7YrVgAB8CmWpcwT uE2r9xWq3ame4Im03N0sxtpH1wTd5xduOAArSTfnNjgQIyNSI73K4UZtI+4/AsRXH9cK pz6E/GG+uI+Fp2HNiwphttFSYnt7+HB/sznRVhjw0f9fUDAQLwMJxG3p5wTyXkDKF7mP XPYKmjb4clV8q4T5rELl1DOqUgIaTE8+cCnjgrKauS50Yh0xWmHXyGhPbHEO6aX57kxt nt/Zu6fcDMlwJAkL2Jev79T4oSKI7TqvKCZ2DBnX0FqoQwrSwusvIijJaHdyGTSpmUnO /uTg== X-Forwarded-Encrypted: i=1; AJvYcCUrMLuXFU7EM1Vt+v7/z04S+k51n5JEpP8u9qnfgr73z17iILV14iHxRjjHf7rRGYD89sdfrDiUMdnwRK9A97dVnNQ= X-Gm-Message-State: AOJu0Yz6r7mV7tUN+5PXKLPMXQMkOsMkZbjfVywn38gJveDA3xlNXfWP FkfNFQMIo5/IKIVVepDkq0hOn0oFwPj5J4QskD0mbEMD3+OWkeav4KPBXK4xhNKyKsFiKOy0PoU +xkku26Hf/I2MXubsBlGOc/36URU= X-Google-Smtp-Source: AGHT+IEb3wyhuol+lxmZz3nl7yKGiwmyQfS5wDx2sbDZU27ZRE9oyfdLN5TTOen6rp6hy+UZ15vYv+tX187CVJMxKLA= X-Received: by 2002:a25:ae96:0:b0:df4:d9ba:2b6a with SMTP id 3f1490d57ef6-dfaf658dd97mr11330173276.14.1718117419380; Tue, 11 Jun 2024 07:50:19 -0700 (PDT) MIME-Version: 1.0 References: <20240608155316.451600-1-flintglass@gmail.com> <20240608155316.451600-2-flintglass@gmail.com> In-Reply-To: From: Takero Funaki Date: Tue, 11 Jun 2024 23:50:08 +0900 Message-ID: Subject: Re: [PATCH v1 1/3] mm: zswap: fix global shrinker memcg iteration To: Yosry Ahmed Cc: Johannes Weiner , Nhat Pham , Chengming Zhou , Jonathan Corbet , Andrew Morton , Domenico Cerasuolo , linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 805AF100010 X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: dqc79yhxezbp64dr8you684dd4zjpnzc X-HE-Tag: 1718117420-100623 X-HE-Meta: U2FsdGVkX1/wQKNq7Yw8F3jdPczhYlVAkhLVEqbuzvaJUK219LygSjYaAqE0jRABsJyqN2CZ2wCNN13hWRMnHacR8CxFGG5ZfrwyRhNd9r1cXQ0g2wPl3f6ZbH+zGLxt83D+qctQ01w/2DWGVV9QfLVVO//s39Il+WXvMX3t6XgOYqFiVZjRvyCf7JoemMLbY70mY2XEsh1Rj3xU9kKUb0w62uGjITA1oNC0z95lHftS3fz+PYbLU3pL5jIDOLkv0GUxvVVTvrgaCvMoY6rTWL1k/cGAFs56/NV0f6eIuH35Rl1P/6lqF9O23z/i+AwFXCjEb10t+pR00bbVqkPWOmLY1Vhq+z/rsqRfrIRppRcjY16ny8Jl3K6pvxItpA5Nx8eRpfgWYHe/owgyWFGZVJB4ttJMRiHlt7iFjR7mkmkFyJ0OsuyTwUJiP63nI5UynxvQQ0KGwM5Vby/o2R3/RvYgLpdk14TZ9pEYs5ntaukki8/rtFPobAPliixfZrbOsxFY6EE3UXY9Rs9uv2iJoXjG4GOmp5740IKid7OlCWU4Lo4JWkOF+xwQk4H4LtHask7XJ2o2zf+URLDV78HJRkJDoY40sZgMWAjL1dotkuP/CwZ4V945H6p2PXa3KjByl4LW5AO1PFyxW7kXwEYt74iP5juvgm4nmX11slG7ulxTYoGgoEOogneUu0WkU1gl6kvsSSmQ+EztIIig2fxMHKg4UWyj1oeC4Hs+Xk5riD7F8naMr2zESMYQck9OTp/GsfapUkahvNIE4NpL976bGTDdRvdxnmqjmF2fdMhezpOIrIqBQlC5n1fWWN43/zMJigI4++/oISSmNfpjFbGeeMdeJEaac7+APg9qtCRKF11cOLRJ9Xj5aLco2ZRQvmnUBDTCejtod7oT4DcKN5sulfTwtzhQFIzrda+hzc4G3Xpj/biH6JAgoudd4tcQ6izUq87LZv3nAD+icfPFMGb a4A4RFHD BTgca/l1XnWgfYcTaoL1FKwvmiFTXgUZFgEgu2upQL4qly0dYz5+2QClkt+6cwhJgnvwPkID2v5mLipYHMFQQuNPsJnZ5THqy2ZU1U/tScN+zr3VvJGqgC5yPH8WcP8xN+zt/CGs6bcZj5sppzSew58XaR9hZEE+if9UpZOWHg+LNE6bjyNzMXObvqPwz6ecfMz605wz0qQhudu+F1Gvvb1LbeqyyYXGOMNzTSPbTAmIAoCb281Ykn5OmQHa0ainqAX4lDu0KLY1ATIo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/06/11 4:16, Yosry Ahmed wrote: > > I am really finding it difficult to understand what the diff is trying > to do. We are holding a lock that protects zswap_next_shrink. We > always access it with the lock held. Why do we need all of this? > > Adding READ_ONCE() and WRITE_ONCE() where they are not needed is just > confusing imo. I initially thought that reading new values from external variables inside a loop required protection from compiler optimization. I will remove the access macros in v2. > 'memcg' will always be NULL on the first iteration, so we will always > start by shrinking 'zswap_next_shrink' for a second time before moving > the iterator. > >> + } else { >> + /* advance cursor */ >> + memcg = mem_cgroup_iter(NULL, memcg, NULL); >> + WRITE_ONCE(zswap_next_shrink, memcg); > Again, I don't see what this is achieving. The first iteration will > always set 'memcg' to 'zswap_next_shrink', and then we will always > move the iterator forward. The only difference I see is that we shrink > 'zswap_next_shrink' twice in a row now (last 'memcg' in prev call, and > first 'memcg' in this call). The reason for checking if `memcg != next_memcg` was to ensure that we do not skip memcg that might be modified by the cleaner. For example, say we get memcg A and save it. When the cleaner advances the cursor from A to B, we then advance from B to C, shrink C. We have to check that A in the zswap_next_shrink is untouched before advancing the cursor. If this approach is overly complicated and ignoring B is acceptable, the beginning of the loop can be simplified to: do { +iternext: spin_lock(&zswap_shrink_lock); zswap_next_shrink = mem_cgroup_iter(NULL, zswap_next_shrink, NULL); memcg = zswap_next_shrink; >> @@ -1434,16 +1468,25 @@ static void shrink_worker(struct work_struct *w) >> } >> >> if (!mem_cgroup_tryget_online(memcg)) { >> - /* drop the reference from mem_cgroup_iter() */ >> - mem_cgroup_iter_break(NULL, memcg); >> - zswap_next_shrink = NULL; >> + /* >> + * It is an offline memcg which we cannot shrink >> + * until its pages are reparented. >> + * >> + * Since we cannot determine if the offline cleaner has >> + * been already called or not, the offline memcg must be >> + * put back unconditonally. We cannot abort the loop while >> + * zswap_next_shrink has a reference of this offline memcg. >> + */ > You actually deleted the code that actually puts the ref to the > offline memcg above. > > Why don't you just replace mem_cgroup_iter_break(NULL, memcg) with > mem_cgroup_iter(NULL, memcg, NULL) here? I don't understand what the > patch is trying to do to be honest. This patch is a lot more confusing > than it should be. >> spin_unlock(&zswap_shrink_lock); >> - >> - if (++failures == MAX_RECLAIM_RETRIES) >> - break; >> - >> - goto resched; >> + goto iternext; >> } Removing the `break` on max failures from the if-offline branch is required to not leave the reference of the next memcg. If we just replace the mem_cgroup_iter_break with `memcg = zswap_next_shrink = mem_cgroup_iter(NULL, memcg, NULL);` and break the loop on failure, the next memcg will be left in zswap_next_shrink. If zswap_next_shrink is also offline, the reference will be held indefinitely. When we get offline memcg, we cannot determine if the cleaner has already been called or will be called later. We have to put back the offline memcg reference before returning from the worker function. This potential memcg leak is the reason why I think we cannot break the loop here. In this patch, the `goto iternext` ensures the offline memcg is released in the next iteration (or by cleaner waiting for our unlock). > > Also, I would like Nhat to weigh in here. Perhaps the decision to > reset the iterator instead of advancing it in this case was made for a > reason that we should honor. Maybe cgroups are usually offlined > together so we will keep running into offline cgroups here if we > continue? I am not sure. >From comment I removed, >> - * We need to retry if we have gone through a full round trip, or if we >> - * got an offline memcg (or else we risk undoing the effect of the >> - * zswap memcg offlining cleanup callback). This is not catastrophic >> - * per se, but it will keep the now offlined memcg hostage for a while. I think this mentioned the potential memcg leak, which is now resolved by this patch modifying the offline memcg case.