From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C319C47258 for ; Thu, 25 Jan 2024 08:30:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA6558D001B; Thu, 25 Jan 2024 03:30:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C56058D000C; Thu, 25 Jan 2024 03:30:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF6668D001B; Thu, 25 Jan 2024 03:30:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 978418D000C for ; Thu, 25 Jan 2024 03:30:41 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 766981C0754 for ; Thu, 25 Jan 2024 08:30:41 +0000 (UTC) X-FDA: 81717162282.07.5088230 Received: from mail-oo1-f50.google.com (mail-oo1-f50.google.com [209.85.161.50]) by imf03.hostedemail.com (Postfix) with ESMTP id A510920008 for ; Thu, 25 Jan 2024 08:30:38 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=NT0pS3mX; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf03.hostedemail.com: domain of zhouchengming@bytedance.com designates 209.85.161.50 as permitted sender) smtp.mailfrom=zhouchengming@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706171439; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LwxZ89BJlc+x6KjX9RIj+qjWv5VNKTF5oPpXEEUlSoI=; b=5AAEx6Mwi1hO0rja63US+7RCxDSoka04/cgdALhOrJne0Dx3W5TtzKlXgyV6g0926fbGkj /e3naJ7V/lR12xmiAAF8ZzcLB4c/ehxn3tHtJ+Ad2xXw0wuiXsnjoFioE3QnmCQyBK52VJ tEa0r1G14ofk22vCZyTB70dYgdOF9c8= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=NT0pS3mX; dmarc=pass (policy=quarantine) header.from=bytedance.com; spf=pass (imf03.hostedemail.com: domain of zhouchengming@bytedance.com designates 209.85.161.50 as permitted sender) smtp.mailfrom=zhouchengming@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706171439; a=rsa-sha256; cv=none; b=d2yZUH/xO+kG74oxdGShPCkwLcrak/73wx7nyfWh1CzmsmWw46KhYKMU783wHBY0jlTmJ7 KjLBRc9ZI24hV3WuJfjHi4vdWG5VpcGBzuAvcAK0vYRdcWfzVMQCqZMxwXPLLKaEKYodRo ZXj4FSr0I5Bdk31XGigHx6YnwHRO8Mo= Received: by mail-oo1-f50.google.com with SMTP id 006d021491bc7-599e93683easo101742eaf.1 for ; Thu, 25 Jan 2024 00:30:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1706171437; x=1706776237; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=LwxZ89BJlc+x6KjX9RIj+qjWv5VNKTF5oPpXEEUlSoI=; b=NT0pS3mXNTTaTE6cQf73vjOVnt9SxoKJR1u84hOWEC7ewHznk1mcDpqevQSp1Ncevr OLH1mDeBgQ6zGX2JpvrzoCTQVP7HvSBuH1EbC09/ib+Vnb2TXnptaS52kQV9oYSe1xL7 1aL4maDAm/k8mF/8kU26q1LZnmxN5qqrK6g/dJwo4Y85kSirOBOcZya1rsYbUjy+qv/o OX25eyro55pyWVBPxwe2v6+DzYwfRwWqfCBiIm7lxnowTNtkrmDNfJM1XwO8VIizgwt/ Tn8t3pdtFuMlJ9BvMEHMbL8mMB8AedlNhkrGzPl9r/k7xAdk+jeIVOeH5PBPqugHOcaw bRMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706171437; x=1706776237; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LwxZ89BJlc+x6KjX9RIj+qjWv5VNKTF5oPpXEEUlSoI=; b=mRsoWHJMPDQMZVasPEP0nGZR/1BeMq/f9oJLGuMLHI+Hbv/pc/ew1N/svUGG0X+jBo iCjTaEA+0H9JbWvOfKI2FVFncv+FBwnwIYiNXkk6zDSKxRT8i6427a7qO10mF+52uedl SZgh/Ss8UDCSd4QQMI6OwDe4asHuyGnBEQMwBscMDD3+q2PCVav8Hu/1H/qImwAJoyd8 4ber7OU/g2bMUCL7SduXTkqthaeGQiGbbEzMYVVAQjlQoBevylogefOr+WTx8lvoECvN diZJBTcIjK0KWi4NbmXq8vQCXD1pzVmx4iZDZGOtWKCTKnSoN3dYR6qMpB3+o98YqEQx S4hw== X-Gm-Message-State: AOJu0YyQ65C3EFcen/I7UjSj+k6BRRzQ/INNo9XLOLXxrmTfgxlejShs 3o4upro+Kd6XTMFzTLyq9n4XxiXZNhWPF3SuJ1dTbMhwM5WZETod3waKDBt7XZ4= X-Google-Smtp-Source: AGHT+IGk/ptGROJhqqMQq2ViqVW7xz+uINt3phL+/LVfUWQOsNEd1sf1i68ZEjnezYTXkBlml1Kp9A== X-Received: by 2002:a05:6358:2244:b0:170:17eb:b4c with SMTP id i4-20020a056358224400b0017017eb0b4cmr618258rwc.54.1706171437371; Thu, 25 Jan 2024 00:30:37 -0800 (PST) Received: from [10.4.195.141] ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id h4-20020a056a00000400b006ddcb9adda1sm919738pfk.163.2024.01.25.00.30.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 25 Jan 2024 00:30:37 -0800 (PST) Message-ID: <1496dce3-a4bb-4ccf-92d6-701a45b67da3@bytedance.com> Date: Thu, 25 Jan 2024 16:30:31 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] mm: zswap: remove unnecessary tree cleanups in zswap_swapoff() Content-Language: en-US To: Yosry Ahmed Cc: Johannes Weiner , Andrew Morton , Nhat Pham , Chris Li , Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240120024007.2850671-1-yosryahmed@google.com> <20240120024007.2850671-3-yosryahmed@google.com> <20240122201906.GA1567330@cmpxchg.org> <20240123153851.GA1745986@cmpxchg.org> <20240123201234.GC1745986@cmpxchg.org> From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A510920008 X-Stat-Signature: k655jky6mgr7cfa3enb95axa6mhxx55e X-Rspam-User: X-HE-Tag: 1706171438-294609 X-HE-Meta: U2FsdGVkX1/obvqI0kGZ9mFWZjE9el63aDvcuNs8Ky2b5xBeIlSvz/9KlST/Fmmw5zUE9/D5h6G+SjBvwDd6GHRaZyZiUsUB+nWU99neTJItPDE/PRoPWVY527V9Gw7zef2rH3tbB/tq6t5dQcXcvCtmmn/Trj5zMoiCxAlgYskxRAdMBDfSsG3VA7s38SIF704Rtcn3l7PX53PFckGT+GxcVqQeZ1KXDtojh0yS5qbBRqmP0hL0NKnT1v9GWK81RsyyalZacQA7fnrN5hm2jLpMRjXSwZyasTJIplAwB3n2+wWT5hrvCyY6EWoKcYJ0praNHCy23MmoW+t+LbTbKr7CpvkWzjTMlpMMLQg/o5obBN7Kut3Mg3QuK7w73unGRCDSPzu9Nu9KC80lx6PDCtPTPiOawWooRggiPMI+Wy0RvXLTqauBU5MHvWKyhHBGwJclWmhdvqKEaCUnO0RU8YiEnSNNp3LCNsdApSU9sMn/i9FwDtjbEqm6aY8/TnLYDcVY4tXR5Ra+nZynM11CnfuBsT8IM2BS4z/+ezbNuSoR530EWVcvGGTNVHDtQAHRDFV/hYtAshE1b8l5HJbyj16XPHFprbH0s1KubfpgMraSanD1H06LtzTIxojjtOUySBS3qOipfJEkaHYhqY7OWBu9QbrR5aGTyTm4MA0W+CcnTtLOdIUuEOFT1xcD6FO8xFccJsJpfhydOknezmmC3WX00xuExITjRft03r0wjzOcfixD6vAd9xRt9Td3GgEVzJn1giHu6ZYVJesI2TN0YUaca6QTcKF7TdzzdTAFIcfl2Tu2JDEpgo579gdJR9amxw769O4FvHjXIsCYVVlmjC2AsogqMdEquSVy3jafKy6nBwE/d/q/lqbYXW4aupCsFn/ha91SunL4eW+KlPfqe517JSo19CKs7j74ztZZbA7Ntf++5RqRGrwTsHWMvSdonxrceWzuC3ekSFCbtLe apAWLQaf IAsVQ1U97vhrQ4H1LpliTEB7Xk60z4d2nVbW5cznAzP6Lq3dzouoSifxDu2KdisFs0pJVqMy5hWqFFLClypEEx5ga5+sXt2l6/SSkLMStqP9Fu57t2s1lBTbZYMJRAzxGAT3nSXeZSwqOCORQu5opVVVJKefWNBczaNfaQFbGL4gz7S0GtEsnLmMKNhzb0xphUAlAoDw+DD22JbMJWYrpN+xO/cmr5ds14eWu5DLOPyb//2euZAWJRyFz7Zc68QRJgFYte/L8gj1DJmCbnaQlKTtcPWrPIcVKvW7m+W0T5Y5KqUkcTB2XsORl84O5RzLgShnKNMljsw/EMcaalRQNQ6KfVrYq7tDoKXsWBV1Ho+P5oC43gSXy73Fj7KxR98dytd7UE9e4phuxtI7j9HKjnE6g+shar2fTHz8GvoDotUfxtGIAqSkDGpb2qAiqGj+Eq/deBPVGRVqsTaY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/1/25 15:53, Yosry Ahmed wrote: >> Hello, >> >> I also thought about this problem for some time, maybe something like below >> can be changed to fix it? It's likely I missed something, just some thoughts. >> >> IMHO, the problem is caused by the different way in which we use zswap entry >> in the writeback, that should be much like zswap_load(). >> >> The zswap_load() comes in with the folio locked in swap cache, so it has >> stable zswap tree to search and lock... But in writeback case, we don't, >> shrink_memcg_cb() comes in with only a zswap entry with lru list lock held, >> then release lru lock to get tree lock, which maybe freed already. >> >> So we should change here, we read swpentry from entry with lru list lock held, >> then release lru lock, to try to lock corresponding folio in swap cache, >> if we success, the following things is much the same like zswap_load(). >> We can get tree lock, to recheck the invalidate race, if no race happened, >> we can make sure the entry is still right and get refcount of it, then >> release the tree lock. > > Hmm I think you may be onto something here. Moving the swap cache > allocation ahead before referencing the tree should give us the same > guarantees as zswap_load() indeed. We can also consolidate the > invalidate race checks (right now we have one in shrink_memcg_cb() and > another one inside zswap_writeback_entry()). Right, if we successfully lock folio in the swap cache, we can get the tree lock and check the invalidate race, only once. > > We will have to be careful about the error handling path to make sure > we delete the folio from the swap cache only after we know the tree > won't be referenced anymore. Anyway, I think this can work. Yes, we can't reference tree if we early return or after unlocking folio, since the reference of zswap entry can't protect the tree. > > On a separate note, I think there is a bug in zswap_writeback_entry() > when we delete a folio from the swap cache. I think we are missing a > folio_unlock() there. Ah, yes, and folio_put(). > >> >> The main differences between this writeback with zswap_load() is the handling >> of lru entry and the tree lifetime. The whole zswap_load() function has the >> stable reference of zswap tree, but it's not for shrink_memcg_cb() bottom half >> after __swap_writepage() since we unlock the folio after that. So we can't >> reference the tree after that. >> >> This problem is easy to fix, we can zswap_invalidate_entry(tree, entry) early >> in tree lock, since thereafter writeback can't fail. BTW, I think we should >> also zswap_invalidate_entry() early in zswap_load() and only support the >> zswap_exclusive_loads_enabled mode, but that's another topic. > > zswap_invalidate_entry() actually doesn't seem to be using the tree at all. > >> >> The second difference is the handling of lru entry, which is easy that we >> just zswap_lru_del() in tree lock. > > Why do we need zswap_lru_del() at all? We should have already isolated > the entry at that point IIUC. I was thinking how to handle the "zswap_lru_putback()" if not writeback, in which case we can't use the entry actually since we haven't got reference of it. So we can don't isolate at the entry, and only zswap_lru_del() when we are going to writeback actually. Thanks!