From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22F57C47258 for ; Thu, 25 Jan 2024 08:43:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 846C98D001C; Thu, 25 Jan 2024 03:43:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F7B38D000C; Thu, 25 Jan 2024 03:43:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BE8B8D001C; Thu, 25 Jan 2024 03:43:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 595168D000C for ; Thu, 25 Jan 2024 03:43:07 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1A8A0807F3 for ; Thu, 25 Jan 2024 08:43:07 +0000 (UTC) X-FDA: 81717193614.16.2A940A6 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf13.hostedemail.com (Postfix) with ESMTP id 4BF9F20009 for ; Thu, 25 Jan 2024 08:43:05 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bhTWVgkd; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706172185; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=11S0UO+gDmS6HDjilZ1t1nfAQtcwLEq5BD+umSpnkLc=; b=vXsJ1IZ8K5wL3cEeCh+7GqF2dPFQy0mHD9+GL4guIEn5wnsjWVwR9JcXccUGMt4rUozuvH an5TivdTRbQxnkdXwDJHBDAFUjqNxiWUvvW8ZlZIE+RaS5ARltQuAsGkefC7clsSZfDPaW YgdKbIEC9xbfjzyKEmlyIvdSad0hYJ8= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=bhTWVgkd; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf13.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706172185; a=rsa-sha256; cv=none; b=5g6QT3O3wT+W17hYi9Cx7ngS1k9pAorcLCu9UD6v2B2Xr7BKbCD/SxzgwwTXg5l5dS8pQA 05fGwXElLC2UbtKmr9Rr1jVFyyrEqRBoPsKaZotH2ASpwhws67lWgEz22+ZlFFR5tF9IUP zbH4DnLIzA+2ga9aU+jOI+4uRUtEMj0= Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a30ed6dbdadso221780366b.1 for ; Thu, 25 Jan 2024 00:43:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706172184; x=1706776984; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=11S0UO+gDmS6HDjilZ1t1nfAQtcwLEq5BD+umSpnkLc=; b=bhTWVgkd4Mkhyb5F29UhaNnFSbyokS1DQ9sSGZ7lQiU0FRpjXor7pwzV/9hb1IPUpa 7+ZUBBPbfHLnXqRqACLZORMBJ8/1gp+YNkdkmVRmQXvoB1wLuVEVdvjijF+7xVcSZnfq lzpAVPLAsJkWpmSRL1bDR1EcUYYCgQyXp82xBX0qrtZyVpBzGkyFEooiZ/ACxx9J84JG RSBLVAhqSIKK884Sglhlu/3dgCqLt8gkR6g2EGyoWffPP+jYG3/m6+/yHn2i1sjPH5oo SCk5XgzGbqzKFzB5tKVxLelLHSlsUB2H0en20PVehbEuGDJm8ol2XJToFedeM5h2v2Fp E+lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706172184; x=1706776984; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=11S0UO+gDmS6HDjilZ1t1nfAQtcwLEq5BD+umSpnkLc=; b=E6ZfX1SpCM4olxUBsMHPbchqUF00iIri3vMbXlEJagqVEzFpJ5Jrtwsk3SbSdOPtWQ LGzUb8ppkqR0q/+QFcTWwyrXKWICVusixoudEcdcycwmLlwN01LIol+wmR/NtREr1AQj /GVjVf+MV2hJKiZDHXS/SFrcWR/ZUJPz3kJAGenfJ0kNZbT8Ye1PlXSlv1/cppPY492b xy2TWkmciRWgpwvZIt5ZD1kEwcgPvd1na2KzKfhVK7ylwc5AVjrRGVO2eFRnyHie1rpa WkLn/STIvEIU9bABPMFawlaD2mJw67IxxXJ3mzPbgNf2BKWCPA/bJ5Hdo2N+NaxUx5YU bPBg== X-Gm-Message-State: AOJu0YzE6DWk7Why3CgzlMQehQYcbHJcG9TA9gyP7eN90qXgl0mUdSCs ooChpK5Wb85Bf+FKZ8j2tpKXlgDq+SRazQd2V5zQ3xY8a5SUIqUBb3mmKS96qHbKtuHuUOwuKva 3NNNrIGK6ckv5DHArq152yd85+ZEnN1g/bVyA X-Google-Smtp-Source: AGHT+IGTG1XjzgU3aQCCbABZMVQoZ8Bl8q24hej4cTrNRk/VasQ3nHKJnd+7qP97wlCr8mA0ur7Cr/Ch/YQ6qRpRuno= X-Received: by 2002:a17:906:9c8f:b0:a30:150:917f with SMTP id fj15-20020a1709069c8f00b00a300150917fmr460924ejc.154.1706172183687; Thu, 25 Jan 2024 00:43:03 -0800 (PST) MIME-Version: 1.0 References: <20240120024007.2850671-1-yosryahmed@google.com> <20240120024007.2850671-3-yosryahmed@google.com> <20240122201906.GA1567330@cmpxchg.org> <20240123153851.GA1745986@cmpxchg.org> <20240123201234.GC1745986@cmpxchg.org> <1496dce3-a4bb-4ccf-92d6-701a45b67da3@bytedance.com> In-Reply-To: <1496dce3-a4bb-4ccf-92d6-701a45b67da3@bytedance.com> From: Yosry Ahmed Date: Thu, 25 Jan 2024 00:42:27 -0800 Message-ID: Subject: Re: [PATCH 2/2] mm: zswap: remove unnecessary tree cleanups in zswap_swapoff() To: Chengming Zhou Cc: Johannes Weiner , Andrew Morton , Nhat Pham , Chris Li , Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4BF9F20009 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 5i1db1j5ca8czzkuj8x96urj3wcdkgpt X-HE-Tag: 1706172185-161189 X-HE-Meta: U2FsdGVkX18rKTzxdEQhIabrjHnzSJj8EmhvXpLgSw/qWJWvpeu0XjIhY8+J4i0+kiAJM3XLy3RVpeZJYyciIQ6p4opO3/++2lPGCmyZ3uCW+HBkDc6Yu2+xgmc8SNzYR01v9b1N3pEPI3CbVqGBew/LtRcR4F5ZLYojCNyQ1pd+qxvi++D6FnQOJjk9NsKWWEXmj+LLJSPigvx/es7pZ/vL875RGhaBM7c7eCSzQH1G8GgBSprvkG2Yaapy8C66PcBIQqGbhNyfullYDMRLvcUYsh5jZYHwO2O0Hxe77PImuEadyfBs6QEx7GcDWwbCPYyF03TCTiIXVA0h4TiH3I8PNsHNRIHNtF1X2EAgipDrLZ+f5exdazck8zZIZ4Yq9NgtLKm729tIqR6PSy7679VeTNpR5Qrz7E07sf+4cvbQyAGltH52rfFYVFuHOJqcVpvmgaTZGf72UEaY7omn1a90Yaqbsx8iPA1yebS8DbEnhaey9pqnMLIUOhWoN4xB+8Eg0zQSJa87pUGNH9VSNyIIKmKvW/lIrwVDe9LmIGRDMnIuis9fyKeAzJQnX4X9O/z55Fp0N+LLFKBl3rK7suZaxok9sSKSGXuPvR1MbRWKnfOeakp+hY/5wtlonnjmREQuel32SE3mLpQ6FkRceLRroZPoVTpITPe5FrBN1xC5YuVPC924S9UjqXtr7Mu1xbhvdZmzhlFX3gaKgm9GgEHumj9AprZFcR460YYmadlXlEx31SNEJNwIBUS+fatHQHyEUO7oeJ2JXKnFKsiLLpfI0m6euI/vRfj7LK//8sb6mzlBcK+8bYTyruise+haXCTy/tlKp0Jo6cWEM3PLC33x33J3JG300vq8OJUuf5PWzA1nGNLlPHcePEyNtGCHDpjsV2yIqspk1yTYkYlCFMGewuWhKkhwcWoXThTiKUMwgJuBN34SgpPY4iOU4BLWiQDIMb36CMZzFWvEG4H tFaSrFnh Ahu351UJ+paYph8dLfX5C+pA1YoSiihf3VG0UocxUlfDHfoFtg0VQi8+WX7xWMRakJq/2es9TYtVaHVC7vGk92oerFLln5JsfAFoT6JcfbcWwh4i5F8N7bAwj8aMH1Hw7g8EAv4LU5ifAgMs6+1Dk5a6CXNfjQycSY7J6/9oTFcuCvs57pHBQGevcs0Mhxy5Q7wGu0Trj0VzTDcfd9dNnyRBorQPKeT2FyNGLu/ebb/rjXIY7bxuqbBTRJ0XwKZSpkqEX00Ke6oe3RGbuvBS+4aGRJ6YOoWWM5/gIxZmqDc4h5oHBvJ2WSUDY8QA72Mz00xawts7A8QOQNLQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 25, 2024 at 12:30=E2=80=AFAM Chengming Zhou wrote: > > On 2024/1/25 15:53, Yosry Ahmed wrote: > >> Hello, > >> > >> I also thought about this problem for some time, maybe something like = below > >> can be changed to fix it? It's likely I missed something, just some th= oughts. > >> > >> IMHO, the problem is caused by the different way in which we use zswap= entry > >> in the writeback, that should be much like zswap_load(). > >> > >> The zswap_load() comes in with the folio locked in swap cache, so it h= as > >> stable zswap tree to search and lock... But in writeback case, we don'= t, > >> shrink_memcg_cb() comes in with only a zswap entry with lru list lock = held, > >> then release lru lock to get tree lock, which maybe freed already. > >> > >> So we should change here, we read swpentry from entry with lru list lo= ck held, > >> then release lru lock, to try to lock corresponding folio in swap cach= e, > >> if we success, the following things is much the same like zswap_load()= . > >> We can get tree lock, to recheck the invalidate race, if no race happe= ned, > >> we can make sure the entry is still right and get refcount of it, then > >> release the tree lock. > > > > Hmm I think you may be onto something here. Moving the swap cache > > allocation ahead before referencing the tree should give us the same > > guarantees as zswap_load() indeed. We can also consolidate the > > invalidate race checks (right now we have one in shrink_memcg_cb() and > > another one inside zswap_writeback_entry()). > > Right, if we successfully lock folio in the swap cache, we can get the > tree lock and check the invalidate race, only once. > > > > > We will have to be careful about the error handling path to make sure > > we delete the folio from the swap cache only after we know the tree > > won't be referenced anymore. Anyway, I think this can work. > > Yes, we can't reference tree if we early return or after unlocking folio, > since the reference of zswap entry can't protect the tree. > > > > > On a separate note, I think there is a bug in zswap_writeback_entry() > > when we delete a folio from the swap cache. I think we are missing a > > folio_unlock() there. > > Ah, yes, and folio_put(). Yes. I am preparing a fix. > > > > >> > >> The main differences between this writeback with zswap_load() is the h= andling > >> of lru entry and the tree lifetime. The whole zswap_load() function ha= s the > >> stable reference of zswap tree, but it's not for shrink_memcg_cb() bot= tom half > >> after __swap_writepage() since we unlock the folio after that. So we c= an't > >> reference the tree after that. > >> > >> This problem is easy to fix, we can zswap_invalidate_entry(tree, entry= ) early > >> in tree lock, since thereafter writeback can't fail. BTW, I think we s= hould > >> also zswap_invalidate_entry() early in zswap_load() and only support t= he > >> zswap_exclusive_loads_enabled mode, but that's another topic. > > > > zswap_invalidate_entry() actually doesn't seem to be using the tree at = all. > > > >> > >> The second difference is the handling of lru entry, which is easy that= we > >> just zswap_lru_del() in tree lock. > > > > Why do we need zswap_lru_del() at all? We should have already isolated > > the entry at that point IIUC. > > I was thinking how to handle the "zswap_lru_putback()" if not writeback, > in which case we can't use the entry actually since we haven't got refere= nce > of it. So we can don't isolate at the entry, and only zswap_lru_del() whe= n > we are going to writeback actually. Why not just call zswap_lru_putback() before we unlock the folio?