From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BEB7C47258 for ; Thu, 25 Jan 2024 08:04:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D4138D0019; Thu, 25 Jan 2024 03:04:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 982D88D000C; Thu, 25 Jan 2024 03:04:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8252A8D0019; Thu, 25 Jan 2024 03:04:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 6D2368D000C for ; Thu, 25 Jan 2024 03:04:32 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3353EC0344 for ; Thu, 25 Jan 2024 08:04:32 +0000 (UTC) X-FDA: 81717096384.15.6AE2990 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf23.hostedemail.com (Postfix) with ESMTP id 669E7140011 for ; Thu, 25 Jan 2024 08:04:30 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="xy97/zHw"; spf=pass (imf23.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706169870; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nbEIuFB5A5XX39UC0VCdyx+dlgGNaYXATJU8aqEkgfk=; b=b5YZJEOcaWK6ANtTwPUDjfEyeTfa/jbsh8BWVu0QjUcqwvHPxLZypdK4AMt5NEjElO84z2 9giGs6ga65iTmdtyTnZgFLK63K+0G5K9KLnkKW1Hm7BGtbsSoA+ei66LnSzYU8Lj2OW5Fb GIEzf7Pd7zcSbss7nVS5oHZnbsKA+j0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706169870; a=rsa-sha256; cv=none; b=p+Q/dnNcJ+7VSgdD73A7X4/QE2czmMsggpfQphlXnGDSoCHPXvDLuXQ+XungLsXokHlARD tbaEo/aWMBtUzg6j3JLqpWLDgotzZw/xemqSbLEM5ivKWvHkQ8r5xoXCOS9/Xhz7iM4J2y VPdIA0TyQuEOu++AMoqQQDo2S5geMNo= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="xy97/zHw"; spf=pass (imf23.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-a29c4bbb2f4so622511866b.1 for ; Thu, 25 Jan 2024 00:04:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706169869; x=1706774669; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nbEIuFB5A5XX39UC0VCdyx+dlgGNaYXATJU8aqEkgfk=; b=xy97/zHwuywNpe61I5bT3YRnIBNOqWsUvUGbhlbEZd/uKR/p7/guihEMX+W5OKGpmu EWT3vEohREEdOy8VJ9iKjx+L6nwy6L1rFmfXbXMEtAU4fWzhGSF6Oyr1oZtkRfXHnuwR 0YtMQH59k0BbLrKe4hF4fHaNVAuqzwNPxo4ibRYdVKWt6CEAvhEiUjT7YncIh8AG9RfO LUd8n4vbs1wvpfkSbEIBs3XD2cSdcTBo3ejrCjxAnbV0y6C8DfOtELQ0kmtgOcAWL0EO it+NSJQcn15VAqtSw6CDcBN8+/rshyOhmz7DsozZoKQ6/kAr5gn/EX7p7qEJzTMc3HKz uEQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706169869; x=1706774669; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nbEIuFB5A5XX39UC0VCdyx+dlgGNaYXATJU8aqEkgfk=; b=wb88YjdGrbrYh3QOEnTGmB//5KcFLAC1Iq/JzxAil7LveR1a7uEVLHqcv6Zx7k2C4f 5Tlz/ppf5o8ctX8TVb8FNrPgKJhiIYl48rnsDnmj7ebJk6YWHQtncq9kleQuLlH+fdar hMBiEJtEbWhRBcWf0+DdpeNIMpMbeeIMwYEFpp2JpCb6wsEk0fwfiRMbJLB9TA9Jcn4Y pd36btG9EZKjN3ckVSF6jOEhKehDsgkkPpXJtxZAmRnPMlAjI5s63ZqOytuNLY//fIWG Zg3ueHPODIuXRVkByPxhOGva9d1MaifzZerR0mD9GUYTJKZL3Onm3yFr7HCHIP+O3FBI i0FA== X-Gm-Message-State: AOJu0YzVllcxZtwEn/6VcXuDe6WgpNsmFBJOBBYa3VeRdpF3bdceEEUC iV1OG9IMG0oId6wGleiQ75EDsdiN6UxHM4nbE/DjsJKV13xHakD0RlHkkf1Axw53E3fWNlmI6/o QrMHKdA5f3TcWjMFv+JAnxy8h4wrh5dzc43hl X-Google-Smtp-Source: AGHT+IGIW7mc7I7DA3/R/orIYvaRnOs6vYuyv9HmINqG5ofGjNKk3OYQW9skTMwEegGwIEruiRoZzoCb+htD1FzYTxE= X-Received: by 2002:a17:907:8743:b0:a31:7c5a:bf5b with SMTP id qo3-20020a170907874300b00a317c5abf5bmr260977ejc.5.1706169868749; Thu, 25 Jan 2024 00:04:28 -0800 (PST) MIME-Version: 1.0 References: <20240120024007.2850671-1-yosryahmed@google.com> <20240120024007.2850671-3-yosryahmed@google.com> <20240122201906.GA1567330@cmpxchg.org> <20240123153851.GA1745986@cmpxchg.org> <20240123201234.GC1745986@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Thu, 25 Jan 2024 00:03:52 -0800 Message-ID: Subject: Re: [PATCH 2/2] mm: zswap: remove unnecessary tree cleanups in zswap_swapoff() To: Chengming Zhou Cc: Johannes Weiner , Andrew Morton , Nhat Pham , Chris Li , Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 669E7140011 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: pzozmjwggqkhis47q7opwrmhe3pstrjj X-HE-Tag: 1706169870-993388 X-HE-Meta: U2FsdGVkX18qO8pR0oB3kWTwJPZhEFFMhZM7ImzES/wdY/XxlnAHE6EBfp3rC653TrpF+3l7t+4Sk+krL12Zaf0lcVPd9Umxi5omqR+Qf1yN0/ztpIVzhz8M9DPWqg6T7SN3aHauVprMR2e8XaHkLpEu7UHwsiLbf2pUoN7X6ioKg+Ik3Whxiy/qlEY8noRx3jV7j67y/ZPeSJgn6PSHCWwBKPu+1oqFvAlK4LwYInF1aGU31TiLKnSbEMuRZsv5cIIRrIIw5zmc4eNQp4uQty69VN79ewWPhMSrE0tFm1M7bhP7OdidOOJtMZ+hDN+48yzmJKiGPhapCtdENVgoP84OZAbDI7KvHfcGTUH25m6K1QVbfGNi03jcKFHHOFi//w4CI1x9u70yr8WIhtermh1w46Xrb5yb2qbjDqitZHTFH717It4L4PhcTprlSuQGXfIQFjfQlTrbxHTQKPo3E7kBqCyvN3XLwL1SIWiOJumhrdFOb0a4BsIcYO3boiFqwUlpU/vitNGxtQvNRTfvN+2uVpprzJR3eZSGIdKJnm7UlkilwhNezhm78kWcnJMknJrvXbbz7H6t6clI/5Mgv11ccRXb1L6AWvkCYs0/2sAjaZddvbLFNIgCUbudYl3qM0JKMrlG61OiPTp0CT1+WpzKJscUGu7Ttcftp1WxzffMEYy405tldJyxXjFmgV0NuXhN0tdtBgoCIWi8jN0dUyCEBzpeKgYX+Zs2k8EQZIIRyMHAIip+CKeR48+1BIn5/gALj/CpWtJOjoAHRmP2PnBNvEDzNQy/fq8kiRk0DY82RMMrp3n7/D6ni4HzK6J9ZXaLyqhL6muKr3nps4dVjmh4Wns3nUkFnc1Jq1nNti62bvFDxdfcHI7AUm4Ly7VYKlMSzISE2Ptx9lmXiAcPySm+hWCoZyUK6L7g5jYxS4zokc1SMseySHWmY5bz6SjwnePoGfE4ondF9p4iLSi cVh83FVD RX3Uxes3ksZZFo59Jn7tP9V50KaUyF+wwLoZFsB50Wb9C0o1imFEOazyxVttLyP7zvmiEETyP1lg0zSf7DvvHCRrYY+Vp3l03grjUHh0RDAc82U0v/gnqie2iIKngqBzZ/UeCoz+nA6vkGsWmOlx7dmwfP585D2qDVY6vDd6TlRM1hdQrfF/8MxP+46M7O/vDzQErxT1oJ19VqiaCP2aMdZJ0HIGTQyE7lLs/Zewu6KhKUU/fL3RJCnLCmMt2g8RdHz+1SqVE/FSgHT3qlIjovBS6qYUxQoFZfbx6CUFqu5TP+8Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 24, 2024 at 11:53=E2=80=AFPM Yosry Ahmed wrote: > > > Hello, > > > > I also thought about this problem for some time, maybe something like b= elow > > can be changed to fix it? It's likely I missed something, just some tho= ughts. > > > > IMHO, the problem is caused by the different way in which we use zswap = entry > > in the writeback, that should be much like zswap_load(). > > > > The zswap_load() comes in with the folio locked in swap cache, so it ha= s > > stable zswap tree to search and lock... But in writeback case, we don't= , > > shrink_memcg_cb() comes in with only a zswap entry with lru list lock h= eld, > > then release lru lock to get tree lock, which maybe freed already. > > > > So we should change here, we read swpentry from entry with lru list loc= k held, > > then release lru lock, to try to lock corresponding folio in swap cache= , > > if we success, the following things is much the same like zswap_load(). > > We can get tree lock, to recheck the invalidate race, if no race happen= ed, > > we can make sure the entry is still right and get refcount of it, then > > release the tree lock. > > Hmm I think you may be onto something here. Moving the swap cache > allocation ahead before referencing the tree should give us the same > guarantees as zswap_load() indeed. We can also consolidate the > invalidate race checks (right now we have one in shrink_memcg_cb() and > another one inside zswap_writeback_entry()). > > We will have to be careful about the error handling path to make sure > we delete the folio from the swap cache only after we know the tree > won't be referenced anymore. Anyway, I think this can work. > > On a separate note, I think there is a bug in zswap_writeback_entry() > when we delete a folio from the swap cache. I think we are missing a > folio_unlock() there. > > > > > The main differences between this writeback with zswap_load() is the ha= ndling > > of lru entry and the tree lifetime. The whole zswap_load() function has= the > > stable reference of zswap tree, but it's not for shrink_memcg_cb() bott= om half > > after __swap_writepage() since we unlock the folio after that. So we ca= n't > > reference the tree after that. > > > > This problem is easy to fix, we can zswap_invalidate_entry(tree, entry)= early > > in tree lock, since thereafter writeback can't fail. BTW, I think we sh= ould > > also zswap_invalidate_entry() early in zswap_load() and only support th= e > > zswap_exclusive_loads_enabled mode, but that's another topic. > > zswap_invalidate_entry() actually doesn't seem to be using the tree at al= l. Never mind, I was looking at zswap_entry_put().