From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ACCBC47258 for ; Thu, 25 Jan 2024 08:52:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BAEB68D001D; Thu, 25 Jan 2024 03:52:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B35DD8D000C; Thu, 25 Jan 2024 03:52:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9613B8D001D; Thu, 25 Jan 2024 03:52:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7B3568D000C for ; Thu, 25 Jan 2024 03:52:22 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3F27CA0771 for ; Thu, 25 Jan 2024 08:52:22 +0000 (UTC) X-FDA: 81717216924.08.1B8CEE4 Received: from mail-oi1-f173.google.com (mail-oi1-f173.google.com [209.85.167.173]) by imf16.hostedemail.com (Postfix) with ESMTP id B3102180007 for ; Thu, 25 Jan 2024 08:52:19 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="VglBEP/v"; spf=pass (imf16.hostedemail.com: domain of zhouchengming@bytedance.com designates 209.85.167.173 as permitted sender) smtp.mailfrom=zhouchengming@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706172740; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UkGQoIYNC+nMybNFXcUXZhNlJLPpzMlk/G5qwiD/mfw=; b=SR10EJT455st2qF6se4HAI0Wbfl3mPHOvCTKhzv/ajHbPOjOMHboLtE2+Xs9tk1TS1W9hQ 0i7h2VNBzu4HUMDNXV5HqdLUVtG7NBwzzlXzzaNJSlEGc1sywo/k9ioiib89GvfCSy96dE A6tKtlSQlBkGt6jjC15tR2gf6weA8zI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706172740; a=rsa-sha256; cv=none; b=rYaJ84/8O4w/J6Nlfh+H1Y8GVQILuQMxb4cxG46ZFmdg43xwJ7FqTG+QsZWpTqyeULiuvs kvOTpQ6XlSV3d2MZjfIQyXmhfgzqaigfoj8c5iAKDYD0z3BFa67ReWJE/mQWVYOb/O62kK IL874G4ewD0nI0R3n7AgzfJAaIv1Sdc= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b="VglBEP/v"; spf=pass (imf16.hostedemail.com: domain of zhouchengming@bytedance.com designates 209.85.167.173 as permitted sender) smtp.mailfrom=zhouchengming@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com Received: by mail-oi1-f173.google.com with SMTP id 5614622812f47-3bbbc6bcc78so5004372b6e.1 for ; Thu, 25 Jan 2024 00:52:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1706172739; x=1706777539; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=UkGQoIYNC+nMybNFXcUXZhNlJLPpzMlk/G5qwiD/mfw=; b=VglBEP/vLbOGD9bCG1XoDZZ5tZZ+7tCaCz9nbDl/3e1e96ClPDIe77T4Su9O4CqhNf nP0hlZ9QFD9BsXPoceM11e/PgWFp/M8Z3lMPk4uFlaPX84FkqoqqKpHcsTfTSxQb4FH3 RJkYi3Y9e0OAlSlR0a/xQyVbUIV4a5J5dCJKEmxU7sylsWStPjkgl+OjEMlb2/VVjr3C ufwYGa0uzLGu/yyan54KiEuUfVcAJXEfUNREqxOruAj9g3ylQfKx+kBf5gdV5T9Xaq1c 5Po0ld+sq9ig6e4vGCVHoIKLuWALgN+ym69N32ldrgHYHymYk3gW1vYEbBq7xT28VxNR adVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706172739; x=1706777539; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UkGQoIYNC+nMybNFXcUXZhNlJLPpzMlk/G5qwiD/mfw=; b=dOpRTdOB55/ts1MkwfkYweoaoDSpcDLnj/F8FvWSx18QN/shR+G+iTqcIxAiicUjGH /sWECsAZ8RWYJI0Ucf//suqrjuo86I6HOKDNRA5Dx5aJUZSp8wM89zT+nfvjc2c+YWc8 VYymIht3qfKC98joTQCeHHcB7gWzf3Hab5fPiyjlxhoqsPpMqsi5yE7t9VrRQ7C93Xw1 +udXJjv7JJFuRTTNvVWPp6fqopl80FjDgzi77ITSTj0AlznCZISJM7iF6lm9p75zaPKt n1Itgt3F9zvjx7VWzoZQvkTlf3klLc9Sj56XhIFnjjvc8/jsKUhIMADaMOVw+5Hblxpg Ep3Q== X-Gm-Message-State: AOJu0Yz4VU4Lca/Mi8kR3rsXHP5GuHAWXbCzB0cUIbSsfaF8/vLT+UWb znuhotWxu8h3/jTvP4Q/2JerP4gwSSBVihaaJIglM02386Xap/gEuYVjwaJHRK+6MlpYGxP0tou u X-Google-Smtp-Source: AGHT+IE3KQ6LFPbSUS7KHTooCvwgkX/cufH/YfcEdQp9M1/ylyfGjHKPw8tFuV/tMyB3cXv/EPsLGQ== X-Received: by 2002:a05:6808:148f:b0:3bd:dae2:b9b1 with SMTP id e15-20020a056808148f00b003bddae2b9b1mr603753oiw.3.1706172738781; Thu, 25 Jan 2024 00:52:18 -0800 (PST) Received: from [10.4.195.141] ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id a11-20020aa780cb000000b006dbd5a5dca9sm8764698pfn.185.2024.01.25.00.52.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 25 Jan 2024 00:52:18 -0800 (PST) Message-ID: <35c3b0e5-a5eb-44b2-aa7d-3167f4603c73@bytedance.com> Date: Thu, 25 Jan 2024 16:52:13 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] mm: zswap: remove unnecessary tree cleanups in zswap_swapoff() Content-Language: en-US To: Yosry Ahmed Cc: Johannes Weiner , Andrew Morton , Nhat Pham , Chris Li , Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240120024007.2850671-1-yosryahmed@google.com> <20240120024007.2850671-3-yosryahmed@google.com> <20240122201906.GA1567330@cmpxchg.org> <20240123153851.GA1745986@cmpxchg.org> <20240123201234.GC1745986@cmpxchg.org> <1496dce3-a4bb-4ccf-92d6-701a45b67da3@bytedance.com> From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: pt8beyxdzpf7gtunr66tp5dx3croyqph X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B3102180007 X-Rspam-User: X-HE-Tag: 1706172739-82631 X-HE-Meta: U2FsdGVkX18liJfwGDYOn0N39xyOkAuFvgHTsCUWSkSFeoOcvjmSk3mSXyTnang7CYFgl/bfIE5sYHqklZEYTdCItWfMJvwXcyLRfuMtix7LMrJGOOW3DF5BAl1Rme9OuUOUp8ycmlX1ODucabCZBUxhuOJWuCt1tloWbKz/KBV0gcBKJUp9mCa80J2y+EW1/yxep8Jc6a3GHQbYpoBJI3c844KDTZAyTNxAplNFHSAE7faZRdgO6FzPBc9XszUQQEiSh3t3v4NnNmdO6sZ/2qC6//dMTIffBz9jLc4QTDM0dRp0NtQouEMGB0H/eWHh4XBQ4S3BTfmztyJXtC//nNmREN8eQNJiYWBmoEC38Ejt1oa1t+Pj2v1SzLWTcCwI+r7jtRe8nY0xjoewyNpbYRAwJlrGjadWmI+ahB50kaY8kwZynCcizeMC9GbTASmU1AALLAaLkJw8o9PiP4kzJPc+eT/IR1eVFBIfXzCoVRF6wYLiAfjhY+gCBRjNVOJbbL27Y7LG7wzg2UY1lj+DO8FV+CIsQrBCgJ7GaHWNEA+TxmE585lkHGuWsJCd5VlnJnU417yAuBtv9MCpBBIKs7BFrwL4+ivZ8t2VEHIaB8n8glhbUgCpECMQU53Vuc8qb01ec+6d28KimAL1zWSbcqnVcaiU0a59uTSDPiKFMJPGvz8MpvR5mv3PEjxVHW7DQbobJ3gfIMkrDdxpF9lqxLnRfa22gShP5W4MVkDmgtYhWjvx5az7SRJYaNSuNtsuY1BkiM2Lja3fidwkUMrDRCx8XZlW1m3/myirrDcBqCm6yGdFbolAcS4Ha65Zhg6GtlX6/X/ncDGAmqFvls/OsnSqFCfUdtpHfR+ZTgM0DjFYnqxReEBaVjaeoMHm+1TCbFekOHVu33TRKtgAg1VdrJUSjq0l7zI5yyQSGZlh83t/zSIr2tqqK7QxZtAgogTEnqAOjfSomAnyKtC41ro HIwansrr WLFutH2t5WRp99iOh0WS/3A56JDpVvbu1NRLBnoNinga4NUVFdi5/fu2UoUrw+hT2bKr17rXkR9R+Vx5mEBgdoRQ6pkwSi4voal2i1fdG+BVBtYJfeeyM0bA8VAgmvl6X8smPNJ/uqGyfWprCoLRAU0kCY3GkOZGkHyS7S8JvRVoNzHG/EmGXRPO3nU9tRioEzahyPlkIvj/t08tS4W0DUVF/PvVbmbi0wr+6f7WqwtvnjhManTOp5CQ5APpnVaNivlgES3//DwkY2gcPiSUgOoXBpT2cQsK50iwK9CjVCPo4X3gs5SFYeG61v6DUouTOLc/f9ATa/4bKpJRG3P2DWMDolSgDp8Fx6+gjQ7nypCJ1vBPKKkwwqXJj64ErL5QPfd7xNnZILC00jMpFb1OHqoK3EVq70gYGYM4nokmve4psSnmvEclxIGKXSq85X8DjpMAIYbguUojnERyK2Kr5TQqXZ25UAVnbsIH3p3SMnEMzyvRHFs7/YmVKlI/EuGPDN5HS X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/1/25 16:42, Yosry Ahmed wrote: > On Thu, Jan 25, 2024 at 12:30 AM Chengming Zhou > wrote: >> >> On 2024/1/25 15:53, Yosry Ahmed wrote: >>>> Hello, >>>> >>>> I also thought about this problem for some time, maybe something like below >>>> can be changed to fix it? It's likely I missed something, just some thoughts. >>>> >>>> IMHO, the problem is caused by the different way in which we use zswap entry >>>> in the writeback, that should be much like zswap_load(). >>>> >>>> The zswap_load() comes in with the folio locked in swap cache, so it has >>>> stable zswap tree to search and lock... But in writeback case, we don't, >>>> shrink_memcg_cb() comes in with only a zswap entry with lru list lock held, >>>> then release lru lock to get tree lock, which maybe freed already. >>>> >>>> So we should change here, we read swpentry from entry with lru list lock held, >>>> then release lru lock, to try to lock corresponding folio in swap cache, >>>> if we success, the following things is much the same like zswap_load(). >>>> We can get tree lock, to recheck the invalidate race, if no race happened, >>>> we can make sure the entry is still right and get refcount of it, then >>>> release the tree lock. >>> >>> Hmm I think you may be onto something here. Moving the swap cache >>> allocation ahead before referencing the tree should give us the same >>> guarantees as zswap_load() indeed. We can also consolidate the >>> invalidate race checks (right now we have one in shrink_memcg_cb() and >>> another one inside zswap_writeback_entry()). >> >> Right, if we successfully lock folio in the swap cache, we can get the >> tree lock and check the invalidate race, only once. >> >>> >>> We will have to be careful about the error handling path to make sure >>> we delete the folio from the swap cache only after we know the tree >>> won't be referenced anymore. Anyway, I think this can work. >> >> Yes, we can't reference tree if we early return or after unlocking folio, >> since the reference of zswap entry can't protect the tree. >> >>> >>> On a separate note, I think there is a bug in zswap_writeback_entry() >>> when we delete a folio from the swap cache. I think we are missing a >>> folio_unlock() there. >> >> Ah, yes, and folio_put(). > > Yes. I am preparing a fix. > >> >>> >>>> >>>> The main differences between this writeback with zswap_load() is the handling >>>> of lru entry and the tree lifetime. The whole zswap_load() function has the >>>> stable reference of zswap tree, but it's not for shrink_memcg_cb() bottom half >>>> after __swap_writepage() since we unlock the folio after that. So we can't >>>> reference the tree after that. >>>> >>>> This problem is easy to fix, we can zswap_invalidate_entry(tree, entry) early >>>> in tree lock, since thereafter writeback can't fail. BTW, I think we should >>>> also zswap_invalidate_entry() early in zswap_load() and only support the >>>> zswap_exclusive_loads_enabled mode, but that's another topic. >>> >>> zswap_invalidate_entry() actually doesn't seem to be using the tree at all. >>> >>>> >>>> The second difference is the handling of lru entry, which is easy that we >>>> just zswap_lru_del() in tree lock. >>> >>> Why do we need zswap_lru_del() at all? We should have already isolated >>> the entry at that point IIUC. >> >> I was thinking how to handle the "zswap_lru_putback()" if not writeback, >> in which case we can't use the entry actually since we haven't got reference >> of it. So we can don't isolate at the entry, and only zswap_lru_del() when >> we are going to writeback actually. > > Why not just call zswap_lru_putback() before we unlock the folio? When early return because __read_swap_cache_async() return NULL or !folio_was_allocated, we don't have a locked folio yet. The entry maybe invalidated and freed concurrently.