From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA05BC47DD9 for ; Sat, 23 Mar 2024 02:03:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 589C86B0088; Fri, 22 Mar 2024 22:03:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E8D86B0089; Fri, 22 Mar 2024 22:03:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 362D16B008A; Fri, 22 Mar 2024 22:03:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 20DC56B0088 for ; Fri, 22 Mar 2024 22:03:15 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id DEB0B812D2 for ; Sat, 23 Mar 2024 02:03:14 +0000 (UTC) X-FDA: 81926656308.22.001A784 Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) by imf21.hostedemail.com (Postfix) with ESMTP id EF2E81C0005 for ; Sat, 23 Mar 2024 02:03:12 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=tOqzPSfb; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711159393; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yoB0K5WRFoGSmH+ex0ygkzLKqvVpzKWZj9b/agl7fXE=; b=5jU/RtLErVJIkPfui0139ioftOmI1TdZljgwWPrJOMGUgr1noGXmba8Oz1+iQHWvlMVvAb mFNvFBfqycZzHeoPjABD9EXHoSZbAzd5WKDbG+59CpKQknbNRNvTRB3GnfkXG7ANcT0a30 zhmYts6UmtMSarzrX64jEOs+nyQQXsE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711159393; a=rsa-sha256; cv=none; b=rlAZQu6xlLy1L8ihCI/bb1VpPzFCXJDy1uysYbwnn1KSN61TTjgteyMcI7001D55wtCHf+ 9qQbt4D2Zdmpsh1dB4A9Nv7fKNIb++rBRldE27c4t+rIM4Ok3Q9XovAgFpI+vVTzA46Xm4 i+834qvbemD7qv+D+UUCx8aCqH6TJt4= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=tOqzPSfb; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-a45f257b81fso330998566b.0 for ; Fri, 22 Mar 2024 19:03:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711159391; x=1711764191; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yoB0K5WRFoGSmH+ex0ygkzLKqvVpzKWZj9b/agl7fXE=; b=tOqzPSfb1jBBiB9Tn2jTTyVwRwl6ZxvSPpaaNzPaxsgqoH7yo//uCizk8USUPY11el mOW0tdmCtfyrSH0IxIttFmw/IVxJgZD9pAnJLgDBlV3aqEFAGDVLRtq5PhdW0gnvIioh xaKPJCii3HMn33e39IpfSbKz+7OtRtYr6I701OKFT0usHWXnYhYNxyD9dfc8U2/LIzyT UuRdKBFcA1BXli7ro7ABOIJJxRxEYqeGyohEv7pg5BIUQMukKFe0DuqwS2yuNPaGChC0 ykNr5CUToU5LLuH8I1TS4yOb9agutaBgTeriGSF0CUKaZfRqysHO7gfugq3JEClDX84y G/FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711159391; x=1711764191; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yoB0K5WRFoGSmH+ex0ygkzLKqvVpzKWZj9b/agl7fXE=; b=orPsROSox3+4nszAJ76GEqG1ktzI7fo+foGPicNFwZNEgnrlAyg7Unip5oBH3TGvsB a62my5p6aX/U1cciN1p9lkPEcBJH/312OVFewdt7rrBoa30Q09AUdK8NDHYcxwwqsf8q PpoPlDkHTm+1DEe+S2YaQJjdP0NPXRjtlSQ0BTf1/gxIYfYYb6G1UzXyVMLYGJseQMhA 4fwU3afzV8hjKj7IImC7LGEcB/yDBH4nxrgoZkU56ekSgE63ZRYQAhhkmLsAdWQTGCBG 92alVVpxkawj3OEB3DibZ/3WsH+lrOC58Xk7++CM8aQivVkHEhXmuPTa9M4NxVF7z9iD /+uw== X-Forwarded-Encrypted: i=1; AJvYcCVU7b7Hsn1tQ8BTlc7Ztb43+sJMcCcw2ECSq05HmKwlmJJjcJz7qyTNluCBuF9+ZBJtHqcofjpWjE9kn5Mmo2u0w28= X-Gm-Message-State: AOJu0YweGb8XcmT9KP+lCSYog5yGYhhADMMJQ7BgEtPPLbyQnqCSooXj QkxSTDbssOvuS0dV82Z7m0v6gkXAnple5s1VHVS2C6C7wdmGOnB9dYCcCWVUPaaTYNRdhfNxtiY 4eCxPn4hIaZezcyOKrrWv+K7XIhpLcudUT/xh X-Google-Smtp-Source: AGHT+IHyypD8b8GJHGjP/pBrCz9FExVb/Ctv2KcYhB73bzJ/Uj6b8xtxInJB4aZ9YcjO4SyK4oq3NjSp7+9IJ/aoJwU= X-Received: by 2002:a17:906:2846:b0:a46:954a:aa14 with SMTP id s6-20020a170906284600b00a46954aaa14mr946344ejc.67.1711159391127; Fri, 22 Mar 2024 19:03:11 -0700 (PDT) MIME-Version: 1.0 References: <20240322163939.17846-1-chengming.zhou@linux.dev> <20240322234826.GA448621@cmpxchg.org> <20240323015543.GB448621@cmpxchg.org> In-Reply-To: <20240323015543.GB448621@cmpxchg.org> From: Yosry Ahmed Date: Fri, 22 Mar 2024 19:02:35 -0700 Message-ID: Subject: Re: [RFC PATCH] mm: add folio in swapcache if swapin from zswap To: Johannes Weiner Cc: Barry Song <21cnbao@gmail.com>, chengming.zhou@linux.dev, nphamcs@gmail.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Zhongkun He Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: wc41c48kufmakqqiy1a43u1gjawxd13w X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: EF2E81C0005 X-Rspam-User: X-HE-Tag: 1711159392-627575 X-HE-Meta: U2FsdGVkX19kMzZqdNWt/iDjbOLblZj6hSwh9pfpanHR46bLnwPOQ3DHDe5M7jnLVzFjGKU6xFiXvEiKdmQ5lJVafIDYby1dAxoj9m7mgvSZd68SSzWQ67PQXEJ44/R4thiyYk/UvlanthR4yJ+UgkAoyYyu62QqO/j42Fhk3snxCbE9f4MRnUUd89/sIklu7Vy29vUnmg1HCZ4zZLrCXjK8edWv5ksoAAWj/O04i8tX3ZCo3J3kuawbMCoc1koWFgIgcZf7VbYbzGfzHF1b1KyETG14OwDdp6WHPcUocl50q4++wIJN5531axx8jJr+9Lk3QmSwlUfcI44avzph07DpRskijJKXnV6+kQ99KxFTJZazvYB6Y/HW/RKTxMAOBEhs66EjfCKeYeeDKbaPxLcIuoQOkri9+RNC8U6d8Raa0LRwkVrtJvyMgisqpzJPRBPJ6GuiSI5heJ1fbkuws4S5QnOuHbhICqR+wcA1uXBeQckB5SsFQfbrJbJIGvPdUlhGo1/eveibhUtJJERcCillbmHULdBM3rl5ph7u8VEgrGY6w4uzfLnxvBs9kdoIabuFWJIvL8HjCiZVIiZK5uWpotfxyRLFfemI7LhFrtMGbbsueI3le9uNlFlkNFPGWX3iE9UApaPXCMgBlj1eMjvd2qAzeLuX9EzWMl5DbE04tVR9w18dxrqqh2djv+CWdxhTWEqt5qO3LTnQzGcwovvIY10Xg4iaOHtvbPgcdUenbnsofatxVAlGrZ+1qn5pAC2YIXCWmHkJy3ARKsFdCJyqhwYCmCBCPQK7Hc8eyXvWnBw6bEYGllYe2CRr9NfGWTs6uajBhQTRQfW416dc63qRZbjr+0TKLYrz0YDtxCwnFELEd5JzYxU9i4/3izKKsH4rF6o8+I4Vz8c1T4cbNikb/PkSNNF/Cq7WESpc+OLDcw+nhbUo07me52/ejfxaqyXd4ivnbSDSoXTlF5h SE1ZPZaG V1PXhlAdjDrzCcVO7kqvOMaho/As3kHnKAFekOCx7ukmyOO+mFdbkrF1WNRzCHsj6BGXsCkt0y5+mU9gmKs3edkVNipmE9lHTg4Vc35F2lEFYKQTifBpuf9vv+N5ifmpx0rNgG6Nlj280UsYiwZFkNGpEv4trGO5QE0SvWziO+yQff+vW3JJNhTXQw+erEsHd6Ouh6u9RaoXBg59g5jLfwhf5K/kNO9WQJyuBHTQ42CRKUjtUocst5i/GbJsbdGhR0aM/G+r2xH/BGZqWJbMK4XzkLSfiFSIf7nvzvcyNSvU7AfgOFwLSi6Bym+JWpFUO6mHNy9hhPvNLDvoeFuiOGuuLTaQGvvwyQE1c7dHSTeN8Ln7wNDVyMn3w3IQzKJ8XpKdChJkrsWZXZd/cawNNu9YSeA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 22, 2024 at 6:55=E2=80=AFPM Johannes Weiner wrote: > > On Fri, Mar 22, 2024 at 05:14:37PM -0700, Yosry Ahmed wrote: > > [..] > > > > > I don't think we want to stop doing exclusive loads in zswap due = to this > > > > > interaction with zram, which shouldn't be common. > > > > > > > > > > I think we can solve this by just writing the folio back to zswap= upon > > > > > failure as I mentioned. > > > > > > > > Instead of storing again, can we avoid invalidating the entry in th= e > > > > first place if the load is not "exclusive"? > > > > > > > > The reason for exclusive loads is that the ownership is transferred= to > > > > the swapcache, so there is no point in keeping our copy. With an > > > > optimistic read that doesn't transfer ownership, this doesn't > > > > apply. And we can easily tell inside zswap_load() if we're dealing > > > > with a swapcache read or not by testing the folio. > > > > > > > > The synchronous read already has to pin the swp_entry_t to be safe, > > > > using swapcache_prepare(). That blocks __read_swap_cache_async() wh= ich > > > > means no other (exclusive) loads and no invalidates can occur. > > > > > > > > The zswap entry is freed during the regular swap_free() path, which > > > > the sync fault calls on success. Otherwise we keep it. > > > > > > I thought about this, but I was particularly worried about the need t= o > > > bring back the refcount that was removed when we switched to only > > > supporting exclusive loads: > > > https://lore.kernel.org/lkml/20240201-b4-zswap-invalidate-entry-v2-6-= 99d4084260a0@bytedance.com/ > > > > > > It seems to be that we don't need it, because swap_free() will free > > > the entry as you mentioned before anyone else has the chance to load > > > it or invalidate it. Writeback used to grab a reference as well, but > > > it removes the entry from the tree anyway and takes full ownership of > > > it then frees it, so that should be okay. > > > > > > It makes me nervous though to be honest. For example, not long ago > > > swap_free() didn't call zswap_invalidate() directly (used to happen t= o > > > swap slots cache draining). Without it, a subsequent load could race > > > with writeback without refcount protection, right? We would need to > > > make sure to backport 0827a1fb143f ("mm/zswap: invalidate zswap entry > > > when swap entry free") with the fix to stable for instance. > > > > > > I can't find a problem with your diff, but it just makes me nervous t= o > > > have non-exclusive loads without a refcount. > > > > > > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > > index 535c907345e0..686364a6dd86 100644 > > > > --- a/mm/zswap.c > > > > +++ b/mm/zswap.c > > > > @@ -1622,6 +1622,7 @@ bool zswap_load(struct folio *folio) > > > > swp_entry_t swp =3D folio->swap; > > > > pgoff_t offset =3D swp_offset(swp); > > > > struct page *page =3D &folio->page; > > > > + bool swapcache =3D folio_test_swapcache(folio); > > > > struct zswap_tree *tree =3D swap_zswap_tree(swp); > > > > struct zswap_entry *entry; > > > > u8 *dst; > > > > @@ -1634,7 +1635,8 @@ bool zswap_load(struct folio *folio) > > > > spin_unlock(&tree->lock); > > > > return false; > > > > } > > > > - zswap_rb_erase(&tree->rbroot, entry); > > > > + if (swapcache) > > > > + zswap_rb_erase(&tree->rbroot, entry); > > > > On second thought, if we don't remove the entry from the tree here, > > writeback could free the entry from under us after we drop the lock > > here, right? > > The sync-swapin does swapcache_prepare() and holds SWAP_HAS_CACHE, so > racing writeback would loop on the -EEXIST in __read_swap_cache_async(). > (Or, if writeback wins the race, sync-swapin fails on swapcache_prepare() > instead and bails on the fault.) > > This isn't coincidental. The sync-swapin needs to, and does, serialize > against the swap entry moving into swapcache or being invalidated for > it to be safe. Which is the same requirement that zswap ops have. You are right. Even if swap_free() isn't called under SWAP_HAS_CACHE's protection, a subsequent load will also be protected by SWAP_HAS_CACHE (whether it's swapped in with sync swapin or throught the swapcache) -- so it would be protected against writeback as well. Now it seems like we may have been able to drop the refcount even without exclusive loads..? Anyway, I think your fix is sound. Zhongkun, do you mind confirming that the diff Johannes sent fixes the problem for you?