From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 007B6EB64D7 for ; Wed, 21 Jun 2023 19:36:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 737258D0003; Wed, 21 Jun 2023 15:36:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E7C58D0002; Wed, 21 Jun 2023 15:36:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AF978D0003; Wed, 21 Jun 2023 15:36:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4C1B48D0002 for ; Wed, 21 Jun 2023 15:36:56 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 20D5A80A3D for ; Wed, 21 Jun 2023 19:36:56 +0000 (UTC) X-FDA: 80927762832.20.C5AB90F Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf29.hostedemail.com (Postfix) with ESMTP id 27BDE120013 for ; Wed, 21 Jun 2023 19:36:53 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=F04tWiHz; spf=pass (imf29.hostedemail.com: domain of cerasuolodomenico@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=cerasuolodomenico@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687376214; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ATxljAeXD3ZQLvgleYPWv/HICEiuHF7y2fCsQyw2v4s=; b=59KVRRnQGcNBWesqy5subkJcIIeAEvSvOEZhCPB08qfdJPgtSQE8VoiDrFM+i/sUueECNB bWwyi3mceTBJkOsyXI/nEmskBHIm1oOkw6X13hQ/lw656aMQKMDt+p5j80cOq/Rb/azbZz SOT5FsrNekbO6osnj9BCzOdQ2wEd9qw= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=F04tWiHz; spf=pass (imf29.hostedemail.com: domain of cerasuolodomenico@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=cerasuolodomenico@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687376214; a=rsa-sha256; cv=none; b=t5jvpLH0lfbA5xPATRdOI4gSkPoqhVO2rL/4WtbiavLfIJz3q5PwzMSxlju8UtvYvObtuU OnEyFCf8EHgt4zNhmbYsMI5zNkQLyLCEqHrBGOhMqHcvp5raNznRLBY/Gi7d8UALACL5oi GoWytEYpJ9VXGCFjETvmallHuWZwnQY= Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-25e8545ea28so4762324a91.0 for ; Wed, 21 Jun 2023 12:36:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687376213; x=1689968213; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ATxljAeXD3ZQLvgleYPWv/HICEiuHF7y2fCsQyw2v4s=; b=F04tWiHz1dnF4m/hyS3Rzz6lV/5n2cDeV8/AKRH6tjunNM+4W5plQtsvob1Rgktzd1 B0U6V+EV7Zqk0UsqT6kayYbb0APEuSJ+ZbM3Z+ZrMVGN6HXu7ZGhVv9p1IX++NxiJJEb /5YLxHTmVdRDAXvIbRo2AmcF9RWoGF/TYVkriLxR/jzqqlCSMxh3mfRiXWKyvLNfalJl 7xcnkfx4nI+RiK3QfK3/N2Wj+owWrgfGLVvIUJ8S9dsgcGlPg+HgCSARJWaGn1ieekl6 IbLtssr6EFnfVBUiGYWlD2KXHwBrhk+2XYlZJV7KC2Omawd1+h1bg/F6TZu0bLVeXe2L wuww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687376213; x=1689968213; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ATxljAeXD3ZQLvgleYPWv/HICEiuHF7y2fCsQyw2v4s=; b=ihGdxuH6hiCe6G1TQ8C1XvfL23ZF30wIoz7fdKEqcyHkzF74s2IXwEHeFMXPKwaro6 WdX+RFqjKzEDjybQI19TT2tb8+NOh9GmrgsqLCQUGTj3V5PGY4xhBvEYY8ynBfUgL6Zb o8Qkcnt64KAmKmcSmZaa9dCs/45KUUa2rCyLjGd6gZho6MQuuozbekwd2KNE2BG/ySh/ robtXfuCq2lESStEnvTJgAbCpCc9h9majDZ90mRKuw9bklD3uqN4KZxyknMpEDmsYP2h /lSOYoqNxrRXrPjsFnMrbkX72a6W11drDGkXbVKyC4fylzBdjxlukoeN1Z/fZnHNu0MT QwWQ== X-Gm-Message-State: AC+VfDzbelXMA7fWzfs/Eq6b24sFdNXeq58VTL3tFJjtwhKmI3jXALKV BtOy3nNBgCaAn7KJAtcjhOfrY1bUcSNDS0BfRhI= X-Google-Smtp-Source: ACHHUZ4gzECWFjd0U1mfwbl+t9yj2eAZd3Ldks7oZPeBVE/vceICr7tQsoMjCRzR74H0cY+940GEZFABs68Mjj8TRqw= X-Received: by 2002:a17:90a:6b46:b0:25e:b3f6:a352 with SMTP id x6-20020a17090a6b4600b0025eb3f6a352mr16635989pjl.28.1687376212622; Wed, 21 Jun 2023 12:36:52 -0700 (PDT) MIME-Version: 1.0 References: <20230621093009.637544-1-yosryahmed@google.com> In-Reply-To: From: Domenico Cerasuolo Date: Wed, 21 Jun 2023 21:36:41 +0200 Message-ID: Subject: Re: [PATCH] mm: zswap: fix double invalidate with exclusive loads To: Yosry Ahmed Cc: Andrew Morton , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Konrad Rzeszutek Wilk , Seth Jennings , Dan Streetman , Vitaly Wool , Johannes Weiner , Nhat Pham , Yu Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 27BDE120013 X-Rspam-User: X-Stat-Signature: rwqhd1tetotp5aaj6jqb3xhiykgtzfq9 X-Rspamd-Server: rspam01 X-HE-Tag: 1687376213-636129 X-HE-Meta: U2FsdGVkX1/u9Gq5xqa5MGMbhxU/ue43gKq4JuacRme2SR2ua4kG3k4CaufCN9QWaD0joFHAROznnT6GCkZNFtkPJJfw5D3NdOh+y3cdKst5NtZ9fvcoQJZuh58rIBWNOlV58XbET7owHL5zHnivhx9iTadR2rhv6bmN4YMW0LE5evVuSFo6lYCoWzQrhIqXVb9Lcbga1x3M7nbZ7sPJ1uBMIlTtcafVnBANIe2e4+MZ8cH7pobKIGkGoAl4+3kay7wkI+zayAKfe8HSL5WjsKqMHJ5xtXhSLhMr6GwS/PWl0CkSuWuBm+wxnfmg8C/l4JRH85EChYHIyA8TvH8gWTgn/1/G97M/AADvp/cebQCNcZsZQ1qxMBa1s5RGSgSp5W6aVP/qAIB8vBB/z0FdqCD/pS3BLjC9ZxIW9UWacpzlGt+WPhhItGyjr2LlIFIYu4n6sEem6MxHFejzHxugWTLNJt6IXMlAe36RutKnEXBDAebW2VQG0mPomfaOqgv9PKvkoVg7dukPglhqjDytjtmcAL/Yn3SCWfmYreYks6z295AU1U0q4L0XnCyfgH8tZljsw/bXBq/b/xm0PHTE+HGHGlgOUZHAXxOMNjCh6qbWbF/QG8wFHeiGdqdkBxeCFtxZk79iM4Ekdc0SNV06bK0wJNhad71e/ZqPWPi5J07/mxU06kG2L+UxGRXoa86WcARf3q3a60B5+klN4DvrK53AIFZ0CdimFa4eYAOAHBg/Rvosje8ay4NF5L+AgstKKSxK1USq528DOY8wmnBeSw2NC3B0snEnq1Ro/r2llJShgD/OoOuoliB/ZqBILT3r/2Wma+It9AnAiYT6pEJ0KVzCxNKFZJm6xstueP0YRAdrqtZbLK4d1viVz3UVcqTcfIZ0+0Ww43lMp7fl3JI6JrQwXW5IAL1rjcbHl+JSA+hxFtAd6hIjqkxCTtWWkjNT35jHWb659SLLrw/ecBN 2CugRglF 2zjmrhS0fO/DSAuv/rPofPKDyWWiUz/K5xg3KKu/zYaHFLL1kDdrTaxMD3uQ6sHMLbHpDzFbdHj/P3QjrevwO4klunMpEy9y36bkWpbf2UABcLNQ3pJRxdG3OO3peQ96FDoY2rn7BXraUItztgnZNpxo0bKzwmEZfx+lH76wCufmrPoAA8uC5a/i5emQPAEa/eiRuqoFtH/MNXQWh5QgcbLEqD6/AtDth4MZ6VEBmGuaIJ5/vl5WJS57ZVTj5/DSe8Xi6ACndKWnkNRU8jwX1/0rr/y+D6L9seYd4B91tM8PFJSff0NH0T5prahbljmSyvKT1Uj+ow6PYJz0zLPd16v0S9QtrgZwmVhpTN9uVM/ScDDeFe0j7OZpWpIrtScqIXpLK8XzijEiNNMYDiDtAX2mVXDujHchlKCxkAYWvXiDRti/sbZZYa6dOJfYAcndT6Fb2K9JVAMqzw58RdGf3Kj6Jfg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 21, 2023 at 7:26=E2=80=AFPM Yosry Ahmed = wrote: > > On Wed, Jun 21, 2023 at 3:20=E2=80=AFAM Domenico Cerasuolo > wrote: > > > > On Wed, Jun 21, 2023 at 11:30=E2=80=AFAM Yosry Ahmed wrote: > > > > > > If exclusive loads are enabled for zswap, we invalidate the entry bef= ore > > > returning from zswap_frontswap_load(), after dropping the local > > > reference. However, the tree lock is dropped during decompression aft= er > > > the local reference is acquired, so the entry could be invalidated > > > before we drop the local ref. If this happens, the entry is freed onc= e > > > we drop the local ref, and zswap_invalidate_entry() tries to invalida= te > > > an already freed entry. > > > > > > Fix this by: > > > (a) Making sure zswap_invalidate_entry() is always called with a loca= l > > > ref held, to avoid being called on a freed entry. > > > (b) Making sure zswap_invalidate_entry() only drops the ref if the en= try > > > was actually on the rbtree. Otherwise, another invalidation could > > > have already happened, and the initial ref is already dropped. > > > > > > With these changes, there is no need to check that there is no need t= o > > > make sure the entry still exists in the tree in zswap_reclaim_entry() > > > before invalidating it, as zswap_reclaim_entry() will make this check > > > internally. > > > > > > Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") > > > Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > Signed-off-by: Yosry Ahmed > > > --- > > > mm/zswap.c | 21 ++++++++++++--------- > > > 1 file changed, 12 insertions(+), 9 deletions(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index 87b204233115..62195f72bf56 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -355,12 +355,14 @@ static int zswap_rb_insert(struct rb_root *root= , struct zswap_entry *entry, > > > return 0; > > > } > > > > > > -static void zswap_rb_erase(struct rb_root *root, struct zswap_entry = *entry) > > > +static bool zswap_rb_erase(struct rb_root *root, struct zswap_entry = *entry) > > > { > > > if (!RB_EMPTY_NODE(&entry->rbnode)) { > > > rb_erase(&entry->rbnode, root); > > > RB_CLEAR_NODE(&entry->rbnode); > > > + return true; > > > } > > > + return false; > > > } > > > > > > /* > > > @@ -599,14 +601,16 @@ static struct zswap_pool *zswap_pool_find_get(c= har *type, char *compressor) > > > return NULL; > > > } > > > > > > +/* > > > + * If the entry is still valid in the tree, drop the initial ref and= remove it > > > + * from the tree. This function must be called with an additional re= f held, > > > + * otherwise it may race with another invalidation freeing the entry= . > > > + */ > > > > On re-reading this comment there's one thing I'm not sure I get, do we > > really need to hold an additional local ref to call this? As far as I > > understood, once we check that the entry was in the tree before putting > > the initial ref, there's no need for an additional local one. > > I believe it is, but please correct me if I am wrong. Consider the > following scenario: > > // Initially refcount is at 1 > > CPU#1: CPU#2: > spin_lock(tree_lock) > zswap_entry_get() // 2 refs > spin_unlock(tree_lock) > spin_lock(tree_lock) > zswap_invalidate_entry() // 1= ref > spin_unlock(tree_lock) > zswap_entry_put() // 0 refs > zswap_invalidate_entry() // problem > > That last zswap_invalidate_entry() call in CPU#1 is problematic. The > entry would have already been freed. If we check that the entry is on > the tree by checking RB_EMPTY_NODE(&entry->rbnode), then we are > reading already freed and potentially re-used memory. > > We would need to search the tree to make sure the same entry still > exists in the tree (aka what zswap_reclaim_entry() currently does). > This is not ideal in the fault path to have to do the lookups twice. Thanks for the clarification, it is indeed needed in that case. I was just wondering if the wording of the comment is exact, in that before calling zswap_invalidate_entry one has to ensure that the entry has not been freed,= not specifically by holding an additional reference, if a lookup can serve the = same purpose. > > Also, in zswap_reclaim_entry(), would it be possible if we call > zswap_invalidate_entry() after we drop the local ref that the swap > entry has been reused for a different page? I didn't look closely, but > if yes, then the slab allocator may have repurposed the zswap_entry > and we may find the entry in the tree for the same offset, even though > it is referring to a different page now. This sounds practically > unlikely but perhaps theoretically possible. I'm not sure I understood the scenario, in zswap_reclaim_entry we keep a lo= cal reference until the end in order to avoid a free. > > I think it's more reliable to call zswap_invalidate_entry() on an > entry that we know is valid before dropping the local ref. Especially > that it's easy to do today by just moving a few lines around. > > > > > > > > > static void zswap_invalidate_entry(struct zswap_tree *tree, > > > struct zswap_entry *entry) > > > { > > > - /* remove from rbtree */ > > > - zswap_rb_erase(&tree->rbroot, entry); > > > - > > > - /* drop the initial reference from entry creation */ > > > - zswap_entry_put(tree, entry); > > > + if (zswap_rb_erase(&tree->rbroot, entry)) > > > + zswap_entry_put(tree, entry); > > > } > > > > > > static int zswap_reclaim_entry(struct zswap_pool *pool) > > > @@ -659,8 +663,7 @@ static int zswap_reclaim_entry(struct zswap_pool = *pool) > > > * swapcache. Drop the entry from zswap - unless invalidate a= lready > > > * took it out while we had the tree->lock released for IO. > > > */ > > > - if (entry =3D=3D zswap_rb_search(&tree->rbroot, swpoffset)) > > > - zswap_invalidate_entry(tree, entry); > > > + zswap_invalidate_entry(tree, entry); > > > > > > put_unlock: > > > /* Drop local reference */ > > > @@ -1466,7 +1469,6 @@ static int zswap_frontswap_load(unsigned type, = pgoff_t offset, > > > count_objcg_event(entry->objcg, ZSWPIN); > > > freeentry: > > > spin_lock(&tree->lock); > > > - zswap_entry_put(tree, entry); > > > if (!ret && zswap_exclusive_loads_enabled) { > > > zswap_invalidate_entry(tree, entry); > > > *exclusive =3D true; > > > @@ -1475,6 +1477,7 @@ static int zswap_frontswap_load(unsigned type, = pgoff_t offset, > > > list_move(&entry->lru, &entry->pool->lru); > > > spin_unlock(&entry->pool->lru_lock); > > > } > > > + zswap_entry_put(tree, entry); > > > spin_unlock(&tree->lock); > > > > > > return ret; > > > -- > > > 2.41.0.162.gfafddb0af9-goog > > >