From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9988EB64D7 for ; Wed, 21 Jun 2023 17:26:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 484AD8D0003; Wed, 21 Jun 2023 13:26:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 434548D0002; Wed, 21 Jun 2023 13:26:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FBD48D0003; Wed, 21 Jun 2023 13:26:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1D8488D0002 for ; Wed, 21 Jun 2023 13:26:58 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7C47BB02AD for ; Wed, 21 Jun 2023 17:26:57 +0000 (UTC) X-FDA: 80927435274.19.B49224F Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf22.hostedemail.com (Postfix) with ESMTP id 97F92C0020 for ; Wed, 21 Jun 2023 17:26:55 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=U48DiOe3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687368415; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dzH3xxs939UBNL4aqrrmjpjc/lWuYbheLSA3AXtGpAg=; b=8Cl6gfRJX//AsXD8XmuMOZIcWO5DZpXHy6z6XzqyD7CfB5rMmhLAiNctWIc5Uwo/qYeaS8 Q5IFJx9bcp8HAZBZAnje6pD/JnlXYwIiqDTcQOtF/vUVM2IASbJrER3XvnRrK4kttUNn2L BZaU+7iRowVSwORgdq2c/qUZ+GNckO0= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=U48DiOe3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687368415; a=rsa-sha256; cv=none; b=1SeqtD+9/G/UApyeucU+ZdR5JczKRPdtB0ApfQc0vhChEBstrWEZ1OkHPMxvvjzKvyt9Pf XY87eylyp9Vlc1bLWmTZQCJ8e18VSnw1URS9KbLo/ciuaFLPUzfB54u6rpvMLAv5KZtlDJ 8kE5DIFjQJoEOckc5o3wY6b+15T+Cuc= Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-989d03eae11so177519866b.2 for ; Wed, 21 Jun 2023 10:26:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687368414; x=1689960414; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dzH3xxs939UBNL4aqrrmjpjc/lWuYbheLSA3AXtGpAg=; b=U48DiOe3RGCz38+m6dovOMUxSOBvRoB5xG6FRedeM+Y77S1Qozx4Bk+b6y95EMHbbA BgUs/F9k/noM8PU3mMcorVTEaE3DayDl5WKc82As3BDym0Uivt7nKRtY47ydWfdFRosc z60gcJyGLZQfr/3369tLnqxE/WRwT5ZJMK2IBXFWxY2A9KAJ9cncSBzd0vWGh6UJGP/k 6v6IbS6OwW32JsfcN0rp/2MO9wfbGfAeCzA9tzF2C9J9scdYEk6iTm4gaq5t62KLMSqc EZ7plCqijnzJQ78YQYachTuMoQqt0+Yv5DHD7NmfVRBzVWJRlC0++O8PmQqUXPrGHaq4 y6EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687368414; x=1689960414; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dzH3xxs939UBNL4aqrrmjpjc/lWuYbheLSA3AXtGpAg=; b=cETVeovKzLzvFVj1vAP8r4su8mHaDN3rYAsWidUp0qDHwBoJQ56rnVUo1XoVDfhaTA 6qitdr3ExM560oXVGSlj9T8t+shVZ9lauZthytyY4pPYufQXeXD4aiqbUIWByCrVZIdf mTXTdAELZLxz3/vaO/3lIvVRDAfXsllBGaQ7sqMEr9nZjTyZJYXJeEFWhVtSV141xtvh pCIanStpdPWcWdeZWh0E/hfZBA7XhoEZoYKRcNC2oshsqhBCwgBlncb6ZElsXQ4JCsqN PFMT8fXo1xyhI9KG95D+P6bBVAcHLf66zcy8aRPC/rfGy203IQXB+7Ys7Q6CGA7p+3Vk 6XNA== X-Gm-Message-State: AC+VfDxoshC+4dq0u4kVZT4acgqj6lDJYuna1aVbNTxAVWWeOOaTkzUQ 37NBs6AJm1uyp+SZ32LXCpxjuV6khKnoPH2lyc3A1A== X-Google-Smtp-Source: ACHHUZ5o+curCNGwadyHRfyuJGoR8Q+Y2lSWnmjqhliqrOVHbGZG9d/Y7iEc7lfewfryEG2c3SNw131g9ccPwD7PPH0= X-Received: by 2002:a17:907:708:b0:97b:956f:e6b5 with SMTP id xb8-20020a170907070800b0097b956fe6b5mr14291606ejb.23.1687368413654; Wed, 21 Jun 2023 10:26:53 -0700 (PDT) MIME-Version: 1.0 References: <20230621093009.637544-1-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Wed, 21 Jun 2023 10:26:17 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: fix double invalidate with exclusive loads To: Domenico Cerasuolo Cc: Andrew Morton , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Konrad Rzeszutek Wilk , Seth Jennings , Dan Streetman , Vitaly Wool , Johannes Weiner , Nhat Pham , Yu Zhao , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 97F92C0020 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 69t7k1gnruqozqyapybgztamkso3qrgq X-HE-Tag: 1687368415-435956 X-HE-Meta: U2FsdGVkX197/nRrkcmuvJZOq/7D+4qfR+tw78OuMJ4qL24FB+9OSKYvHDc5AhYif0cI1zbZL6nTUSWe0/VNOC+xLwIUIJa6iM7tI95DVkS8l/XU+SqsMhX+dvoqVEc7+SSC5/BRgW9T6ciLzvfiRMCdj3ZsN1R85+HSCoibS9gHFwDqjRbkGikgS8jYG5uvNzxIfT3ge4XxYfXYzL2CO3VMZ08CEdOsjUwI6AvTc7HaPxh+xptv9yxRcZVjGBFZX2KHGufOrP76DX8vUss1GSk17CSFip2fyP31YE1wNBVj434OrEJzTyeAMdfgSRdWs+4KodZo9RhIXrd6TQQdqqOe4UB4HcAkCwg7HKStDihJVkXh0zg3+PrMzVewwNPzW4GSAgCKM+q3UG/kfkXXxTLC7YqVJm/1ACY0UIZ9tfNfaBABEVMfT7fuP840B0s2xmDvZpucWiWW9rvkC2UmhPHa7QYJC9qA9t1Q2SMMsnCPTzjlJTzTrBRuUln9umvA0XRVVBI4PePwLtUpXHPXTaFr12scQfw7k7FsITB2jmvZyhxDe62mBZG38UzMRkJj9wHXPZg6Dz8MrqIfL7uSuhZXZWGn/ZfkGntSFYiecgZ8MBgVIX1CJpLpZiIhDtiPDPzmZnurizUFmqJQ+znxuN8oOIlSfb7+1ea9gfOQ8nb8zujPXPGdPjLdeyRgBp8TUV4sFoSpX9fLOU2HYj1YyDjKppmW+OPNevVWuuUJmevEk9GyFttda2V8FcvljLrH8nGI7N6B1cFj6CGpgnezX0hK7rgFR60NkBZUBaP5lUCKbS/pgrXRgT0f5Wm9UipTIfmPx4j7ep2eoDMwIxL3FBewTplSlD6RrafX/YOIuT9tVMtyRq4upwnRNw0OvOpPjhcTVTaqsrBcT5fDskWBwThnKPOjVuBhE+/qKmVwDgVyIXF0l1HLQGn+QpHXU3qYStfI4+UAx36iTokAFEy EVTJJnYc iVgBtfh4AWy9umsj75o45an3foRlrfB/Y4Qf07rB2cFhuT14kjLzeBvou6SjsW9fSry/7be8OBmkf6kKlReSqW4K5pigH79io6nzom0mRpI6ORXq40Q2UGpx6JZVubE/4WM0Pfxfvwq4HF0VDNh5Wrlr8oTNbRz9tEmGNCLIiPMS1i9yG82aoFLpMRCrG64SfO2R4kf0bVRraDZvuCEz0rwAXggtbj/KtG5bsGB5iBNYW8WQPwjNmgfEnYu5AsCyciEQzils+1qo0QhIeXXwwkdD+/dotjpWnGyRBmD0LLpWDVdJkjHUHlrXK1wNEqSMh0AS/bov7xWZVRt8ov+2SEpHPHh+VudtiLW2X X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 21, 2023 at 3:20=E2=80=AFAM Domenico Cerasuolo wrote: > > On Wed, Jun 21, 2023 at 11:30=E2=80=AFAM Yosry Ahmed wrote: > > > > If exclusive loads are enabled for zswap, we invalidate the entry befor= e > > returning from zswap_frontswap_load(), after dropping the local > > reference. However, the tree lock is dropped during decompression after > > the local reference is acquired, so the entry could be invalidated > > before we drop the local ref. If this happens, the entry is freed once > > we drop the local ref, and zswap_invalidate_entry() tries to invalidate > > an already freed entry. > > > > Fix this by: > > (a) Making sure zswap_invalidate_entry() is always called with a local > > ref held, to avoid being called on a freed entry. > > (b) Making sure zswap_invalidate_entry() only drops the ref if the entr= y > > was actually on the rbtree. Otherwise, another invalidation could > > have already happened, and the initial ref is already dropped. > > > > With these changes, there is no need to check that there is no need to > > make sure the entry still exists in the tree in zswap_reclaim_entry() > > before invalidating it, as zswap_reclaim_entry() will make this check > > internally. > > > > Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") > > Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > Signed-off-by: Yosry Ahmed > > --- > > mm/zswap.c | 21 ++++++++++++--------- > > 1 file changed, 12 insertions(+), 9 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index 87b204233115..62195f72bf56 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -355,12 +355,14 @@ static int zswap_rb_insert(struct rb_root *root, = struct zswap_entry *entry, > > return 0; > > } > > > > -static void zswap_rb_erase(struct rb_root *root, struct zswap_entry *e= ntry) > > +static bool zswap_rb_erase(struct rb_root *root, struct zswap_entry *e= ntry) > > { > > if (!RB_EMPTY_NODE(&entry->rbnode)) { > > rb_erase(&entry->rbnode, root); > > RB_CLEAR_NODE(&entry->rbnode); > > + return true; > > } > > + return false; > > } > > > > /* > > @@ -599,14 +601,16 @@ static struct zswap_pool *zswap_pool_find_get(cha= r *type, char *compressor) > > return NULL; > > } > > > > +/* > > + * If the entry is still valid in the tree, drop the initial ref and r= emove it > > + * from the tree. This function must be called with an additional ref = held, > > + * otherwise it may race with another invalidation freeing the entry. > > + */ > > On re-reading this comment there's one thing I'm not sure I get, do we > really need to hold an additional local ref to call this? As far as I > understood, once we check that the entry was in the tree before putting > the initial ref, there's no need for an additional local one. I believe it is, but please correct me if I am wrong. Consider the following scenario: // Initially refcount is at 1 CPU#1: CPU#2: spin_lock(tree_lock) zswap_entry_get() // 2 refs spin_unlock(tree_lock) spin_lock(tree_lock) zswap_invalidate_entry() // 1 r= ef spin_unlock(tree_lock) zswap_entry_put() // 0 refs zswap_invalidate_entry() // problem That last zswap_invalidate_entry() call in CPU#1 is problematic. The entry would have already been freed. If we check that the entry is on the tree by checking RB_EMPTY_NODE(&entry->rbnode), then we are reading already freed and potentially re-used memory. We would need to search the tree to make sure the same entry still exists in the tree (aka what zswap_reclaim_entry() currently does). This is not ideal in the fault path to have to do the lookups twice. Also, in zswap_reclaim_entry(), would it be possible if we call zswap_invalidate_entry() after we drop the local ref that the swap entry has been reused for a different page? I didn't look closely, but if yes, then the slab allocator may have repurposed the zswap_entry and we may find the entry in the tree for the same offset, even though it is referring to a different page now. This sounds practically unlikely but perhaps theoretically possible. I think it's more reliable to call zswap_invalidate_entry() on an entry that we know is valid before dropping the local ref. Especially that it's easy to do today by just moving a few lines around. > > > static void zswap_invalidate_entry(struct zswap_tree *tree, > > struct zswap_entry *entry) > > { > > - /* remove from rbtree */ > > - zswap_rb_erase(&tree->rbroot, entry); > > - > > - /* drop the initial reference from entry creation */ > > - zswap_entry_put(tree, entry); > > + if (zswap_rb_erase(&tree->rbroot, entry)) > > + zswap_entry_put(tree, entry); > > } > > > > static int zswap_reclaim_entry(struct zswap_pool *pool) > > @@ -659,8 +663,7 @@ static int zswap_reclaim_entry(struct zswap_pool *p= ool) > > * swapcache. Drop the entry from zswap - unless invalidate alr= eady > > * took it out while we had the tree->lock released for IO. > > */ > > - if (entry =3D=3D zswap_rb_search(&tree->rbroot, swpoffset)) > > - zswap_invalidate_entry(tree, entry); > > + zswap_invalidate_entry(tree, entry); > > > > put_unlock: > > /* Drop local reference */ > > @@ -1466,7 +1469,6 @@ static int zswap_frontswap_load(unsigned type, pg= off_t offset, > > count_objcg_event(entry->objcg, ZSWPIN); > > freeentry: > > spin_lock(&tree->lock); > > - zswap_entry_put(tree, entry); > > if (!ret && zswap_exclusive_loads_enabled) { > > zswap_invalidate_entry(tree, entry); > > *exclusive =3D true; > > @@ -1475,6 +1477,7 @@ static int zswap_frontswap_load(unsigned type, pg= off_t offset, > > list_move(&entry->lru, &entry->pool->lru); > > spin_unlock(&entry->pool->lru_lock); > > } > > + zswap_entry_put(tree, entry); > > spin_unlock(&tree->lock); > > > > return ret; > > -- > > 2.41.0.162.gfafddb0af9-goog > >