From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A0FF5C48260 for ; Tue, 6 Feb 2024 02:23:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB0836B0071; Mon, 5 Feb 2024 21:23:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B60AC6B0072; Mon, 5 Feb 2024 21:23:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A28BE6B0075; Mon, 5 Feb 2024 21:23:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8F27D6B0071 for ; Mon, 5 Feb 2024 21:23:42 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 101AF140757 for ; Tue, 6 Feb 2024 02:23:42 +0000 (UTC) X-FDA: 81759783084.11.6B76681 Received: from out-172.mta1.migadu.com (out-172.mta1.migadu.com [95.215.58.172]) by imf07.hostedemail.com (Postfix) with ESMTP id 33FAB40006 for ; Tue, 6 Feb 2024 02:23:39 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="oQMdyuN/"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf07.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707186220; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AWOM6rLUSXhQ0cpOJnsOOSLPrLEastBq1peLqBFRdbk=; b=W7QH63OryZguEON4JdZy8chXPxsiT+wnOlziUr/1P0xBL3tEzT4nnPjz+tErs//r7t8i8D PjoZJ8Wct0Qf3DY0Tf7QYow5QzjzEmGWAk+JzC4I4iWOUwRZ60r6ffuDlZhFMi4rOdPrzJ RNmms+DZVAqrsuCJjgAKJb0IOfuxegI= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="oQMdyuN/"; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf07.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.172 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707186220; a=rsa-sha256; cv=none; b=OhuighxGMJ4yEsB0n9UioqyuOOjri/r4hwznD0buHNgN72xUmwSiTBpm6vE/rj8Hk92Tqk ClQtz4W7Gc5eNIAQBMN46akRb6vvkQNwbgZqq9IaffuYE8Bf0R0HLl8wVxuKK6H0V06paD Dx0DqF01EFoHdWnDfOfoML/GDL7hUdA= Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1707186218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AWOM6rLUSXhQ0cpOJnsOOSLPrLEastBq1peLqBFRdbk=; b=oQMdyuN/MixUuBdNNyhcwctxZbyzMcoHleds5u5dMpRJkGarpvFY5ggVPo+GF6xMik8KVO IAff2ZUfIXnO16yQIQ2QG1F6abv1EKmbClOUpSMR83UmSE6aU3SeK542UGmU7z3HzwvKcO wsntrKRHGjhPc03SjHBAExUU/glkdAk= Date: Tue, 6 Feb 2024 10:23:33 +0800 MIME-Version: 1.0 Subject: Re: [PATCH] mm/zswap: invalidate old entry when store fail or !zswap_enabled Content-Language: en-US To: Yosry Ahmed Cc: hannes@cmpxchg.org, nphamcs@gmail.com, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chengming Zhou References: <20240204083411.3762683-1-chengming.zhou@linux.dev> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 33FAB40006 X-Stat-Signature: ntfi9jft1znqbxrgq6dpewwgfn76ee4g X-Rspam-User: X-HE-Tag: 1707186219-966610 X-HE-Meta: U2FsdGVkX18szj7jP7iMhD8P+r0mF36YwXWerL/l/e/lL1Vw8BFwcGkU+LRtxioPTmYmgnwh3LN7JBEI9InzeCx9UgPSp3wBVV/wINkKxwg/T9Wwt88aEYmOgFaLnE1vNCkYLuttIAGgqa3JLwsokMof6npnCGr7psBiu2FrWTmwLJwfVGbVtyrTfWA1UvmvRngFWk9QPzmovZpiu++vbVMwdrQhL5KrCIaOoVXla8FOoJwkvmfGRTC8LcIUjweZK7Ormb7KkyvGUA0Z7M7nr1Ka/4OLZU2h25h/q4Lc6WXlqc3XLgZo8nsqchmWKPWCV+2r+3Y2KQWtdZRwqymXolPEVz5fwOeE3NMxjnkM5aqee2A+R0nQjVG8oviHNCKA9bEretekwof0wJf4vBFTVj99ClqiBzSCazsY1pgA+VdzafnbsHxmGyAUO1bh+ZjXuJIV/4/b3YDPJLpVXr9hNkVhvwXfcpql1q41YuIa4KTGjFJ9dKx+QxU3qSRfYz0U2ZITXVZ2av5mUIGQAHQombV7pl9KNVM2wWOo7v9zMKQarjHely7s3B1qmQw4Hb/VuND+D1OCwf6yriNT3v5lfeYejpHzW0fHewM1aiKIllOc7xoPdhspCqwY8JNTu552O3cyo5IHMJScSW3Jws4s+MnO/6mwsjA7pr2uGJQ9MfdR6fzobtzzkiJg77boSTje0w35FhrkGq8ws+mDapNxwXrum5hP1zVFfxijbdCss3ihSWK/QaptR8fSDtovNLpiGNvOeaFAmhBfYst0wwtWCUhkWlFc+PP/uDiQpcKeUUD+fX8CnII44T+qBHdj66Pwe6jLd0nH9QovDxcPF5Mkjo3DaXq/vbw2YBb2QfA8g3b8/x2RQVvo3CiaXZjkwDd3uU0SktikWDw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/2/6 06:55, Yosry Ahmed wrote: > On Sun, Feb 04, 2024 at 08:34:11AM +0000, chengming.zhou@linux.dev wrote: >> From: Chengming Zhou >> >> We may encounter duplicate entry in the zswap_store(): >> >> 1. swap slot that freed to per-cpu swap cache, doesn't invalidate >> the zswap entry, then got reused. This has been fixed. >> >> 2. !exclusive load mode, swapin folio will leave its zswap entry >> on the tree, then swapout again. This has been removed. >> >> 3. one folio can be dirtied again after zswap_store(), so need to >> zswap_store() again. This should be handled correctly. >> >> So we must invalidate the old duplicate entry before insert the >> new one, which actually doesn't have to be done at the beginning >> of zswap_store(). And this is a normal situation, we shouldn't >> WARN_ON(1) in this case, so delete it. (The WARN_ON(1) seems want >> to detect swap entry UAF problem? But not very necessary here.) >> >> The good point is that we don't need to lock tree twice in the >> store success path. >> >> Note we still need to invalidate the old duplicate entry in the >> store failure path, otherwise the new data in swapfile could be >> overwrite by the old data in zswap pool when lru writeback. > > I think this may have been introduced by 42c06a0e8ebe ("mm: kill > frontswap"). Frontswap used to check if the page was present in > frontswap and invalidate it before calling into zswap, so it would > invalidate a previously stored page when it is dirtied and swapped out > again, even if zswap is disabled. > > Johannes, does this sound correct to you? If yes, I think we need a > proper Fixes tag and a stable backport as this may cause data > corruption. I haven't looked into that commit. If this is true, will add: Fixes: 42c06a0e8ebe ("mm: kill frontswap") > >> >> We have to do this even when !zswap_enabled since zswap can be >> disabled anytime. If the folio store success before, then got >> dirtied again but zswap disabled, we won't invalidate the old >> duplicate entry in the zswap_store(). So later lru writeback >> may overwrite the new data in swapfile. >> >> This fix is not good, since we have to grab lock to check everytime >> even when zswap is disabled, but it's simple. > > Frontswap had a bitmap that we can query locklessly to find out if there > is an outdated stored page. I think we can overcome this with the > xarray, we can do a lockless lookup first, and only take the lock if > there is an outdated entry to remove. Yes, agree! We can lockless lookup once xarray lands in. > > Meanwhile I am not sure if acquiring the lock on every swapout even with > zswap disabled is acceptable, but I think it's the simplest fix for now, > unless we revive the bitmap. Yeah, it's simple. I think bitmap is not needed if we will use xarray. > >> >> Signed-off-by: Chengming Zhou >> --- >> mm/zswap.c | 33 +++++++++++++++------------------ >> 1 file changed, 15 insertions(+), 18 deletions(-) >> >> diff --git a/mm/zswap.c b/mm/zswap.c >> index cd67f7f6b302..0b7599f4116d 100644 >> --- a/mm/zswap.c >> +++ b/mm/zswap.c >> @@ -1518,18 +1518,8 @@ bool zswap_store(struct folio *folio) >> return false; >> >> if (!zswap_enabled) >> - return false; >> + goto check_old; >> >> - /* >> - * If this is a duplicate, it must be removed before attempting to store >> - * it, otherwise, if the store fails the old page won't be removed from >> - * the tree, and it might be written back overriding the new data. >> - */ >> - spin_lock(&tree->lock); >> - entry = zswap_rb_search(&tree->rbroot, offset); >> - if (entry) >> - zswap_invalidate_entry(tree, entry); >> - spin_unlock(&tree->lock); >> objcg = get_obj_cgroup_from_folio(folio); >> if (objcg && !obj_cgroup_may_zswap(objcg)) { >> memcg = get_mem_cgroup_from_objcg(objcg); >> @@ -1608,15 +1598,11 @@ bool zswap_store(struct folio *folio) >> /* map */ >> spin_lock(&tree->lock); >> /* >> - * A duplicate entry should have been removed at the beginning of this >> - * function. Since the swap entry should be pinned, if a duplicate is >> - * found again here it means that something went wrong in the swap >> - * cache. >> + * The folio could be dirtied again, invalidate the possible old entry >> + * before insert this new entry. >> */ >> - while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { >> - WARN_ON(1); >> + while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) >> zswap_invalidate_entry(tree, dupentry); >> - } > > I always thought the loop here was confusing. We are holding the lock, > so it should be guaranteed that if we get -EEXIST once and invalidate > it, we won't find it the next time around. Ah, right, this is obvious. > > This should really be a cmpxchg operation, which is simple with the > xarray. We can probably do the same with the rbtree, but perhaps it's > not worth it if the xarray change is coming soon. > > For now, I think an if condition is clearer: > > if (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) { > zswap_invalidate_entry(tree, dupentry); > /* Must succeed, we just removed the dup under the lock */ > WARN_ON(zswap_rb_insert(&tree->rbroot, entry, &dupentry)); > } This is clearer, will change to this version. Thanks! > >> if (entry->length) { >> INIT_LIST_HEAD(&entry->lru); >> zswap_lru_add(&entry->pool->list_lru, entry); >> @@ -1638,6 +1624,17 @@ bool zswap_store(struct folio *folio) >> reject: >> if (objcg) >> obj_cgroup_put(objcg); >> +check_old: >> + /* >> + * If zswap store fail or zswap disabled, we must invalidate possible >> + * old entry which previously stored by this folio. Otherwise, later >> + * writeback could overwrite the new data in swapfile. >> + */ >> + spin_lock(&tree->lock); >> + entry = zswap_rb_search(&tree->rbroot, offset); >> + if (entry) >> + zswap_invalidate_entry(tree, entry); >> + spin_unlock(&tree->lock); >> return false; >> >> shrink: >> -- >> 2.40.1 >>