From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B19DFC54E64 for ; Mon, 25 Mar 2024 09:40:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 33F6B6B0088; Mon, 25 Mar 2024 05:40:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 316D46B0089; Mon, 25 Mar 2024 05:40:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B8136B008A; Mon, 25 Mar 2024 05:40:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 081256B0088 for ; Mon, 25 Mar 2024 05:40:50 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A24A0140A31 for ; Mon, 25 Mar 2024 09:40:49 +0000 (UTC) X-FDA: 81935067018.20.CE85E93 Received: from mail-ed1-f41.google.com (mail-ed1-f41.google.com [209.85.208.41]) by imf23.hostedemail.com (Postfix) with ESMTP id B3BED140010 for ; Mon, 25 Mar 2024 09:40:47 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RrMNeKEz; spf=pass (imf23.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711359647; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pWb0tkOtTuLiN7G14PNBlPOOupfOoAoEKqcNpPczYPU=; b=Fk5dxQw4V6F6RlDkmKZrd93SIftFe0K/wrvXEYAT7q1TK0f6tI9cTXbJSekwcjPAlZp3Fs W0nkXpLqIGRpGdg3+0HXkBrQaLUDnZFzo5f4u9CCX+/41Yf9AWhTTUWo1KSiO+HgAZ+2UQ Xdwbv4CjEHqzCXcEK7te6voeeusTs7c= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711359647; a=rsa-sha256; cv=none; b=IUb7bh6IAIZHuHUsClHq7kRZPxzX/jIh4WTLE2M67W0mktKTQNuKwnXzbdBI9K2d8EQXhE 6fT1jxtFYG2E5JIQX0uNOcz/MIjiwhAMfqb3us1rKpLa9tuOOKny6K1EUwg+iXEbt8+k7z MO77Jxj+uUQCy+1fUaSPFRVbqNxLk+4= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=RrMNeKEz; spf=pass (imf23.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.41 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ed1-f41.google.com with SMTP id 4fb4d7f45d1cf-56bc753f58eso4944869a12.3 for ; Mon, 25 Mar 2024 02:40:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711359646; x=1711964446; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pWb0tkOtTuLiN7G14PNBlPOOupfOoAoEKqcNpPczYPU=; b=RrMNeKEzlb8iM6Bn87RbDJd8b42PEe77647KMmalNdSPgpMmduwkh/pY+hsUJc/daB 2WpzV8ckKubhcL8TSW4KRHHF4/FRaWgtGPx8TmE55MHsXV08zqj0Z3uQAqbd6uhvHSyz JOxW3If00LshsOADotJMXKB9M8v5Emj4x0zafI9OTA2k9sjdSm3vM09/cld8jdskdIEn UHKaRAsJBcbDt8LRJhm+yaTaL3/FHq/tyb4uj7Z+Ys+sFjySPVoUMwE1jP51ykkGdbXE Q6YUVJD/A1Mn/TcktI1hsID/oCNjxQMwFQcnq85kMwOCOntbUvwi7WGgl/XQn0NsGLsL U3jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711359646; x=1711964446; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pWb0tkOtTuLiN7G14PNBlPOOupfOoAoEKqcNpPczYPU=; b=XA1qh/xkvDb6pNcjTLF7YyRPt03xSWBA6zYRF6r/4cPb7bdaM0rWFf/lWVyGE5Q6yk fFwRReY/X7T1Cm9m+/C701sNQX1rVEDlGVSaGlFzsIvPaSrIY2lq4wZM2u22lFqlcK9i MdqDY5F/c9KlgsbtFK8WzH7u1XrXmWnuCEqu7f99irRedDb8nXrN7Nep8K17OlfX0cFH KSXW8/DKlihjBVLsQcaezdmlWwZLRnIuqYYnxA7hvdtQMgGVNdx2XdUHi6iFYwUL/oyF TnibMO9ioik3Z48ujFcxr1gO1ZEd5ud9b1ZPqvuB+vOBlgVsr1oMQRRSnvNbt7lUqJBj 0thA== X-Forwarded-Encrypted: i=1; AJvYcCWt9W/a8h2Tdz664n8v+kgM94JftxBgOfJ2mnccCl0ZQJswRlGzIVrAFZeelhltrljcgVMi6krPLufGNdXL12qfsDo= X-Gm-Message-State: AOJu0YzCqvHsx/pUOTzC/YjXX7Cn7+DAf8vWLxojJ6uO5yUvCZM2GFjn 90UI6NNhMUSk8Gn3oBnqGlutmebs+8Qiaq0FKy8a1I/sbwTsibjAahb3x67PYOwnQOd1bBzU9UN irsQh8EzjrVkvIqsHpUp47yzPtvWtn8Bllbuk X-Google-Smtp-Source: AGHT+IEtlmX9IKU2k8X3a1hYN0MJCPSuOw+XaYGopOCUJ55JbRETnjldqTB1OByt1d/JtBkryLPJzQYYGn9eqeZ6m1k= X-Received: by 2002:a17:906:dfcf:b0:a46:9ae2:1927 with SMTP id jt15-20020a170906dfcf00b00a469ae21927mr4464655ejc.67.1711359645679; Mon, 25 Mar 2024 02:40:45 -0700 (PDT) MIME-Version: 1.0 References: <20240324210447.956973-1-hannes@cmpxchg.org> <1e7ce417-b9dd-4d62-9f54-0adf1ccdae35@linux.dev> In-Reply-To: From: Yosry Ahmed Date: Mon, 25 Mar 2024 02:40:09 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices To: Chengming Zhou Cc: Barry Song <21cnbao@gmail.com>, Johannes Weiner , Andrew Morton , Zhongkun He , Chengming Zhou , Chris Li , Nhat Pham , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kairui Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: fsdmwn1i647frpygznjezzubfyfaimw5 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B3BED140010 X-Rspam-User: X-HE-Tag: 1711359647-717302 X-HE-Meta: U2FsdGVkX190BweT9zuhsyb3Gplw4c58rnMo4fDxKcXbpeUoKT5iEYxN0B/wThmy+lVC6ZEAcWalo6K9ymj18lO399TuRRkwbsEAQcUFYBjacINeIqUNzuXg12LGIFu0TCQYMtm3coDzM71bbrW+OtThA+v041ZGC+xa0ajK56iqtR8GX5KfEAhbevAbnlT9M9cUL5T9V/eSkTf5btj10lVVoS2wzeIWXfc9oSNiVZR6ti1dN+BUomIWuBwyM5LXfa6zVEHYOrXka006jmlfABCKvmlnK6mceFPYOf1BlYreXlcn5ISesSOINn4FWoSecJPpI3ejmlBOaBpmZC7QpUwXfWGeQVsOny1tl2DiW3ZhZeVsOlj18ODPS393gjSImsL+OP3jufEe+T/vALdhZuSV87rRW5l0kuMl+nQz6Or9grDLkayeKLlj1P1YDw1h5hANSbWW8d97rIr/H/ew7TYZX35lehiNZ8iSm0j4L5x+5vzpoTfOd7QAERKplvCZwp7LAcEYjAOEMcmA3X4j1ydbDcrTTFjneXs/16ddm8toyRutsO/ZvBc5Ii27QH5r1CI+VNsiEtplTIOFwbnISGoTeKfGU/hmfj7jo51rEXdfZZlX0q0LZQKUYMWkihggsv2XIPfZdhkfzXk9H5GFsSu2DCTn/RVX9Z2ydJ5xamkWjft6fWSk+VWvdFfuUqHLty2dgCIMtXNWor9WatgMpa2HuSYQ+BwevR2mHA/AnbyTqXgedRQHXb2r5VnPZajLes/LBgv/8sBeDpN0w3lIbAlHIKUhvkgXX7ICzC5Tsy+gscFhD6yDSj7kSm4Fbz0P6uBXTg6Qp1YG3uCdtha3D6SwQfM3Elu4E/fo2VbN5oK0p/kWHZTW5b4F+pvmhTmkEFnCHd4tpJXHE3gfsW/kgRpk3z0qBtSZ+dLUj5evjUdMvfHn1EKX7kOXOcZxAs/etjiORwV/wPaM8TsmPaO 88OG2t5p zB2qcC3XZVfcrOrrnM8wmf1l5QLmU1RFfr8mtyjLPfv58mJI2+IU2SwcMkpEsfwcczJSjaN/YOXuWENRT1o3uG1K2HDp/z6lW0LevP0z1wpCdda8ckJhK6SfWIFHLaGvJ83OM3nXQGsOKKQcwohrT8WpOqcHbho5rsEcQQE2Pc71oK2+HIFiXhPMcHMi+BzGYFVhy82tU1tmjDUE4riFHVIcSF4N1zrS8l9vKeZNtqA4u0aM1Bn2BB/ha8nocyUXrHAm2nUn/tScIpAssWrT1JYYDBVxnt3II5qoQjRZFA8d2DBqWHMFiUdogE9BNpT2HkPTu11Fkm9LbZj2WA7YdnZ0NYavkJRJFFBzEG03tbiZE1PwqzLdIG8Q7zMrXjGULk9MiGvV2RO784HFlp6jPpccwQKohVAaFvaqfG773FUGS4hPpp1umRa3j+RvgbkvHyrNNdlXos+KekuFkyl6+Ct8Qjp4oqnkROxmttTqk9WtnrbfsQ0IiScAMpA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 25, 2024 at 2:22=E2=80=AFAM Chengming Zhou wrote: > > On 2024/3/25 16:38, Yosry Ahmed wrote: > > On Mon, Mar 25, 2024 at 12:33=E2=80=AFAM Chengming Zhou > > wrote: > >> > >> On 2024/3/25 15:06, Yosry Ahmed wrote: > >>> On Sun, Mar 24, 2024 at 9:54=E2=80=AFPM Barry Song <21cnbao@gmail.com= > wrote: > >>>> > >>>> On Mon, Mar 25, 2024 at 10:23=E2=80=AFAM Yosry Ahmed wrote: > >>>>> > >>>>> On Sun, Mar 24, 2024 at 2:04=E2=80=AFPM Johannes Weiner wrote: > >>>>>> > >>>>>> Zhongkun He reports data corruption when combining zswap with zram= . > >>>>>> > >>>>>> The issue is the exclusive loads we're doing in zswap. They assume > >>>>>> that all reads are going into the swapcache, which can assume > >>>>>> authoritative ownership of the data and so the zswap copy can go. > >>>>>> > >>>>>> However, zram files are marked SWP_SYNCHRONOUS_IO, and faults will= try > >>>>>> to bypass the swapcache. This results in an optimistic read of the > >>>>>> swap data into a page that will be dismissed if the fault fails du= e to > >>>>>> races. In this case, zswap mustn't drop its authoritative copy. > >>>>>> > >>>>>> Link: https://lore.kernel.org/all/CACSyD1N+dUvsu8=3DzV9P691B9bVq33= erwOXNTmEaUbi9DrDeJzw@mail.gmail.com/ > >>>>>> Reported-by: Zhongkun He > >>>>>> Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") > >>>>>> Cc: stable@vger.kernel.org [6.5+] > >>>>>> Signed-off-by: Johannes Weiner > >>>>>> Tested-by: Zhongkun He > >>>> > >>>> Acked-by: Barry Song > >>>> > >>>>> > >>>>> Do we also want to mention somewhere (commit log or comment) that > >>>>> keeping the entry in the tree is fine because we are still protecte= d > >>>>> from concurrent loads/invalidations/writeback by swapcache_prepare(= ) > >>>>> setting SWAP_HAS_CACHE or so? > >>>> > >>>> It seems that Kairui's patch comprehensively addresses the issue at = hand. > >>>> Johannes's solution, on the other hand, appears to align zswap behav= ior > >>>> more closely with that of a traditional swap device, only releasing = an entry > >>>> when the corresponding swap slot is freed, particularly in the sync-= io case. > >>> > >>> It actually worked out quite well that Kairui's fix landed shortly > >>> before this bug was reported, as this fix wouldn't have been possible > >>> without it as far as I can tell. > >>> > >>>> > >>>> Johannes' patch has inspired me to consider whether zRAM could achie= ve > >>>> a comparable outcome by immediately releasing objects in swap cache > >>>> scenarios. When I have the opportunity, I plan to experiment with z= RAM. > >>> > >>> That would be interesting. I am curious if it would be as > >>> straightforward in zram to just mark the folio as dirty in this case > >>> like zswap does, given its implementation as a block device. > >>> > >> > >> This makes me wonder who is responsible for marking folio dirty in thi= s swapcache > >> bypass case? Should we call folio_mark_dirty() after the swap_read_fol= io()? > > > > In shrink_folio_list(), we try to add anonymous folios to the > > swapcache if they are not there before checking if they are dirty. > > add_to_swap() calls folio_mark_dirty(), so this should take care of > > Right, thanks for your clarification, so should be no problem here. > Although it was a fix just for MADV_FREE case. > > > it. There is an interesting comment there though. It says that PTE > > should be dirty, so unmapping the folio should have already marked it > > as dirty by the time we are adding it to the swapcache, except for the > > MADV_FREE case. > > It seems to say the folio will be dirtied when unmap later, supposing the > PTE is dirty. Oh yeah it could mean that the folio will be dirted later. > > > > > However, I think we actually unmap the folio after we add it to the > > swapcache in shrink_folio_list(). Also, I don't immediately see why > > the PTE would be dirty. In do_swap_page(), making the PTE dirty seems > > If all anon pages on LRU list are faulted by write, it should be true. > We could just use the zero page if faulted by read, right? This applies for the initial fault that creates the folio, but this is a swap fault. It could be a read fault and in that case we still need to make the folio dirty because it's not in the swapcache and we need to write it out if it's reclaimed, right? > > > to be conditional on the fault being a write fault, but I didn't look > > thoroughly, maybe I missed it. It is also possible that the comment is > > just outdated. > > Yeah, dirty is only marked on write fault. > > Thanks.