From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C316AC54E58 for ; Mon, 25 Mar 2024 08:39:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 57EE96B0095; Mon, 25 Mar 2024 04:39:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 52EF76B0096; Mon, 25 Mar 2024 04:39:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41FA06B0098; Mon, 25 Mar 2024 04:39:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 31E296B0095 for ; Mon, 25 Mar 2024 04:39:28 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id F24CF40A54 for ; Mon, 25 Mar 2024 08:39:27 +0000 (UTC) X-FDA: 81934912374.08.332BEC4 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf26.hostedemail.com (Postfix) with ESMTP id 1D65F140009 for ; Mon, 25 Mar 2024 08:39:25 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=V2SwSjga; spf=pass (imf26.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711355966; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y0kZP691y+X37CeRc5a1b81NRGe9/CqFEZLGS/IX9lU=; b=z6vAPkZp3YznMfbwcIEWj0MfDe9c/07OjJf9tbgB8Q/SHU+EKe3muZ8uMXXssKHOMyS5vQ HiRrwk15JtsdBP210jWyE3B59O2dLj3b4UVjO0xrZXaEvKkxyfsKFkKZsrIpwGWaG8wPLo yFE2kU1bpYpHCDK9pdQ0AwUfrepKbGQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711355966; a=rsa-sha256; cv=none; b=KfhwqKZmBbV+fcSceBuR7yvmtVhebzAZpkV/41gfHs7S5DaXlgQpU8cHtS5QTf2GUmaZoQ 3ZrpaQeuJOwYPuokJp4rS+8k4QAVJrYyvxT0zAeDQMXyojRRzCwfMCJMaa9m4u1BJMVTp4 yw04XfA9V8EJ06WOoHdkq5SVPTSMOYM= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=V2SwSjga; spf=pass (imf26.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a4a34516955so83819966b.0 for ; Mon, 25 Mar 2024 01:39:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711355964; x=1711960764; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Y0kZP691y+X37CeRc5a1b81NRGe9/CqFEZLGS/IX9lU=; b=V2SwSjgaffNfMx0akPSNl5xeCiBGC0PrnnGpX2bZtgbw4k8bM0ueSgSB+fcQUZMwP2 POW/0MVUfUKGbbUhW4NS9DTLDv0hyaxnPJmikViOQ4rCm0P3TPX+EJE3yJiKEPzfxgMq y2lqA0/R3H/Z2V9aSuZyM4VbgzTGEnWkHWGiKCTwyKCHrT4pSZQJNY80K+hPy5INXbQ3 DbrtTUDBv+z3W2mLEjUOf37hRXeuCvOXIae1qZZ37gbEpuRqIgUZcTjNEMRXOoscvLaO DAIFxroN9neCtZUojsMR/LzbwUShZc3CdIMDnngxCJo2kSQ8Q0KMepqAG5O8Y3mcECbY fS3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711355964; x=1711960764; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y0kZP691y+X37CeRc5a1b81NRGe9/CqFEZLGS/IX9lU=; b=xHyJiSHr76mDl4EB46HKCO6pTF4i47aKj9hReZ9tWF4Fce8tL8zN/MJ+zOc/DRjidv eyTyeKlv9jsAQtTEn21Rv94YkW8+3Ko7ulhqLBJluUOiH30HtLvhGBDAgUwnLCzVqmmq LXmNGmVLfSoe9ExN8sg2uJr5r3SAskwRAZH/oQbjCOAo0NPEkzlBj03dBt5MPqjzzmEf DpQe5wHAD2lf3NStUYhIyQj7XzijvVe7XV4Mi2Gyrnujs33XjErmu8hHwQXLtOzKKzRg s4z3sKLIKxaodN46AZRLp9Vts9yj7/o1NRIWLXvHTyZ5V6chf+ekrlw0bt/Z1KUJBWgV uNRA== X-Forwarded-Encrypted: i=1; AJvYcCUaiVFESCgvVeIvkdb80AOq0nclu7yuj7gGaBANwv0mowsgHjDNce0ef1xs/tajipFBpzRAq78clz+kvqBRARsuHPc= X-Gm-Message-State: AOJu0Yz+3UCXnefyukQXieyqptnp67t8qhxuM5StuYLXxAKznaYI7/uP 5fNI9fgVVdcEQszddGVLDM9fAldF/ywXdGxcY+9VVOcVxNF1yiWfAxTFWkwh6yWn+WvMv0qkNlr jKeVzLm1kDwUARgUCrgl41PNdu6FTp/Ee9snkqW7XbzKLMJ+tmg== X-Google-Smtp-Source: AGHT+IFgl4zrYBmL1mnbXRObDbHIPSKGVSR/6CfTP061t6LzL7AucWLMmH83VFZD1AhGOhS3BZxguvOWKzC2W1yUwAk= X-Received: by 2002:a17:906:c109:b0:a47:6bd4:cbd9 with SMTP id do9-20020a170906c10900b00a476bd4cbd9mr2433483ejc.52.1711355964219; Mon, 25 Mar 2024 01:39:24 -0700 (PDT) MIME-Version: 1.0 References: <20240324210447.956973-1-hannes@cmpxchg.org> <1e7ce417-b9dd-4d62-9f54-0adf1ccdae35@linux.dev> In-Reply-To: <1e7ce417-b9dd-4d62-9f54-0adf1ccdae35@linux.dev> From: Yosry Ahmed Date: Mon, 25 Mar 2024 01:38:47 -0700 Message-ID: Subject: Re: [PATCH] mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices To: Chengming Zhou Cc: Barry Song <21cnbao@gmail.com>, Johannes Weiner , Andrew Morton , Zhongkun He , Chengming Zhou , Chris Li , Nhat Pham , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kairui Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1D65F140009 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 453bowb98x4ckr536o7ugk4he4tatbmb X-HE-Tag: 1711355965-703317 X-HE-Meta: U2FsdGVkX18kUcUGcONTTFtDrBfaMd3PWGUE1na93Gy8K2JjjcodW/Jx9+W0jW47PHGv9vxcFwFLc/6PfsEDIWOadW7tkTTCzL0en1Gsab+Bus5DmAQe92AK/7h61p9xPavB9tiq9dVlT6adr/jEnWWd93gZkVGJP1r0i4csrTCXoVDuPukluYSmH1hckqeDVvB/YuHPcnhk3lqHR2z2IR4mK1iuIqfdzROP68dPvLV0LW7nfJTvSOYXl5INFHSez4M/AOGn4tDeoPpuQJQ0d/l2SSybWtOmohhw0+otiHiDTGnVk6Tqy/hMZMycsusHSQPGaV2WY0T9EYQn56VYeltq+pQPRBCIzEuQJQoxgaUihj+viFykSMrJCaKE0yfuoxrljZI2QTUUzUixbUTt28hX4VNQaIoAtlnK6pZGTibk0bw3W0fp/+vXMtDdteAeC0Q7J7tZA8pwphUm8eqz3mGOBrkTdOufh6BoAdsAkA+IQz4uR6llP8Gfybwt/GzSyEspA42NMkC1dK7/TLCmgpIiVxC+n9gmZczSk6IesQoTiD8z2RnWUit1iJWS8u6c7WeNtdxX74MHvresaeEJvIIL/0dat0/qPNFHwQMTHA/v6eKmwar4edbTRcFlfIEpLKUF/9Fbaf8YnltPLBDYk9/n2FocfGzK+B8H7JPNvFDX1/O1OmV0kV8wF52kslzmRI71p9R1axEQFcGyvVVIRQawBBqyMGFZK3epxoSYCOLzRLIFel+5rWwHMGlJJZPhyjVIcmmN4hpjSu2nG+/XFwPM3I8+YN37EpxRspyJOucUBzCJdTyhBw6n+ZIH1KxxQ46dUlQuvvCbSflwd9pwTU9MX7NuMLcNilPrWyrgusoj7UJP7+CRArnrLb/U+KDSUrqIxxcKpRcmzDOnJu9+Zdx8WrZriTpahuAV5BOzHquvYvkkRoFv97jtnO8O8CWFB2XOrTUet1KXKsUZ9RW M6n9DNnb 9TAoJH75oZPwnZ6mn5zvZObmdraAeNjhHh9muS7FU7pdxI0fpa4X4uSS9XJKT8O+U92sFWw78rxIbg+4C8d9+GMoZvSrPxC6YVfy1MLMC/UVt9pgUahbQE9kG8ftHV0F5S5JqkLM7JA7GeD9S+qoCYmBDsLoFms5CA0kaZfUogcSjAKt5k5/XHGuCRePNfi5DwJtzhoG8DJU58V57WJXKdvpQ+qBuKs82UszBK/f+8HYxJmK89bgSGGi+sRJkwdlDAx6Oa5tsqVLCm9l9tExvISwXVxO6BgCoYknS5M/MswWxB/TetuaIbvLjdWhuVibBmDxSw2/9PaRjv9pJkGncp2OL7wpsoAQo24O4L0b7iJMqgpm2DXYvO+cUtcPCiA+/tzOu15Kk4evL2+3H+dihj3KP4zaBi8fhl2YojuzGuBonW4uEoRvIn19IJXZpeDyAawZbo+/7bfmtE2S2sG7gZijq+OVSlaw499X3n6CQOEWb7/D8HdtFBYGCSw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 25, 2024 at 12:33=E2=80=AFAM Chengming Zhou wrote: > > On 2024/3/25 15:06, Yosry Ahmed wrote: > > On Sun, Mar 24, 2024 at 9:54=E2=80=AFPM Barry Song <21cnbao@gmail.com> = wrote: > >> > >> On Mon, Mar 25, 2024 at 10:23=E2=80=AFAM Yosry Ahmed wrote: > >>> > >>> On Sun, Mar 24, 2024 at 2:04=E2=80=AFPM Johannes Weiner wrote: > >>>> > >>>> Zhongkun He reports data corruption when combining zswap with zram. > >>>> > >>>> The issue is the exclusive loads we're doing in zswap. They assume > >>>> that all reads are going into the swapcache, which can assume > >>>> authoritative ownership of the data and so the zswap copy can go. > >>>> > >>>> However, zram files are marked SWP_SYNCHRONOUS_IO, and faults will t= ry > >>>> to bypass the swapcache. This results in an optimistic read of the > >>>> swap data into a page that will be dismissed if the fault fails due = to > >>>> races. In this case, zswap mustn't drop its authoritative copy. > >>>> > >>>> Link: https://lore.kernel.org/all/CACSyD1N+dUvsu8=3DzV9P691B9bVq33er= wOXNTmEaUbi9DrDeJzw@mail.gmail.com/ > >>>> Reported-by: Zhongkun He > >>>> Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") > >>>> Cc: stable@vger.kernel.org [6.5+] > >>>> Signed-off-by: Johannes Weiner > >>>> Tested-by: Zhongkun He > >> > >> Acked-by: Barry Song > >> > >>> > >>> Do we also want to mention somewhere (commit log or comment) that > >>> keeping the entry in the tree is fine because we are still protected > >>> from concurrent loads/invalidations/writeback by swapcache_prepare() > >>> setting SWAP_HAS_CACHE or so? > >> > >> It seems that Kairui's patch comprehensively addresses the issue at ha= nd. > >> Johannes's solution, on the other hand, appears to align zswap behavio= r > >> more closely with that of a traditional swap device, only releasing an= entry > >> when the corresponding swap slot is freed, particularly in the sync-io= case. > > > > It actually worked out quite well that Kairui's fix landed shortly > > before this bug was reported, as this fix wouldn't have been possible > > without it as far as I can tell. > > > >> > >> Johannes' patch has inspired me to consider whether zRAM could achieve > >> a comparable outcome by immediately releasing objects in swap cache > >> scenarios. When I have the opportunity, I plan to experiment with zRA= M. > > > > That would be interesting. I am curious if it would be as > > straightforward in zram to just mark the folio as dirty in this case > > like zswap does, given its implementation as a block device. > > > > This makes me wonder who is responsible for marking folio dirty in this s= wapcache > bypass case? Should we call folio_mark_dirty() after the swap_read_folio(= )? In shrink_folio_list(), we try to add anonymous folios to the swapcache if they are not there before checking if they are dirty. add_to_swap() calls folio_mark_dirty(), so this should take care of it. There is an interesting comment there though. It says that PTE should be dirty, so unmapping the folio should have already marked it as dirty by the time we are adding it to the swapcache, except for the MADV_FREE case. However, I think we actually unmap the folio after we add it to the swapcache in shrink_folio_list(). Also, I don't immediately see why the PTE would be dirty. In do_swap_page(), making the PTE dirty seems to be conditional on the fault being a write fault, but I didn't look thoroughly, maybe I missed it. It is also possible that the comment is just outdated.