From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98F9DC54E67 for ; Wed, 20 Mar 2024 20:03:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 17EE66B007B; Wed, 20 Mar 2024 16:03:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 107976B0087; Wed, 20 Mar 2024 16:03:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9D146B0088; Wed, 20 Mar 2024 16:03:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D30D86B007B for ; Wed, 20 Mar 2024 16:03:27 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 9B6FE1A0E2E for ; Wed, 20 Mar 2024 20:03:27 +0000 (UTC) X-FDA: 81918492054.19.A6985DA Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) by imf06.hostedemail.com (Postfix) with ESMTP id 78CD0180008 for ; Wed, 20 Mar 2024 20:03:25 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=RNtktAS3; spf=pass (imf06.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.170 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710965005; a=rsa-sha256; cv=none; b=rAA6AsBkTMTTAyqGGYvo2e7FRInZS2aNqk1qu/wt5oGfGD2BG5Og1gVep8wupVlC2lnT6T J23+f11jRW6+Osqf8YKhjILo9VaS3UR4l+mhjDdkKfoPBzl3vNk59G4ynGyMRLAQUDqc+W q++6WrvefOC4ariih9PYGhjronbrn4o= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=RNtktAS3; spf=pass (imf06.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.170 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710965005; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0jdcNltAU6TVN3UM3CUikRiO1j6yIq28ebD3IaLAC6s=; b=Hb4ooY+X8wYAuBEi5uze4n2c8qPLpaOUYr5ugxk8gnxS92ezaZmf2h8TbNxzyaj5ZP6sdC cyYwjxfBcUIWaq4JTNNkbxeVPHMj3VGJympVTulmtC0t9VIEfz0RlSeO8V3K3Ef5TF0yGT dQtT2/XJip5l7BWkWVTF1ExBs0ERImc= Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-789e6f7f748so14712885a.3 for ; Wed, 20 Mar 2024 13:03:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1710965004; x=1711569804; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=0jdcNltAU6TVN3UM3CUikRiO1j6yIq28ebD3IaLAC6s=; b=RNtktAS34OQBNNpivzedLV3BxKzMUDZP08SDTODMM6K98i3FY612WwJST/D0+8Q2ze DwD142ssV0YW++IoqoWzEx2BfCp43J/8o844bAt8arX0Linh9YQKwmrIPwzCwHuqSGcw 7bWyclOyNzmH5H2LnhliOhtKnQ8R/lg9vQrOZaTZhXq6QwDs+eCGwg0YjPqYVl+bFhnw ZdXQrK6gvUlAEAxFrFLV8cPQd1JRfLm+cBsGji5T5f2McSYtgDVl2B1F07R2OGs1Qvxx JD7WpXIjlD5AgtrBvgwUDEKFpSjTW65lWJCOfDHPgugQeUF91ypCYbh81fzsM100i1YZ /hNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710965004; x=1711569804; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0jdcNltAU6TVN3UM3CUikRiO1j6yIq28ebD3IaLAC6s=; b=d82WK4+KhTTB3g8IcxLtP/Dm4PzERLUzYYXYePModqbTdHbY/2id9zPHbIvq/3pUC4 uhRX/szZvwcoALWCXTxIHe67Ns61QleynZXnKNBQ7VXACj9WJF9w8vc45lpG18o5Rs9q NtL5yptwn5VpiZpL/F4xwzhtTgkIBTGWzNfR/b5XMB+51Y7V04O4dRvq1yit52DqX9gt uKLkKvhZ/HCFnNXERFpabnly1AAlEBXYrK/DozgUQWHMDKLyBqMCV4YMUs0Sp6uXZWlI 1+PUWCKRtbC6xC1yXFF2YeqtSEq0FBAVFAbFW5euaVN/OPt3IXX/Qdi5qtw+Cozd6ccH LnMg== X-Forwarded-Encrypted: i=1; AJvYcCVWk+nyEnbt2XuqjwaE/3amkPwqoT713GYqdEDKrJiiLPyrnXS1z8aWn4jeYvztiAg0X8sddHK5k98OIyqeowoWxag= X-Gm-Message-State: AOJu0Ywn6tkewVv09P3WOAh+PLvr69AZPqtDUbZNOxVSdtrxvVIRK07g 4rZ+KkhmefsB6/qKG91Zx8d/fFF2X6ZEVpd3NanfgX2oseH8hTWxqdaQGs3o8R0= X-Google-Smtp-Source: AGHT+IEQIaO99JJUIT0HNOJj4jV69AM8LJMl/9ZZ0APTkan3Axv1r+s/gu+wRKVFtxl3NxM8/24MLg== X-Received: by 2002:a05:620a:c45:b0:789:f333:d0da with SMTP id u5-20020a05620a0c4500b00789f333d0damr15335940qki.20.1710965004349; Wed, 20 Mar 2024 13:03:24 -0700 (PDT) Received: from localhost (2603-7000-0c01-2716-da5e-d3ff-fee7-26e7.res6.spectrum.com. [2603:7000:c01:2716:da5e:d3ff:fee7:26e7]) by smtp.gmail.com with ESMTPSA id t26-20020a05620a035a00b00789fb5397d4sm3587371qkm.100.2024.03.20.13.03.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Mar 2024 13:03:23 -0700 (PDT) Date: Wed, 20 Mar 2024 16:03:22 -0400 From: Johannes Weiner To: Yosry Ahmed Cc: Chris Li , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Nhat Pham , "Matthew Wilcox (Oracle)" , Chengming Zhou , Barry Song Subject: Re: [PATCH v7] zswap: replace RB tree with xarray Message-ID: <20240320200322.GG294822@cmpxchg.org> References: <20240319-zswap-xarray-v7-1-e9a03a049e86@kernel.org> <20240320100803.GB294822@cmpxchg.org> <20240320192558.GF294822@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 78CD0180008 X-Stat-Signature: 86fdrpgq85cpeyuiccskwgfct5bbn1y9 X-Rspam-User: X-HE-Tag: 1710965005-761852 X-HE-Meta: U2FsdGVkX1+oxvxGo/FLg+/RoLavByK3RJehLS1qE4rsek0MooUAKSHtFSSp/1+JFOLsZUZ2lPG2LXPyJe8cbRj+mdTJILbJNJuVbyaY1azT4q3+97cKSN0CI0IfBq0MKi1ggIcKwRtEoO/1vrbtn6mMOBe5imIMmLQILsJgEwWnPcnh+bxMjpjY28UbMOFtOUPbbcUdDNxqVAjiA58UgqqYnpzH/MMzY9DS4MGSE32x63XsafutEKSc0ha0ujkdOKTaQnLzx+QeIx67Vfx2AcEWAtz/ksbJRfJEmjiqpF4I2eLt2Ksj2YK/fD8MGwqmmRCpEUKwOxZeeE/W239gi9a5SdN1m+DoVGGdr2dbxGVWTt1UfhfAfD96EWvAY50mmpdTHcIsR0YS5UJSvvZgU0ZdEYk8icNbkiVW23efg218KnPQ9dGBKkYrJYEVUx4K1zln2fDitrEuKzlgJ8gdHsKOiswX3OGggoruTu4XgbE7XKwbeIpcIxzSR+jC92m3l01wuA1/qYljizBEfwLXmZ5KIst9uAZi+1WMAURFcPQxJJeNNFMM7oX3nJqii+1lPAYkBhSR8/T9zJMg4lDoQ9uoKe9FtJ2swXH4iic/SmwFHYiDahAEPsB4gdb9iK/zqEZK37+GTqjIsxJQxVsNzMU137B2xCYMl8iOnkGxXm4WJYDMy0aIgflIGnbnTAxezVmF4665RDQ6avEtlR9DGofz1XahZ4R6QXVl0IyqlKicJPF5HffAvlPKc49AHRjWOJPBK/WqqdwgXxC7+3RH7SHY6cTuMRB/QX8QiQfo9O+7i86DUqCMrDa61Bizg4pP5mpBJ1qFFoNlA0ua4MvSRGUuPzjfQZ1bT7/Wqw3teCS8psKODViQ87rjeHUm1/xuMbL5IuKY7uHx5s+oxlIkKA/XBpCbYDwr0miV2SOgx5zp1iq7JVdNIVaB5A+UL3GhsueTuIh6nZZ6+/PPOhK qoQVfjTg ZP+kCb5dTRzcvkN8MPVDRORWb0a/v3X2EimJ10RndLZCSk4DiK+jnFKF8KWim53MiWJ+u6jIlhT3BMKdxtLtrLjQxfT1K/Mg++jSwv0CxFDOb+AmMNwgnlKA47aQ+MG9SEiJ8T1Rf2sQog2hYRDJtYwQoHnIcxkHbCCN5FmsuV+t2MM04poyQ3/KUr5gSkSYI94znfFQX7vcsutq3yPx90DJ1/XbtFGW3rUoRbxQFXhzBS09DEdo25xgkatgkMpQCwSif1QzqthtSMpDbbTi+9SD3HnY9kQ0ijbIKoLCPfPlDjVtFeLChYjph8bZAH19vguzI8RvPYhDhhs2in2SYudJ8xYmIPHcF+JCIRJHfVXcAfnBROcvy1Xl9yXcjqACb4+tMlsvd11kXUHE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 20, 2024 at 07:34:36PM +0000, Yosry Ahmed wrote: > On Wed, Mar 20, 2024 at 03:25:58PM -0400, Johannes Weiner wrote: > > On Wed, Mar 20, 2024 at 07:11:38PM +0000, Yosry Ahmed wrote: > > > On Wed, Mar 20, 2024 at 06:08:03AM -0400, Johannes Weiner wrote: > > > > On Wed, Mar 20, 2024 at 07:24:27AM +0000, Yosry Ahmed wrote: > > > > > [..] > > > > > > > > - /* map */ > > > > > > > > - spin_lock(&tree->lock); > > > > > > > > /* > > > > > > > > - * The folio may have been dirtied again, invalidate the > > > > > > > > - * possibly stale entry before inserting the new entry. > > > > > > > > + * We finish initializing the entry while it's already in xarray. > > > > > > > > + * This is safe because: > > > > > > > > + * > > > > > > > > + * 1. Concurrent stores and invalidations are excluded by folio lock. > > > > > > > > + * > > > > > > > > + * 2. Writeback is excluded by the entry not being on the LRU yet. > > > > > > > > + * The publishing order matters to prevent writeback from seeing > > > > > > > > + * an incoherent entry. > > > > > > > > > > > > > > As I mentioned before, writeback is also protected by the folio lock. > > > > > > > Concurrent writeback will find the folio in the swapcache and abort. The > > > > > > > fact that the entry is not on the LRU yet is just additional protection, > > > > > > > so I don't think the publishing order actually matters here. Right? > > > > > > > > > > > > Right. This comment is explaining why this publishing order does not > > > > > > matter. I think we are talking about the same thing here? > > > > > > > > > > The comment literally says "the publishing order matters.." :) > > > > > > > > > > I believe Johannes meant that we should only publish the entry to the > > > > > LRU once it is fully initialized, to prevent writeback from using a > > > > > partially initialized entry. > > > > > > > > > > What I am saying is that, even if we add a partially initialized entry > > > > > to the zswap LRU, writeback will skip it anyway because the folio is > > > > > locked in the swapcache. > > > > > > > > > > So basically I think the comment should say: > > > > > > > > > > /* > > > > > * We finish initializing the entry while it's already in the > > > > > * xarray. This is safe because the folio is locked in the swap > > > > > * cache, which should protect against concurrent stores, > > > > > * invalidations, and writeback. > > > > > */ > > > > > > > > > > Johannes, what do you think? > > > > > > > > I don't think that's quite right. > > > > > > > > Writeback will bail on swapcache insert, yes, but it will access the > > > > entry before attempting it. If LRU publishing happened before setting > > > > entry->swpentry e.g., we'd have a problem, while your comment suggets > > > > it would be safe to rearrange the code like this. > > > > > > > > So LRU publishing order does matter. > > > > > > Ah yes, you are right. entry->swpentry should be set to make sure we > > > lookup the correct entry in the swapcache and the tree. > > > > > > Perhaps we should spell this out in the comment and make the > > > initialization ordering more explicit? Maybe something like: > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index d8a14b27adcd7..70924b437743a 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -1472,9 +1472,6 @@ bool zswap_store(struct folio *folio) > > > goto put_pool; > > > > > > insert_entry: > > > - entry->swpentry = swp; > > > - entry->objcg = objcg; > > > - > > > old = xa_store(tree, offset, entry, GFP_KERNEL); > > > if (xa_is_err(old)) { > > > int err = xa_err(old); > > > @@ -1491,6 +1488,7 @@ bool zswap_store(struct folio *folio) > > > if (old) > > > zswap_entry_free(old); > > > > > > + entry->objcg = objcg; > > > if (objcg) { > > > obj_cgroup_charge_zswap(objcg, entry->length); > > > count_objcg_event(objcg, ZSWPOUT); > > > @@ -1498,15 +1496,16 @@ bool zswap_store(struct folio *folio) > > > > > > /* > > > * We finish initializing the entry while it's already in xarray. > > > - * This is safe because: > > > - * > > > - * 1. Concurrent stores and invalidations are excluded by folio lock. > > > + * This is safe because the folio is locked in the swapcache, which > > > + * protects against concurrent stores and invalidations. > > > * > > > - * 2. Writeback is excluded by the entry not being on the LRU yet. > > > - * The publishing order matters to prevent writeback from seeing > > > - * an incoherent entry. > > > + * Concurrent writeback is not possible until we add the entry to the > > > + * LRU. We need to at least initialize entry->swpentry *before* adding > > > + * the entry to the LRU to make sure writeback looks up the correct > > > + * entry in the swapcache. > > > */ > > > if (entry->length) { > > > + entry->swpentry = swp; > > > INIT_LIST_HEAD(&entry->lru); > > > zswap_lru_add(&zswap_list_lru, entry); > > > atomic_inc(&zswap_nr_stored); > > > > > > > > > This also got me wondering, do we need a write barrier between > > > initializing entry->swpentry and zswap_lru_add()? > > > > > > I guess if we read the wrong swpentry in zswap_writeback_entry() we will > > > eventually fail the xa_cmpxchg() and drop it anyway, but it seems > > > bug-prone. > > > > I think it's more robust the way Chris has it now. Writeback only > > derefs ->swpentry today, but who knows if somebody wants to make a > > changes that relies on a different member. Having submembers follow > > different validity rules and timelines is error prone and makes the > > code less hackable without buying all that much. The concept of > > "publishing" an object like this is more common: if you can see it, > > you can expect it to be coherent. > > Fair enough, but don't we still need a barrier there? Couldn't some > initializations still be reorder after zswap_lru_add()? Only if it were lockless. The LRU unlocking in zswap_store() implies RELEASE, the LRU locking in writeback implies ACQUIRE. Those force the desired ordering - nothing can bleed after RELEASE, nothing can bleed before ACQUIRE.