From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DAF5E82CD8 for ; Wed, 27 Sep 2023 21:02:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 98FAE6B0208; Wed, 27 Sep 2023 17:02:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 942188D001C; Wed, 27 Sep 2023 17:02:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E0856B020B; Wed, 27 Sep 2023 17:02:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6D16A6B0208 for ; Wed, 27 Sep 2023 17:02:11 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 3656E40E70 for ; Wed, 27 Sep 2023 21:02:11 +0000 (UTC) X-FDA: 81283600062.09.1C34391 Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) by imf11.hostedemail.com (Postfix) with ESMTP id 0B00C40010 for ; Wed, 27 Sep 2023 21:02:08 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=k1Wm7pms; spf=pass (imf11.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.173 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695848529; a=rsa-sha256; cv=none; b=GZZYrl+zchO8V6kgFu7KMoF3enFGShRRyBv1nf7DrYx3WNRm1PRBrzygmrbv1U5ceFdvAv JoPOY15hFPlQGF7Hc/a8bFq1xMomGXp5mUNj71aqPi7xv227oOOLACmtrkTQQCe6GFq6Sp Hbblx5J98hXGpaqvJ1RyZtsm7TQHldU= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=cmpxchg-org.20230601.gappssmtp.com header.s=20230601 header.b=k1Wm7pms; spf=pass (imf11.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.173 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695848529; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9G/HLJ01FZeXOW/EiwRF0i/XwdYFJSfQ3MESOHhln7I=; b=19CeSWb5en6AqsCazSZcPwGm3kmncnwBaB8A1+aRmD1ivCDVVRf0is9igfT5znOrS7g82r m5gPG5mfFoKYw05pVs6418giN9+7wKURm4DqhFNgKD3ZLVZOhO1opeBxxe6KK8KUbxh8Hg nG8VmizLU6NV99hfCMos+siEQYD91JM= Received: by mail-qk1-f173.google.com with SMTP id af79cd13be357-7741b18a06aso726668885a.1 for ; Wed, 27 Sep 2023 14:02:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20230601.gappssmtp.com; s=20230601; t=1695848528; x=1696453328; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=9G/HLJ01FZeXOW/EiwRF0i/XwdYFJSfQ3MESOHhln7I=; b=k1Wm7pmsrST65Eg5w0svjjJZJR1VlPyPTt1Fdv5g1ksCXR8XF6fmGIhyhggVNq5+K9 1TUYNR9Csm5LDDj4sVtQXuvWOLXfIQv419c99QtITmmYdsNc1Keptmr1Rh2F5N/pssfW 6vO6ljWqxXq8XefrVUZz/M+Bx/khFerkfqrd3/+GJudt3EgjULw+4pmFX3msPDK/BS5q UxhlrqV+lqkHIlWbh31TTFhRbnGbrsSOd6hECaaHEvEt3g9FkyzOufv5z894MNpt5kEn +3OOiJRD43XU3rkeoMa+ro5uARgVyZyIu0lfQ3pj9b+gMfFOkxGPr5iQLSiiHDvS0pXP sKYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695848528; x=1696453328; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=9G/HLJ01FZeXOW/EiwRF0i/XwdYFJSfQ3MESOHhln7I=; b=ttqacPRaWKWct1l5bJ/HdQ9pLEbFsWtdYPQHHZwc1SWYJjjcdEVoj/K6lHnl+TKqXZ iHXBMVtm4IX+ZcDdF8J6x9LXroBWv5OnTkl4Yqb+b/2R1BiwbnT+NyCLOGvFWak4QMJo /viWEpRAs19zkRvSUogKxpML0unH30ngvvJuP5+0OM235G22Ww1wKYSgZDMXnIPaReGx axI+oNzU+lOdKvoMM6eSm4ISaNIFLMJ21cp3vNguXZTwN66l0sKP0F20vv6k5isecvBt BjSuGtAnD2+WEgwktc3fWatlrl89f0w+Mb8QZ3YolOoVDu4MvR0GIi+JqPDgBaKlkdkC SM+A== X-Gm-Message-State: AOJu0YyDjr6HZCMgekDAWducylQ4l7bsnEy6WI1gA9fuxCJKu7hgVAuD JEejVggC7+OM8Xa+KeoFLVq1ghQXaFmQppJyGmU= X-Google-Smtp-Source: AGHT+IHaNWmq04rPYSAM3Coge8OuS9p8M2ExEAsfXHcLuVDR+WYtFIlfptZTF6VlSNbvXdrlw3fXEg== X-Received: by 2002:a05:620a:25d4:b0:767:ae40:1cae with SMTP id y20-20020a05620a25d400b00767ae401caemr3203135qko.7.1695848528012; Wed, 27 Sep 2023 14:02:08 -0700 (PDT) Received: from localhost ([2620:10d:c091:400::5:ba06]) by smtp.gmail.com with ESMTPSA id op34-20020a05620a536200b00772662b7804sm5746480qkn.100.2023.09.27.14.02.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Sep 2023 14:02:07 -0700 (PDT) Date: Wed, 27 Sep 2023 17:02:06 -0400 From: Johannes Weiner To: Domenico Cerasuolo Cc: Yosry Ahmed , Nhat Pham , akpm@linux-foundation.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Chris Li Subject: Re: [PATCH v2 1/2] zswap: make shrinking memcg-aware Message-ID: <20230927210206.GC399644@cmpxchg.org> References: <20230919171447.2712746-1-nphamcs@gmail.com> <20230919171447.2712746-2-nphamcs@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 0B00C40010 X-Stat-Signature: d5zryo4eqocpz3n8mc9fx68dngj8zfj9 X-Rspam-User: X-HE-Tag: 1695848528-920651 X-HE-Meta: U2FsdGVkX1/FSiYKhyoMiLbRIjCFeyG0REhrYr8A06s6bzqc4NNcBfAhQ3wvybbfNz2BB6SovtDx1JMP96aIqhi8qjNGvYJyzIBDaSP9C+i1n3bFEC5/l2ey6mw+kEGZE8DcA3C+uMTR2C9XVXw8RcedoGMUZdr13lR1KWb6cXe0x9uzjbroybRVYblmKokzv6Ai3YwQWxSjzoj2Ju7/a7XRfGCjMEG27Bcaq5DIRHfgim/KIz3MLiFlj6vHjqd93dkvpT83JhXug1WFdCM/pUjqKtsmemBo5sMjWpY5lb5qDbAQBjLy8+QOxJ3gxtasE98Iy7Ak2Bax9c5rW+9Rsm4QGc486lWGgZaoyPPB/K1L7CpSsX+Dya0zTnTlERRIzjb6o86z34zqbwhyudYBP1ZmPXIcyoC0NVkvtR7jvLI2q/VyemH4wMCn9s04Ju410hGLUtYk0gXEF+z4XRBt7nvy0mFl5S1tqBzUWKtk3gcrNGZDmqjCYaMwJs94AjgME00pXkm847RzTSgMEWzgRieSEbKlRitjmReYdCquaJa6a0ZkB8+X0UqCXZZGHNsEs8oPCoAFkMQTdsL/FHky3LUal7s06qiMVcT8XrqfPk09lj+eGvTAwo/8u33XLd45bI0J4pRecA0PVAZ9ij985eS91BdjYce/jCy/BJh2u3ZB4+QcETi1h8hF7Ov/qxMdN7kajA/XhPCSEh/NoYRA6Lr9iF+sYxoz9CPt6OfTE5L17C/lq97DqJx62d1VSXoBZaus8RQ0Ll+Qp6ysJtDcN1CtOJ8B7lCBAq6DAH2n570TLBHiswUP1EsSFiz/xRcwUTlr0a7DjoHRGccQyEprpyKQZdk2UX4zG6aIVuEaQ807x5N+SARUtXzxO+aIBWv2yO53nP1/Jvr9c9MWeaOKbTzirzsHr38isdoy1TzgSJzl5CM/xjp3rv2788PUZHT13L0bVrMyaEO0avL+5N/ pzdF3W0o Xwq75oB9j5VP0UDy+0dsqjqGOo5JgpZXluU1pbq78ir7nqpLyPtdsF2pu9AtvV3bUvjjjAMRe1WDL9drEdQr4CMzOgqTHLtYeo2GAGaRMLg0g/FuBLVtZ30vUCM3/kgPB1xitnd9KP0fjg+pEy4ZqQil/8D0FJtWv2wnvAyzZKyUcAGoc1nKUscANmKYstQFROcJqUeZxorz3WrO7SricYY6xH+oyttzhIlcOoRJ7PYlnPD9sKobPiT7CQrl05ySqstNZ/0ferCCUyxPl1QnlMa6xv+d4Ym0XwTheRNbObH2+sA/DMirQCmNZg+iDp0LY4KfBWI72YcPKyryehcXUhqVp0wi0kFlfHPbp1hG01LoQ0E6gYkaennkmLYZPoLSv6aXlQoAwf0XipVByfHcorVlFjw36UZcg1xJLUfQYZkKarI4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 27, 2023 at 09:48:10PM +0200, Domenico Cerasuolo wrote: > > > @@ -485,6 +487,17 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > > > __folio_set_locked(folio); > > > __folio_set_swapbacked(folio); > > > > > > + /* > > > + * Page fault might itself trigger reclaim, on a zswap object that > > > + * corresponds to the same swap entry. However, as the swap entry has > > > + * previously been pinned, the task will run into an infinite loop trying > > > + * to pin the swap entry again. > > > + * > > > + * To prevent this from happening, we remove it from the zswap > > > + * LRU to prevent its reclamation. > > > + */ > > > + zswap_lru_removed = zswap_remove_swpentry_from_lru(entry); > > > + > > > > This will add a zswap lookup (and potentially an insertion below) in > > every single swap fault path, right?. Doesn't this introduce latency > > regressions? I am also not a fan of having zswap-specific details in > > this path. > > > > When you say "pinned", do you mean the call to swapcache_prepare() > > above (i.e. setting SWAP_HAS_CACHE)? IIUC, the scenario you are > > worried about is that the following call to charge the page may invoke > > reclaim, go into zswap, and try to writeback the same page we are > > swapping in here. The writeback call will recurse into > > __read_swap_cache_async(), call swapcache_prepare() and get EEXIST, > > and keep looping indefinitely. Is this correct? Yeah, exactly. > > If yes, can we handle this by adding a flag to > > __read_swap_cache_async() that basically says "don't wait for > > SWAP_HAS_CACHE and the swapcache to be consistent, if > > swapcache_prepare() returns EEXIST just fail and return"? The zswap > > writeback path can pass in this flag and skip such pages. We might > > want to modify the writeback code to put back those pages at the end > > of the lru instead of in the beginning. > > Thanks for the suggestion, this actually works and it seems cleaner so I think > we'll go for your solution. That sounds like a great idea. It should be pointed out that these aren't perfectly equivalent. Removing the entry from the LRU eliminates the lock recursion scenario on that very specific entry. Having writeback skip on -EEXIST will make it skip *any* pages that are concurrently entering the swapcache, even when it *could* wait for them to finish. However, pages that are concurrently read back into memory are a poor choice for writeback anyway, and likely to be removed from swap soon. So it happens to work out just fine in this case. I'd just add a comment that explains the recursion deadlock, as well as the implication of skipping any busy entry and why that's okay.