From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87AEFE82CDB for ; Wed, 27 Sep 2023 21:08:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD9EB8D001C; Wed, 27 Sep 2023 17:08:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D63586B020C; Wed, 27 Sep 2023 17:08:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BDC8C8D001C; Wed, 27 Sep 2023 17:08:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A73D76B020B for ; Wed, 27 Sep 2023 17:08:25 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 77DE31601A3 for ; Wed, 27 Sep 2023 21:08:25 +0000 (UTC) X-FDA: 81283615770.05.1A89667 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) by imf08.hostedemail.com (Postfix) with ESMTP id A75A0160008 for ; Wed, 27 Sep 2023 21:08:23 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kkBbdC+X; spf=pass (imf08.hostedemail.com: domain of yosryahmed@google.com designates 209.85.221.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695848903; a=rsa-sha256; cv=none; b=dn9vFiuc5i1FIoZvBbmfuE2p6mtT+EPQlYoGvmTSOWB73wCOQJ6mOdEtrKiiy30RX0gb/2 BHWwinq6zFKsczCAQbE+3gOp+Afaa86EKDQHu/TnhDqFuSxLT+s3Dbm8hXjQ5OW1/1ENpa Z71lvoKqZuAx29aIFFwrz+Z0ON53vrA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=kkBbdC+X; spf=pass (imf08.hostedemail.com: domain of yosryahmed@google.com designates 209.85.221.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695848903; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hfvm8/a4exaNtV5rVjrKr+Z7FNNhpbs91r3Q2boPyn4=; b=DbxfN74tlCpqu+p2FNCDPRZJJlb/ZzyQzP+yPVjsQeEF2PAhT1c1ekNEdCXKXbRyS73xcP ktZ+WOcc2nCdeotIsH5vHWnotDLWcrFfp/IyFlAyb/4ZecHzqVxg4h6+59jsroiEsJXJNc +SykXD6jnQWnpVpbyll5Angg0PR2qVc= Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-317c3ac7339so11546240f8f.0 for ; Wed, 27 Sep 2023 14:08:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695848902; x=1696453702; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Hfvm8/a4exaNtV5rVjrKr+Z7FNNhpbs91r3Q2boPyn4=; b=kkBbdC+XoVb4rHb4tUqubvPUAJrSDegDMzBp6+XLmpYNegwBLN5qW2qa6e9raSUASP Hu4ZAjyQuJw7hZBRBVTjf8OMJ5d3KzfqJLFU1V4jCVCELits0n4PLQw1shkoJLPaUsYY hpmSbarL2PuTWP4q0x6gsgfdtGw1TFNehqCBHBhXBvU08uRZfcsRRsT+6xI4JSOIvcP4 3H06gmbpfDr+uhYDZ3DQQYh0bUPILGTZjQmLQhG/161FYs6JgRhZH1hyQtEobOk7Tit6 8Z6md7S4sz3+FvJLUPWci+peUQOrvSgpxsm/5Fh60WGkAuXXLBgoTg7p+rEPLudXwu/J wH8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695848902; x=1696453702; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hfvm8/a4exaNtV5rVjrKr+Z7FNNhpbs91r3Q2boPyn4=; b=R7d5VLkBRTf+k6K0A7JPbE5tX4ZMtQlCWMGCPu1B6X+m8fg/s3eftAIAJIYrjGvMFE KIVoyyGI+g/t4MIzaFX9MaDiEA1dpojoIrw/Wj+6c1wLn5+TTS/pyDikLwKYJtAk8Shl UKBQ7UoAlPIhKwbe2KsD5hktc3fDrI1FFqFhjNvo6dlXIsbO97iPd9IbKoEOK+UZ/bK9 JVsy3yZ7np+2w4iQ8XbcjltF1XR5Q08Gl+v2V+VL3d4Ump4R5fjOYLgFStDpg0qh5m4k SLPE0lT4l/+fCG9eEwqcUDRmhnaaMJK4/Y4BSeacaHe6KIaaMNmCtZUzTLs2XQNwvfwz Lydg== X-Gm-Message-State: AOJu0Yx2sXVKgW/gMH1cacYkDJOH8twO9IpEWeUMpNdhlcowxfmplfD1 U1OhFIx7DQlhumO5ZbLm4qHbdkWH2QoEEF9VI1Qvkg== X-Google-Smtp-Source: AGHT+IGgbVlUZsL5dndb0Q67CFV4Q9kOla86kBHqm7mYfzIxee1MZaJ7YtGhjbLlr//eTV6W+H9iSLVy/KEczb1yuNc= X-Received: by 2002:adf:ce8b:0:b0:323:33b1:dc44 with SMTP id r11-20020adfce8b000000b0032333b1dc44mr2908496wrn.0.1695848901844; Wed, 27 Sep 2023 14:08:21 -0700 (PDT) MIME-Version: 1.0 References: <20230919171447.2712746-1-nphamcs@gmail.com> <20230919171447.2712746-2-nphamcs@gmail.com> <20230927210206.GC399644@cmpxchg.org> In-Reply-To: <20230927210206.GC399644@cmpxchg.org> From: Yosry Ahmed Date: Wed, 27 Sep 2023 14:07:45 -0700 Message-ID: Subject: Re: [PATCH v2 1/2] zswap: make shrinking memcg-aware To: Johannes Weiner Cc: Domenico Cerasuolo , Nhat Pham , akpm@linux-foundation.org, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Chris Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: A75A0160008 X-Stat-Signature: a5qba9c5765p951m77wufwgahnhpncxp X-Rspam-User: X-HE-Tag: 1695848903-462298 X-HE-Meta: U2FsdGVkX1/40n44442ZjfWD9GihLfd0N4Sn8EOgzdww/Go+mnmjo0ilwFnYvAvhR9x3713+f70GQ3TkFWdpFv2SZYvMdUugAlxNlTRMRMn8QQCXtVD2HdHk5eLsr5h4eJK+xXkQIMwqTFP1fzEdsqlOqTIWC29K1zbT+Nq2UQN01h39DzaRSkkFD4uRYk4itYPp0KBp1XLVVHQOw0ALi6ZqKSD/goAhgjJFj1FsIkJu4xulnzjki56LhRlYvVOy9c0In5DL9O4/TE2oj6IY3JC5fRDA1e/emWwcNq02MJeds2ARisHakWnYXbDafrf9t09+j5NFpRJCtVwr8FCHvelLtM/GnRc+obfVgDJkjezgJ4U3jbG2YgmMC9bqidUFJDpVZIZQoXnVWggNoh6eox30i8iQ1YSALAN0N+niNqJ6oGb5rRDR//6f9lYUPVvBFXZJHuR2jpn5Czkj4vDH9xwsUhJtIlFk5NO2kx9BA4U8ZekRxy2Bc25Azye3DelvDDvYDCTEw1TFikQD0bKnWViv+I9AhC5qb0tvSdTyc5CLT33uw8BcEXgszk4wEWlo1OvjpQJWyEDmnQKgmowTQnUgIovRglkb3q77//pP4XN3E6iXoIduFFQxyPM3J9LRhSgIJU82pbFsZqzdtAxOWDFz94IQZ72/6aSIZT+w61cXm3CGeKDrtzKhVyJRTgL+xlUNxIp2ROGXpeiZgCvA7lSeNeppn34fScWGt3getpko/z7iuRkdZOavtT/fsqJ3BO73bFos8Ti7ZIpsR4dPdYnegG6hbpHUNFN1pd4/jpTK032QqumhBqPhxzZ6expU0lzqloAJTgN9VGrveRjMX0Xqili5k1DjpBo+n/ZDmBuaKUfcIt4pQf4aRMFe6N9PHkQLKqmGTZZhNgtiiq6esXnUaW9gSe496/o9rPkcVj557fSqZ29DAUOXbJjaLLU+xMiJtxVN+/a8WYg9c2F OhziJTqO JSHS9+5KG79/97a2+yty+N3fqSH6jmpA++Hft3gHdl6m8BKWG+//qfbtGc+FKqT/aeuULo5+I3QL9XiDWUSr3w8dvqUOMkW7kZJ0EpzIZh2QeG372vXZyTXPr0l4Hfzl0wPcV9XI+X+S0xIT+rJ3HE+GFJMEHD5rALPmSI3/O46fLr750tLYO79fNKv8biU2PgR0eJDR9i9mGfUr379oJQ5BVnL5l1mduKUZ0iMaC9rZ+7N8ZBvRz5BG1Pf5RSV8jRWY9qtmpC59qNdR3TvvWvd72RRs+xyk3E4eBICpV93pCaVtOYpl6TGwKjlWltkKnjNL+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 27, 2023 at 2:02=E2=80=AFPM Johannes Weiner wrote: > > On Wed, Sep 27, 2023 at 09:48:10PM +0200, Domenico Cerasuolo wrote: > > > > @@ -485,6 +487,17 @@ struct page *__read_swap_cache_async(swp_entry= _t entry, gfp_t gfp_mask, > > > > __folio_set_locked(folio); > > > > __folio_set_swapbacked(folio); > > > > > > > > + /* > > > > + * Page fault might itself trigger reclaim, on a zswap obje= ct that > > > > + * corresponds to the same swap entry. However, as the swap= entry has > > > > + * previously been pinned, the task will run into an infini= te loop trying > > > > + * to pin the swap entry again. > > > > + * > > > > + * To prevent this from happening, we remove it from the zs= wap > > > > + * LRU to prevent its reclamation. > > > > + */ > > > > + zswap_lru_removed =3D zswap_remove_swpentry_from_lru(entry)= ; > > > > + > > > > > > This will add a zswap lookup (and potentially an insertion below) in > > > every single swap fault path, right?. Doesn't this introduce latency > > > regressions? I am also not a fan of having zswap-specific details in > > > this path. > > > > > > When you say "pinned", do you mean the call to swapcache_prepare() > > > above (i.e. setting SWAP_HAS_CACHE)? IIUC, the scenario you are > > > worried about is that the following call to charge the page may invok= e > > > reclaim, go into zswap, and try to writeback the same page we are > > > swapping in here. The writeback call will recurse into > > > __read_swap_cache_async(), call swapcache_prepare() and get EEXIST, > > > and keep looping indefinitely. Is this correct? > > Yeah, exactly. > > > > If yes, can we handle this by adding a flag to > > > __read_swap_cache_async() that basically says "don't wait for > > > SWAP_HAS_CACHE and the swapcache to be consistent, if > > > swapcache_prepare() returns EEXIST just fail and return"? The zswap > > > writeback path can pass in this flag and skip such pages. We might > > > want to modify the writeback code to put back those pages at the end > > > of the lru instead of in the beginning. > > > > Thanks for the suggestion, this actually works and it seems cleaner so = I think > > we'll go for your solution. > > That sounds like a great idea. > > It should be pointed out that these aren't perfectly > equivalent. Removing the entry from the LRU eliminates the lock > recursion scenario on that very specific entry. > > Having writeback skip on -EEXIST will make it skip *any* pages that > are concurrently entering the swapcache, even when it *could* wait for > them to finish. > > However, pages that are concurrently read back into memory are a poor > choice for writeback anyway, and likely to be removed from swap soon. > > So it happens to work out just fine in this case. I'd just add a > comment that explains the recursion deadlock, as well as the > implication of skipping any busy entry and why that's okay. Good point, we will indeed skip even if the concurrent insertion from the swapcache is coming from a different cpu. As you said, it works out just fine in this case, as the page will be removed from zswap momentarily anyway. A comment is indeed due.