From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 819C5C3600C for ; Thu, 3 Apr 2025 20:38:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F25E7280003; Thu, 3 Apr 2025 16:38:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED483280001; Thu, 3 Apr 2025 16:38:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC2D6280003; Thu, 3 Apr 2025 16:38:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BE2B3280001 for ; Thu, 3 Apr 2025 16:38:39 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id D1E851411F9 for ; Thu, 3 Apr 2025 20:38:39 +0000 (UTC) X-FDA: 83293895958.24.9D40DD5 Received: from mail-qt1-f169.google.com (mail-qt1-f169.google.com [209.85.160.169]) by imf10.hostedemail.com (Postfix) with ESMTP id F0052C0013 for ; Thu, 3 Apr 2025 20:38:37 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Bu1inqWw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743712718; a=rsa-sha256; cv=none; b=i7X3nZtE4j9gCvTvXQeabhIXOkPwpOsA5ANdQ3VXRdY7VD9KX6wT0+3hnVLMhihf0k+wXO lMinB1oveM1xtAQMWRu6AOmJhituc0wjclQm7gSms07MzQHIhAXJEjYRhoxPRDCIv44+9O hYNhbDg4OsHUNXHXzvSvP6i53BHP33I= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Bu1inqWw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf10.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.160.169 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743712718; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CgJaLBC/vvLHRWUfTxP8meiFEnp76vjgOWeU9IZXJJ8=; b=qsTNh6nLTzuQx46LFOOrrx0N1fdMXvFZPYtAfh133t0pnFrCcT20UnaK3gtEzh4YxZ60rD KUYi9cuffhBI6qa//TAlUAoGPUuQ6YpTeYAexWz6Moyv0TJoXuwOtKCRsKsn4bwNG0MNpr DVipPBPoarUBt53Uc7tXWTXy4Y1xCPE= Received: by mail-qt1-f169.google.com with SMTP id d75a77b69052e-476f4e9cf92so10378211cf.3 for ; Thu, 03 Apr 2025 13:38:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743712717; x=1744317517; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CgJaLBC/vvLHRWUfTxP8meiFEnp76vjgOWeU9IZXJJ8=; b=Bu1inqWw8IKLaJt5uLobhwld3F6DZepk7mt8mX0/xw8r9AOQgg9ysXsskcjtcYbzvw hNOU6bcR1tbdzK+C30UMTX+q9emlm57O13mxd45ylYHKYieuDk759liJCyMGLfB3loAx 3xO1ym6QFZzbCdAsMVLno2Xv5dFmUN1rP0BdZ6pG2tltFopxGW5lD76U9HGlhHuX/nPD sRqPrFuZHZw+JwHTO/rG7IJneSOEkpgCFkq9gIlihwGbkjobcUoUF+G/AP6/7gQFgpvP qkMMKnCU0hN1efkVll7TtEIURhOKz+OL3aOCE0ooq2fRHzBiKEIIP43NWjAwAuC3Z6VT pXJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743712717; x=1744317517; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CgJaLBC/vvLHRWUfTxP8meiFEnp76vjgOWeU9IZXJJ8=; b=Z3uZvdSXZHgQ/o2sQvUK8FCbT96+VkIiH5mjQfNCcoJ8YOwJQfwWG/PtyvxnkJrnpj 10BNLoWDd/ptBa/my2Cq+8i2X+WPSjPvn6oVmedq3CZhBWMz1li+Bbu2TOsZn+J6kIX2 yCg2C9bc+WiUlb8ME/Ub+xXTNOBkjlHjwSp1a8Y6H68U3mLQ+1z/czQwRQ94W1pRqSPQ qZQkP+JLhcq/dWbnYrDSOFEo+y3V/4XBOvdpz9VW6JiHGSfEzdX7v8uDEKtGSEg2Nj+r u4g/kvp4ubVIuJiYQIMajXVlMASe/JPm7puWwEssEFAnDKmFL62KDaPeyKNOjo6A/p8w cmtA== X-Forwarded-Encrypted: i=1; AJvYcCUz8OK2M60zyY+6p00RT43V2Yl4mjDU7ZHolE1G+eqeAIoWnMgbW0fQA8S/xFsfExMtEwEuEQ44Qg==@kvack.org X-Gm-Message-State: AOJu0YxcVbuIiw6NGTx2GaVb6p44Ab7B8jMN6eJPRlBXuMo5TwCsTGvz OoXe/wh0Rir3RsY1qe6tD1LOxKh1OrSk3m5gk3uPFmzshRSdAUx2734TK8i2Di0t6L3GVaUL5jU BupPGZ0/z8MuQP9Q8MCaZ2OA9e1s= X-Gm-Gg: ASbGncu4fiBtws4OgFWEIgskezB+WgKzNH5YjC7iPSgfEOBglQOK6ML0pkyM/dQqcA3 NiYzW0yQzfi5lzGV4sJSkxv25KmG8ehoUSmrCfizU9N2TLpjL+BFlO4Fmx3YHzpbF9XT6NNcZam 4Ha/S66S34atMVSW1zUFHIBBTjEJWzxS0DTMpWFQtOmA== X-Google-Smtp-Source: AGHT+IEMSliei2l649c31YPpt/ggB9tdVf1Zi3+20JU2QrQQbAiEI8KNTt9SsCchr9zSmEJqdJm0lS29wCdwTyhsG64= X-Received: by 2002:a05:6214:765:b0:6e8:955b:141e with SMTP id 6a1803df08f44-6f05852affdmr5326266d6.21.1743712716802; Thu, 03 Apr 2025 13:38:36 -0700 (PDT) MIME-Version: 1.0 References: <20250402200651.1224617-1-joshua.hahnjy@gmail.com> In-Reply-To: <20250402200651.1224617-1-joshua.hahnjy@gmail.com> From: Nhat Pham Date: Thu, 3 Apr 2025 13:38:26 -0700 X-Gm-Features: ATxdqUH27xjXKtpV6QHLDb42HuJLxsrQKEFDngpscMUoQkaQAzPemRQ1YdwdJqk Message-ID: Subject: Re: [PATCH 0/2] minimize swapping on zswap store failure To: Joshua Hahn Cc: Yosry Ahmed , akpm@linux-foundation.org, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, david@ixit.cz, Minchan Kim , Shakeel Butt , Chengming Zhou , Kairui Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: F0052C0013 X-Stat-Signature: hjfdzfjcey3qzi31b9uidwm39kwojds3 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1743712717-413312 X-HE-Meta: U2FsdGVkX1+MLK3F9XWF7xG2kVoyPxAf5ZWddBRHUx1H2ZF70Lj9VcohLZ1rqxsEnqYOG3QwkpwUzcoyLo/2LZLE/OOsdMEUyhhKkIrDNRulBNn/0hCL6EXTDaTDa47JJuIGAXmPBHVObtpvRukC0jUVSI5D3nk0MN9hRreDU+MXIL5goZRAKNFRPel3i3sTHE2daIHHplKUIn/cK2qqrJOzU9DU8Lbmld1X3Ymobb3UYgIMyh5Y5PW/PG/BgdzHUVJYClGNKHVyKHhMvbdiqvdaHY3EM/wyVgFTyk6990iFDnBGmu4KdKM739rktKVHqwgAqEjL1ZKeTqklin6AhXVP2xUPOFKLGCct8UD0IO0XZL8R63UwNlopnbPaow8okNAwtc6B93KwAuRgMEaWor86qu6YUSbYCgPC+a/FIVdhV8gtqr5++hTn4hrbrWip5UpO8vUks52Alyn3r29rZkC4mKjrWnhZrjBiM6P/vrF7GlwYfBA/fjJsDwice7uGoXLgUaq0gGizNKCUKVNtY1nkaxGGk3en+pkYdGUmzaiR2pa1MSfqiHo2FfUroCX0qnnOjCMVWF2Cv/wH1LtIl7ftETwAxRV/sbfhSORFd1vKk9IEfMRFnINwh6yLElr1maNcgdOaTG2yNZnkCQm9ELpFwBPzajtWOnXELrnEDEnYR8a+1y02FGkPm5NvHvaZOwWPPtScX8nrdASTXw5who1XunzkuXiABHpz0w/F0GjZMUzxoOrNIG4TCScDH/mroqY00PHygcNkX9WrvjCSPzdE8R+DzzHIjFhax/PJe8yXreOWYrAnZH5HXofDGr89zAr4U/Dep7Hdmag77qPVcvgnBmTZZx2I5zFzSf6tEUK7os+sO6PXToUfxvMJe0fZgIajeAuIMiAI78wj2s/PVxrCwsesGChKcEdiOX1TP2uWAxJbSi7ztpAp5WiBzDx35wZ/fYM98WJqTTqhs24 RyTMREOB 7t66WneqcrHNkSHkknXGjl5T45RHrcz1AzcvdmtolLbteubqVPORncLNqnUmeGNs6avdg64/4Fd9wqwDbRx3Xo3/n5Jct0grgNmuYiwkS7GPg9mBok5X02SelEanFSYeEyB0dX+CCtmf4JE6swgEwN8kqn7rjdn1PNfHzwEbjOvyPXKuVTBnwiR1YvFeOgcDyuyg3XLyme4RcNkzapPTvCA3iZVlgb0paDQKCMMP79yjuCCGHRNARd48GJUckVlgDo95Ano6M10bSi900x71XPJlIymo/NALJSY36Tdoy/nGT4U2xV+VKmPSeGH20LXauZSSYD1ibYgUHxtYlmiJbd2HoMJOw3NBNq/5KqmJWj/dgHkZPATxp9dEIY0vVSNIG4bjiR+koMkJmkivnfPmp1W429n3gRdlJ42Zds70uVBvthfSII3iQtdEasThz3fGWo0TkE4MYQ4KST5+lYA/Vn5iv0Q== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 2, 2025 at 1:06=E2=80=AFPM Joshua Hahn wrote: > > On Mon, 16 Oct 2023 17:57:31 -0700 Yosry Ahmed wr= ote: > > > On Mon, Oct 16, 2023 at 5:35=E2=80=AFPM Nhat Pham w= rote: > > > I thought before about having a special list_head that allows us to > > use the lower bits of the pointers as markers, similar to the xarray. > > The markers can be used to place different objects on the same list. > > We can have a list that is a mixture of struct page and struct > > zswap_entry. I never pursued this idea, and I am sure someone will > > scream at me for suggesting it. Maybe there is a less convoluted way > > to keep the LRU ordering intact without allocating memory on the > > reclaim path. > > Hi Yosry, > > Apologies for reviving an old thread, but I wasn't sure whether opening a= n > entirely new thread was a better choice : -) > > So I've implemented your idea, using the lower 2 bits of the list_head's = prev > pointer (last bit indicates whether the list_head belongs to a page or a > zswap_entry, and the second to last bit was repurposed for the second cha= nce > algorithm). > > For a very high level overview what I did in the patch: > - When a page fails to compress, I remove the page mapping and tag both t= he > xarray entry (tag =3D=3D set lowest bit to 1) and the page's list_head = prev ptr, > then store the page directly into the zswap LRU. > - In zswap_load, we take the entry out of the xarray and check if it's ta= gged. > - If it is tagged, then instead of decompressing, we just copy the page= 's > contents to the newly allocated page. > - (More details about how to teach vmscan / page_io / list iterators how = to > handle this, but we can gloss over those details for now) > > I have a working version, but have been holding off because I have only b= een > seeing regressions. I wasn't really sure where they were coming from, but > after going through some perf traces with Nhat, found out that the regres= sions > come from the associated page faults that come from initially unmapping t= he > page, and then re-allocating it for every load. This causes (1) more memc= g > flushing, and (2) extra allocations =3D=3D> more pressure =3D=3D> more re= claim, even > though we only temporarily keep the extra page. Thanks for your effort on this idea :) > > Just wanted to put this here in case you were still thinking about this i= dea. > What do you think? Ideally, there would be a way to keep the page around = in > the zswap LRU, but do not have to re-allocate a new page on a fault, but = this > seems like a bigger task. I wonder if we can return the page in the event of a page fault. We'll need to keep it in the swap cache for this to work: 1. On reclaim, do the same thing as your prototype but keep the page in swap cache (i.e do not remove_mapping() it). 2. On page fault (do_swap_page), before returning check if the page is in zswap LRU. If it is, invalidate the zswap LRU linkage, and put it back to one of the proper LRUs. Johannes, do you feel like this is possible? > > Ultimately the goal is to prevent an incompressible page from hoarding th= e > compression algorithm on multiple reclaim attempts, but if we are spendin= g > more time by allocating new pages... maybe this isn't the correct approac= h :( Hmmm, IIUC this problem also exists with zram, since zram allocates a PAGE_SIZE sized buffer to hold the original page's content. I will note though that zram seems to favor these kinds of pages for writeback :) Maybe this is why...? (+ Minchan) > > Please let me know if you have any thoughts on this : -) Well worst case scenario there is still the special incompressible LRU idea. We'll need some worker thread to check for write access to these pages to promote them though. (+ Shakeel) > Have a great day! > Joshua > > Sent using hkml (https://github.com/sjp38/hackermail) >