From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0651FCF9C69 for ; Wed, 25 Sep 2024 00:48:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 90F526B0089; Tue, 24 Sep 2024 20:48:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8BEE66B008C; Tue, 24 Sep 2024 20:48:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75FC66B0095; Tue, 24 Sep 2024 20:48:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 573916B0089 for ; Tue, 24 Sep 2024 20:48:01 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E46B5160698 for ; Wed, 25 Sep 2024 00:48:00 +0000 (UTC) X-FDA: 82601423520.14.1D9AF52 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) by imf26.hostedemail.com (Postfix) with ESMTP id 284D2140009 for ; Wed, 25 Sep 2024 00:47:58 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=IVast1QP; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727225264; a=rsa-sha256; cv=none; b=1DIhb3IzXOMkKh26I7f7f6RrwV+9FkP6UYMoSxFywoKZ6FTk4Q95y+0WYdoVbOgSzkOQN8 BSM1v77prKKsowgNS/Ia25Tiujz+O8d3kvpPiDOhOe+shrM6YgXgv3GZD73kQ69xXMgIKx 8S0NzwWPIqkZyfRSJEDIHy56u/esrEQ= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=IVast1QP; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf26.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.49 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727225264; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=RjroumTRg3WK06LaJJH4au3JX5ri1QDATDl4+eAVQeI=; b=ZyQChkUa0Yl9kpJ2rR8zUocCp7/UWM9YzsCHcKAyo9eKSm2+87/oH6cBAWKgmrO881zrpN bmpb/v7aUWW7IxQdphjxtv7JjwVYx+ae1tqqnkttjZJK7nLuCWCSYtIw8Bx82goHeYBmHz 9Xo/FOw57VzuxWZxuSS/322KMrWD7eo= Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-a8d0d82e76aso596542366b.3 for ; Tue, 24 Sep 2024 17:47:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727225278; x=1727830078; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=RjroumTRg3WK06LaJJH4au3JX5ri1QDATDl4+eAVQeI=; b=IVast1QPYdwdpl8GC/djwfx9dutW81p69tfr12wqzzBHYPDng4OSFa3+mkD6+rMpul CLJpBqmvWg+A/0Ax+LcxrCHPetqmj2Kl1yUXfeonoK/sKUNgsTyvjd0uj8tlcNOWik2U SVNkXevQKGUUALff0gOEFLo2l7jae2DFg6yFPVY2DWw05NvD21K0hBWciKUJN3NwUUC5 THSJewdmU8c1skjkdHNF4WeEftCvmqtEEVbxmlHz3MVEUtUAFEbjyCKfo7nfPsuJEUvw nGVCAYva55dCf2QtieRQ60f8IwwsqhK40sHd9/MTGTdAd069nJENnkFbxnZ3eIjyos+0 EFcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727225278; x=1727830078; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=RjroumTRg3WK06LaJJH4au3JX5ri1QDATDl4+eAVQeI=; b=HQqstuhORRtW6y8twH2ctJieHCeBqf6AUDTCvwjDdP16LYnbuEf8TAD2JsYtlRIKOV UTBsToPuqixTCucw6XopnTbo6DG5jIVTD6Q91Sws8g0rByVimBL2NhsT2ZEnDDvIl5Xk 449RyS+aS67Xyzh1GxXJei3dEF/t68RxbLDEHVnWwOSnH9bC5cZSNoVzGoquQKHF8E6K 1dENP2GVbeb+Zw4TGDeWj4226V0XHmEWPMYSbrQmk7f+WfrvPkylpS0YjRYg3iTdAuJ6 AgkNgSSHplp0mqn76umfX/k0G2b7lkY2KypoEogD+oJiU7cE8FPm5WGfQX/oezlC468d 51Vg== X-Forwarded-Encrypted: i=1; AJvYcCX1HRqnNIF1R16FL70a7NS1jtguCVSKUexOQn6es+fjf146XDtSk8dgn5LZCgOqXS+iV4AAAX0ynw==@kvack.org X-Gm-Message-State: AOJu0YyU8kFrSTWjhiTsIniXBV8h+pnWis+KiFdqnpX2L5iM36fiYUKf HQV3b4/Cn6XEKlx7Ko07QE6QfzBy7meJLq3lwejymFRCP5s8X9PCEE2qiP7EVn2Dz9Degk3hr3w hlUMB8JxED5ZMA64XYHZHWf70kKiZbeKYeYf+ X-Google-Smtp-Source: AGHT+IFT1+XkDNXM0wUC5d0XbRCPwF92i2hsvdolML0qEbL8TkBgdfFoCvKcGyfkRdHe21DiOC4IfCDXtE7rS5Jl/A4= X-Received: by 2002:a17:906:6a03:b0:a86:820e:2ac6 with SMTP id a640c23a62f3a-a93a037e463mr100424366b.22.1727225277451; Tue, 24 Sep 2024 17:47:57 -0700 (PDT) MIME-Version: 1.0 References: <20240924011709.7037-1-kanchana.p.sridhar@intel.com> <20240924011709.7037-6-kanchana.p.sridhar@intel.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 24 Sep 2024 17:47:21 -0700 Message-ID: Subject: Re: [PATCH v7 5/8] mm: zswap: Compress and store a specific page in a folio. To: "Sridhar, Kanchana P" Cc: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "hannes@cmpxchg.org" , "nphamcs@gmail.com" , "chengming.zhou@linux.dev" , "usamaarif642@gmail.com" , "shakeel.butt@linux.dev" , "ryan.roberts@arm.com" , "Huang, Ying" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Stat-Signature: kg8jqozzf4h1yxppangrq4ykf3dqn3jy X-Rspamd-Queue-Id: 284D2140009 X-Rspamd-Server: rspam02 X-HE-Tag: 1727225278-851774 X-HE-Meta: U2FsdGVkX1/qJKp25zSFRZUWVFaaioAibDVwXUBpwfe2I55iXFc0pQKp3jfpQ6ESRpJI8g3vUz0vIvgzUplv0fn1ceBOH5cbUv38ORFXdpoD1Rvc50iTj2YvGpIQdVjvm3wCVHJf1AU8yMTc/ybdmpPD0MALf52Xa0XDhzYxQkw9sY2/BgI8F0cnLBCYbKpdEaql3RjawI+QCsygLgM1DO0Ks8LOneUH93WkQ0LI7DQ9dvWPlwjYagaACpiDUzGRzD2ay84z1GpHMZMjPjdQLaBdH8cg7ZvgjhET2WeHrCl0PIdjjoCjCY7FiyL+lHa9iR+97I4XtF32ORPjgxxtJkamnTE/3hh8j1QAyBVTmdISTUL/WVhMHlTejMvQbnwcKRWzTRTgsYfeEu5p4ez5R5Iq+qcbF8hPtKcjNQCQ7kCMJ6LCz4P1h05j03IOTCOrrvb6GZTopzay6sg/QDlX5ky9OiHOQW1NUZR1FwV77pypsttFj18tBBKPCrn9+sbsZzizAX4EOp+WD5jIEISYHpdn2AvCNGdxSsG+vmeZB8zO6EwfDTfnRZHK5L2Md3eBIpM+ZPoQwLOo1eNRzGrr/2i3PREYb4Cpy5ZoIo/svsczx9VFIThuo2cyi7AKxImc9gYauNW36vba0LobAQmZYxbXZZEb7T6aep+l935bQsp+BXFrOeSCF4xyrz4n169uKM8LCLCO20sL9kUihv6ZP6W1NijajEFyAVISpU2RPKLG+jhv4hHbdXU9ihKe7yi+j8eHXreNS55CYWnaeEYiIP2J0fxE0Aw+/vTRxphd8SlZ9bmIcR7aLlji1D5Y+emEZaHyoF1+NiPo3xP9lumdcELoYtZk/FD8sR/fIprR/hY3UmS7ekYQpC5BApEV3VaFqNCJMjRipc9VqYvJHSbI4cZ8QB1/lNPSpkWFRANhXgsH00uakRkzmVJVIZq/0zB2SK5dKlpR7BGm4yq3M2R U5XnrDtU x9s8IDnvXlFRg28gA59wW3UucDN77sTV6nLtRRkTvLeLpS9DYPPs5YcM0HHcth+5iOLN7A2s4zLAsEipRjM4ewhFdZnUinI8ioh8EcZuJNTRrUEQJxYbs/RRINe8g0v+VHApZejrnCo/yjsk3FOq++LzR0QJ07c+mCJPrPXFZ/ZsljxTStechxHTE7fEzpwQTg7gAdBuy0LVoTQXLcWeLX0Ml5WbRRa1pheJYwVXxtdi+BW0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.003231, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [..] > > > > > +{ > > > + swp_entry_t swp = folio->swap; > > > + int type = swp_type(swp); > > > + pgoff_t offset = swp_offset(swp) + index; > > > + struct page *page = folio_page(folio, index); > > > + struct xarray *tree = swap_zswap_tree(swp); > > > + struct zswap_entry *entry; > > > + > > > + if (objcg) > > > + obj_cgroup_get(objcg); > > > + > > > + if (zswap_check_limits()) > > > + goto reject; > > > + > > > + /* allocate entry */ > > > + entry = zswap_entry_cache_alloc(GFP_KERNEL, folio_nid(folio)); > > > + if (!entry) { > > > + zswap_reject_kmemcache_fail++; > > > + goto reject; > > > + } > > > + > > > + /* if entry is successfully added, it keeps the reference */ > > > + if (!zswap_pool_get(pool)) > > > + goto freepage; > > > > I think we can batch this for all pages in zswap_store(), maybe first > > add zswap_pool_get_many(). > > > > I am also wondering if it would be better to batch the limit checking > > and allocating the entries, to front load any failures before we start > > compression. Not sure if that's overall better though. > > > > To batch allocate entries we will have to also allocate an array to > > hold them. To batch the limit checking we will have to either allow > > going further over limit for mTHPs, or check if there is enough > > clearance to allow for compressing all the pages. Using the > > uncompressed size will lead to false negatives though, so maybe we can > > start tracking the average compression ratio for better limit > > checking. > > > > Nhat, Johannes, any thoughts here? I need someone to tell me if I am > > overthinking this :) > > These are all good points. I suppose I was thinking along the same lines > of what Nhat mentioned in an earlier comment. I was trying the > incremental zswap_pool_get() and limit checks and shrinker invocations > in case of (all) error conditions to allow different concurrent stores to make > progress, without favoring only one process's mTHP store. I was thinking > this would have minimal impact on the process(es) that see the zswap > limit being exceeded, and that this would be better than preemptively > checking for the entire mTHP and failing (this could also complicate things > where no one makes progress because multiple processes run the batch > checks and fail, when realistically one/many could have triggered > the shrinker before erroring out, and at least one could have made > progress). On the other hand, if we allow concurrent mTHP swapouts to do limit checks incrementally, they may all fail at the last page. While if they all do limit checks beforehand, one of them can proceed. I think we need to agree on a higher-level strategy for limit checking, both global and per-memcg. The per-memcg limit should be stricter though, so we may end up having different policies. > > Would appreciate your perspectives on how this should be handled, > and will implement a solution in v8 accordingly. > > Thanks, > Kanchana > > > > > > + > > > + entry->pool = pool; > > > + > > > + if (!zswap_compress(page, entry)) > > > + goto put_pool; > > > + > > > + entry->swpentry = swp_entry(type, offset); > > > + entry->objcg = objcg; > > > + entry->referenced = true; > > > + > > > + if (!zswap_store_entry(tree, entry)) > > > + goto store_failed; > > > + > > > + if (objcg) { > > > + obj_cgroup_charge_zswap(objcg, entry->length); > > > + count_objcg_event(objcg, ZSWPOUT); > > > + } > > > + > > > + /* > > > + * We finish initializing the entry while it's already in xarray. > > > + * This is safe because: > > > + * > > > + * 1. Concurrent stores and invalidations are excluded by folio lock. > > > + * > > > + * 2. Writeback is excluded by the entry not being on the LRU yet. > > > + * The publishing order matters to prevent writeback from seeing > > > + * an incoherent entry. > > > + */ > > > + if (entry->length) { > > > + INIT_LIST_HEAD(&entry->lru); > > > + zswap_lru_add(&zswap_list_lru, entry); > > > + } > > > + > > > + /* update stats */ > > > + atomic_inc(&zswap_stored_pages); > > > + count_vm_event(ZSWPOUT); > > > > We should probably also batch updating the stats. It actually seems > > like now we don't handle rolling them back upon failure. > > Good point! I assume you are referring only to the "ZSWPOUT" vm event stats > updates and not the "zswap_stored_pages" (since latter is used in limit checking)? I actually meant both. Do we rollback changes to zswap_stored_pages when some stores succeed and some of them fail? I think it's more correct and efficient to update the atomic once after all the pages are successfully compressed and stored.