From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1723CDE026 for ; Thu, 26 Sep 2024 17:35:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 403C76B009B; Thu, 26 Sep 2024 13:35:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B30C6B009D; Thu, 26 Sep 2024 13:35:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2538C6B00A0; Thu, 26 Sep 2024 13:35:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 033416B009B for ; Thu, 26 Sep 2024 13:35:31 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 997611201B1 for ; Thu, 26 Sep 2024 17:35:31 +0000 (UTC) X-FDA: 82607591262.22.21DD709 Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) by imf21.hostedemail.com (Postfix) with ESMTP id A4A521C001D for ; Thu, 26 Sep 2024 17:35:29 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ITZ6V0oS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727372040; a=rsa-sha256; cv=none; b=M2vQe4wFkR1IBIE4rxM2eOBh9IGtFNnJGQPspmRxAsYvczI8ZdUyfcOzgQLBq09F/OHmzk BEVsWmawunXIoDYFt/WIxNDf+oWKoccv9AA/9TbefANpzj7Nba1ZijaLO2HTetJo04vCsT uWokorDO01SJ4rKz9LdkRhSWo1UQsnc= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=ITZ6V0oS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727372040; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=90Zj4vWTFKM9hGI/ZhiBg+/nwuUtOIRWahkWQsv/IQo=; b=DwzdXoFuRIN6HFNxtFqeWAHZ+59vUatHmUnJcdJWd1juCkwN39Jo5Wfq4pBqslobmS6ms9 fhAMnnNcTk0rH8FLUl4sFySap+QpoZcpAIUNTeWxMkNlWCWf+yVbPQ7gTsB+HBhNghNS/V PF2IH+Isly/blD9/tmWLYB+9QFEErbE= Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-a8d446adf6eso172561866b.2 for ; Thu, 26 Sep 2024 10:35:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727372128; x=1727976928; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=90Zj4vWTFKM9hGI/ZhiBg+/nwuUtOIRWahkWQsv/IQo=; b=ITZ6V0oSMadBmpC5yIXuP3Gisw8kIaG3DpTrnWtVRokOsCBmwxZQ/Ec0+Mt8ovgCFM fw5aGFCjp2rUlr6S0OE145HkZqJD3C0CmMY0QuY1MKLaTCeKFThX/mvTBPpVxuUPseGe ls/vWXzkoh565+e4gHY1tKp2LNaEXc/Ae8KvNe88g7J8m/QonUE2JEOvfWpFjy7lSTsB 1X+UWYi5bwngc3P2FJWv5DMAAKbNMsL+CNeX5gFWklIpOLyc9SN7CAneb6xfOU4h4KHE 7pslu27Q4FlcvfvLmdczfZc/GACXsqKz6hQ+PircxrZCcSS4ijrJc6lQO+nmgpMYptFR p4uA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727372128; x=1727976928; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=90Zj4vWTFKM9hGI/ZhiBg+/nwuUtOIRWahkWQsv/IQo=; b=fooQuM6wpTYCGJQ4TYZdQ7v1pZEAjKDDM2TZrISej/wb4NhAUUqc/LfBvUPIYwkFnD dZSRIaTG+djdOF2+PZZdEHUE0ks5rqi84HpvFPjTSK2LrareTu7kc5ad355Vk8NangYv V0ACKJ04p4hzqPS8CDusWhG02N5VD39FWaMMLT05lBzkXD+jbkTIGVpPe48wiOI+LJeG yklZVo+kgKzfykZ85gO2jorY59YamMMHm8Lz3ojurbcTGe4cyVuFxShI2hkN9+bIF9xL gRqfQa+0G5LkE25w61kJ8KGUJh8VJYIlv7Hf3RG9xCvSCtqHxanNcwZ2ACEc/0SFmTFg hIqw== X-Forwarded-Encrypted: i=1; AJvYcCWw22ob7nGiabtQGvcbbvGEg+q+ryBAi6PJI8NloeSfs5sNPcO8EV1uca5+ILErZhl2e0pdqKdMXA==@kvack.org X-Gm-Message-State: AOJu0YzpULvx9+p1gRFTWUUHS13UEhto0rAECe4+BAMM4RSqe8GFKUaK xFlR0l8rGD5jObYfccW+ynHXeMA75Dahxo/Ud+pijjE3naT/fR97lAWa1rMl7dGGMwujQxoUXoJ YQKZfZRUIN28I/g6GlijfelBzeBFF8IffxA6+ X-Google-Smtp-Source: AGHT+IHCP+Zznjpd3YjrmHAE50PhdAm4vSwOqQym8ahagT8NDlpt1wX9i/4gffYt0cbbR2b6ytzqgY8SXtldqDcxqhs= X-Received: by 2002:a17:907:e91:b0:a7d:c148:ec85 with SMTP id a640c23a62f3a-a93c4aa6b89mr25072866b.62.1727372127804; Thu, 26 Sep 2024 10:35:27 -0700 (PDT) MIME-Version: 1.0 References: <20240924011709.7037-1-kanchana.p.sridhar@intel.com> <20240924011709.7037-7-kanchana.p.sridhar@intel.com> <20240925134008.GA875661@cmpxchg.org> <20240925192006.GB876370@cmpxchg.org> <20240925201323.GA880690@cmpxchg.org> In-Reply-To: From: Yosry Ahmed Date: Thu, 26 Sep 2024 10:34:49 -0700 Message-ID: Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in zswap_store(). To: "Sridhar, Kanchana P" Cc: Johannes Weiner , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "nphamcs@gmail.com" , "chengming.zhou@linux.dev" , "usamaarif642@gmail.com" , "shakeel.butt@linux.dev" , "ryan.roberts@arm.com" , "Huang, Ying" , "21cnbao@gmail.com" <21cnbao@gmail.com>, "akpm@linux-foundation.org" , "Zou, Nanhai" , "Feghali, Wajdi K" , "Gopal, Vinodh" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: A4A521C001D X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 9jmcz1jnc3xfq919nf563hq13jzjbmg7 X-HE-Tag: 1727372129-791387 X-HE-Meta: U2FsdGVkX18FymhZsmYASx0J1+F3TRJ7S6eLzDAOdhiDxFIJWHCbpfQwOdCaP2E5FFMHUHsEFdO5M5RoEsWASSXtFwGveTSL9NXuKHSYuaDvqw3osLmSU/4GhoXcBhu37PE2C2+xfmyC1GpRPoRSYa/0R5ZADLkTE0ZXG0tb72hH9E3VyMYB2CqvhjBbL9HRkz3PSCHjBjyurBqvcMbAvPx79xmyQ87qWF3WD6K8Y7eBjo0upUTFNlX9LcHf3QckoLV7BQMhc2makIG2hOTqfiXOurnWcpfpCSw43jw2wB+EvEIk7FzZj7U8suTHcb6J1S7htt83SnQzWv2OfiHn97wDL05zsbQtJwZ8SWlVtVfWXiQT1h4gbSF5/dwJ01gbSvd+E8lLXr2/7vweBqVrMGaiQjp/pDSw5QWe05ssdcw0iDXCtjRibnd0pyVEAmO4rlkoxPEsDLT3+U2JZPFbF1xAU/88WFvJN9mJu3E9BZzKHCXspYVqO0L9bQaLtKZuTXP2mn39KctXgK4NKjVbdmTpKDXLYQOYcBfx7uIfzWGChVMmnQNl2BJ77Jb5Kua3kA5KXkfnIpyTGN6i+HsgmL75tYPtWy0ZOnllBVsMmr1ft9VHwaAO3WCMzLOOKVK8j0+LMFEH3kKwC75IFiHe9EOemZVAk4s39ueFAyye/OaTT8XpHAVyfYHFJtHFM/CUQTWrdvj+8R5uSXe6QoRUfGbLb5WL1JNFrn+cnRKqF0l11L5JPLyGqRdnyxOWuv8vs+N8Rzt8LZPcJ2Vh+HICljV9+uR0P+/p+sYdbMaSKVilnLKQ1EKjEmeRPfiMMTUwAtnfa33VD/Bs+yzPaHbd9pdkouE8WYO6TJdEFc2RSOPSLvAUg6i6ZjwPVzbIsUMTdapn4sSUq3PwElWTaTtD5oDmvRjlcMTsiDbleko4Ksaqi0KmyEcpYlk+5tWBV2RdGyZ5nrwMAyV4R24ty9d MERP8dOs ZcsqQ23GY1nQ9fU1vCSsu7FX05t4LHiURtWn9wn+D6myGNAkAjCFODFaNzFXJ8pYY06EcbsFh/aHIjbSIErzExTEQJrLkz6lAIy+KziII06f12KJj5RgwBTIa3m0ABMPDMxLZMPYmQr1C2bG7oJDBZvBxPezWg8JTNq8IT+27M1AIWRBI1hzimYY6SbBf6ehSx9+Hf2HBVucexksjSJYgcuM9X7z19AlvSZLTAF4WhqYqS9EuLs1iE4g30C4eB+BTjKehatZFSj6/fTkakEHD1SG5gTkKDU4wk4fZNgtl6O+zmcSvyN7fJzDpgh25kvhACEeUNyhuBqIq2BdjWK0VKDzcZaMqIFSNbviqnwwKsGR/yJ6wvOD3z0bkKyAQRVqggDslywi3QolQjy9GBwS8A9m1wHEHR3/qzN9YKKqN/lX+dgb+T5xgPUH+U2u3QfcNUF/iua9Ve04sZPFJGAJUjRN5N+V8h4vX6wQD2IDnNU4RCNWimDk3tzSEWyqkYn0dGDksClPELSOY7DThMiPghBqFxsjGvojDlysZKHG6Hm7WSOLi5SSMLTHS4JaJiHcFIsNu2w4W5FXA37dvWjFtcI9JJBW47otc7tQ0Ly24QfS5N/Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Sep 26, 2024 at 10:29=E2=80=AFAM Sridhar, Kanchana P wrote: > > > -----Original Message----- > > From: Yosry Ahmed > > Sent: Thursday, September 26, 2024 10:20 AM > > To: Sridhar, Kanchana P > > Cc: Johannes Weiner ; linux-kernel@vger.kernel.org; > > linux-mm@kvack.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > > usamaarif642@gmail.com; shakeel.butt@linux.dev; ryan.roberts@arm.com; > > Huang, Ying ; 21cnbao@gmail.com; akpm@linux- > > foundation.org; Zou, Nanhai ; Feghali, Wajdi K > > ; Gopal, Vinodh > > Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in > > zswap_store(). > > > > On Thu, Sep 26, 2024 at 9:40=E2=80=AFAM Sridhar, Kanchana P > > wrote: > > > > > > > -----Original Message----- > > > > From: Yosry Ahmed > > > > Sent: Wednesday, September 25, 2024 9:52 PM > > > > To: Sridhar, Kanchana P > > > > Cc: Johannes Weiner ; linux- > > kernel@vger.kernel.org; > > > > linux-mm@kvack.org; nphamcs@gmail.com; chengming.zhou@linux.dev; > > > > usamaarif642@gmail.com; shakeel.butt@linux.dev; > > ryan.roberts@arm.com; > > > > Huang, Ying ; 21cnbao@gmail.com; akpm@linux- > > > > foundation.org; Zou, Nanhai ; Feghali, Wajdi = K > > > > ; Gopal, Vinodh > > > > Subject: Re: [PATCH v7 6/8] mm: zswap: Support mTHP swapout in > > > > zswap_store(). > > > > > > > > [..] > > > > > > > > > > One thing I realized while reworking the patches for the batched = checks > > is: > > > > > within zswap_store_page(), we set the entry->objcg and entry->poo= l > > before > > > > > adding it to the xarray. Given this, wouldn't it be safer to get = the objcg > > > > > and pool reference per sub-page, locally in zswap_store_page(), r= ather > > than > > > > > obtaining batched references at the end if the store is successfu= l? If we > > > > want > > > > > zswap_store_page() to be self-contained and correct as far as the= entry > > > > > being created and added to the xarray, it seems like the right th= ing to > > do? > > > > > I am a bit apprehensive about the entry being added to the xarray > > without > > > > > a reference obtained on the objcg and pool, because any page- > > > > faults/writeback > > > > > that occur on sub-pages added to the xarray before the entire fol= io has > > been > > > > > stored, would run into issues. > > > > > > > > We definitely should not obtain references to the pool and objcg af= ter > > > > initializing the entries with them. We can obtain all references in > > > > zswap_store() before zswap_store_page(). IOW, the batching in this > > > > case should be done before the per-page operations, not after. > > > > > > Thanks Yosry. IIUC, we should obtain all references to the objcg and = to the > > > zswap_pool at the start of zswap_store. > > > > > > In the case of error on any sub-page, we will unwind state for potent= ially > > > only the stored pages or the entire folio if it happened to already b= e in > > zswap > > > and is being re-written. We might need some additional book-keeping t= o > > > keep track of which sub-pages were found in the xarray and > > zswap_entry_free() > > > got called (nr_sb). Assuming I define a new "obj_cgroup_put_many()", = I > > would need > > > to call this with (folio_nr_pages() - nr_sb). > > > > > > As far as zswap_pool_get(), there is some added complexity if we want= to > > > keep the existing implementation that calls "percpu_ref_tryget()", an= d > > assuming > > > this is extended to provide a new "zswap_pool_get_many()" that calls > > > "percpu_ref_tryget_many()". Is there a reason we use percpu_ref_tryge= t() > > instead > > > of percpu_ref_get()? Reason I ask is, with tryget(), if for some reas= on the > > pool->ref > > > is 0, no further increments will be made. If so, upon unwinding state= in > > > zswap_store(), I would need to special-case to catch this before call= ing a > > new > > > "zswap_pool_put_many()". > > > > > > Things could be a little simpler if zswap_pool_get() can use > > "percpu_ref_get()" > > > which will always increment the refcount. Since the zswap pool->ref i= s > > initialized > > > to "1", this seems Ok, but I don't know if there will be unintended > > consequences. > > > > > > Can you please advise on what is the simplest/cleanest approach: > > > > > > 1) Proceed with the above changes without changing percpu_ref_tryget = in > > > zswap_pool_get. Needs special-casing in zswap_store to detect po= ol- > > >ref > > > being "0" before calling zswap_pool_put[_many]. > > > > My assumption is that we can reorder the code such that if > > zswap_pool_get_many() fails we don't call zswap_pool_put_many() to > > begin with (e.g. jump to a label after zswap_pool_put_many()). > > However, the pool refcount could change between the start and end of > zswap_store. I am not sure what you mean. If zswap_pool_get_many() fails then we just do not call zswap_pool_put_many() at all and abort. > > > > > > 2) Modify zswap_pool_get/zswap_pool_get_many to use > > percpu_ref_get_many > > > and avoid special-casing to detect pool->ref being "0" before cal= ling > > > zswap_pool_put[_many]. > > > > I don't think we can simply switch the tryget to a get, as I believe > > we can race with the pool being destroyed. > > That was my initial thought as well, but I figured this couldn't happen > since the pool->ref is initialized to "1", and based on the existing > implementation. In any case, I can understand the intent of the use > of "tryget"; it is just that it adds to the considerations for reference > batching. The initial ref can be dropped in __zswap_param_set() if a new pool is created (see the call to ercpu_ref_kill(()). > > > > > > 3) Keep the approach in v7 where obj_cgroup_get/put is localized to > > > zswap_store_page for both success and error conditions, and any > > unwinding > > > state in zswap_store will take care of dropping references obtain= ed from > > > prior successful writes (from this or prior invocations of zswap_= store). > > > > I am also fine with doing that and doing the reference batching as a fo= llow up. > > I think so too! We could try and improve upon (3) with reference batching > in a follow-up patch. SGTM.