From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7C35BC0218D for ; Wed, 29 Jan 2025 05:48:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F4B328002C; Wed, 29 Jan 2025 00:48:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A520280029; Wed, 29 Jan 2025 00:48:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EAF0428002C; Wed, 29 Jan 2025 00:48:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CC23E280029 for ; Wed, 29 Jan 2025 00:48:49 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 76493B0441 for ; Wed, 29 Jan 2025 05:48:49 +0000 (UTC) X-FDA: 83059410378.29.A6B8599 Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by imf13.hostedemail.com (Postfix) with ESMTP id 7B61E20004 for ; Wed, 29 Jan 2025 05:48:47 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dVqOUNvZ; spf=pass (imf13.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738129727; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X4Flo92Ws1CKjLfySXJwEmj1yJvKOvZWv8ayX8yn5X4=; b=P6NYe+HZTCwiSP0MduEo+t6c6gJF5mqQuqkVURj060K7wKzh1uutdCYcVlETUqDlLDDFfq Pe/NUBTn7qgg46Xj+Y1+djkVfO273zaeTp1+5hfON1yRg6IDpcxT/Sc7WvBCNAbRCXgplm bdPRqUXuXF3JKBnq8amSkOQRl3BFaB0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738129727; a=rsa-sha256; cv=none; b=shQeYq6wczrFVFdK+5W02W9ABsM3/NbT0/TwSGMfgsPKuJb649rdW4pXiSh6h4M7eqDQJt 2dSMlF7ZWd8Ul8xstlnhAQwsPc6/RGkruyjkxDVY7a1KeHEdr3X2A45hTmFKDKWRaj+Hnr aoYSTZ0jfQi+Z+tCJ5y7HZMU/rl84+I= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dVqOUNvZ; spf=pass (imf13.hostedemail.com: domain of 42.hyeyoo@gmail.com designates 209.85.167.43 as permitted sender) smtp.mailfrom=42.hyeyoo@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-53f22fd6887so7011304e87.2 for ; Tue, 28 Jan 2025 21:48:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738129726; x=1738734526; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=X4Flo92Ws1CKjLfySXJwEmj1yJvKOvZWv8ayX8yn5X4=; b=dVqOUNvZAHzHEBZIPOKfsHizXYvC5HZ0J1RTApXSUsHec0HudxegTVGVvS6VWYwdN9 lLAg38LAfp1oTuAZS4tPCDEvc/DTfh3ELB5Qp3G40MQpRtmsp3RxKQDLTeryBQtBQrcV xl4AJBHF6NHCt7TxNlLR1zUnhZEYUOmzx/aZLnBUyWHHhk6IQUz91OijLxAx5j7LmZK4 TMI1f2zMwy2KaQ6Bc06ZI50Ta+3LsOSMeFJUcvbdiVL1/qisLJCpYFlGRIjwfu+j+qbX UP4sVEL6m19+aEuiQbHaVSRk5mm174pKpe/ijDn3m0PjGwmnH0gqQAlFwdhWJEUpwnRF InxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738129726; x=1738734526; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=X4Flo92Ws1CKjLfySXJwEmj1yJvKOvZWv8ayX8yn5X4=; b=Yyichs7w+MsgN8F6tmzUQLje9SiFPj7yDGlWyk7vzXmzZEqzzB+UJR8Utohf6hlOTV auOBuKZru9zlHa+I3H8enfgFSAqCnZ8010wVs8MwnjAhvdutnKHu0QExaC9pyAGQgzQ8 e1SuDcfIjB2HtqlkTzDyGSeAK9R/cO5TMKz/WIqICdArC1lqQSIVvkKzxaN8bheUT8vx vqALB9/QkZStlkDU2Z36qw9g/AT5av9cySAJ76HoQ7Md5OdzfHAjergl3D9KYbaHxgDi ts57tLHhrt7JrDoNS8AUycKW5B93+VsexMelxWxrM/qlFrheE3ALqFXGGDinsNdo9VHu ATKA== X-Forwarded-Encrypted: i=1; AJvYcCXQUOY2z+HfS8UKnRsuXtNPrcRtWE19ldCHV6KE7QwYV6wyJUuPC4+WSO5OdTFEB7NlmvqdVADaYw==@kvack.org X-Gm-Message-State: AOJu0Yyq2malg9CHoGqPmQ+EdZm6aZ7hYWZ+lKMDNEkQoTSKww22I1ly RP8/Wge5beS0BAq0pvt/e5UgjTWs3sRw/npLgtzSGAEZ1+72so7Zn3uEIo+Jb0a0lQadrsbpypH aoKtay9GSaaOhZQ41CzYSsLCOpD8= X-Gm-Gg: ASbGncsyQmCT1E0CoKbXFNaPQwRXZpOXGps7x61IUL8tO4V5bvvP3Y56mhCK8I4T52G aULBGjKiXNERRFozKEbgvv07uFLhLO88lWX3eesQ4lOgfFa48XECh1APmEsMRonsFUmXeBCMoFH 4= X-Google-Smtp-Source: AGHT+IH79ysMd+zfQo6uBZ0P4S8xhWIAGh1/HlmQybma6Yavn/m0erKd2wqY31sKpFEx9Z05ILrtR9RrQbjx39Y9TrI= X-Received: by 2002:a05:6512:2344:b0:540:1b07:e033 with SMTP id 2adb3069b0e04-543e4c3e1afmr517020e87.45.1738129725446; Tue, 28 Jan 2025 21:48:45 -0800 (PST) MIME-Version: 1.0 References: <20250128185507.2176-1-42.hyeyoo@gmail.com> In-Reply-To: From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 29 Jan 2025 14:48:33 +0900 X-Gm-Features: AWEUYZmVHVUggp5omvsnyTnYuvT9BZPcI1izdir3u0U3m6Yig7rXwfsLREKHx2o Message-ID: Subject: Re: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging when zswap_store_page() fails To: Yosry Ahmed Cc: "Sridhar, Kanchana P" , Johannes Weiner , Nhat Pham , Chengming Zhou , Andrew Morton , "linux-mm@kvack.org" , "stable@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 1wczeo15kne9yk8ek6r3r7gki1ps1aw3 X-Rspam-User: X-Rspamd-Queue-Id: 7B61E20004 X-Rspamd-Server: rspam03 X-HE-Tag: 1738129727-910884 X-HE-Meta: U2FsdGVkX19y3hVm5ljZn5cV/oWaNpQoHduZ2YDurld4UpnV/fWehI+q71mWte9vL9JrU+TGoqStogcHD1+uVgC1iAVTFU6doMsaPDMk6zh6eXxkURdYB14mP8s1ONcNyHKRj73GdwftYKyvLGlO5viIIzCKfRdxBg829Orq5ai1Iz9AK3PTszWXo0w50GZle55HuSfTXsCsQZppWBSLoB/9zglO6KLuHb5uK0eLa6fWI7U6NuzqtSqI502bMPe1YoBkuDQT23KAdFFYo5+CdL64fEA/rf9xT7s3Ft8ID+eArcJMdg3/1ri04YCI+ZdvCLQnT+EZ6/z1JlRwpEG5W2vtlEMVxtrihg90aQiQN3kuLEgdHez+UO8X+NmCC3sWeUTreq3uO2//qjyq4da1z/tfqqLCdVeSxPqEJaT6+q4sc/Z7zfhLpOa/WhYs/suAmkp9YRBz6bfM7batZdJfauJxy+OLLZhytCnAxKD0zXKRq2s/hxf3tyGD5nuQfhOo8KpesFX19QO1AIAmzY/FU4xcF8VFWxKPocJ/DoPcYe5yiiS6CWbR01eLImQYypbBzEZxKiIHWxbtXTKOJNyYJfFEuLgagPv0YoM6tqbL1dqH53ZpZcqQlkWOus5dlbyepBDQZCbGDRwIi9b9OORuLV4C+lTcrg8TkpRHODh5TsCT+rJ+MLS+iAcQ4CLqfr0vSIaSB18sX0bsxFScsoyzyOvKfCyizEvWtI1wuKwxFEl9qNAXOTSJL7eEYqZ/xEvPjNm7526PbEl0CEJpRlIrbpUcbet9WPWW0dxkqg1wKPCxM7HpgO8XJb7B1xRhJLiImwkfZ7oMPMrFSh29UQe2klDzbT20ShhM2UOuKTmPWvQDrmMJNVC8iOQrg409MLsfIs+pgOhGvySk8COhX6DQj7dP4bOiLRFH7T63tg59zi+3xMPYaIr/gT/IRhn036t1JnhZEXTwd28ggiUEWaJ rtd0OWGZ +EbwOL2CL+NujQaqUelmqDHCdBntoX+jr3dYtNqJtUpdDoPZ3d4+W+j0Vbpn+hjz7DcMyIKTrYb26DyGK+hLZuXVW8b/ixC6FRQWvoRiYCG2PhzMdjTWEi/lPZ0WL2e0luTjufw0Gfe9G3+DGUeD65tALmVTfSmZLV2TdL+DSjVnh90aIoWwMF4/Q1ED6C4Ur3vvMxepdqTAQs2GhX7MIXh8DOzsmi6IWyoPbjFbZ4BtORE4uvT2DnmEEmEJKPEeh/JS8AphuwQzpthAkeMe9sHslfEbMZ2lUxSuHzVQ9MRwsMyGYj3nXq1wPv/yiolZyghocN4tec3Pj/3tdxHJs5VE7AU8mhnyVGpBUCk9mMRIL/GaamxWYSw160e/5u0pkGjzD2SaBZDdrda2jKofsm43S1XBIsdIsFeyIggYje9r+5vQszQmxh5ENhteNkggDtMVIB+yE48v9UpcjPDBVftbD6dS+A9uqHde+5UTpG02SmjXYM/hUF6alL9wbUw3VKCOEVlXWZIjnvVo/UVoWjVJ3yJnhUDTF4Wpgch8ZyvPkzTwfbCn1yXvG5gTCUj04zOYLysuMRXMvcsr54hfgzTmN/BiuKbbh7uWEv1/Q1Y16F9gMp91uQSukMfvGy7PMLF2pv6EKYWGktVfGJT2hvW92QG26ge3Oz21Svk6J2unQNeca04K1+c+aFS8gKsffyqrulncaJhEpumA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 29, 2025 at 4:19=E2=80=AFAM Yosry Ahmed = wrote: > > On Tue, Jan 28, 2025 at 07:09:05PM +0000, Sridhar, Kanchana P wrote: > > Hi Hyeonggon, > > > > > -----Original Message----- > > > From: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > Sent: Tuesday, January 28, 2025 10:55 AM > > > To: Sridhar, Kanchana P ; Johannes Wein= er > > > ; Yosry Ahmed ; Nhat > > > Pham ; Chengming Zhou > > > ; Andrew Morton > > foundation.org> > > > Cc: linux-mm@kvack.org; Hyeonggon Yoo <42.hyeyoo@gmail.com>; > > > stable@vger.kernel.org > > > Subject: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging w= hen > > > zswap_store_page() fails > > > > > > Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store(= )") > > > skips charging any zswapped base pages when it failed to zswap the en= tire > > > folio. > > > > > > However, when some base pages are zswapped but it failed to zswap > > > the entire folio, the zswap operation is rolled back. > > > When freeing zswap entries for those pages, zswap_entry_free() unchar= ges > > > the pages that were not previously charged, causing zswap charging to > > > become inconsistent. > > > > > > This inconsistency triggers two warnings with following steps: > > > # On a machine with 64GiB of RAM and 36GiB of zswap > > > $ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng > > > $ sudo reboot > > > > > > Two warnings are: > > > in mm/memcontrol.c:163, function obj_cgroup_release(): > > > WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1)); > > > > > > in mm/page_counter.c:60, function page_counter_cancel(): > > > if (WARN_ONCE(new < 0, "page_counter underflow: %ld > > > nr_pages=3D%lu\n", > > > new, nr_pages)) > > > > > > While objcg events should only be accounted for when the entire folio= is > > > zswapped, objcg charging should be performed regardlessly. > > > Fix accordingly. > > > > > > After resolving the inconsistency, these warnings disappear. > > > > > > Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store(= )") > > > Cc: stable@vger.kernel.org > > > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > --- > > > > > > v1->v2: > > > > > > Fixed objcg events being accounted for on zswap failure. > > > > > > Fixed the incorrect description. I misunderstood that the base pages= are > > > going to be stored in zswap, but their zswap entries are freed immed= iately. > > > > > > Added a comment on why it charges pages that are going to be removed > > > from zswap. > > > > > > mm/zswap.c | 14 ++++++++++---- > > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > > index 6504174fbc6a..10b30ac46deb 100644 > > > --- a/mm/zswap.c > > > +++ b/mm/zswap.c > > > @@ -1568,20 +1568,26 @@ bool zswap_store(struct folio *folio) > > > > > > bytes =3D zswap_store_page(page, objcg, pool); > > > if (bytes < 0) > > > - goto put_pool; > > > + goto charge_zswap; > > > compressed_bytes +=3D bytes; > > > } > > > > > > - if (objcg) { > > > - obj_cgroup_charge_zswap(objcg, compressed_bytes); > > > + if (objcg) > > > count_objcg_events(objcg, ZSWPOUT, nr_pages); > > > - } > > > > > > atomic_long_add(nr_pages, &zswap_stored_pages); > > > count_vm_events(ZSWPOUT, nr_pages); > > > > > > ret =3D true; > > > > > > +charge_zswap: > > > + /* > > > + * Charge zswapped pages even when it failed to zswap the entire > > > folio, > > > + * because zswap_entry_free() will uncharge them anyway. > > > + * Otherwise zswap charging will become inconsistent. > > > + */ > > > + if (objcg) > > > + obj_cgroup_charge_zswap(objcg, compressed_bytes); > > > > Thanks for finding this bug! I am thinking it might make sense to charg= e > > and increment the zswap_stored_pages counter in zswap_store_page(). > > Something like: > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index b84c20d889b1..fd2a72598a8a 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -1504,11 +1504,14 @@ static ssize_t zswap_store_page(struct page *pa= ge, > > entry->pool =3D pool; > > entry->swpentry =3D page_swpentry; > > entry->objcg =3D objcg; > > + if (objcg) > > + obj_cgroup_charge_zswap(objcg, entry->length); > > entry->referenced =3D true; > > if (entry->length) { > > INIT_LIST_HEAD(&entry->lru); > > zswap_lru_add(&zswap_list_lru, entry); > > } > > + atomic_long_inc(&zswap_stored_pages); > > > > return entry->length; > > > > @@ -1526,7 +1529,6 @@ bool zswap_store(struct folio *folio) > > struct obj_cgroup *objcg =3D NULL; > > struct mem_cgroup *memcg =3D NULL; > > struct zswap_pool *pool; > > - size_t compressed_bytes =3D 0; > > bool ret =3D false; > > long index; > > > > @@ -1569,15 +1571,11 @@ bool zswap_store(struct folio *folio) > > bytes =3D zswap_store_page(page, objcg, pool); > > if (bytes < 0) > > goto put_pool; > > - compressed_bytes +=3D bytes; > > } > > > > - if (objcg) { > > - obj_cgroup_charge_zswap(objcg, compressed_bytes); > > + if (objcg) > > count_objcg_events(objcg, ZSWPOUT, nr_pages); > > - } > > > > - atomic_long_add(nr_pages, &zswap_stored_pages); > > count_vm_events(ZSWPOUT, nr_pages); > > > > ret =3D true; > > > > What do you think? > > > > Yosry, Nhat, Johannes, please let me know if this would be a cleaner > > approach. If so, I don't think we would be losing a lot of performance > > by not doing the one-time charge per folio, but please let me know > > your thoughts as well. > > This is certainly cleaner, and thanks for catching that > zswap_stored_pages cleanup is also wrong. Oh, yeah. Thanks for catching zswap_stored_pages. > I am not sure if this has meaningful impact on performance, but it seems > like we are doing a bit more work in the common success case to avoid > the work in the uncommon failure case. Right, but at the same time I think we need some evaluation to sacrifice readability for performance. No need to be in a hurry to optimize when fixing a bug, I think. > Moving the charge (and atomic addition) above the zswap_store_page() > loop would be doing the opposite, albeit less clean. I think that wouldn't work because zswap won't uncharge for pages that zswap_store_page() fails to store. And we don't know how many pages are going to be stored in zswap in advance. IMO v2 of this patch is efficient and it works while charging just to uncharge doesn't look great. > I don't feel strongly either way, but I slightly prefer the latter. I'd prefer writing more readable code because it's a hotfix. Let me post v3 with Sridhar's feedback (the former) adjusted!