From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BCB1C25B76 for ; Thu, 30 May 2024 16:20:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 969A86B009B; Thu, 30 May 2024 12:20:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 919BB6B009D; Thu, 30 May 2024 12:20:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E1206B009E; Thu, 30 May 2024 12:20:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 609356B009B for ; Thu, 30 May 2024 12:20:49 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 17C7980F78 for ; Thu, 30 May 2024 16:20:49 +0000 (UTC) X-FDA: 82175575818.25.3BC6AD1 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf27.hostedemail.com (Postfix) with ESMTP id 2EB9F40026 for ; Thu, 30 May 2024 16:20:46 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=c3tpq0s+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717086047; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Kl/GGM9FEQNmC7lNrCDznGozkd6NHwhfF13Ha04rMtw=; b=iGcEbAQfynPjbPsJTlBLAw3D7R2xgSvG+PJ9eAm/of24nQju6xAfoqsx1weAIRStpc0cYh eYjnsHwUBDnYnmHCd+3IxGU9CmHSRGDa1IP/1VinnN6wfr5N9kCksRv7nso8pwCjJfVQ4v 2Daev/n4J4I2aJSoLqpzmEZ1dGlQQDc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=c3tpq0s+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717086047; a=rsa-sha256; cv=none; b=GKMCSEjj+QZMfDLMAqDaDR8yRkDpPQa1N+y4i9j0Iq32zJcydmfijMMD7nzevNfmgCSdkj lp65f2m5tdAP3+nxO+b/CqxY7cjBGi+h3iP7T1HVxx+NLiXif7bnZ+B+1TcSpnI8vZ4Um9 WZA+0o8Z0prHxVL1QLzrfeZQPtDQfTA= Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-a673a60f544so36513166b.3 for ; Thu, 30 May 2024 09:20:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717086046; x=1717690846; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Kl/GGM9FEQNmC7lNrCDznGozkd6NHwhfF13Ha04rMtw=; b=c3tpq0s+rd9faRkLaoigSH3tGUt3MgtiZh/1G9Ms/drD3c9gHFt6qWVJOotyNOxRXI Wvd1Z8A7xQArYThNjRAC+XEhH+b9GpnbJbJ7sy89YCZkJRRFmr231yZ/He8L3f+uoJn4 4SXl9Rtc10TEp0jo5vsondqQWrJ34trRJXbuk2VbNzTOBySsamKYYR7PHMYp2KLLbRrM U7dql81WK+axK8KTSHYUCzP6+BeFT2DBTQOKFzqyyBGdL6AZH1G+OlVEhEqp4ff2Fjkf iIazPiIDo1P8VD9u+EO2cjskVlJc0TCA/NVmJPWQMf5fDJ+Di5poRy2VALfePxqb81ua euLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717086046; x=1717690846; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Kl/GGM9FEQNmC7lNrCDznGozkd6NHwhfF13Ha04rMtw=; b=ofBPruPW8UQCfTrOhoNRLE3G8k/L4/NMfFUKbW9YwjroDSaIwr0CLFa8vhoVZeJ0Is Fyec7TLOjNr9mJUpAHJKOnqub3o4OaZoaj1ieZxBPVR7nweWGZz1y0ZAlyrdk3gj6qBk VMWID30paDoaSJZpgZGsd2DAtYH2GruYmdyGaMjIARfcK2TJ3ouqXW5ODXSkBmW7/Jfd Ou9iHRkUo4piJYVVnx3RSUKc53lw76CcD4CzfhxSDhFkqIUBQ2B5SVK3bV5A+HLN6jan y4gDHWbXDAt0ERPdtdS8Iu4CATzN7Nc3H3VNWumuUAarFyH2lkK/Q7b8ATgsREQpHcdw oS0Q== X-Forwarded-Encrypted: i=1; AJvYcCV26N/CSjzFJZZJhiqTBRIXsI+dG4C4x4LQfN71aUZJl4sNkMxTLvWJSlu2zpHDP7JG5DiuJPKrshpBin8GAGA+WU0= X-Gm-Message-State: AOJu0YwIGiNJvPh1gI04bYnjYqdfyYhhCbUTnG3pyRy4n8WrcGnb8C71 WbJp7NdNDIunXrncKy8Brlnr6uTekmjaxhMrhPZiXJ6DhsDYEYJlQvLzQBTxcr9PfqWMoBh+F4k oPYAAAiLhMG3l+hzYEDVbqTYeKPfmlzzL0Ylj X-Google-Smtp-Source: AGHT+IE99vKgOVmVQYgmhzEBWXRWfHbQ0PozVSW5DerT4/50qVdSiaudvPuYP6CQetEqvnOnFqyicngxT1Zw39JQ8gI= X-Received: by 2002:a17:906:11d2:b0:a59:c62c:212d with SMTP id a640c23a62f3a-a65e8d36768mr160285466b.14.1717086045429; Thu, 30 May 2024 09:20:45 -0700 (PDT) MIME-Version: 1.0 References: <20240530102126.357438-1-usamaarif642@gmail.com> <20240530102126.357438-2-usamaarif642@gmail.com> In-Reply-To: <20240530102126.357438-2-usamaarif642@gmail.com> From: Yosry Ahmed Date: Thu, 30 May 2024 09:20:06 -0700 Message-ID: Subject: Re: [PATCH 1/2] mm: store zero pages to be swapped out in a bitmap To: Usama Arif Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, nphamcs@gmail.com, chengming.zhou@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Ying Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2EB9F40026 X-Stat-Signature: a9akx8chjoi41dsgsws7h4ne616swc1m X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1717086046-513388 X-HE-Meta: U2FsdGVkX195PC4zTOe8bEi/4WmpSTbEADMHMJX+FzRdigi1WnVufze/nv/6EIkeHM1Fsk9GaG3uqaoyfYjaHp1TdAe3hgtbthn9zqR8FLXyqcVoTsb+UblaiVZ8nkyzC/VuA+l5pfx6Lp8KVPaAC7LFsi4r3clZ055noRr5AT/Bi0528mRRz6/2MvhHj+Goyhs1vFOZGG2XtVOjZWVd6cqe/J+lbnCwulKrJTUPkh74RHmhDqnXyxlaa9WzH5Giehrd0+Qac7a/4kHB5XqWaUq7eSq28qfG5Ei2a+cNhZbbklthHlwva2g3VH/QasBGBPvTaoGf1YL8y0FjuX5h33bUIwc4UzVpF7hmWlSCq8vQGEbgiX7jlCpQM50JzyBZm7KBGUfPeR6VJ6kXehajdpNPjamA5Vc8FemaLvyseEmdYcmumWPiU43x+kWDqHrRB/nrM8505KXosxHQFoDWJHxeT3r/8MdYeaiyDrwiLmuKSHYGwoWrAfWgx6NadbXqkojYnFKEqt+Ce8iSHz/QdynV7Z3ZmQ3HxTV5tR8zu4kjFtCJzEduDcjZLa5hDM/tZeP7m01YTXyhuKub82NJDmSxk4irUnQ5hMbMsRBoWxN7dB/qZ9xrlFAbMskKA2AGCoW8iSkrI22wB6qAfiAqqL0rjq/3FfUTww+8xuz+Gzc2eALuFAxqc+83Sc6RIeOYF+zoJWHTI5BavVS9Vku+vTuqVOAjBDwp5vTVNB1ioZhvSBW4lL3ixDntFaMgkfU7HqDkClYYP8pnOsHv4Tv9rWRMiR3BXYV88AxEg1MYAIEhLoBYVOtDYawgTTFAoVfbW34xbAbQh913xRNyCIMxMxuYQAV9cn28q3k5/XDZDh/FYRhW0g6eTlvG9gr3HgQL5SMYyb0Os3fi3coU91cZgl0PrDF3VyM+MhLqg+zAfnILu5XLTQ9G67qV7c6+KuVGScIcLbJpa+pMNyWdd8l O/mg7TKy TWg7623Zh5ztvCOTWrUiQhlVN9sIe+chQ8K8Zof75xNSL9MtMT/NHF5XxjFGeoW7pB0i8eN9MNqfgKgEUvKWwzDCh+uiScnawKgDGxLC4zHgVYpVUXfRdCyGU3/tb593FFCP4xNW8X+CfYBHvwf924+1xa2tPxHJfX7EpNrJwzycb5njG+OgaNGX2mKULpUnDO194AjCk3vO4myBHz61gqtPsSVkCvwI40I9rjLbuGB/1FFjBEeYk9PzLK2S07Y7jDBiacnb4gaJFMg/hy+YbeSfcKBzdb3opFa+uXs/v+ACynUwZDqz75nQhzeplaAWeJuDmW1uQXW5M+3wONDQk8nBh2uzGY1rRUc456fwJtrWd3Vo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001392, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 30, 2024 at 3:21=E2=80=AFAM Usama Arif = wrote: > > Approximately 10-20% of pages to be swapped out are zero pages [1]. > Rather than reading/writing these pages to flash resulting > in increased I/O and flash wear, a bitmap can be used to mark these > pages as zero at write time, and the pages can be filled at > read time if the bit corresponding to the page is set. > With this patch, NVMe writes in Meta server fleet decreased > by almost 10% with conventional swap setup (zswap disabled). Great work. I thought about doing this after my attempt to drop some of the same-filled pages handling in zswap, now we can drop all of it :) Make sure to CC other non-zswap folks on next iterations like Ying. I see Johannes already did that in his response. I suspect get_maintainers will give you a few extra names. > > [1]https://lore.kernel.org/all/20171018104832epcms5p1b2232e2236258de3d03d= 1344dde9fce0@epcms5p1/ > > Signed-off-by: Usama Arif > --- > include/linux/swap.h | 1 + > mm/page_io.c | 86 ++++++++++++++++++++++++++++++++++++++++++-- > mm/swapfile.c | 10 ++++++ > 3 files changed, 95 insertions(+), 2 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index a11c75e897ec..e88563978441 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -299,6 +299,7 @@ struct swap_info_struct { > signed char type; /* strange name for an index */ > unsigned int max; /* extent of the swap_map */ > unsigned char *swap_map; /* vmalloc'ed array of usage coun= ts */ > + unsigned long *zeromap; /* vmalloc'ed bitmap to track zer= o pages */ > struct swap_cluster_info *cluster_info; /* cluster info. Only for= SSD */ > struct swap_cluster_list free_clusters; /* free clusters list */ > unsigned int lowest_bit; /* index of first free in swap_ma= p */ > diff --git a/mm/page_io.c b/mm/page_io.c > index a360857cf75d..ab043b4ad577 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -172,6 +172,77 @@ int generic_swapfile_activate(struct swap_info_struc= t *sis, > goto out; > } > > +static bool is_folio_page_zero_filled(struct folio *folio, int i) > +{ > + unsigned long *page; I just recently renamed this variable in the zswap version of this function because it is very confusing, especially when looking for struct page references. 'page' is usually used for struct page. Let's use a different name here. > + unsigned int pos; > + bool ret =3D false; > + > + page =3D kmap_local_folio(folio, i * PAGE_SIZE); In the zswap version, we compare against the end of the page first before the loop, in case the page just has a bunch of zeros at its beginning. The patch that added it to zswap reported better performance in some cases [1]. You can also use memchr_inv() to compare the range against 0 instead of the loop. [1]https://lore.kernel.org/all/20230205190036.1730134-1-taejoon.song@lge.co= m/ > + for (pos =3D 0; pos < PAGE_SIZE / sizeof(*page); pos++) { > + if (page[pos] !=3D 0) > + goto out; > + } > + ret =3D true; > +out: > + kunmap_local(page); > + return ret; > +} > + > +static bool is_folio_zero_filled(struct folio *folio) > +{ > + unsigned int i; > + > + for (i =3D 0; i < folio_nr_pages(folio); i++) { > + if (!is_folio_page_zero_filled(folio, i)) > + return false; > + } > + return true; > +} Is there value in splitting this into two functions and having a separate loop here? Can we just have a single function that kmaps the entire folio and loops over it in one go? > + > +static void folio_page_zero_fill(struct folio *folio, int i) > +{ > + unsigned long *page; > + > + page =3D kmap_local_folio(folio, i * PAGE_SIZE); > + memset_l(page, 0, PAGE_SIZE / sizeof(unsigned long)); > + kunmap_local(page); > +} > + > +static void folio_zero_fill(struct folio *folio) > +{ > + unsigned int i; > + > + for (i =3D 0; i < folio_nr_pages(folio); i++) > + folio_page_zero_fill(folio, i); I think you can just use clear_highpage() here and drop folio_page_zero_fill(). It should be more optimized as well in some cases. I don't have further comments about the rest beyond Johannes's comments.