From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BEFAC25B74 for ; Thu, 30 May 2024 19:18:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A8DE46B009A; Thu, 30 May 2024 15:18:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A63F26B009B; Thu, 30 May 2024 15:18:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92BDD6B009C; Thu, 30 May 2024 15:18:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7459F6B009A for ; Thu, 30 May 2024 15:18:28 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 13055A0A64 for ; Thu, 30 May 2024 19:18:28 +0000 (UTC) X-FDA: 82176023496.05.35FC4ED Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) by imf01.hostedemail.com (Postfix) with ESMTP id 335F740017 for ; Thu, 30 May 2024 19:18:25 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fACs9+xX; spf=pass (imf01.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717096705; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=65JI8ToZio6BOSyQlC9ku0hs9U2pN1DWvQ/o96iLw7Y=; b=v5pk6dHz92oDSQQFGbc1qQw26U/ndKCug7rJWRuXIS9QxBpUIzQxBqJ/xHi5dvE9Ps0zQ2 3TGumyLBYVlP11tkUauCn4Fu03tzHTEK2aaKyZnfrEy6fgyCg0q4h7pGxEXsDmNpQct5mm 78m3fkfPUdYP1X9NYgCOqISW3c28yKg= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=fACs9+xX; spf=pass (imf01.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.222.181 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717096705; a=rsa-sha256; cv=none; b=zwmajtZFwqcpjMXAceJ1yK0adRmP3oymEpzIhtiHhE+vrImVwPBQoMQnoieOl2tyYWy62s b2IWA75UFNoLsvKJ/Yohr7LeLrQjOxZDVgNFd53ECeFGgK9AzTvEhgkGmhyiosh5vxHUOS vj2nGu+JezIeLaFGX06swkikwCe9PYA= Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-794ab13bb88so86566785a.2 for ; Thu, 30 May 2024 12:18:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717096704; x=1717701504; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=65JI8ToZio6BOSyQlC9ku0hs9U2pN1DWvQ/o96iLw7Y=; b=fACs9+xX0fAIjQ6gJ3EFEuYLNsYDnAKZMtCKK52IZW/+5QJn9hGpB/MmZmVMRwYXNd oQXqLatNFsXnu/2NPEPlN6q66Ve7fUl30eeQ/MQLkMOmUKP8zaVMDaeRcnOtK9VXKqfi 0wB0CwOgDh2bU0mAy6vOBhiYf3wVYUETx+SlSCZ2tbtskz2j1vAUE6bYlp+A8ldcCR8D UoNAS0XmsZN4fCblc8QUwpvYGSqzG0LEA5kwXv8fYg5IutduPkNRCzJpXXB92rYnGtwh 5xxdXUcWV07E9ACsasXzhrushFxxGsUSAluSMzRz7VEadGXZFM69zAE/Y9Kr7Ayl5ySJ NMuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717096704; x=1717701504; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=65JI8ToZio6BOSyQlC9ku0hs9U2pN1DWvQ/o96iLw7Y=; b=cI/5uvrm/6MIkpe//XzUSw6hj+iANj9Ldmf/5ABkmyWbv7lkY9ymwGaSJrcYdE4heM 1L2CulwskW9R1/yonF5rA2ZFh0MpxZX7HGH+bypc+VVVz2T5VHZqNaNuBb97VRtvZ/75 K78no9Z6LB9VnliahsiMZavMK+zdBDottUq+NeI9557ou6CcpJ9hI08dXm0zWgRF8hnf pvoiva2xuTLoKvNW6oPK4OWLi6BXsHi9xnzlCJQtOaj96Oov4rYayFJ9Bys6NCNuZJJf fseQW8dV9rXeMMH/up//J31fTuJ7PFQWgFk4gr9u3Vs8KhtplJBKc/VtKAyV1lfG7iBx q8Og== X-Forwarded-Encrypted: i=1; AJvYcCUGw3fR7tiTHmGPDIq0ejfEWLJEEQvuGoXuV41dZTrlyZLjkd8K0oEZnHTPJftUUY2SWhC/QT3lzGBnq9ZAgodl9UA= X-Gm-Message-State: AOJu0Yx9l0kuCy9FPj+xHSbiWIaHdQCXVcAcELXzDUKhejIorYUi02HH 6LMeOtlfeVa+LoqpCQc/E78z6msNLcvaG+ilVk4v9Px9b6cem3cQQMiYwSvXeuoVSsarSle5I/v KyHhvIjnEChYX/h6kbpF+cTECmpc= X-Google-Smtp-Source: AGHT+IEWiwV/N1eKCUtd5lUrfHlt6xQcUtAJI7t+Fy+OVbSQXqv/r8POOph1CO+LGgwz1v/i1F/EE6kWI+qQL8Eeoc0= X-Received: by 2002:a05:6214:4488:b0:6ab:9142:3b26 with SMTP id 6a1803df08f44-6ae0cb5a42dmr34267576d6.19.1717096704187; Thu, 30 May 2024 12:18:24 -0700 (PDT) MIME-Version: 1.0 References: <20240530102126.357438-1-usamaarif642@gmail.com> <20240530102126.357438-2-usamaarif642@gmail.com> <20240530122715.GB1222079@cmpxchg.org> In-Reply-To: From: Nhat Pham Date: Thu, 30 May 2024 12:18:13 -0700 Message-ID: Subject: Re: [PATCH 1/2] mm: store zero pages to be swapped out in a bitmap To: Yosry Ahmed Cc: Johannes Weiner , Usama Arif , akpm@linux-foundation.org, chengming.zhou@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Hugh Dickins , Huang Ying Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: w4ujg58kujqdmcqqeunu1x5rumwytrpt X-Rspamd-Queue-Id: 335F740017 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1717096705-600972 X-HE-Meta: U2FsdGVkX1+qsITH+BIg11N/pjxcRXThcuQOsf6Fh7qV6DuK4pmm1mKmUrT3bGkxqxKstdQpbNDzpoXtsmAw6slNP8NWGwjKWQP1X5SenuEu0qDmpqYVH6lyBo0cYbjMwIYPFYaCzX23VrHF0EuurnIMMUxdE/y7xURS+CyvLWbsun2rfoStTVpEmX+p62vFaKcvacZmUiZFAi8ySIYmJ96z7MVr/Bi2AxgPzN4U7XM4Ms4Gv4S1a6VcShJt0WUOtHz5N8JUtPYhMDpKAwJTkQwYjJC/mCxQGIPrivbuY61ngmdZpSEwxP/MZ7udh1yR1nbe2Rg8NH7MUxkN7oNdE7sxsokP1nkb/KyhIqX2AIRo9DycXiV1df8aTpgv6cm2Tnjrvrdko/8xKOhE8T4WA+UbEiua/sttZPvhNDfbUyfF7mP59LphhKllX9ZflV2/FywAwFtZT94DPpv/0Bfoc5LWK+zsCC9455nxmQ9FHjGE4fEQY51P5Ky6deXSQkl+/55DT1mkORzwKS1p8EshVx1LHGtXFnXZeNbCbUTnVnHDlp11Dzy5RCIry3bc9V6Fvoa/bDX66zISr/OJfrYznYQnktk3Ovl++4czhKBHnueByGRWlYP0aLgh5zBhd7HOr/tMwozpOOIh10ihVDJUUQm59rCwNbI4h53yubGWexVFPKBKqhXUMd0CFIXQVDd3Obz5FiE7PihbhutTy94NabmlmV1iaxdYS0qpFCE/MeZ6IvSAwn6f3PfLiSfNPczX2M18ySnoi+D4IMuYO6CCuHtLZApOEPyGZAowSWsiRhz6mfsiwvYo9kYSnz7mDMTXMSEuGWziM65huBWUgDmk8Kxa1oA+YoiF7EPPaigpJxkUVasvbBbj+3NAbo84iKm6ZLHDKaqmwCr0qhFdB2fLR9Yv/m375CPSAp6B+4P9oyrrqJAFYI/QA2/u0vxNwf/b7AsWeKd1BeO8TiaO7sJ dLkN8k67 RXRKVAEECqWZFnvG94HrvljzRkg3Qt/fURg7DTwZgWuf04rOpIZvJ0K+HUFvLYPa4kAGNQsmEYowPDNHORLRZImk/+DSEHO6BaiPQ+N1NeUa/uYcWXYfGnOaziWcfvZMZGNGVhA4Dm4P81tU2CXyi3AMjsSfQs++QVbQIBmbdoSq1jmICNB82DTgJB7IzJtGz0ePuzOt3jeClWpGm3sTdiLZdL8xrvDot8rIrVxtu4qncUEi84bLf61MR3HcaI6LgpzaCEiiZZGf3P3STJKHUHh3H618b3kej0/Xhi7ky+m5GdujgW8M9E2JwbLXoXu97cc7W6LmoF5oXT3EPChj243/jd2JfT8GAcz5VmMdfyYNqfDgaLYZiG2asYnhBsOIOV4AUCXcy8x89nI6bkmUgAdTNh2xTUjzScPcj1yVEAQ4mVI2vuDhpHDZWvg22iE9aobamddxzvcKU3B1Mzp3IcI9FZrKFkJPYZzdvR+xP3hiwSHKLuGYU8nBUcB0nvfvo3vsDlgf8oYeLJ6o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, May 30, 2024 at 9:24=E2=80=AFAM Yosry Ahmed = wrote: > > On Thu, May 30, 2024 at 5:27=E2=80=AFAM Johannes Weiner wrote: > > > > On Thu, May 30, 2024 at 11:19:07AM +0100, Usama Arif wrote: > > > Approximately 10-20% of pages to be swapped out are zero pages [1]. > > > Rather than reading/writing these pages to flash resulting > > > in increased I/O and flash wear, a bitmap can be used to mark these > > > pages as zero at write time, and the pages can be filled at > > > read time if the bit corresponding to the page is set. > > > With this patch, NVMe writes in Meta server fleet decreased > > > by almost 10% with conventional swap setup (zswap disabled). > > > > > > [1]https://lore.kernel.org/all/20171018104832epcms5p1b2232e2236258de3= d03d1344dde9fce0@epcms5p1/ > > > > > > Signed-off-by: Usama Arif > > > > This is awesome. > > > > > --- > > > include/linux/swap.h | 1 + > > > mm/page_io.c | 86 ++++++++++++++++++++++++++++++++++++++++++= -- > > > mm/swapfile.c | 10 ++++++ > > > 3 files changed, 95 insertions(+), 2 deletions(-) > > > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > > index a11c75e897ec..e88563978441 100644 > > > --- a/include/linux/swap.h > > > +++ b/include/linux/swap.h > > > @@ -299,6 +299,7 @@ struct swap_info_struct { > > > signed char type; /* strange name for an index */ > > > unsigned int max; /* extent of the swap_map */ > > > unsigned char *swap_map; /* vmalloc'ed array of usage co= unts */ > > > + unsigned long *zeromap; /* vmalloc'ed bitmap to track z= ero pages */ > > > > One bit per swap slot, so 1 / (4096 * 8) =3D 0.003% static memory > > overhead for configured swap space. That seems reasonable for what > > appears to be a fairly universal 10% reduction in swap IO. > > > > An alternative implementation would be to reserve a bit in > > swap_map. This would be no overhead at idle, but would force > > continuation counts earlier on heavily shared page tables, and AFAICS > > would get complicated in terms of locking, whereas this one is pretty > > simple (atomic ops protect the map, swapcache lock protects the bit). > > > > So I prefer this version. But a few comments below: > > I am wondering if it's even possible to take this one step further and > avoid reclaiming zero-filled pages in the first place. Can we just > unmap them and let the first read fault allocate a zero'd page like > uninitialized memory, or point them at the zero page and make them > read-only, or something? Then we could free them directly without > going into the swap code to begin with. > > That's how I thought about it initially when I attempted to support > only zero-filled pages in zswap. It could be a more complex > implementation though. We can aim for this eventually, but yeah the implementation will be more complex. We'll need to be careful in handling shared zero pages, synchronizing accesses and maintaining reference counts. I think we will need to special-case swap cache and swap map for these zero pages (a ghost zero swap device perhaps), or reinvent the wheel to manage these pieces of information. Not impossible, but annoying :) For now, I think Usama's approach is clean enough and does the job.