From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 394B0D5B845 for ; Mon, 28 Oct 2024 23:03:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E6FA6B00AE; Mon, 28 Oct 2024 19:03:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 66F126B00B0; Mon, 28 Oct 2024 19:03:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 499616B00B2; Mon, 28 Oct 2024 19:03:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 2816F6B00AE for ; Mon, 28 Oct 2024 19:03:26 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 780EE140237 for ; Mon, 28 Oct 2024 23:03:25 +0000 (UTC) X-FDA: 82724538582.15.9E5D41F Received: from mail-ua1-f47.google.com (mail-ua1-f47.google.com [209.85.222.47]) by imf15.hostedemail.com (Postfix) with ESMTP id 56600A0036 for ; Mon, 28 Oct 2024 23:02:59 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Qs7Cj/bE"; spf=pass (imf15.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730156523; a=rsa-sha256; cv=none; b=b0y/24Kusql+k6jSCLIHI2TlvjldASeUUgcBdSzC61PXZ95FIbjRgHfAQDxL4GuhHPxAvd R41fQPEhg5wnhoPsaQmYdUPgm6r+PO/uTwTeEbmlM4E/SUwQ45ebArYrJKZF5VhjPDZg0l po9CM2fANXlBF2WTlAtEXgz+ngJqP/U= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Qs7Cj/bE"; spf=pass (imf15.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730156523; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Azy0v9jtjkVLRmMSUwvVawjYHa/FT9WykC0F8frkrbY=; b=wK9/SRZZtKLMi6bT8E7z1R09cnWBVcaZKo7WGc9QQBmwE12elU+ZvQCC8DCPOvj6f7DWfz 4bMsYmQJ99svyrswuQOB9BzbuLy9ecNHO3ocIG6Xku8oy5jZEPnBqtugxsitXDhqPtyiCL KHgS6cJngqBDG9jNjS/Q6NitYa8liy0= Received: by mail-ua1-f47.google.com with SMTP id a1e0cc1a2514c-84fd616acf0so1596593241.0 for ; Mon, 28 Oct 2024 16:03:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730156602; x=1730761402; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Azy0v9jtjkVLRmMSUwvVawjYHa/FT9WykC0F8frkrbY=; b=Qs7Cj/bEIm5GsVZ079XThtAvFoR/VO/LiffvgffDsWHP2QTWOj8hY8iU9kk6ZM3S+W si7sCiyS8sBUHrmZTEktDRksa0hBcIz2YI/GnFOHaVXrjyohSbZmyB5xcTMFt7BYoK8k vk3E+FP5h1gx/QV0E9N1ZvAXAKKSg+5ZSVrgoDNRsSVurJKVgCMGW13Fv2ISY8B36GBj ytICvUBww3OsDWDWSUlZ/NTSKZklsTAOPTQELsNh9CfM/pMoLzLz7bnI7NnmemUN1jiO iBErtakwfiVl9nZiO/9BZfvbt+zfUYnZFrnVXl2RXtvRVSkpjivZoT77YSs0/gkCLR4i /zTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730156602; x=1730761402; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Azy0v9jtjkVLRmMSUwvVawjYHa/FT9WykC0F8frkrbY=; b=gxRSMqTzIbKa+yIV+sHbZzPzTlwdSJTj6GhtD3LgUjD4+tO5cKVxvM6adPBfliBP8L fMAkEN2qJIDmXtPkDuaRWxlneOt37E96dDzQkJL/lAheQGL8MWBhcvgV+BgF08CKTqgm TiPqi4KL2xAdB6zZILnTmlUJZCCh5FzxBqZs40VkuwkFEzB/qnc71yciplS6yfPohOnd c+EqFAECuE2fCiRXYzxD/pKnB45U1RPTDIEGS7jIRo2Q+d+bpr87eHWNBEA92HZOX1gT IV4AT8byzBv5rFVGgFvKOKPZMPXuwkL/EgY26LK2Fj38zATKxanrvBzTtPB7bUx1bTB6 iFBg== X-Forwarded-Encrypted: i=1; AJvYcCU+WYxuD9CbvjXR5IRkvh/Lms8V+Imbi9F0ytorMTlz1NSBZOTwJsB8YvcGBUCREMxUTcdEyKTRdg==@kvack.org X-Gm-Message-State: AOJu0YyIjT5dBmR1IKXv7fRhvQTazairEKFmSNbvjwI3oqh3lPxrtyvO KHF7WmJlZG6u1WZViZoIKxbSiVJ6IYO2tQycAZDgtk1Gbl+yG/UtpTKPUJMX5nDxDJWWUIzgdEi oyBl4hOHjZx9O+Qavt+gdmXyF1Wg= X-Google-Smtp-Source: AGHT+IEGbeMUDYWfp+H60X5Eb7NfSJtT/bmNi74QInOCuoBT3cy9A90I/F9VBhCDZLNwMJsfzZ8QBVFvElzFOSBbR/c= X-Received: by 2002:a05:6122:1796:b0:50d:a577:dec0 with SMTP id 71dfb90a1353d-510150306e9mr6602082e0c.5.1730156602563; Mon, 28 Oct 2024 16:03:22 -0700 (PDT) MIME-Version: 1.0 References: <20241027011959.9226-1-21cnbao@gmail.com> <678a1e30-4962-48de-b5cb-03a1b4b9db1b@gmail.com> <6303e3c9-85d5-40f5-b265-70ecdb02d5ba@gmail.com> <64f12abd-dde3-41a4-b694-cc42784217fb@gmail.com> <882008b6-13e0-41d8-91fa-f26c585120d8@gmail.com> <228c428d-d116-4be1-9d0d-0591667b7ccb@gmail.com> <03d4c776-4b2e-4f3d-94f0-9b716bfd74d2@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 29 Oct 2024 07:03:11 +0800 Message-ID: Subject: Re: [PATCH RFC] mm: count zeromap read and set for swapout and swapin To: Yosry Ahmed Cc: Usama Arif , Nhat Pham , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chengming Zhou , Johannes Weiner , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts , joshua.hahnjy@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: ickj513f6tm9r87h7f9ygohadyius9ij X-Rspamd-Queue-Id: 56600A0036 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1730156579-329164 X-HE-Meta: U2FsdGVkX18Ea04egvfUYVykb62sSLjGtHnlFP2SsUkm4ZnqMXOC8f8kq/UW3RnsbADS1c2CbuPUgi2T2A8Pi9kXzGpAWiZTbk9gdp+9FETx6pneV48+opKSGZTgEXoXxVNRyprrQTqus2SvXgYg7ten5cZG3PyNSzyHOjg9ynOwsIGxJHT/uYzGAlUHipMhZf7Cs/3qk8YTi/YzgcgTYZSboeYsXapa03DfsnNE6DKZOSbyIdT/q4W3IYm3sEk+Lztt1H6p4nFY+ban2X/ua9KSBNrdpUpYZgIs8JuF3ktsSsjSmo1eatkmgs+MHNgg69zxXqrJaIxTiVjdIyV4mA0Nh9qD58XUPS2W7a3M4Pd1/hDpHpaIv/OpAjllK32PfmsiYlOkQOETfpTo2SZpnKPvCHjT+89Gp7YtLIdOEhQiiGGcysUfHN2Vbk97OZQCHNxstNzBTyxhaNl8DJAbkPOCAZ5aW8Fz/PR0vochC0Pnf8teqbKswl3U6/DGDh1sa9//EmO1y294sMjjSQdHc8Za4k9C0FRkYuqpq75YNb1+ENhmayBOcDvffKBglvzDziKFdVX0c14UItksHpRKBTJP+9OUcGrUl1TL80PhjBQGvC5w9oBlmRTjFyY+96jF/L0M9vKyOCUGQipUcfVoD9KShCoF0IwG5EdIiJR4NgfZeuh/A5SqyYE8au6clGylFNsiJjjls6+tH8eR0fa3A54AxWev/HcQtp3sYhkDfO6CFve7pxL2OhINwHd3d+eNugA1qoksIRFEhrmVMVU//MLLYw9R4uzoZRSei2nfjUqXplyqIbtAUFfexDDq8P8XMmVS/1TTLutVp/E3hLM7jHRC+EOVZGFsy6QvEoUQp7dQwVVL3NjeDzOAzPuGnw/o+SjP2fjx4swBg0/H4Osqv1FLWsQdooGCTGNj/8+KTRwGzBE+ey8hJjlQRAUva4+nV5dksuB+zd1Br0rmU46 +oUaiZYR fiRo50t09hT59sxaYCRvhqzvVW/XOXqy4lMxlXhXcZUesUoErxX12mFlm5TrIu2YR+b9TCyOPCt/Td3dcXItbopaaWoRYjQBOv7s2zHHyTkJb+66VSfeM2BTPAfoPMIBQZXqhwOfEThcJPIDYhsSlJOuULxLaS7CJsaLH7K2J156oG6/eQO3Gjb6PRkBrB87HjDqpHst6SSaUs6QPqZY3nAv7h/XzSo/1cqVDZbUuVs6hRah/BKpL7nmFg7W14izboHyrV5tHDjaNk7ALIshQ5QiXZa5YHOjCFndzk6SOQpMhhPMgQjHsyzYcPUaepexGyJpTEVluf4A49N/B44b70CQo4OwH0HoB7GmXd4JgvSnL/Cy9YUOWTkfphIUzjJJ/iEgUVGoh5bM6hsvP5YoZRCFRxAy/3eln5fmVkt85CdSbnw1vdjepKeZIFwiMQAklfslkDSh2tlkDcxjfoEzMejDYRWo1SZadTqsey3/O0ofv/PeNdhoCQmWRRFlAZ04osB4LcFR1dMiODAo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 29, 2024 at 6:54=E2=80=AFAM Yosry Ahmed = wrote: > > On Mon, Oct 28, 2024 at 3:52=E2=80=AFPM Barry Song <21cnbao@gmail.com> wr= ote: > > > > On Tue, Oct 29, 2024 at 6:33=E2=80=AFAM Yosry Ahmed wrote: > > > > > > [..] > > > > > > By the way, I recently had an idea: if we can conduct the zerom= ap check > > > > > > earlier - for example - before allocating swap slots and pageou= t(), could > > > > > > we completely eliminate swap slot occupation and allocation/rel= ease > > > > > > for zeromap data? For example, we could use a special swap > > > > > > entry value in the PTE to indicate zero content and directly fi= ll it with > > > > > > zeros when swapping back. We've observed that swap slot allocat= ion and > > > > > > freeing can consume a lot of CPU and slow down functions like > > > > > > zap_pte_range and swap-in. If we can entirely skip these steps,= it > > > > > > could improve performance. However, I'm uncertain about the ben= efits we > > > > > > would gain if we only have 1-2% zeromap data. > > > > > > > > > > If I remember correctly this was one of the ideas floated around = in the > > > > > initial version of the zeromap series, but it was evaluated as a = lot more > > > > > complicated to do than what the current zeromap code looks like. = But I > > > > > think its definitely worth looking into! > > > > > > Yup, I did suggest this on the first version: > > > https://lore.kernel.org/linux-mm/CAJD7tkYcTV_GOZV3qR6uxgFEvYXw1rP-h7W= QjDnsdwM=3Dg9cpAw@mail.gmail.com/ > > > > > > , and Usama took a stab at implementing it in the second version: > > > https://lore.kernel.org/linux-mm/20240604105950.1134192-1-usamaarif64= 2@gmail.com/ > > > > > > David and Shakeel pointed out a few problems. I think they are > > > fixable, but the complexity/benefit tradeoff was getting unclear at > > > that point. > > > > > > If we can make it work without too much complexity, that would be > > > great of course. > > > > > > > > > > > Sorry for the noise. I didn't review the initial discussion. But my= feeling > > > > is that it might be valuable considering the report from Zhiguo: > > > > > > > > https://lore.kernel.org/linux-mm/20240805153639.1057-1-justinjiang@= vivo.com/ > > > > > > > > In fact, our recent benchmark also indicates that swap free could a= ccount > > > > for a significant portion in do_swap_page(). > > > > > > As Shakeel mentioned in a reply to Usama's patch mentioned above, we > > > would need to check the contents of the page after it's unmapped. So > > > likely we need to allocate a swap slot, walk the rmap and unmap, chec= k > > > contents, walk the rmap again and update the PTEs, free the swap slot= . > > > > > > > So the issue is that we can't check the content before allocating slots= and > > unmapping during reclamation? If we find the content is zero, can we sk= ip > > all slot operations and go directly to rmap/unmap by using a special PT= E? > > We need to unmap first before checking the content, otherwise the > content can change right after we check it. Well, do we have a way to terminate the unmap if we find pte_dirty and ensu= re the folio is still mapped after try_to_unmap_one()? Then we could activate it again after try_to_unmap. It might just be noise. Let me take some more time to think about it. :-)