From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30717D5B845 for ; Mon, 28 Oct 2024 22:52:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9554C6B00B7; Mon, 28 Oct 2024 18:52:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8DE5C6B00B8; Mon, 28 Oct 2024 18:52:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 730186B00B9; Mon, 28 Oct 2024 18:52:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 4EC286B00B7 for ; Mon, 28 Oct 2024 18:52:12 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id F137280869 for ; Mon, 28 Oct 2024 22:52:11 +0000 (UTC) X-FDA: 82724510652.14.4F38BE7 Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf24.hostedemail.com (Postfix) with ESMTP id B520B180008 for ; Mon, 28 Oct 2024 22:52:06 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EWNNkzc0; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730155771; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OKc+iU530BpPmkhR7DEU839Nwnoqh+MtpDrGUcBE2Sg=; b=p44sDQepso8VJkRDKAGRQITdoYud/dOLQ3eU6PawdJoCU4wmYKSntzLtziwnBxnGJsRG5/ U0K2NCcEmG5+K7W1IYLBv0eJuUON6FupAIxSPQOUTrqCWHevF0HgeCQpelV90C/GvcGkcA tAphXfJi22amhWeVUCqZgz4N8us5emE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730155771; a=rsa-sha256; cv=none; b=Uk3k3ckcOve4KPeRli0ftMSl78YDvIk2aKy2phpM+WN6JFVa2rb7xuHCRXqwCLyuf26k1l yjf7NTvBbTQPLC9ka7ENmMDYbDeOqZoctupMcMJ/rMsMaP+3+7j9dg1OKqgWVLQ50IT54O PGhgGWuATbRFtaJ19PD9GacbkIOtDJo= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=EWNNkzc0; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.45 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vs1-f45.google.com with SMTP id ada2fe7eead31-4a46d662fccso1591037137.2 for ; Mon, 28 Oct 2024 15:52:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730155929; x=1730760729; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=OKc+iU530BpPmkhR7DEU839Nwnoqh+MtpDrGUcBE2Sg=; b=EWNNkzc0QO8xn8hI6D1FuezQzfTDsBYyYyTanYHryfEzeGo0GNUELvzY3WYzs57RyO amvjfE3BWnvIQe/hJiuiJWzCX2Af2iDu1OjL+2+KMnoqp2HftgfxnSsHjzwE3Z3uNUq2 dkNE13CDtbSlGWrMrYDFuzaHSlj8atx9/KJu0IoEhUSRiRdchy4pA4S44DAKiCM0cXXB MReT9/ql49SQmswhr0J8qH2min8oid8btcA/MeLfiS+o3s4lUHyTvXugKq+jKhrJywzF /NjGDDPkPXJ1oYsqqhqTEpEDvmoN0gOxyqkExz4IQAwaXAkQF66gId1HSXgBIjbPV51w ojTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730155929; x=1730760729; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=OKc+iU530BpPmkhR7DEU839Nwnoqh+MtpDrGUcBE2Sg=; b=QgSZWk8U5yk6uNcKfsX1VwkMEs8qo/u2v1VLU3XfqlC48fjjhW1EjOn9ofRdnf5A6t gpQc1bkdEKcIsAmLqnac2DU/vsIAwfHHldsGDdijABHUKyqfOvu8r2VPGmOn3SxUxDA+ VxZ22MrHDj9EpZp0tfvphn3Y3ExoGaJT2+sQ5RyAvEk0DgnxJNuPmaKvIdYisp+013uW ILRsYvRy5R6ghmGpJmSCj2t/pBBwA4HA0iMFTbzU1J6Dm15asb1Mow0osUCEMOOSMZdt BTC1UBhQ7QiBZjNeVBZAhxz5JKZTF4e0sj/SJBUEdX8bqZIQPd81AwO6bsNzmYiCsmUi xAHA== X-Forwarded-Encrypted: i=1; AJvYcCV1/Xdt278+on3EFo7jR0YKFQQaHINxYq+m8OXavu7ELlGyNnDXhrIO5hztUM63Tv35qRXiFJYI0A==@kvack.org X-Gm-Message-State: AOJu0YzFJfxVIIPnrBwZGpWPR+2JDT7FbUr852I47xzPbmc0lNbU2108 CTCu9JnAm5KjAND/8Kn2sLSc7HhtsLvjAPBVWHAY4Y7yPZGz0iOWzByYP0wQVylnpg6wkM1OV/q eNqOQmXLaPxyxIs8l3MKdK4EEjks= X-Google-Smtp-Source: AGHT+IEAqiNXTrHzMLvn16UEbhWXfSAvlmRUo8YE31JdE86g+2eUJ8Cqe7C03H3RxktffuNkqtVMMC8ghvWTV7gyBSc= X-Received: by 2002:a05:6102:390f:b0:4a4:7161:609 with SMTP id ada2fe7eead31-4a8cfd4fb57mr7913376137.20.1730155929259; Mon, 28 Oct 2024 15:52:09 -0700 (PDT) MIME-Version: 1.0 References: <20241027011959.9226-1-21cnbao@gmail.com> <678a1e30-4962-48de-b5cb-03a1b4b9db1b@gmail.com> <6303e3c9-85d5-40f5-b265-70ecdb02d5ba@gmail.com> <64f12abd-dde3-41a4-b694-cc42784217fb@gmail.com> <882008b6-13e0-41d8-91fa-f26c585120d8@gmail.com> <228c428d-d116-4be1-9d0d-0591667b7ccb@gmail.com> <03d4c776-4b2e-4f3d-94f0-9b716bfd74d2@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 29 Oct 2024 06:51:57 +0800 Message-ID: Subject: Re: [PATCH RFC] mm: count zeromap read and set for swapout and swapin To: Yosry Ahmed Cc: Usama Arif , Nhat Pham , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chengming Zhou , Johannes Weiner , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts , joshua.hahnjy@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: ee54m59d1rdf6wi6z7p3azu8s4bdst1e X-Rspamd-Queue-Id: B520B180008 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1730155926-675340 X-HE-Meta: U2FsdGVkX1+5oeHWbLMtrkQboy4l6vIqnSf7m8mN+CJcwHv/04yQ9xAVlsMY8JnxINTwiF5O+Rhpi7xzkPU9bxqTVLxEEuXz88fK5zN7lbAhNkey3nu2i9AGI7E3tMTSM8uMpWkK/XtCl38yuu9LgkhBG416gKyzsyYC0drN98RHo6KyEriDCA87ehiUmrEG4hyk7E0WkWY6+IPllxY+eDkYSDJF381zFnKO4GQAQy4kIeuG3MsVZG8rFTxcSOBjk30yTyAu2kX015xAZducs9BihrTP9PiX1DLIiO6FKfZ7usfyMgnrN6QhksKxnLcSryOvLmXgf/Q2f/lx/jRRejt0VtAVXlXVA7jsjFijIvxDQEsiBR//n8haC1woYvGkhq6eu2V+vnBE3atdURd73O34D67SKGxR1H/Zev3rIIawRFoCJB8r9sO1Bx8jAbjPddb301q1HtJdY1s5qbG3MLWfbsoVPCgEd+VXWiagJlFwBONoJ9eOJanTXI/01+QZCcO5XLjQpHRpUrTJM2DIvPk7ZM3/uSvjDVvm08HFz2ENH7M6lqEd+9FIzspBEFrOPgqIZuYbP7wO63c1SPzP0VG+nLIpkkEH8ysPx49mD5c5DU5VAgdfBGMgjgwqINGOkFX0b3z9CC/exM+80atk7JWj8ImopJmLLAUiomzclZULwmPfJbGVsPjuJw082EgMzFtn3NS90riHLSYqOLqtpOvTOLbrSyeB5S0Mkm1PvAOmT9SuoJ9lPzewZgGdGmdtrHvSDS+s7P3Fm9it6lJ39G5iLCNb9C9NEmDJl/LmwHto6B5lFhRUKxfoJ5+bOXsNb6d83Qj8QvDAQFKcQ+VBAORUUs3MYZR1ozSb209tWRk7Ed1OquPJUOzXVZ6TYc8NkQypnSM55Uvi4NKC9Rnc0llFwdD+WojppU3yT7rwJO6Oe8cdSX9YA5micv7VHO9DXVFTT0A+WVzQwipoAmv R0Zz8hjQ vgzSF0fs4Z+ll659EFznuxE9tXUHOamd9odkhT3WIv6hp5HEZsPEDuAqAY6S8mY8jXLSIgRABRpfvjxa0z5YmKO/rUpI3KDFuryzoQAtvgFlWGnArrp4Wo45m/PCRBy3oni9ajOaukg65+O+02MWZpKAyCP6NOsG3l8UnyAUAVIJDAdTF5LO5v25E5SLF33Ypk+4wRLFJvPURqntz4lX5XmHihZI4kKGhq2kDGt9NRH2gLil/rGu1bphLkgHFtQvWGpWqpBvoLP5GRmPtTZxBAv/8TSCHcFxWFIG1TmeCxpnwEoLCehAnEFdhIING+ZPdW4EtFblFvrE9mpc9neIyHqZINz4sScLKCVvjixnj65LARBpG3nHGh/5RSj/zog5xRyNqG02e1PFp/78D3uS8zsuJhW2DqmnSGj+b28EQMa6GlVYAwnpJu3vQG1P7DvC5Z2OBC8nQkoNh0DUsmQJsxgzJphAmwI8hF7Ejy8mtZx6bOh6x0vPuCihGU6dcESEJiI4+XBzJZ1XrurI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 29, 2024 at 6:33=E2=80=AFAM Yosry Ahmed = wrote: > > [..] > > > > By the way, I recently had an idea: if we can conduct the zeromap c= heck > > > > earlier - for example - before allocating swap slots and pageout(),= could > > > > we completely eliminate swap slot occupation and allocation/release > > > > for zeromap data? For example, we could use a special swap > > > > entry value in the PTE to indicate zero content and directly fill i= t with > > > > zeros when swapping back. We've observed that swap slot allocation = and > > > > freeing can consume a lot of CPU and slow down functions like > > > > zap_pte_range and swap-in. If we can entirely skip these steps, it > > > > could improve performance. However, I'm uncertain about the benefit= s we > > > > would gain if we only have 1-2% zeromap data. > > > > > > If I remember correctly this was one of the ideas floated around in t= he > > > initial version of the zeromap series, but it was evaluated as a lot = more > > > complicated to do than what the current zeromap code looks like. But = I > > > think its definitely worth looking into! > > Yup, I did suggest this on the first version: > https://lore.kernel.org/linux-mm/CAJD7tkYcTV_GOZV3qR6uxgFEvYXw1rP-h7WQjDn= sdwM=3Dg9cpAw@mail.gmail.com/ > > , and Usama took a stab at implementing it in the second version: > https://lore.kernel.org/linux-mm/20240604105950.1134192-1-usamaarif642@gm= ail.com/ > > David and Shakeel pointed out a few problems. I think they are > fixable, but the complexity/benefit tradeoff was getting unclear at > that point. > > If we can make it work without too much complexity, that would be > great of course. > > > > > Sorry for the noise. I didn't review the initial discussion. But my fee= ling > > is that it might be valuable considering the report from Zhiguo: > > > > https://lore.kernel.org/linux-mm/20240805153639.1057-1-justinjiang@vivo= .com/ > > > > In fact, our recent benchmark also indicates that swap free could accou= nt > > for a significant portion in do_swap_page(). > > As Shakeel mentioned in a reply to Usama's patch mentioned above, we > would need to check the contents of the page after it's unmapped. So > likely we need to allocate a swap slot, walk the rmap and unmap, check > contents, walk the rmap again and update the PTEs, free the swap slot. > So the issue is that we can't check the content before allocating slots and unmapping during reclamation? If we find the content is zero, can we skip all slot operations and go directly to rmap/unmap by using a special PTE? > So the swap free will be essentially moved from the fault path to the > reclaim path, not eliminated. It may still be worth it, not sure. We > also need to make sure we keep the rmap intact after the first walk > and unmap in case we need to go back and update the PTEs again. > > Overall, I think the complexity is unlikely to be low.