From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3D92D3A678 for ; Tue, 29 Oct 2024 17:47:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5EE456B0092; Tue, 29 Oct 2024 13:47:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 59E4B6B0096; Tue, 29 Oct 2024 13:47:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43D246B0099; Tue, 29 Oct 2024 13:47:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 24FE66B0092 for ; Tue, 29 Oct 2024 13:47:07 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7DF8A140385 for ; Tue, 29 Oct 2024 17:47:06 +0000 (UTC) X-FDA: 82727369634.03.C0B8CE0 Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) by imf28.hostedemail.com (Postfix) with ESMTP id 86C5FC0027 for ; Tue, 29 Oct 2024 17:46:39 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mNanKWDM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.167.182 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730223895; a=rsa-sha256; cv=none; b=p+3Wb/2P5EUHCm7gsexd65s8hNRXhnXF6uXH6v72VqmQMmaugg1lQIxSJmiYKLpF+z8VRL pQH+rim7szXx/2HxMgXV1MzwBdVy5/eIQxhi/glj3GWYeWkKOBGKu1CxLGig9TIzX3Kq/b xWxaQjFK9RvTimotq7W/NgVI9eIRnHE= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mNanKWDM; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf28.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.167.182 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730223895; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o6xVfP4AWDImns9Pcjik5dLlk13C9VGqWiL7EWtae8s=; b=NWqeMEVX2WeJbLmQiHvcm3zcFpFsQ4iAevZtMWjudH3FrocpLe7TvCAQ/2z+d/fgOYtZIq N5Jgab8HwpvcHolSEBkpNSySJYqyRV7uH8K4+z0ul2642OO4bKcCHJQQcrzkH/yhNw0LUg BEZhNToozWHIfiu8ToH5m1CUEb7cf0A= Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-3e60966297fso3227593b6e.1 for ; Tue, 29 Oct 2024 10:47:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730224023; x=1730828823; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=o6xVfP4AWDImns9Pcjik5dLlk13C9VGqWiL7EWtae8s=; b=mNanKWDMjrXE/y0EnKySFgOpGr1547z0g3WkmO0mFr+laCcA9DstJnMqjwl0RXYLv7 n8DWRRVM4wMV+pHlnh9moFeb0ZCmf93Lu87iRnla7XHhEHSwxYHISH4T1Q9I3paIcCLr iT9oD+rTeN+hy+tyCRosdNeTTGUbSh8q2o9ga/0ebQFtkApZMjHWOGMr0818LUVT2NAn poNyugfqiVw6DokY0hogkSMrPTWTPHON7su6Nq64HVUCBmJFxG3uXna+Mrq/WrA3i1BZ i8L3qGr9sHEzutOFNwT7cvGbDruzPom/MYym+Mraoj5tyWbqzEl2/dFcZ7vyhlSMD/Nv 7z/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730224023; x=1730828823; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o6xVfP4AWDImns9Pcjik5dLlk13C9VGqWiL7EWtae8s=; b=o7/d816eGgO6KgRjTsSwINwODE0cDJxifEb6GhMQleJBiJFAl4i8C2rbPfI5A8mFmo JD5KzBsoiWvbcm6eP7icVqa+kE+eO5i0Pj6Ul+spn6HSBKEDdnKwZ0FYFnzTImNZV5bC y19Xrl+XsyisTa0qxozOfsOmBAcDHrrmtlMbHpAMJQSLzQO/3r7Hht+1ORqp/X8LsmIH rZO1iYJ41WUnXjkc/phsD0ZrMcPlAxMZsp2YMsgx3FUckJMOqRNr4dlqGdFHUGG6bZxK QxhQOQkXbe/svrj/auRvSEiO9+NmJJuiD3d8uKSDWrbFY8TZ5bFYDCR4uciEy6TlMxIO zC8w== X-Forwarded-Encrypted: i=1; AJvYcCVq3qqb77q8ZCzITuItx83RP/wT9jvDHrheTJgTY0cCgE6smasarwNzgWWgQKIKvZLgQSVVyGo5Dg==@kvack.org X-Gm-Message-State: AOJu0YyIITq/OF68Y6vMPp+ekgkZXO3Uk27PvqbODVReecx1wx/W4Wly j5fKrNHCqp/b9aC5Ce5zapcvUA+WyMZN103TnLUtVmkNwDUnL5/GDvRxJeKGAU7ZXXc55y6a9l6 s3Q7J2APUPs7j2Irqh0Mf3kHIPuM= X-Google-Smtp-Source: AGHT+IE7dKkM6rEd7MQhQz1c94b/1mS1byd5I5IB/6mA1cKHdaYvEfKyEnTsz6bkK3hCok8h0iedHGE2bJDKvYarFuU= X-Received: by 2002:a05:6808:3c95:b0:3e6:1291:7629 with SMTP id 5614622812f47-3e63823d82cmr13768963b6e.3.1730224023549; Tue, 29 Oct 2024 10:47:03 -0700 (PDT) MIME-Version: 1.0 References: <20241027011959.9226-1-21cnbao@gmail.com> <678a1e30-4962-48de-b5cb-03a1b4b9db1b@gmail.com> <6303e3c9-85d5-40f5-b265-70ecdb02d5ba@gmail.com> <64f12abd-dde3-41a4-b694-cc42784217fb@gmail.com> <882008b6-13e0-41d8-91fa-f26c585120d8@gmail.com> <228c428d-d116-4be1-9d0d-0591667b7ccb@gmail.com> <03d4c776-4b2e-4f3d-94f0-9b716bfd74d2@gmail.com> In-Reply-To: From: Nhat Pham Date: Tue, 29 Oct 2024 10:46:51 -0700 Message-ID: Subject: Re: [PATCH RFC] mm: count zeromap read and set for swapout and swapin To: Barry Song <21cnbao@gmail.com> Cc: Yosry Ahmed , Usama Arif , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chengming Zhou , Johannes Weiner , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts , joshua.hahnjy@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 86C5FC0027 X-Stat-Signature: d8xjnjtxetibbm5wyrbsnbw1f7cg3opw X-Rspam-User: X-HE-Tag: 1730223999-294207 X-HE-Meta: U2FsdGVkX1/Ix8slxIVfpGcR50C1j7qwfc+3NpTtXSxE10J5NKHjWYDqwFA0Erjb8kMprKwh8lrfSyxo3iQ6FoKZP8AmtKm5lpQPNWtO2DW+WTZkplpAflUJ0nAzLvkONhydUZYUzOH3sdQItO3msCWPeM8DLCoe9VJrVgdg1dDHUsxOwTSpz/uzNIxztnS3Y1wj36IHdzdkmO8PW5l121OydycGlz/xSrbwJ0TH5jjR3aImLm26BZl8Xg1ZiV3JxVKq3p3Ff0tkEx5BswOAvJ0pZOq0doFM/vlGSM3AfE0ThAuDagNQLpzi4Dwzm4Qd7vGGgrV7OE+Etet8GO3PceQyBtnt0g27Kap7g4n5F9qASUVGTO4uuD2to5yTLocFZgUCFaW1f5R+TQsuCMMUDV3+kLS4Zacozl3Mu9ckNAd+HnHuk6d6AjoHqLYYe+ilpEToV3TTpukG3CnENy90tUidzjIbSjnCuOj2pgzmghDbagIdS9tpj+fR95PzUXivXK43CfSN140ZSYns2LWmzNAk6/Hmw+QOj8+TZn61MiIpc4HKqlRumarZmdNCJHVFfVr1VtkVmSErQBwC8CDitQ5fBMJvnOyU3jOCIbKqUrFy/e80fa4m9FXRoeiEJ4lqSVevh28i56FdMwf+ofTmRRtaVCvn1WvKCDfuO4D8XCCZAkRZWcUND4xqZvTuW8ZtlPXzc2ZW4dgxd6hLtxrc3HCUHwUIW6ideUvMRSsur8SatvQu6tn6M8tn4/TFPHy9DH2tC9M1R/rFr+CQ31y3PkQy8CJzPz/Iow00VCnCa0HBzVgLxNypgv4Fb4GRai5vqfRkwVMHox94lat9nXANWZSNoJbt+ifWF7aa2lND0aVcrl70eklQ0222mFQZvQ1chVg6rw/BLclmTiGY++a9ClREikxZqb9uyEI8/2qLhE/SAYiGegX5gfXNDPOqF+PaZBbcBC2K9H75d4DJEGj wnoGr9DN LryUkNVoz7VlrQlKaY42CrcxfrQqlYZ4VpclXvX4qCydZuMTCUef3QvwQsEG4zoeh2JCOcWCnTB0b0cZRgtfkNsvp7050oStzjjV4i/OV8I5UrC5rNAf0oVLKUg2875tO3iqneCMiV0Kz8hgxJDv82MS9vxwn8dqbEBnlwwgPTN06HuvXZiUUNMkMAet2nfibGWNUZ0HtaJjNn8qhj6bElIJrDc3i7EwhmRxfV8nzLcJwMkDWjlC+VqhjCrCwv/4CVALdFUNaVyJwjDB6FNhfEdhxSMAiqMkGkH3KwZ9y+7AX0qsDuKr0+WKBIlN6frOGTBaxUIpmIazldaGIaRMOI7BZpSDFASM245XfWWLMuQFClv+inE7wSmq0kMd8WjxLjr5RUA0bPXJY78Z2Y59eSgYwkZEqt2XkWZ1oi1E+wWAgV7mmz4uSutZzs8eOUok102GiriSLoqqew0/Igdz0lZ2NjBk1SbcGxxbrMdAXy/41F/c= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000028, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 28, 2024 at 4:03=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Tue, Oct 29, 2024 at 6:54=E2=80=AFAM Yosry Ahmed wrote: > > > > On Mon, Oct 28, 2024 at 3:52=E2=80=AFPM Barry Song <21cnbao@gmail.com> = wrote: > > > > > > On Tue, Oct 29, 2024 at 6:33=E2=80=AFAM Yosry Ahmed wrote: > > > > > > > > [..] > > > > > > > By the way, I recently had an idea: if we can conduct the zer= omap check > > > > > > > earlier - for example - before allocating swap slots and page= out(), could > > > > > > > we completely eliminate swap slot occupation and allocation/r= elease > > > > > > > for zeromap data? For example, we could use a special swap > > > > > > > entry value in the PTE to indicate zero content and directly = fill it with > > > > > > > zeros when swapping back. We've observed that swap slot alloc= ation and > > > > > > > freeing can consume a lot of CPU and slow down functions like > > > > > > > zap_pte_range and swap-in. If we can entirely skip these step= s, it > > > > > > > could improve performance. However, I'm uncertain about the b= enefits we > > > > > > > would gain if we only have 1-2% zeromap data. > > > > > > > > > > > > If I remember correctly this was one of the ideas floated aroun= d in the > > > > > > initial version of the zeromap series, but it was evaluated as = a lot more > > > > > > complicated to do than what the current zeromap code looks like= . But I > > > > > > think its definitely worth looking into! > > > > > > > > Yup, I did suggest this on the first version: > > > > https://lore.kernel.org/linux-mm/CAJD7tkYcTV_GOZV3qR6uxgFEvYXw1rP-h= 7WQjDnsdwM=3Dg9cpAw@mail.gmail.com/ > > > > > > > > , and Usama took a stab at implementing it in the second version: > > > > https://lore.kernel.org/linux-mm/20240604105950.1134192-1-usamaarif= 642@gmail.com/ > > > > > > > > David and Shakeel pointed out a few problems. I think they are > > > > fixable, but the complexity/benefit tradeoff was getting unclear at > > > > that point. > > > > > > > > If we can make it work without too much complexity, that would be > > > > great of course. > > > > > > > > > > > > > > Sorry for the noise. I didn't review the initial discussion. But = my feeling > > > > > is that it might be valuable considering the report from Zhiguo: > > > > > > > > > > https://lore.kernel.org/linux-mm/20240805153639.1057-1-justinjian= g@vivo.com/ > > > > > > > > > > In fact, our recent benchmark also indicates that swap free could= account > > > > > for a significant portion in do_swap_page(). > > > > > > > > As Shakeel mentioned in a reply to Usama's patch mentioned above, w= e > > > > would need to check the contents of the page after it's unmapped. S= o > > > > likely we need to allocate a swap slot, walk the rmap and unmap, ch= eck > > > > contents, walk the rmap again and update the PTEs, free the swap sl= ot. > > > > > > > > > > So the issue is that we can't check the content before allocating slo= ts and > > > unmapping during reclamation? If we find the content is zero, can we = skip > > > all slot operations and go directly to rmap/unmap by using a special = PTE? > > > > We need to unmap first before checking the content, otherwise the > > content can change right after we check it. > > Well, do we have a way to terminate the unmap if we find pte_dirty and en= sure > the folio is still mapped after try_to_unmap_one()? Then we could > activate it again > after try_to_unmap. > > It might just be noise. Let me take some more time to think about it. :-) FWIW, the swap abstraction layer Yosry proposed last year (and I'm working on right now) will allow you to store these zeromapped swap entries without requiring any swap slots allocated on the swapfile. It's basically the same thing as swap/zswap decoupling. Not stopping you guys from optimizing it, since all I have right now is a (most certainly buggy) prototype + there might be benefits if we can get around the swap subsystem altogether for these zero mapped entries. Just letting you know there's a backup plan :)