From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00279D5B845 for ; Mon, 28 Oct 2024 22:54:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E6116B00B9; Mon, 28 Oct 2024 18:54:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 66CD36B00BA; Mon, 28 Oct 2024 18:54:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E6786B00BB; Mon, 28 Oct 2024 18:54:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2CD4F6B00B9 for ; Mon, 28 Oct 2024 18:54:42 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 9C32F40264 for ; Mon, 28 Oct 2024 22:54:41 +0000 (UTC) X-FDA: 82724516406.26.CD9442E Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf29.hostedemail.com (Postfix) with ESMTP id C375D120005 for ; Mon, 28 Oct 2024 22:54:07 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="rSO/8Mz6"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730155905; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZFYzTT9VCZTuKfkSmTat5ifTWKCFpRoKLJorE3qBJDg=; b=Th1/yMZ18cXlq6hxsf6zjI17o8BLqC+fPdgCD38BH2TvXXht7MAzd/etc02eJkq4A7I4KT 5gKLYm6GLNCU5UJLPPYEgLr6Moq4XTB1ltNUv76hyrDWIPT7wAe7DAxxeL3nEie8R+BkdE bJ6ubpJIkD+n86fKnd+6+ovgkTubg+w= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730155905; a=rsa-sha256; cv=none; b=x4kxv0CuoKRcBC3KHYVmOO8ilLqJFnznGkahuD9p0b+DxloD4uytISBKeSvdibL/IJGrCu BXJDJ8tEBajq4ZwxrwwslCUMwJIBMO43ICV06fXWKc8taFesve62Sx3gvLmGb2+mO9L1bR GqqF1Dp5Dk1ZbaOKAyiwHhuZnNoRusM= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="rSO/8Mz6"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of yosryahmed@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=yosryahmed@google.com Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-4609d75e2f8so49128661cf.1 for ; Mon, 28 Oct 2024 15:54:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730156079; x=1730760879; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZFYzTT9VCZTuKfkSmTat5ifTWKCFpRoKLJorE3qBJDg=; b=rSO/8Mz6giK87rg/jwKh1dtpO/xFqRWwe91v+ZTcJzDaBNtQcNizMa13SlQVhuQ3lk 33MGN84Bn1w//ub8zGSqL2PjsUHWdZ5DH84I3YUt/YhkFLMei9v29F/7wPQv6jTKvOnc EZgVTjymYFLgmCFZO5No3piVovNMnx8A1P/G+Tbp7lgoTGFUhrkB22B7iwAfA5DoGIKM 0NNmP8Aw2aairWZ7InoogTigb+mmV2de1ezJe+MEArIjjvlfShd0h6Sc/88hh4rbzK6z o6c9RMJHfQbwvjW54MrZ4V4WN0l6ToxsClpaXWXgR3cM+M/khcQvz8jwwKvFVqO/A8Zr mtxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730156079; x=1730760879; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZFYzTT9VCZTuKfkSmTat5ifTWKCFpRoKLJorE3qBJDg=; b=aHJ7/YCxqzn2E/hWccSYAnpkAWPxT8J/z38l7w+AR2bx64S2OS/q/7ZrmETn42oeSe JwqDWw9T/Pk2yXs6fcSym6yKSycGEXBcsBmOnMSaZnuFW1EQ4IBi5OSIqoT1rNhGgFAZ TcnDTbl6l0/Bx1kE8PFj6ErtxsSo7so3/f5eud7mPRMLtYpAjQw5tWKOtZ0AvwATacej wjCdgl1UG4O26eqrMK8YmVRL1PT2ZWUZ3y2BtVvvDk68hUA+JP+TsUhEiaFdYwjzwkWe CyKPVj+7Wnt3i1iATx/7n322bA9jw1j5XUg0bkW4ZqC4dBaTCzwFCSAcoacI/Kv4fGSt d4Jg== X-Forwarded-Encrypted: i=1; AJvYcCVR+vOiUTDG38/mITbAAKSLNZj4P9hbK01ylyFDXKS3AjjOo5rJBRcFAthm3hswtkeeOXqONkFDvA==@kvack.org X-Gm-Message-State: AOJu0YzEfmU7h2jPAGpCU6xNN4D2cqIjanTYoYkdODAZ7gCJ7jS0Mow3 0l4U5OmwsZ8TnqWDsKDpX+n01erathnhUi4tgXy+Ja15vmYB+JTZoAhfk9iF3fG94irbshGVEY+ 1jzoyLa8bU4SjFE/ozUXLCRBe8VoCLAE2yw2Y X-Google-Smtp-Source: AGHT+IE0q/boBG3+oyN/Yjh/S1/GXOOj7dtxpjIxWf6HrXjs/nceYWFpbhOmvRcanhF+bDLY1ZcGrR0EThL0DWqD0LA= X-Received: by 2002:a05:6214:234f:b0:6cb:ee9c:7045 with SMTP id 6a1803df08f44-6d2b486abb2mr2573386d6.2.1730156078801; Mon, 28 Oct 2024 15:54:38 -0700 (PDT) MIME-Version: 1.0 References: <20241027011959.9226-1-21cnbao@gmail.com> <678a1e30-4962-48de-b5cb-03a1b4b9db1b@gmail.com> <6303e3c9-85d5-40f5-b265-70ecdb02d5ba@gmail.com> <64f12abd-dde3-41a4-b694-cc42784217fb@gmail.com> <882008b6-13e0-41d8-91fa-f26c585120d8@gmail.com> <228c428d-d116-4be1-9d0d-0591667b7ccb@gmail.com> <03d4c776-4b2e-4f3d-94f0-9b716bfd74d2@gmail.com> In-Reply-To: From: Yosry Ahmed Date: Mon, 28 Oct 2024 15:54:02 -0700 Message-ID: Subject: Re: [PATCH RFC] mm: count zeromap read and set for swapout and swapin To: Barry Song <21cnbao@gmail.com> Cc: Usama Arif , Nhat Pham , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chengming Zhou , Johannes Weiner , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts , joshua.hahnjy@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C375D120005 X-Stat-Signature: 6ohib7jd6ma6fdgsf3q6sy81pgfefsqt X-Rspam-User: X-HE-Tag: 1730156047-789980 X-HE-Meta: U2FsdGVkX19akPtRljZa54lMQPq6TQDfMYR8CXwaKoTNq7zx0Ls7/B6Uv04nHoHt35IqVBruIZA5BdLbyTdFok9SWd45DOjSc3t6zHTR28DUlCrHvr/15XgTVAE/B0cHW9XddmyDVaAhSPxnMq3r8B4q5dXU+xvpOffgTlV97aqSb3zOJREPnx9FQOO/5rADxI6RApQTrK8ioCP/2j3qMA+LlB/36+/lkxdYyc3TLoMXoURYTArUbZ3reyyfwgS+7XFpmZ7grDzyVnL3c8Y/RP7JNdASZrpRakH5ZqfZ+KZZ/kEme0DfpGhYMdzPvqiLnFgjhs8Xmum1wLqBwEHlem2MQoioXB6y0B3h1cROaujjBK2lkPRsny2/LxUMAgB4JI3Y4xMNKdPl7jh1bAiQbdwZVjjaM1YweiWhCta9xH8MUL5AYzUFxjwXAAD4tWToIqtYJh/EiAJwt64sq86igHoReO8XBTP+UlAJkSLEkfHQr90RJsph1WIIlDCThv2RoBNi4odUx4Or8vSWLK91rPn1luHsSMGKXK+K2bVolxEkckHTWnjTfaIeg0w1WepD2j8OGGXdMbNTRiaY3npi1NcxRHxWezTYtCeZCYMcbujMpHFOcSfYVrpRXMNJQMyLLNjKuDEpH0ZuLp7vemvRXOkyYDOjgMbLvY0QYluxFGAVlpm+G1q6y1/vbCOSR6SrsdWgo5Vc+lBEpkRWCix4cf8WyphhVFqc0Th1jXFvQAUumywxSe9AA21ehnYIHV95CXNqZz9+AVnFRsQMMfPBYwYzhS5CVyg6QMi1JjwGhPdqYdSczCbwDN04yy9w9FF2XL9chJrJ2jggFzcIUm9MjCdZF5ogzA5BTUGbk9jS13bUIdBWIsOH/yqSvxQXztw+qKTJRrxlNHELNnfXbcuW7jgmGf5mhpElTqqsqNi/re7U6pIqUckxQ96zu5Q/sJO19YJKrSP4K7LylWzUY0z YYP4VWyR vJfRWn+lGfSQmEWPEvdTnXDvvOB7+UvV0U0u8019fPCeRSnFHFvA5pR4k5FvHrcn4cN41nRKytknkSZ6BCva43ABzEN1IIRfjUaHrKnUZVyw6R5HQnUFFAnRhF0yWAsD+qr96u/0lfBl/TlByqDUIuH/bXr2tiip2vEd4GYQUOO21+UpPL35ONt09pUu455Nhp3V63dp80lHspRnbcG+soUctVsQDH3zvZTxck8UapZw6F0tkq51cCn/bkZrwV3ZkQT079fPlv1wUaeSaBI/A7WGiMDi3NWK4PIe26TulSmplW5T1O8XtHH+r3W92FrSezVACYmn5udNuH4I55XyhTyK5t4HDEO/kmm7Ovn/fa4gTLb9OOsRTWpcb5BhQGhFTdcTQ+XhD8JplL0Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Oct 28, 2024 at 3:52=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrot= e: > > On Tue, Oct 29, 2024 at 6:33=E2=80=AFAM Yosry Ahmed wrote: > > > > [..] > > > > > By the way, I recently had an idea: if we can conduct the zeromap= check > > > > > earlier - for example - before allocating swap slots and pageout(= ), could > > > > > we completely eliminate swap slot occupation and allocation/relea= se > > > > > for zeromap data? For example, we could use a special swap > > > > > entry value in the PTE to indicate zero content and directly fill= it with > > > > > zeros when swapping back. We've observed that swap slot allocatio= n and > > > > > freeing can consume a lot of CPU and slow down functions like > > > > > zap_pte_range and swap-in. If we can entirely skip these steps, i= t > > > > > could improve performance. However, I'm uncertain about the benef= its we > > > > > would gain if we only have 1-2% zeromap data. > > > > > > > > If I remember correctly this was one of the ideas floated around in= the > > > > initial version of the zeromap series, but it was evaluated as a lo= t more > > > > complicated to do than what the current zeromap code looks like. Bu= t I > > > > think its definitely worth looking into! > > > > Yup, I did suggest this on the first version: > > https://lore.kernel.org/linux-mm/CAJD7tkYcTV_GOZV3qR6uxgFEvYXw1rP-h7WQj= DnsdwM=3Dg9cpAw@mail.gmail.com/ > > > > , and Usama took a stab at implementing it in the second version: > > https://lore.kernel.org/linux-mm/20240604105950.1134192-1-usamaarif642@= gmail.com/ > > > > David and Shakeel pointed out a few problems. I think they are > > fixable, but the complexity/benefit tradeoff was getting unclear at > > that point. > > > > If we can make it work without too much complexity, that would be > > great of course. > > > > > > > > Sorry for the noise. I didn't review the initial discussion. But my f= eeling > > > is that it might be valuable considering the report from Zhiguo: > > > > > > https://lore.kernel.org/linux-mm/20240805153639.1057-1-justinjiang@vi= vo.com/ > > > > > > In fact, our recent benchmark also indicates that swap free could acc= ount > > > for a significant portion in do_swap_page(). > > > > As Shakeel mentioned in a reply to Usama's patch mentioned above, we > > would need to check the contents of the page after it's unmapped. So > > likely we need to allocate a swap slot, walk the rmap and unmap, check > > contents, walk the rmap again and update the PTEs, free the swap slot. > > > > So the issue is that we can't check the content before allocating slots a= nd > unmapping during reclamation? If we find the content is zero, can we skip > all slot operations and go directly to rmap/unmap by using a special PTE? We need to unmap first before checking the content, otherwise the content can change right after we check it.