From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D0F1D5B845 for ; Mon, 28 Oct 2024 22:33:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 920206B00AE; Mon, 28 Oct 2024 18:33:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8CFDA8D0003; Mon, 28 Oct 2024 18:33:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74C2D6B00B2; Mon, 28 Oct 2024 18:33:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 51B736B00AE for ; Mon, 28 Oct 2024 18:33:13 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DDE5F1405E3 for ; Mon, 28 Oct 2024 22:33:12 +0000 (UTC) X-FDA: 82724462478.15.607B263 Received: from mail-qv1-f42.google.com (mail-qv1-f42.google.com [209.85.219.42]) by imf06.hostedemail.com (Postfix) with ESMTP id 2C7DC180013 for ; Mon, 28 Oct 2024 22:32:54 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=WNhrc2AV; spf=pass (imf06.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730154711; a=rsa-sha256; cv=none; b=Ey3UhNy0uiSAkO2Q8PTpIUZ3GgoPmvrxUYV0t0gUDNXh7rO0YQio9Ym+bP7Y57vm09vPVL TrFgC676GRIr5Ci6KynnmL7B5f7CfAIkrgSy9f7ggjF6UjGE/9ZG8gnMeXlwlcrsZvBHIg MOk5edV8POeF7eNDdgFrxWqmRaQutgk= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=WNhrc2AV; spf=pass (imf06.hostedemail.com: domain of yosryahmed@google.com designates 209.85.219.42 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730154710; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IlreXH8S/kqYldkIylSBxVIyiDInNqbc8GoTCBhULZA=; b=i08Z601dqtd1+XAKE3zgOc0xJOZCz2Pv6bmWKp1+vhkDFs/piycalFtomw2RGMSfFg98FD UPPPfzPz7xVvFDmrNic4kqwGlVAGQkSlYo88KDRISdmYqlds+Q21N/e0JDQ19/J3pbqfLL dcElsvf/zp+C0OZPL7V4A0ZvZiiclCQ= Received: by mail-qv1-f42.google.com with SMTP id 6a1803df08f44-6cbf347dc66so31364776d6.3 for ; Mon, 28 Oct 2024 15:33:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1730154790; x=1730759590; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=IlreXH8S/kqYldkIylSBxVIyiDInNqbc8GoTCBhULZA=; b=WNhrc2AVgLd/WCJjPK1mIBPNwfHDcu+TMDlZlCDfH91WLHrW4y60OYaiDVb++04CQu F/wPlWQmI/dqu42uKx2/XEDjnptjmDRn7HrdhvGHyv7NdftOj0EzYjaQlF7yAnsqbZON YDo7u49Kxpp0+QQtik/7WHLj/ykcYitxQzYdgVoCzrlvGxCJyiK5yrrlO1VCEsSMyl/k 7uJLdx9s2bHjXIPTPEw+DovJ5S+4aqvJ8zfQwD4Ebo3yPnCb+n4TzstO5q/W1jI9NVoc LaAG6lV3mgsk1IlrMOEanFFuvfNcP5EDxuws5e7HFb6UsgCZ6zPbr7yfQFdcENydRsF0 DZYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730154790; x=1730759590; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=IlreXH8S/kqYldkIylSBxVIyiDInNqbc8GoTCBhULZA=; b=SfaDsoJgc78pY8NVp2CXESXoww3eq6nZGPEqPKRGqawTnBYrVCnYdz/hGdvaV//84U 8BWcxnXGWIrQGpn9gmgmSjIxyw4SnL8JXq918L7yH1fbK/30fPYGG8jRFWAJWvTKvA/C C6VA8nE5I59omePeo8eUS13P/RvedockjAeIXdXLO1KW2JDfScr2VeS1OOwh2QHoFdVH f+HV2SkhCCHaQtWi7cQLmAd2+RMFtFnX65yIJyAba4zQzwt4mb1FqtQ1ZARu7poki4nu xMZdy4B4N5KMF504kZL0wu7DvcHCkzwJHWmL0CxU7tKuMzRK51rNK8FPu14QEhJQVBCW w2vg== X-Forwarded-Encrypted: i=1; AJvYcCX56Tg7BFw2WLJ+7QIsUZncSfTHFdx2oKZoXd1m24ogqIpkaLzB7tySLD8q/1I2vGpJxyY7OJz1pQ==@kvack.org X-Gm-Message-State: AOJu0YxWQnXHVTrRa6Ax447gd1/p8bn6CLkJ5q/fVg8I5cq6h0Fnahcw ozo49kwe50VFkI9DJ9NyTN+UsxH716cRJvGchLsgxl8f7poUmM6MCg8QK7P1aOg5sA3NcgTtHNZ ie2Z2LpTC2CLAFuK+M9WqoK8jIgu/Qbw2zQmv X-Google-Smtp-Source: AGHT+IEvj0UDWD9cBgGXsz6dWYQYrij2iHnYvSWbpQKcdZqIgNnWa6rgDt8EjP+Rek8fDWlUW0Bw5GnI1Lu4m6zeLlQ= X-Received: by 2002:a05:6214:5a0b:b0:6cc:2d3c:6472 with SMTP id 6a1803df08f44-6d1856b4898mr139737096d6.14.1730154789998; Mon, 28 Oct 2024 15:33:09 -0700 (PDT) MIME-Version: 1.0 References: <20241027011959.9226-1-21cnbao@gmail.com> <678a1e30-4962-48de-b5cb-03a1b4b9db1b@gmail.com> <6303e3c9-85d5-40f5-b265-70ecdb02d5ba@gmail.com> <64f12abd-dde3-41a4-b694-cc42784217fb@gmail.com> <882008b6-13e0-41d8-91fa-f26c585120d8@gmail.com> <228c428d-d116-4be1-9d0d-0591667b7ccb@gmail.com> <03d4c776-4b2e-4f3d-94f0-9b716bfd74d2@gmail.com> In-Reply-To: From: Yosry Ahmed Date: Mon, 28 Oct 2024 15:32:33 -0700 Message-ID: Subject: Re: [PATCH RFC] mm: count zeromap read and set for swapout and swapin To: Barry Song <21cnbao@gmail.com> Cc: Usama Arif , Nhat Pham , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chengming Zhou , Johannes Weiner , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts , joshua.hahnjy@gmail.com Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: u3sjoo8wiq4jneum4899du45r58eawye X-Rspamd-Queue-Id: 2C7DC180013 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1730154774-970634 X-HE-Meta: U2FsdGVkX1/Ni0pBAOa1vSCsqiFsOEewpslIrdJxGK4k9a1l16jVMQ4aprr7JXWZk1G3qbs+YMTr+2nQ3qWhvgEeo8wpr6mHewCA+V5gnAHHjgtYDrjMzNez7D3v1G3OKNLSPVf4VlF1xmChFeYDyJjvcr2Fxw+nB6kAFxfTk98f9doe3rt6bmKhyhAs83Y7VUvi0+elYLLzM4PZismcTJxPMbjdNgPMz4HxA92AFpDo4I3OeHyuhoeYkPuZJQGHoPm1mq+gQuHNT6tSHB0YARHWRAO+xiXQmxCxWeV0uKPGNhH4pNdgWf2FrPc02UqHn6xMigPMBVNLqlcaCvyvkV00D2VpOTKetbnL2DU2mHQUu7UWlSZiEu4ru0zWusLxB7ngv3qfm2bLgpXWwAVAhdswFnYuOU4mtkC8l048jJoCz84pUD4pvx92eDqA8CEX5AsTILj84IYsfh7DJNk3S+QIdTRiNDTw9/j/HScummxZSAzIsTaZZRaEt0BoTNPyXbj8XaDPDHkkahzYcQqEwUGORJ3jsf/KXEjG9bfAow3TE4nVcl3Z3h1jAuxy783OoqlWEB+S5+04Q64inyh4bXv2shjv1W4t0l5ua4ROF57e2zj29K/4HWPDGKjknWfge92knuZjhwP8NYQL1MXGcig+GrfVpls0f/Z+Juj9+ZI80YzUrSPET+P7669c22Pf3A2BRr1uFm84sz7rI6ENqyby95pEvTlEWgbzI6s7dct9hEuxPUbnVMF48m6VTeUwLNWmy8SpRCOGfByqfsq89X9rlQPUXSKpnJ40HRJqFWlf1ztJv0sZeTgiwdew+lAE+f4fikBfKWIaOTgSpn0zAJNVOv/q61vYoR/Q1vGYujW3VFjIGnb7GveY1r0ipjtK19EcrfGSXoyo846jkiuJoC7Rzc/IXI6uokHlH1z3AuudmwQ62Rp3qY9XybB7jlutXXMePQSFB0BDBLzHFaA Y3HSEe0c r/R/T48qJZb6Q8UVUD88ZOxZpzbB8jhsEXNz1bJxTU0ssz3V959AeFkA7H8dhe4eycma4T+IDzyfp1Y3z+rZ279Y9lmb/jToi72net2Zr6EDrV0gBKNL05XfhlXkibrK+6CP7xikD89t9w8hfOK/BYKCz7GezfvMf1tEo5maeL2Nz/2DB1jXUPnqdAbUmkdc9CC9vyRPnuvGJ/7jObVDxbGJDgVdbWkaQw/GvLnwiS1Pr9dRxNEX2iEzX8KkytKeJHBna0IspbemUQE+3YmspAiS8U64vikGIRyA5PkV+qwYKjs6ceBukDEsCcAM7w3FwhBKXW0HmnYmnO03pDeps9R0CgVmn16SixuGO/Ax4iAtZXn0sFMNwG6hXaQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [..] > > > By the way, I recently had an idea: if we can conduct the zeromap check > > > earlier - for example - before allocating swap slots and pageout(), could > > > we completely eliminate swap slot occupation and allocation/release > > > for zeromap data? For example, we could use a special swap > > > entry value in the PTE to indicate zero content and directly fill it with > > > zeros when swapping back. We've observed that swap slot allocation and > > > freeing can consume a lot of CPU and slow down functions like > > > zap_pte_range and swap-in. If we can entirely skip these steps, it > > > could improve performance. However, I'm uncertain about the benefits we > > > would gain if we only have 1-2% zeromap data. > > > > If I remember correctly this was one of the ideas floated around in the > > initial version of the zeromap series, but it was evaluated as a lot more > > complicated to do than what the current zeromap code looks like. But I > > think its definitely worth looking into! Yup, I did suggest this on the first version: https://lore.kernel.org/linux-mm/CAJD7tkYcTV_GOZV3qR6uxgFEvYXw1rP-h7WQjDnsdwM=g9cpAw@mail.gmail.com/ , and Usama took a stab at implementing it in the second version: https://lore.kernel.org/linux-mm/20240604105950.1134192-1-usamaarif642@gmail.com/ David and Shakeel pointed out a few problems. I think they are fixable, but the complexity/benefit tradeoff was getting unclear at that point. If we can make it work without too much complexity, that would be great of course. > > Sorry for the noise. I didn't review the initial discussion. But my feeling > is that it might be valuable considering the report from Zhiguo: > > https://lore.kernel.org/linux-mm/20240805153639.1057-1-justinjiang@vivo.com/ > > In fact, our recent benchmark also indicates that swap free could account > for a significant portion in do_swap_page(). As Shakeel mentioned in a reply to Usama's patch mentioned above, we would need to check the contents of the page after it's unmapped. So likely we need to allocate a swap slot, walk the rmap and unmap, check contents, walk the rmap again and update the PTEs, free the swap slot. So the swap free will be essentially moved from the fault path to the reclaim path, not eliminated. It may still be worth it, not sure. We also need to make sure we keep the rmap intact after the first walk and unmap in case we need to go back and update the PTEs again. Overall, I think the complexity is unlikely to be low.