From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42239C282EC for ; Thu, 13 Mar 2025 17:01:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE0AC280005; Thu, 13 Mar 2025 13:01:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8FD7280001; Thu, 13 Mar 2025 13:01:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 909EA280005; Thu, 13 Mar 2025 13:01:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6F06F280001 for ; Thu, 13 Mar 2025 13:01:17 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 11F1612035C for ; Thu, 13 Mar 2025 17:01:17 +0000 (UTC) X-FDA: 83217143436.03.E292E4D Received: from mail-pl1-f180.google.com (mail-pl1-f180.google.com [209.85.214.180]) by imf07.hostedemail.com (Postfix) with ESMTP id 8D3424002A for ; Thu, 13 Mar 2025 17:01:14 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MQXPSegu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741885274; a=rsa-sha256; cv=none; b=a1I20Y7kl006wsJ+3sJqK7sOXIHhhK0CC4dXCT+xQ08frc+z1wCNbJxBI5rWUKT8dSgBMc n2s2806Qc+rmGxRQX6oB8aqtgC/jxbQEBhO5/A4wJ4Qh3XxUlpmvb7itwDo76c+PIeZTIw idR+PlqjIb2Q+v0L4yfMFzUJQ61UkN8= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MQXPSegu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741885274; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=5P82pdBAFVbDSRi5XIPEAoEtfgv9YXlZdJ7xlxw7MDg=; b=y1LBUtUooyKHMW7mG1npeOktkMQEAVZN7GFCTU0IobwsXNnF7lDNM6tJURnYJZyRi62jBq 7pAz2Jolbr4N7eGTSASDhoWNA+mZElxcyCGDjf8rcURGcX1CipMXQGA87uuEtWlmswRmo+ iCh4zN2VIXuEnquU3Yb+pzHsZ217bsA= Received: by mail-pl1-f180.google.com with SMTP id d9443c01a7336-22359001f1aso31642085ad.3 for ; Thu, 13 Mar 2025 10:01:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1741885273; x=1742490073; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=5P82pdBAFVbDSRi5XIPEAoEtfgv9YXlZdJ7xlxw7MDg=; b=MQXPSeguL/FWs5k1GeW9qL/imq5sbDiVGLt3+eFX4TQRJH0HLHd1gBHe1DCZjCJSY+ cmAKOBzwV7frPnaf/GRVggqj0QiCj4qcpmgjwhijDvrUJN6Mw5PVdd90QbbwTqZucUPf jHKCSgteEtdUC6vu842gS6HxcZeObDiJpMk7CxeV/nN8AyObwP8kdSQLM8FROJizvlHP h9Oe8wPwKtZrCs4YgWn71hyUk2dVs1523Rjs5+gowSnOgWqEE4zTuiH0Qo5Jat9VEXvz ADPt4MscU2EFCAoHI4XqmGFrXgf/6Epr2gzMUzgCVqdXullTxegkMHfnXEx+Sg+sG8VU w1ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741885273; x=1742490073; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=5P82pdBAFVbDSRi5XIPEAoEtfgv9YXlZdJ7xlxw7MDg=; b=qVq4IDev2BDv/pcctYoKdJXYT3YayVX+GdF7B+XjXD/V3tDmgMVcSO6gR6yWtUFB1A PQLWW1POstyVfG5jdQ8GDtoA3JEvPGz0vH+oQlZjqXColwfoJwKsasPaOGY6oEFm3g52 ep6lkQtBwJWCSI9vpREvBLX3cD6tYCf+5MH4/n/OOEWDWw8v1TqsmFHITJp86SpUxQR0 a6spSvMoO3oqOd6xeGVTI4OZTsb+2u8K4amqLjFwUdvzBwqAMFRsQ8LDnpG14SzLLzVf cy9IIUndFammXyr3slDDTQ/apWjhC0E84t79c6ukc6D0MAQHKQOziFbUEPkd4sjhFWne kACg== X-Gm-Message-State: AOJu0YycRa0W4+d2v4684jiJ2MKGEzMIXKDvx4woiDmvSoTtnPRKBdDx hR2XKMpYwyEzMr2R/bdvU32hDxX6d0ekf0P/sxEhzkKwPp7M1uoFnOPOKtqA4II= X-Gm-Gg: ASbGnctaCaOD/1aZnQvkAYlXxwS96coVDV9LaFiIDYIO/bMjF3Dk3Vm+zUsULnWeJXx V8CboHdv8ZVEWGFuWcZaMTuOn5hfObSxk9poER2pmbtGHGcNREg55rxdOPlDsdwK9GIL8726Qex SvNXsqcPr3YlflrYr7gsj7m3bYcifRxjRywlDfZWini/humD5iFvJH8iW7d/RSZDEeHRVRlIzBB OWUGmE5k/44RCdKZCblztrwVCRNB+cuLnFjZfydjEo9MIcY0kiZCJOmL9sPXLgeG2ZiORd5T1DZ izpShRAZ4+jm+eCD750Hy1/KgVkfgCU/5i33tai3TLMH+M8qgKF4mRRt6rPIk6/SJ2BMYcfYScz x X-Google-Smtp-Source: AGHT+IFtZYssUMBc4VlC36k5H0KjOuFJKyZR5Lu6B6Wylds+EzqaCHgNjSLG+/KFod08Js6ZDt8GXA== X-Received: by 2002:a05:6a00:3c8d:b0:736:52d7:daca with SMTP id d2e1a72fcca58-7371f172930mr352757b3a.18.1741885272638; Thu, 13 Mar 2025 10:01:12 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.220]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-7371167df0esm1613529b3a.93.2025.03.13.10.01.06 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 13 Mar 2025 10:01:12 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , Barry Song , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Baolin Wang , Kalesh Singh , Matthew Wilcox , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 0/7] mm, swap: remove swap slot cache Date: Fri, 14 Mar 2025 00:59:28 +0800 Message-ID: <20250313165935.63303-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.48.1 Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 8D3424002A X-Rspamd-Server: rspam05 X-Stat-Signature: moh8r4jqgmwamamxf5xwhaphhijoc99t X-HE-Tag: 1741885274-803273 X-HE-Meta: U2FsdGVkX1+IStGWFCLqRwu8GAXVf1bw9GagVvU5Aa07YbTIz0YDQhEZ36oZypfEbHuNMqFJC1yz2YmAQ9BqMWNK+1ZUfnDvqFVXKtkZtK3au40GrBDdc5thlFjMAVAe6e1NxCf1/XPAOisNydCOrNFq1TRsHD+Yq0UJKTM28YSs1fPBhwP8Z2Bih5dfW3m9JZXZf9mDQUJyKFo5zqWiOTvTjePZd3miMMe1Egh/UFWRUyMQYEKgDmL23UgWbSOkAjJyIECvbni4mAu7jUEb3QwLBKrN42jhATiAgaZJ52kFx2HZVJS5heyL2YBqL70go4NZwiSlLtir7RF6bkpjz7huAWy2mz3e+mbMw20TyWqNxW638a965TQIojHCIv34UHByWRmu8s7T86Tr4DH2TwSGfKJ1bCnYB0pKazz29n57UZt/cQbq51WSpW3wvGeg+z/q8AV2MJ8PyGWeCrFYTufUSCX6Do/1eRYZIeDFxG4Fm2NfmfTXwTYGFnLtUyE8HKIlVujG4Ir237moOZgG6PXiCIe79UP8y0fW88T3eevDPFGFjN0VMXRK9KAklKSdWOWwFRc86fDDYVoyT0xkPJxMGV0uwnMXd2KLWFEs0q2sZxe7KKOVvkolz+O2vPfWlG9mhofNaKbs9xbChO2TvSSeTm8E3xXYL7X7Wc0fA8Lo+FiEB2irtLmC+qOIKHvNONFyzU8Kc3roB3fX76Sr6si5sMYhxhjRswt+oEegskgbtxVg0SbYC3qTlw5+1kieHuEsCx0xh+gFf0z9p2X6JSJIiLjgBCyt99UShOPmV3IyYHfJDhQrMIOEF1N0RpXA8QOvWJqE79iSWioN3xHPZ+ZyUDJXCT9YSLoEfECCuUytrmtx97Fmqzm+lSJUorykvRdYUmoYeloEEvUBl5O8eq10xyZkDRwmUYHvWwqOUfjchJztVtZRlrC6YL69nqsfxuXRHEGovx8X0LyWN9S 68ceG47j kuy2zFiNArKvm2jH/rAL6DrXZjMHeMDkQH0NSBWePw22rOkU5hC6wxjkq7DCq9t/D5D8bMTcpPJDEFkKD7GbDMNv6eZjEtuiNBIemVnWr57oXi2Kn1dqL9LPBkDv+VI/WJ+Vv3HSlhZh4hfk3zod3SLDx++pBU7IdhKijJTRsuJJvRmqqtu0R+Wx5Bx1wVW0tprx5OzJAwYM/dZg1FaLKYy+cwcdqDkrcKqPtKMdJ5roERdNgSgKkNFlxRy6Yquj5K1xUZmAd73JMd3sWOxo/zQQbGek3jQcEJJJ5MhzTJxwbUyKZb2YprizEHzlksQPiL4XJrItIpiLHAXedajlHbLMRcLvcaivFn11IJKOCSxUqbC0ueu7lI4XYTnPFwJtMPniTgd8K0qdtmR2cXojc9FGp1SzPXc0uEPCv/tZx0S9jy3r+/4sci2hSyMy4sYBgvsanJUwyDfR6mgkF3PJdmilhMut4QQ6AOfWe9gyIgPsW9Eaico+yFzKE135NtBy83U6EmIAuM11DBmjDJ645nMCQ8TXugLmhuidiY1OUtvrKWHv1UcfMeLPiAwSw1HLND/0pIQgUPLBU+xk/burfl28ZVRi3at1h6Rc5UhWKNod+RWs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Slot cache was initially introduced by commit 67afa38e012e ("mm/swap: add cache for swap slots allocation") to reduce the lock contention of si->lock. Previous series "mm, swap: rework of swap allocator locks" [1] removed swap slot cache for freeing path as freeing path no longer touches si->lock in most cased. Allocation path also have slight to none contention on si->lock since that series, but slot cache still helps to reduce other overheads, like counters and the plist. This series removes the slot cache from allocation path too, by using the cluster as allocation fast path and also reduce other overheads. Now slot cache is completely gone, the code is much simplified without obvious feature or performance change, also clean up related workaround. Also this should avoid other potential issues, e.g. the long pinning of swap slots: swap slot cache pins swap slots with HAS_CACHE, causing reclaim or allocation fail to use these slots on scanning. The only behavior change is the swap device allocation rotation mechanism, as explained in the patch "mm, swap: use percpu cluster as allocation fast path". Test results are looking good after deleting the swap slot cache: - vm-scalability with: `usemem --init-time -O -y -x -R -31 1G`, 12G memory cgroup using simulated pmem as SWAP (32G pmem, 32 CPUs), 16 test runs for each case, measuring the total throughput: Before (KB/s) (stdev) After (KB/s) (stdev) Random (4K): 424907.60 (24410.78) 414745.92 (34554.78) Random (64K): 163308.82 (11635.72) 167314.50 (18434.99) Sequential (4K, !-R): 6150056.79 (103205.90) 6321469.06 (115878.16) - Build linux kernel with make -j96, using 4K folio with 1.5G memory cgroup limit and 64K folio with 2G memory cgroup limit, on top of tmpfs, 12 test runs, measuring the system time: Before (s) (stdev) After (s) (stdev) make -j96 (4K): 6445.69 (61.95) 6408.80 (69.46) make -j96 (64K): 6841.71 (409.04) 6437.99 (435.55) The performance is unchanged, slightly better in some cases. [1] https://lore.kernel.org/linux-mm/20250113175732.48099-1-ryncsn@gmail.com/ --- V2: https://lore.kernel.org/linux-mm/20250224180212.22802-1-ryncsn@gmail.com/ Updates from V2: - Make folio_alloc_swap() inline to fix build error [Stephen Rothwell] - Flush the global percpu cluster cache on swapoff to prevent new swapon devices using the old invalid values. Based on: https://lore.kernel.org/linux-mm/CAMgjq7AkRmb5ote-VZErM_2UdEC575j9WcrstcQOypEb+T-DLA@mail.gmail.com/ - Minor update for patch 5/7: in slow path also try the local cluster first to avoid fragmentation. It's a intermediate patch change for easier testing if someone run into a bisect, the final code after the whole series applies is not changed. - Need to call mem_cgroup_try_charge_swap even if swap allocation failed for cgroup events. - Collect reviews and minor improvements [Baoquan He]. V1: https://lore.kernel.org/linux-mm/20250214175709.76029-1-ryncsn@gmail.com/ Updates from V1: - Check the cluster with cluster_is_usable and cluster_is_empty in fast path too, improve performance and avoid fragmentation. - Fix a build warning and error for !SWAP build reported by test bot. - Global cluster array also record device for each order [Baoquan He] - Adjust of comments and function name [Baoquan He] - Collect Review-by [Baoquan He] - Minor function style improvement [Matthew Wilcox] Kairui Song (7): mm, swap: avoid reclaiming irrelevant swap cache mm, swap: drop the flag TTRS_DIRECT mm, swap: avoid redundant swap device pinning mm, swap: don't update the counter up-front mm, swap: use percpu cluster as allocation fast path mm, swap: remove swap slot cache mm, swap: simplify folio swap allocation include/linux/swap.h | 22 +-- include/linux/swap_slots.h | 28 --- mm/Makefile | 2 +- mm/shmem.c | 21 +-- mm/swap.h | 6 - mm/swap_slots.c | 295 ------------------------------ mm/swap_state.c | 79 +-------- mm/swapfile.c | 355 ++++++++++++++++++++----------------- mm/vmscan.c | 16 +- mm/zswap.c | 6 + 10 files changed, 232 insertions(+), 598 deletions(-) delete mode 100644 include/linux/swap_slots.h delete mode 100644 mm/swap_slots.c -- 2.48.1