From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0751FC02183 for ; Wed, 15 Jan 2025 10:52:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CD4E6B0085; Wed, 15 Jan 2025 05:52:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 87B696B0088; Wed, 15 Jan 2025 05:52:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74E90280001; Wed, 15 Jan 2025 05:52:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 5A83E6B0085 for ; Wed, 15 Jan 2025 05:52:18 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0E4F1C127F for ; Wed, 15 Jan 2025 10:52:18 +0000 (UTC) X-FDA: 83009371956.04.5D4D0F2 Received: from mail-lj1-f181.google.com (mail-lj1-f181.google.com [209.85.208.181]) by imf17.hostedemail.com (Postfix) with ESMTP id 100F74001D for ; Wed, 15 Jan 2025 10:52:15 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a2x+FRLn; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736938336; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zoaexl7FIZDwDObL40NWLWP4cFXOJ/HtYGLdIwv4QFY=; b=MtAEVBOlV6x0HLAoerL82K1P7mToF7g9qOsPG+SmSNSknH1hh6BYQvu4FI0eZz4+rSsPpR zqjQ+QAya36g5h4ygibBBSiQFkfWnWeBg0LDepsfYizSC0zlApdMsAx4dSQhbm/boSMkr6 /o5FaUdQ0GH1XroHA+MlJsI07/crr10= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=a2x+FRLn; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.181 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736938336; a=rsa-sha256; cv=none; b=ycUb6G5XS0RwR+s8/Wo9QTKeSWyK4IcyvquJSdYTGLji0sKaPmODGyaIOA05gMDHkb57f3 2Pv2Ks0EQranuvOjndQ2W75LmSruePKsoiiPpI6pq2FLKN4zBar6/8GpMVVBbivMEGJIKL rBLeiy3ve8IrdQLejq0kH3iF8jIW6sE= Received: by mail-lj1-f181.google.com with SMTP id 38308e7fff4ca-30227c56b11so60345541fa.3 for ; Wed, 15 Jan 2025 02:52:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736938334; x=1737543134; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zoaexl7FIZDwDObL40NWLWP4cFXOJ/HtYGLdIwv4QFY=; b=a2x+FRLncfxDWioGWhVjC6qVB9ePOQ4d/coPHZpbMNhUYYlMV0mDGyCDhYvewV9do7 HQDi9MZCxDZVZwOIz03bUS1e815o9YnkSJ5x5BCkQd7ql8rUucoUlryZH8egGzMCHVbC H3aSpbn1E1BNgYif9qwTnaimofjwovC9by/2BMQKcElBQB9Di7GDMprWxe8IrJFfvuJE EXvGTQHy06HRK0C5RcTJKS+UdAeZDA5gxu/8fhub8SBHVKkk/J8DIBWSqoGhix7ST2g5 pmjGu7POh1pgRnVNZpCTamXbCG5kwfM+3C/y0MyJODc+czzgzk6IbDeQgeiieiyIfwj2 K+Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736938334; x=1737543134; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zoaexl7FIZDwDObL40NWLWP4cFXOJ/HtYGLdIwv4QFY=; b=KSZoqjO5aecneXM+3f/tkojhD5mGvYoHPelZzdlj3xMikg20H7dRtxvH83jKwIc6Nn JMj6xiOK/2TMjuvDqRyXVFmFac6MTB5zwLDm1sprSlpzgFxz+ITqvB7k+AstTnVeTtgL 4y3O3zr9bsOH3W7Ecbs835rpgcldbvTnj3IAU3BUaWQL+kHwJuS5Ax3mEXFTSgI2uH00 u2XFsi67WPCLE+4y0pGB5Pu1doRha+HweSyWe5dDxcqOCElyn04N3KWhAV/fonNDRXzN OkL8UY/uWFhVVdDJg8xCx/loaK7URbW23HCrBM33oN+S8xCIbkaJTj2bxjgg21mHf7ga F1gQ== X-Forwarded-Encrypted: i=1; AJvYcCUPZQzxzjP/sTCpmtwkhcIVfEEE5pcMvT4nFghiUQ1ccMs1oVdxwHonzgb0upEmtQ/7jV6HL4IZFw==@kvack.org X-Gm-Message-State: AOJu0Ywm5uFjYq6OTqSSb4dmqjJNGkCAIiu8ynUGhYsuSYEDdnRmAdwS rJyGebP7a/T2ejDZgqeRQ3UDT7x7eRfFWNyI2QCRdGBt/lZFhA3QfocDlssILruSdDdxvreV4jK x7hIl2jyvc6lM7oFXVHn03DNnzwU= X-Gm-Gg: ASbGncsKJtst+NLm/kG6BEuOOOtVHDDJlZ64XB+DF9+hopAnE5+Z5FUAxMW7I8+n2uw abn7hySKH2/Mj3XsqUC22sYM+Bv4yG/jHwOPf X-Google-Smtp-Source: AGHT+IEp9dLezo9dhoHWMDvWq6FM8IOOSIQR8f7uPFJmNcqjdtYjM52nOzlwmh89aJtXTwfh1mxaWIq03DijBx8RQms= X-Received: by 2002:a05:651c:b0e:b0:300:38ff:f8cd with SMTP id 38308e7fff4ca-305f4546fd6mr103260571fa.9.1736938333900; Wed, 15 Jan 2025 02:52:13 -0800 (PST) MIME-Version: 1.0 References: <20250113175732.48099-1-ryncsn@gmail.com> <20250113175732.48099-13-ryncsn@gmail.com> In-Reply-To: <20250113175732.48099-13-ryncsn@gmail.com> From: Kairui Song Date: Wed, 15 Jan 2025 18:51:57 +0800 X-Gm-Features: AbW1kvYnk69SPMJKaKM_5e1GmPGfWprUX1kQRQf0PlkYrozWLAcCldBFxW9izD0 Message-ID: Subject: Re: [PATCH v4 12/13] mm, swap: use a global swap cluster for non-rotation devices To: Andrew Morton , linux-mm@kvack.org Cc: Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Baoquan He , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Stat-Signature: isbpx48xhacfai3sy5zj8fg8yd7arndm X-Rspamd-Queue-Id: 100F74001D X-Rspam-User: X-HE-Tag: 1736938335-377806 X-HE-Meta: U2FsdGVkX19qDTiJz6KF25oL8OUgy9T5XvXohv50k+rjzKTyXbFG7vN1mzGXTF6/dSIazPWlBeID3sI2c+uHH2RmkEqZkNOAU6hFXJqnkZoU2kKUaQk0wQ6DSPiiz9vuuX3HWciNQHhGKcHjxmNvLk6/9JvX7nHICMxm7M2sXomvnGKbKsGxpNu0Bu6d186CUsKpCBvLOR1axIJX1WYLfaqT1enBxJj5iW70gE9cGVAIacKtcKuoSc9opmqCN4MS7ivpxtpM9tX65s2ypEDawEmRL5Y+yme5rktAoWoa1mO4MN3LSXjzyGD8A3otIIWbGDlctbTL0QCanR9uPREmDD/FHmdhb9+qsAvpo8sFMj0t1sYpCIMwIoupWsSGhsOddkQpYNvLWK10+uic9gwMqHqFqLosc9zGfmVpOqhJEUG1cPxh4gSpKS/xOfVCjvrV0f/SLC8mu5czGTPp5szPE3JD2wixghx8urGnkwOBrWSRtPRdekH1A6lKQBbnqjWbzqsqcRxu1ZwBzQaGywU58tK38pAX5/rAsWn3GMREyjCUC0ez8uCG0TZD+5tFKb97Cla/qJNo1dYmen+GIai7qZNlEVqPVz0yjKtrJKRkHQZ8YF4djg1WP/Vg18amG/P84KIcK0NNhQsW2eTQTO+219vGQGXaaNJsvi2pvjn6gXfEu68pfivNv1slPdNLWq1PuBVqK7PomrOo2xQvpEbIh8tkmk3DqYRhoqcTT7RWlmT2DdoR9Ez3ATTS5H6/TUDpf+KFrB0slZ8VDBlxXy7QNY4HAyNkEQBx8HXqHgln9ilGPjQBuxXp0XUxNP63t/66/K40jILvAj838JkgMRPhbYRm56cqcYdMFUd/syp6dejxl5zIbL6PWZrZ41Gzxr6JySh+UF50DhORmnqyuqsfXi6hiMTa1Fm0/dcbfSGi0JXHShkgDP+hUQTHbc5+oSHhgzQraKmGymteYNXaDhq 8a+8/ASN Oz8PRuiLnFwAwdCQT04KQA50TjJCeflHEti4Cz6Aat664VreJwNnIverjBiiHxgocb1OFMnwzwCiI++yhZTsQIIGsJyjhbz4iIoIPryMSRYJhkEIUjHEKeEE96WuRZ9iKw9aR4dQQ0kEMlqgQqPCXTlQQfTFXhU09BK0KOYtDnGdgQplj+YUHG1WJ67lt+hNKZaItCt7Fdkx/e3RWLy8VUtkC0+yChrKMQtYq+Sj5nxOjKVCS2PlF+PvcQ+2i2XtNMmhS6JXs5TheF4pu/81mR6+1yWnjjr1PsWACfPWvtja8r3dVyxcVJqel/blqi7Z2wJGGzIS7gmLf4XDeC2eVSbqBBHb6nVMJkf3oJeUSm+gjIP7hVOhpDyllWA4Jqw5izncfMhQ4WARSHM4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 14, 2025 at 2:00=E2=80=AFAM Kairui Song wrot= e: > > From: Kairui Song > > Non-rotational devices (SSD / ZRAM) can tolerate fragmentation, so the > goal of the SWAP allocator is to avoid contention for clusters. It uses > a per-CPU cluster design, and each CPU will use a different cluster as > much as possible. > > However, HDDs are very sensitive to fragmentation, contention is trivial > in comparison. Therefore, we use one global cluster instead. This ensures > that each order will be written to the same cluster as much as possible, > which helps make the I/O more continuous. > > This ensures that the performance of the cluster allocator is as good as > that of the old allocator. Tests after this commit compared to those > before this series: > > Tested using 'make -j32' with tinyconfig, a 1G memcg limit, and HDD swap: > > make -j32 with tinyconfig, using 1G memcg limit and HDD swap: > > Before this series: > 114.44user 29.11system 39:42.90elapsed 6%CPU (0avgtext+0avgdata 157284max= resident)k > 2901232inputs+0outputs (238877major+4227640minor)pagefaults > > After this commit: > 113.90user 23.81system 38:11.77elapsed 6%CPU (0avgtext+0avgdata 157260max= resident)k > 2548728inputs+0outputs (235471major+4238110minor)pagefaults > > Suggested-by: Chris Li > Signed-off-by: Kairui Song > --- > include/linux/swap.h | 2 ++ > mm/swapfile.c | 51 ++++++++++++++++++++++++++++++++------------ > 2 files changed, 39 insertions(+), 14 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index 4c1d2e69689f..b13b72645db3 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -318,6 +318,8 @@ struct swap_info_struct { > unsigned int pages; /* total of usable pages of swap = */ > atomic_long_t inuse_pages; /* number of those currently in u= se */ > struct percpu_cluster __percpu *percpu_cluster; /* per cpu's swap= location */ > + struct percpu_cluster *global_cluster; /* Use one global cluster = for rotating device */ > + spinlock_t global_cluster_lock; /* Serialize usage of global clus= ter */ > struct rb_root swap_extent_root;/* root of the swap extent rbtree= */ > struct block_device *bdev; /* swap device or bdev of swap fi= le */ > struct file *swap_file; /* seldom referenced */ > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 37d540fa0310..793b2fd1a2a8 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -820,7 +820,10 @@ static unsigned int alloc_swap_scan_cluster(struct s= wap_info_struct *si, > out: > relocate_cluster(si, ci); > unlock_cluster(ci); > - __this_cpu_write(si->percpu_cluster->next[order], next); > + if (si->flags & SWP_SOLIDSTATE) > + __this_cpu_write(si->percpu_cluster->next[order], next); > + else > + si->global_cluster->next[order] =3D next; > return found; > } > > @@ -881,9 +884,16 @@ static unsigned long cluster_alloc_swap_entry(struct= swap_info_struct *si, int o > struct swap_cluster_info *ci; > unsigned int offset, found =3D 0; > > - /* Fast path using per CPU cluster */ > - local_lock(&si->percpu_cluster->lock); > - offset =3D __this_cpu_read(si->percpu_cluster->next[order]); > + if (si->flags & SWP_SOLIDSTATE) { > + /* Fast path using per CPU cluster */ > + local_lock(&si->percpu_cluster->lock); > + offset =3D __this_cpu_read(si->percpu_cluster->next[order= ]); > + } else { > + /* Serialize HDD SWAP allocation for each device. */ > + spin_lock(&si->global_cluster_lock); > + offset =3D si->global_cluster->next[order]; > + } > + > if (offset) { > ci =3D lock_cluster(si, offset); > /* Cluster could have been used by another order */ > @@ -975,8 +985,10 @@ static unsigned long cluster_alloc_swap_entry(struct= swap_info_struct *si, int o > } > } > done: > - local_unlock(&si->percpu_cluster->lock); > - > + if (si->flags & SWP_SOLIDSTATE) > + local_unlock(&si->percpu_cluster->lock); > + else > + spin_unlock(&si->global_cluster_lock); > return found; > } > > @@ -2784,6 +2796,8 @@ SYSCALL_DEFINE1(swapoff, const char __user *, speci= alfile) > mutex_unlock(&swapon_mutex); > free_percpu(p->percpu_cluster); > p->percpu_cluster =3D NULL; > + kfree(p->global_cluster); > + p->global_cluster =3D NULL; > vfree(swap_map); > kvfree(zeromap); > kvfree(cluster_info); > @@ -3189,17 +3203,24 @@ static struct swap_cluster_info *setup_clusters(s= truct swap_info_struct *si, > for (i =3D 0; i < nr_clusters; i++) > spin_lock_init(&cluster_info[i].lock); > > - si->percpu_cluster =3D alloc_percpu(struct percpu_cluster); > - if (!si->percpu_cluster) > - goto err_free; > + if (si->flags & SWP_SOLIDSTATE) { > + si->percpu_cluster =3D alloc_percpu(struct percpu_cluster= ); > + if (!si->percpu_cluster) > + goto err_free; > > - for_each_possible_cpu(cpu) { > - struct percpu_cluster *cluster; > + for_each_possible_cpu(cpu) { > + struct percpu_cluster *cluster; > > - cluster =3D per_cpu_ptr(si->percpu_cluster, cpu); > + cluster =3D per_cpu_ptr(si->percpu_cluster, cpu); > + for (i =3D 0; i < SWAP_NR_ORDERS; i++) > + cluster->next[i] =3D SWAP_ENTRY_INVALID; > + local_lock_init(&cluster->lock); > + } > + } else { > + si->global_cluster =3D kmalloc(sizeof(*si->global_cluster= ), GFP_KERNEL); Hi Andrew, Sorry I just found I missed this tiny fix here, can you help fold it to the current unstable tree? diff --git a/mm/swapfile.c b/mm/swapfile.c index e57e5453a25b..559b8e62ff71 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3212,6 +3212,8 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, } } else { si->global_cluster =3D kmalloc(sizeof(*si->global_cluster), GFP_KERNEL); + if (!si->global_cluster) + goto err_free; for (i =3D 0; i < SWAP_NR_ORDERS; i++) si->global_cluster->next[i] =3D SWAP_ENTRY_INVALID; spin_lock_init(&si->global_cluster_lock);