From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DF2BE77199 for ; Thu, 9 Jan 2025 02:08:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 396106B0083; Wed, 8 Jan 2025 21:08:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3458B6B0085; Wed, 8 Jan 2025 21:08:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E6E36B0088; Wed, 8 Jan 2025 21:08:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B6A6C6B0083 for ; Wed, 8 Jan 2025 21:08:14 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8B4E2B0D59 for ; Thu, 9 Jan 2025 02:08:12 +0000 (UTC) X-FDA: 82986278424.14.677162B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 865CC40013 for ; Thu, 9 Jan 2025 02:08:10 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YDJ4qWkO; spf=pass (imf11.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736388490; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8ooB0O1WD64FgHnPSOYOdvJ9nrWH+T4U0WRMROiUJU4=; b=ce13pRGVWvySwL7Z3iRY6QUAgyBzylKK1xiKyDfmf8fbsp/2CaV5N3cIrK1wu3c4fBLHxY Z6E6gluo96y2CG4BIFawQHPVQ6kO2NRFI2sSJMJVUdx4ZWab8tlF4AI5hBC4AW01kIJAZ1 zbBqBVREonhpIc3TuLU08Fs6b9VAocE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736388490; a=rsa-sha256; cv=none; b=RxYnnvAIYRM37WXJZHjalidL83usZjKLyLrmWzanc+VudXK73jpVh22Q9kS8Qzf5MxRn1e KSGtsA5EWJV0vE1GXQl5YwzhcHWOk5ke7WNml8IKuyIoHDJkf98ZibTNZYZhp1z2dEHL3O hcEV1BGpP2RZl38a95TqG1YITZ8b4Z8= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YDJ4qWkO; spf=pass (imf11.hostedemail.com: domain of bhe@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bhe@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736388489; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8ooB0O1WD64FgHnPSOYOdvJ9nrWH+T4U0WRMROiUJU4=; b=YDJ4qWkOmITQ6fCece21jYJuIocuGIVmthFFE08vvOxMYSdVgJw3yuEamE3+cf/1+IrV2t 7fZ/4OySMaOqwFvqcc3D+ZoqBPBfTziq/iwIZ13RultwK/tzpZouS0SYIpaTXJrbhKHpsW zPZQkSVkJyJgFoFbCaOpDmu8amPS1a4= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-271-SwuNs9svOnuwKGUJ_F8jRA-1; Wed, 08 Jan 2025 21:08:03 -0500 X-MC-Unique: SwuNs9svOnuwKGUJ_F8jRA-1 X-Mimecast-MFC-AGG-ID: SwuNs9svOnuwKGUJ_F8jRA Received: from mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.40]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 47A6F1944D05; Thu, 9 Jan 2025 02:08:01 +0000 (UTC) Received: from localhost (unknown [10.72.112.99]) by mx-prod-int-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4140519560AD; Thu, 9 Jan 2025 02:07:58 +0000 (UTC) Date: Thu, 9 Jan 2025 10:07:54 +0800 From: Baoquan He To: Kairui Song Cc: linux-mm@kvack.org, Andrew Morton , Chris Li , Barry Song , Ryan Roberts , Hugh Dickins , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Kalesh Singh , linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 10/13] mm, swap: simplify percpu cluster updating Message-ID: References: <20241230174621.61185-1-ryncsn@gmail.com> <20241230174621.61185-11-ryncsn@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241230174621.61185-11-ryncsn@gmail.com> X-Scanned-By: MIMEDefang 3.0 on 10.30.177.40 X-Rspamd-Queue-Id: 865CC40013 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: knygjmzbmin7prbrpexacg5wknn89zaw X-HE-Tag: 1736388490-485770 X-HE-Meta: U2FsdGVkX1/I3wJtxKZhNOtN9faN9TspgtB8ZUX5B34y9sPe+rx5T5wncPAqfNff8x3rNTt8fU3XFqq2738CKBX3mXBAx2Iljl3jnD8GaoMqYfw8+/YZLppGkznnJpr+EiqOYXeRzi8vEC/2UyCYVnNuAzkzfyksD6DnW0c5Dy3yNNdmOsF/8WAuLlhZ6AFdhQorE0JTNixqLf/GsOlkECos35t8TDELjiueXlpg0BSc4fS26PQwBKrYP45zsm5LZLCiNIbDAsnPYubn5KqpmXTpU4bZZ8071JctHYTRGU/KGhG4r8QbKIlOA8GEB+hLrppKUxH3XEcaXWHUF+jf40degJhpMWETKeKvPWl//8JUBqRQPBNU1Yi+3Js2DrZKv0fBzXh1q6x93+z9vzqGLLLWFzzSAz92StNjPukD46N4vbHkt6LbZO8GpwPFvlKX1D3CkWlXH+g6Wb3WvTffjLDyO9VL76nOALj/32fS2nDHe+upEH3vw87jSIZ8BmM2UjlQ600l+/k80p83CjUWyxak5/XOkN63fEX3G5e/S4nWBq6RjdueUxk7I37b3eyv69XxiIm56BUlYneOCmQO7+shIeNjMfAllQ7PxkUbpjgD/fmIMlR7+5MafxmCm/YW6DfR4Rxqa5A1OleTmgGtQ/5A9AbI2ZSkYR35Cq3EoN9levbjWJdslikOhzBKdyxVUl0TYm+WE1MXKk7fh4f0OTPw3sL7q8GKyg/Eu6k16ocowPM4FnFF/m2F3WqmZH4JIQMUK2tFBqJ3rj9ey/LKeIBJwpCrTcuxtIrboPutdJAevZLuvyNpprS1q/H4Ub+7co9WeOSc50aW6UqPLIyRrSomrRZOkuiHLEW0fiB95vQhXDuKZXy93PBFlfNRlo4y+LnioJxwJ0guw4U0051BnH5PS0w0rysQzRKpQK3tZwYqSUrtX3BZjqrXSKM6nkTMffsx7VfAaS/5VeJKtMu awGP+e2i bRH/zbcrYV73gKTr9aJr84Lc2UTfSo52QLBPldPRoxaxMrn77U8pO6gymHGpyS2S2YSeYQDbGDToSbcn0FxTw338r2Udfn9ZxPFeUmfV8oi66TP4NHEKgWoHenSE++A6Iea9TVsLnbI6g8+dbynMXksoTr9w/4jAPTAboCeYiEV4ZIE1vNGqGJlzIrydb9X2y5705CROh4gzJDTCFWVvRDzmQl5T87tyqTLWRewpFgTStzBYZL2i/Gwj+5xJqT3V8fTTUX2H5WOUjDj3CBflJjJL1lXW0tUIhVvYOUA/Xd+cAo/pGr8mMNGJ9bVwLBoxulUWHwD3ZeXRDitpszPIGQ0Vr/+OMOvhvYrx9JSap9Dx68QtrFJsV/m0wZA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 12/31/24 at 01:46am, Kairui Song wrote: > From: Kairui Song > > Instead of using a returning argument, we can simply store the next > cluster offset to the fixed percpu location, which reduce the stack > usage and simplify the function: > > Object size: > ./scripts/bloat-o-meter mm/swapfile.o mm/swapfile.o.new > add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-271 (-271) > Function old new delta > get_swap_pages 2847 2733 -114 > alloc_swap_scan_cluster 894 737 -157 > Total: Before=30833, After=30562, chg -0.88% > > Stack usage: > Before: > swapfile.c:1190:5:get_swap_pages 240 static > > After: > swapfile.c:1185:5:get_swap_pages 216 static > > Signed-off-by: Kairui Song > --- > include/linux/swap.h | 4 +-- > mm/swapfile.c | 66 +++++++++++++++++++------------------------- > 2 files changed, 31 insertions(+), 39 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index c4ff31cb6bde..4c1d2e69689f 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -275,9 +275,9 @@ enum swap_cluster_flags { > * The first page in the swap file is the swap header, which is always marked > * bad to prevent it from being allocated as an entry. This also prevents the > * cluster to which it belongs being marked free. Therefore 0 is safe to use as > - * a sentinel to indicate next is not valid in percpu_cluster. > + * a sentinel to indicate an entry is not valid. > */ > -#define SWAP_NEXT_INVALID 0 > +#define SWAP_ENTRY_INVALID 0 > > #ifdef CONFIG_THP_SWAP > #define SWAP_NR_ORDERS (PMD_ORDER + 1) > diff --git a/mm/swapfile.c b/mm/swapfile.c > index dadd4fead689..60a650ba88fd 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -759,23 +759,23 @@ static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster > return true; > } > > -static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigned long offset, > - unsigned int *foundp, unsigned int order, > +/* Try use a new cluster for current CPU and allocate from it. */ > +static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, > + struct swap_cluster_info *ci, > + unsigned long offset, > + unsigned int order, > unsigned char usage) > { > - unsigned long start = offset & ~(SWAPFILE_CLUSTER - 1); > + unsigned int next = SWAP_ENTRY_INVALID, found = SWAP_ENTRY_INVALID; > + unsigned long start = ALIGN_DOWN(offset, SWAPFILE_CLUSTER); > unsigned long end = min(start + SWAPFILE_CLUSTER, si->max); > unsigned int nr_pages = 1 << order; > bool need_reclaim, ret; > - struct swap_cluster_info *ci; > > - ci = &si->cluster_info[offset / SWAPFILE_CLUSTER]; > lockdep_assert_held(&ci->lock); > > - if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) { > - offset = SWAP_NEXT_INVALID; > + if (end < nr_pages || ci->count + nr_pages > SWAPFILE_CLUSTER) > goto out; > - } > > for (end -= nr_pages; offset <= end; offset += nr_pages) { > need_reclaim = false; > @@ -789,34 +789,27 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigne > * cluster has no flag set, and change of list > * won't cause fragmentation. > */ > - if (!cluster_is_usable(ci, order)) { > - offset = SWAP_NEXT_INVALID; > + if (!cluster_is_usable(ci, order)) > goto out; > - } > if (cluster_is_free(ci)) > offset = start; > /* Reclaim failed but cluster is usable, try next */ > if (!ret) > continue; > } > - if (!cluster_alloc_range(si, ci, offset, usage, order)) { > - offset = SWAP_NEXT_INVALID; > - goto out; > - } > - *foundp = offset; > - if (ci->count == SWAPFILE_CLUSTER) { > - offset = SWAP_NEXT_INVALID; > - goto out; > - } > + if (!cluster_alloc_range(si, ci, offset, usage, order)) > + break; > + found = offset; > offset += nr_pages; > + if (ci->count < SWAPFILE_CLUSTER && offset <= end) > + next = offset; > break; > } > - if (offset > end) > - offset = SWAP_NEXT_INVALID; > out: > relocate_cluster(si, ci); > unlock_cluster(ci); > - return offset; > + __this_cpu_write(si->percpu_cluster->next[order], next); > + return found; > } > > /* Return true if reclaimed a whole cluster */ > @@ -885,8 +878,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > if (cluster_is_usable(ci, order)) { > if (cluster_is_free(ci)) > offset = cluster_offset(si, ci); > - offset = alloc_swap_scan_cluster(si, offset, &found, > - order, usage); > + found = alloc_swap_scan_cluster(si, ci, offset, > + order, usage); > } else { > unlock_cluster(ci); > } > @@ -897,8 +890,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > new_cluster: > ci = cluster_isolate_lock(si, &si->free_clusters); > if (ci) { > - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), > - &found, order, usage); > + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), > + order, usage); > if (found) > goto done; > } > @@ -911,8 +904,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > unsigned int frags = 0, frags_existing; > > while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[order]))) { > - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), > - &found, order, usage); > + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), > + order, usage); > /* > * With `fragmenting` set to true, it will surely take > * the cluster off nonfull list > @@ -932,8 +925,8 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > * per-CPU usage, but they could contain newly released > * reclaimable (eg. lazy-freed swap cache) slots. > */ > - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), > - &found, order, usage); > + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), > + order, usage); > if (found) > goto done; > frags++; > @@ -959,21 +952,20 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o > */ > while ((ci = cluster_isolate_lock(si, &si->frag_clusters[o]))) { > atomic_long_dec(&si->frag_cluster_nr[o]); > - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), > - &found, order, usage); > + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), > + 0, usage); > if (found) > goto done; > } > > while ((ci = cluster_isolate_lock(si, &si->nonfull_clusters[o]))) { > - offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), > - &found, order, usage); > + found = alloc_swap_scan_cluster(si, ci, cluster_offset(si, ci), > + 0, usage); > if (found) > goto done; > } > } > done: > - __this_cpu_write(si->percpu_cluster->next[order], offset); > local_unlock(&si->percpu_cluster->lock); Do you think if we still need hold the si->percpu_cluster->lock till the end of cluster_alloc_swap_entry() invocation? If so, we may need hold the lock during the whole period when going through percpu_cluster->next, free_cluster, nonfull, frag_clusters until we get one available slot, even though we keep upating the si->percpu_cluster->next[order]. I can't see the point by changing it like this. > > return found; > @@ -3194,7 +3186,7 @@ static struct swap_cluster_info *setup_clusters(struct swap_info_struct *si, > > cluster = per_cpu_ptr(si->percpu_cluster, cpu); > for (i = 0; i < SWAP_NR_ORDERS; i++) > - cluster->next[i] = SWAP_NEXT_INVALID; > + cluster->next[i] = SWAP_ENTRY_INVALID; > local_lock_init(&cluster->lock); > } > > -- > 2.47.1 > >