From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9990FC27C53 for ; Wed, 19 Jun 2024 07:53:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1F4F58D0075; Wed, 19 Jun 2024 03:53:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 17DDB8D004F; Wed, 19 Jun 2024 03:53:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F38AC8D0075; Wed, 19 Jun 2024 03:53:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CC1D78D004F for ; Wed, 19 Jun 2024 03:53:07 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 788C91C1C91 for ; Wed, 19 Jun 2024 07:53:07 +0000 (UTC) X-FDA: 82246872414.03.5E8094F Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) by imf08.hostedemail.com (Postfix) with ESMTP id BE85016000A for ; Wed, 19 Jun 2024 07:53:04 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dHiolX7u; spf=pass (imf08.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718783581; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=htV+QPzaHZh9TFZufL5pvo9Bl9JYKfsNMsyYXcK396I=; b=XwPo2AbyyQF7iIk6mAwNS/RsarYurWDRTSa56BsmLWDNERU7u7/PUDOkbUwTPK4nfGM6Ln 24pXCKgPclw3am8MX11GSGE5u2hXbl7q3eRe4twvdvb6O69EbcnqEz+HATQgcQKmBTXxJL 1fPw6cHYic9HjzlufxsPZ99S8qFhcO0= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=dHiolX7u; spf=pass (imf08.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718783581; a=rsa-sha256; cv=none; b=WqwiHPT1wjeeD3acA0uJ8FoxNPZp8cFVyhnqnVG7HM2o/TrOu1mJzZ6PCl2qbTXlY/clk+ eQQuCz/byPYXHGZ21O13QuK/bkkTQibmnsKTDYBRhrWlpP6tQLVsZVtiQC9+4j7S2+RIyw PPX+R08vk2DXLYnT/LE/Crz4yuaUr84= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1718783585; x=1750319585; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=HqAayMXfsb3bAlmNU8C06hZ7IHnX6K11yuzyPkwsHy0=; b=dHiolX7uKH0N4IjSQj/dpaDH9TCrWqD4RjsmIIPRj89SV/Hhn6jDoxvX /nRPucoVsWeGrddQ+hZJCmt1ikjIReUlCQ4mUpnkL/7Irv6K4uJW8kx8q f++iVOTeQ6x2muQGy98pkjVgfA9te29t7uu9j9fRLF0nZTNmhqPR+YsEC OpJkGedyw4P860ex6Wu/XviaJH/wSUZb2ucrdehOcne7OH897nZGuc2la xfLNRGA2svI8VWSYeQPjcA5MWKJ2Ek02dgVPDc7CJmW9JGmroISaa1dZP SzDnrN69IUdvhf48Ox0iN9pYHU7f6nq7+iGGDcZV/DDALF2wsxgEZdTGV A==; X-CSE-ConnectionGUID: ko555kucTC+UP/U1mPNfMw== X-CSE-MsgGUID: oH8Jtg4kTcGwu/o461KkZw== X-IronPort-AV: E=McAfee;i="6700,10204,11107"; a="15846957" X-IronPort-AV: E=Sophos;i="6.08,249,1712646000"; d="scan'208";a="15846957" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2024 00:53:03 -0700 X-CSE-ConnectionGUID: zfLGzj9NQCe8JAkYt4MAxg== X-CSE-MsgGUID: G1L5H5Z0R2ic3jRkOdIBbQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,249,1712646000"; d="scan'208";a="41978623" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jun 2024 00:53:00 -0700 From: "Huang, Ying" To: Chris Li Cc: Andrew Morton , Kairui Song , Ryan Roberts , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Subject: Re: [PATCH v2 1/2] mm: swap: swap cluster switch to double link list In-Reply-To: (Chris Li's message of "Tue, 18 Jun 2024 03:01:42 -0700") References: <20240614-swap-allocator-v2-0-2a513b4a7f2f@kernel.org> <20240614-swap-allocator-v2-1-2a513b4a7f2f@kernel.org> <87frtc5bxm.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y1724re8.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 19 Jun 2024 15:51:09 +0800 Message-ID: <87le314bgy.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: BE85016000A X-Stat-Signature: 9soyx45axut8jh3mosq6j438frf64p4h X-HE-Tag: 1718783584-117418 X-HE-Meta: U2FsdGVkX18csTs6fzFI0hYPG4wM5DfpdvM/J22rWSZKX5QrlI7Yy26biJg4SUKI/t/9t8iH6o5Rb2CVd7AXxPty8C5r/Jv58hneOG1TyC+4AOprgy9w7vwKwpk5WJYvHvPDtV3Kw6T+bxoukXbiEQee3Ij7kIo9xhsKKHE+FzhQhb8YACEb9rHbwj3k7HdrGglalmg8rPTW0A3GMACbpRjU2Qaujce3ImvSSigFB5UbBGAVkW85CjFB3PAHzd7EaqvPEE/2IOD62KvriqpVa5BmrnrwDVKYq0Dl5Vq33Z5Mf1MWG1+2607LT5LklykhX8dYhkk8z565I4i5qTCyCWnMnbJZl4loGDwW9EN468n1e3x7ELXcxFU6HykTV6Cmtm+Lpv4qBbokfdnTpzVIFVo3bK5l7t24RwEKTVkCanpGN6z6Y2KWkRnc+3W6dcqWg0bu7evUMM/mVIwghphWEyvpSPke54jOJEYmTFhCTlN0463Rvpd/+yIxPEURybHvaFBF/wfHBc5Glirz1mGDMe+L4OOj1NyXYAS6c48hzsrPzdyaYRpnsPTQXEg9ZmIrRqLWc4jUPM7tpSXDh86nghkn2PpBvi/IMFhpLjOLmzm3pzaXPY7445dDZRFWwlk7tlrKHbebcBLHleLcbWDxWySMxkSxf/YpZJoMxzEiWhIR/jo6ShZY3o8+ZJRppWlpc3SZxGeXFDrzo6AM9JL58VrSSE3X4+/Bu/bg8oaQIhDKsWt6Xm6J1OTmIaGirJJQRV7BBW5lE//uLupoTHqL0yrTl8BwqURO+2EDQwfaQW7XT56tqMDAZjA35Sp7AX7wwB3EYEMixtot+CyZRyoUy2id66lAgDiJXo2Ynz9c+F4rj+i4rI7Y+ju/1MvD8r7JkPZ3rgAaS6bgW6cyBMFMnB/aXRe96Vtu9oQs0N2pI0/al3U5lGJIUwK5TegcZATzy+mnYY2JkjC47Xb2vrt LQc1qqF1 aeWAeNhcoMbhlL3If/8NPN++eqdGHtAvItQuzlL0haNGTA4991eKBJKVs26Vqv3w8YLNxA5GN6rJ/E0sxZhgpYelwaexixysMi+T0fGjkLHW4ZhRFtmSlPVuZii8zprUoAOOFVVTnZeYN+xdi9UYwUpqQJnT/1QsD3mJmKbphzq9nPDQ80bpzhxlDUZE4B2YaGcp43BmVtuCp2gYIG9udrc/xIwW02hW4i4sr X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Chris Li writes: > On Tue, Jun 18, 2024 at 12:56=E2=80=AFAM Huang, Ying wrote: >> >> Chris Li writes: >> >> > On Sun, Jun 16, 2024 at 11:21=E2=80=AFPM Huang, Ying wrote: >> >> >> >> Hi, Chris, >> >> >> >> Chris Li writes: [snip] >> >> > diff --git a/include/linux/swap.h b/include/linux/swap.h >> >> > index 3df75d62a835..cd9154a3e934 100644 >> >> > --- a/include/linux/swap.h >> >> > +++ b/include/linux/swap.h >> >> > @@ -242,23 +242,22 @@ enum { >> >> > * space with SWAPFILE_CLUSTER pages long and naturally aligns in = disk. All >> >> > * free clusters are organized into a list. We fetch an entry from= the list to >> >> > * get a free cluster. >> >> > - * >> >> > - * The data field stores next cluster if the cluster is free or cl= uster usage >> >> > - * counter otherwise. The flags field determines if a cluster is f= ree. This is >> >> > - * protected by swap_info_struct.lock. >> >> > */ >> >> > struct swap_cluster_info { >> >> > spinlock_t lock; /* >> >> > - * Protect swap_cluster_info fields >> >> > - * and swap_info_struct->swap_map >> >> > + * Protect swap_cluster_info count an= d state >> >> >> >> Protect swap_cluster_info fields except 'list' ? >> > >> > I change it to protect the swap_cluster_info bitfields in the second p= atch. >> >> Although I still prefer my version, I will not insist on that. > > Sure, I actually don't have a strong preference about that. It is just co= mments. > >> >> >> >> >> > + * field and swap_info_struct->swap_m= ap >> >> > * elements correspond to the swap >> >> > * cluster >> >> > */ >> >> > - unsigned int data:24; >> >> > - unsigned int flags:8; >> >> > + unsigned int count:12; >> >> > + unsigned int state:3; >> >> >> >> I still prefer normal data type over bit fields. How about >> >> >> >> u16 usage; >> >> u8 state; >> > >> > I don't mind the "count" rename to "usage". That is probably a better >> > name. However I have another patch intended to add more bit fields in >> > the cluster info struct. The second patch adds "order" and the later >> > patch will add more. That is why I choose bitfield to be more condense >> > with bits. >> >> We still have space for another "u8" for "order". It appears trivial to >> change it to bit fields when necessary in the future. > > We can, I don't see it necessary to change from bit field to u8 and > back to bit field in the future. It is more of a personal preference > issue. I have to say that I don't think that it's just a personal preference. IMO, if it's unnecessary, we shouldn't use bit fields. You cannot guarantee that your future changes will be merged in its current state. So, I still think that it's better to avoid bit fields for now. >> >> >> >> And, how about use 'usage' instead of 'count'? Personally I think th= at >> >> it is more clear. But I don't have strong opinions on this. >> >> >> >> > + struct list_head list; /* Protected by swap_info_struct->loc= k */ >> >> > }; >> >> > -#define CLUSTER_FLAG_FREE 1 /* This cluster is free */ >> >> > -#define CLUSTER_FLAG_NEXT_NULL 2 /* This cluster has no next clust= er */ >> >> > + >> >> > +#define CLUSTER_STATE_FREE 1 /* This cluster is free */ >> >> [snip] >> >> > /* >> >> > @@ -481,21 +371,22 @@ static void __free_cluster(struct swap_info_s= truct *si, unsigned long idx) >> >> > */ >> >> > static void swap_do_scheduled_discard(struct swap_info_struct *si) >> >> > { >> >> > - struct swap_cluster_info *info, *ci; >> >> > + struct swap_cluster_info *ci; >> >> > unsigned int idx; >> >> > >> >> > - info =3D si->cluster_info; >> >> > - >> >> > - while (!cluster_list_empty(&si->discard_clusters)) { >> >> > - idx =3D cluster_list_del_first(&si->discard_clusters,= info); >> >> > + while (!list_empty(&si->discard_clusters)) { >> >> > + ci =3D list_first_entry(&si->discard_clusters, struct= swap_cluster_info, list); >> >> > + list_del(&ci->list); >> >> > + idx =3D ci - si->cluster_info; >> >> > spin_unlock(&si->lock); >> >> > >> >> > discard_swap_cluster(si, idx * SWAPFILE_CLUSTER, >> >> > SWAPFILE_CLUSTER); >> >> > >> >> > spin_lock(&si->lock); >> >> > - ci =3D lock_cluster(si, idx * SWAPFILE_CLUSTER); >> >> > - __free_cluster(si, idx); >> >> > + >> >> > + spin_lock(&ci->lock); >> >> >> >> Personally, I still prefer to use lock_cluster(), which is more reada= ble >> >> and matches unlock_cluster() below. >> > >> > lock_cluster() uses an index which is not matching unlock_cluster() >> > which is using a pointer to cluster. >> >> lock_cluster()/unlock_cluster() are pair and fit original design >> well. They use different parameter because swap cluster is optional. >> >> > When you get the cluster from the list, you have a cluster pointer. I >> > feel it is unnecessary to convert to index then back convert to >> > cluster pointer inside lock_cluster(). I actually feel using indexes >> > to refer to the cluster is error prone because we also have offset. >> >> I don't think so, it's common to use swap offset. > > Swap offset is not an issue, it is all over the place. The cluster > index(offset/512) is the one I try to avoid. > I have made some mistakes myself on offset vs index. Yes. That's not good, but it's hard to be avoided too. Can we make the variable name more consistent? index: cluster index, offset: swap offset. And, in fact, swap offset is the parameter of lock_cluster() instead of cluster index. >> > >> >> >> >> > + __free_cluster(si, ci); >> >> > memset(si->swap_map + idx * SWAPFILE_CLUSTER, >> >> > 0, SWAPFILE_CLUSTER); >> >> > unlock_cluster(ci); >> >> > @@ -521,20 +412,19 @@ static void swap_users_ref_free(struct percpu= _ref *ref) >> >> > complete(&si->comp); >> >> > } >> >> > [snip] >> >> > @@ -611,10 +497,10 @@ scan_swap_map_ssd_cluster_conflict(struct swa= p_info_struct *si, >> >> > { >> >> > struct percpu_cluster *percpu_cluster; >> >> > bool conflict; >> >> > - >> >> >> >> Usually we use one blank line after local variable declaration. >> > Ack. >> > >> >> >> >> > + struct swap_cluster_info *first =3D list_first_entry(&si->fre= e_clusters, struct swap_cluster_info, list); >> >> > offset /=3D SWAPFILE_CLUSTER; >> >> > - conflict =3D !cluster_list_empty(&si->free_clusters) && >> >> > - offset !=3D cluster_list_first(&si->free_clusters) && >> >> > + conflict =3D !list_empty(&si->free_clusters) && >> >> > + offset !=3D first - si->cluster_info && >> >> > cluster_is_free(&si->cluster_info[offset]); >> >> > >> >> > if (!conflict) >> >> > @@ -655,10 +541,14 @@ static bool scan_swap_map_try_ssd_cluster(str= uct swap_info_struct *si, >> >> > cluster =3D this_cpu_ptr(si->percpu_cluster); >> >> > tmp =3D cluster->next[order]; >> >> > if (tmp =3D=3D SWAP_NEXT_INVALID) { >> >> > - if (!cluster_list_empty(&si->free_clusters)) { >> >> > - tmp =3D cluster_next(&si->free_clusters.head)= * >> >> > - SWAPFILE_CLUSTER; >> >> > - } else if (!cluster_list_empty(&si->discard_clusters)= ) { >> >> > + if (!list_empty(&si->free_clusters)) { >> >> > + ci =3D list_first_entry(&si->free_clusters, s= truct swap_cluster_info, list); >> >> > + list_del(&ci->list); >> >> >> >> The free cluster is deleted from si->free_clusters now. But later you >> >> will call scan_swap_map_ssd_cluster_conflict() and may abandon the >> >> cluster. And in alloc_cluster() later, it may be deleted again. >> > >> > Yes, that is a bug. Thanks for catching that. >> > >> >> >> >> > + spin_lock(&ci->lock); >> >> > + ci->state =3D CLUSTER_STATE_PER_CPU; >> >> >> >> Need to change ci->state when move a cluster off the percpu_cluster. >> > >> > In the next patch. This patch does not use the off state yet. >> >> But that is confusing to use wrong state name, the really meaning is >> something like CLUSTER_STATE_NON_FREE. But as I suggested above, we can > > It can be FREE and on the per cpu pointer as well. That is the tricky par= t. > It can happen on the current code as well. cluster_set_count_flag(0, 0) is called in alloc_cluster(). So, it's not an issue in current code. If you need more, that shouldn't be done in this patch. >> keep swap_cluster_info.flags and CLUSTER_FLAG_FREE in this patch. > > Maybe. Will consider that. > >> >> >> >> >> > + spin_unlock(&ci->lock); >> >> > + tmp =3D (ci - si->cluster_info) * SWAPFILE_CL= USTER; >> >> > + } else if (!list_empty(&si->discard_clusters)) { >> >> > /* >> >> > * we don't have free cluster but have some c= lusters in >> >> > * discarding, do discard now and reclaim the= m, then >> >> > @@ -1062,8 +952,8 @@ static void swap_free_cluster(struct swap_info= _struct *si, unsigned long idx) >> >> > >> >> > ci =3D lock_cluster(si, offset); >> >> > memset(si->swap_map + offset, 0, SWAPFILE_CLUSTER); >> >> > - cluster_set_count_flag(ci, 0, 0); >> >> > - free_cluster(si, idx); >> >> > + ci->count =3D 0; >> >> > + free_cluster(si, ci); >> >> > unlock_cluster(ci); >> >> > swap_range_free(si, offset, SWAPFILE_CLUSTER); >> >> > } [snip] -- Best Regards, Huang, Ying