From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6EED0C27C53 for ; Wed, 19 Jun 2024 09:03:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DE8A78D0079; Wed, 19 Jun 2024 05:03:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D98358D004F; Wed, 19 Jun 2024 05:03:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C393F8D0079; Wed, 19 Jun 2024 05:03:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A31418D004F for ; Wed, 19 Jun 2024 05:03:27 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5013F160FDA for ; Wed, 19 Jun 2024 09:03:27 +0000 (UTC) X-FDA: 82247049654.12.7ACB11D Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf04.hostedemail.com (Postfix) with ESMTP id B37B84000F for ; Wed, 19 Jun 2024 09:03:23 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=EobCimQm; spf=pass (imf04.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718787799; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fAqbwlW+1BmBj054BSA0bK7CR5VLzZl5hEVu+ujAPaY=; b=s4h5fp8Cb3usWahtrIi8SYF02jCIwYNzTV0MZpbKu/gTs2ntizUSLYx4C6taAs27gtGEjg L4vxUVpMc9dZRTD/JLj1FtyB39btgo6sbRBc0SB4gyAIfLMuhqWL70OB4tHhWxqzu5Oj2B gPaed0GQk4IK4T4xAq8ncigOTaBRH7Q= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=EobCimQm; spf=pass (imf04.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718787799; a=rsa-sha256; cv=none; b=ZQO1+fcSRpPZM4793osE5HNSqYcPNb8E3BeCZsv2evdNQL18WnSbu3T4/+7cJbeaC6oXq8 VtGB0dYkCSYTwN5i03ztzXhR5SiVXc1Ev4159FKgGZm4NURO4xUyVvQznYMzw6MMg4iwHi /yH6M3KCjkXuLwuheViyu5Ge/BogY1o= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 48EF8CE1D6E for ; Wed, 19 Jun 2024 09:03:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 88313C4AF1A for ; Wed, 19 Jun 2024 09:03:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718787798; bh=JbBH3S104EQeGS0iLj4NUJU7t2tF7dtaAplk8RZ/iWQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=EobCimQmbeR6hU1uWmZ/N8wUqGdGj6/Atp9MgZMZCFN/az6FM3KKN0ziSDc/tXSi8 mAQ8cMHvNtV04jYuSU3r+UQd3H5PYnFjvYePpC09PCDZofb/TXZlMEBYQqSX2hniSm DBz8YjWPR+4c3fvtTo5QWPywZNxx0D55YqRk6in/sf05YDHnRARLJgCufUcGcmNdj7 AXE3vm06YuiYGC1VNnLE5A5VTWVjrD39YjWk8M4ZxttroBma35y+4gn7dljZNvvRY2 xu+A8KfeUZqWgUdhvTzx5qGCcJ1vZFPDCXpr5Daw5vQFzCTv0va9WwXAVctkAdwvPL SXii+Z8zh0WOg== Received: by mail-il1-f180.google.com with SMTP id e9e14a558f8ab-3761f8d689fso2234195ab.3 for ; Wed, 19 Jun 2024 02:03:18 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCVueeJaUKSpWFuIVfLRwo+q6BXijKhdpV39NZzgsUksGHqixaLQLJPy1Kfz0rHSc0lvBBcKJmRsOeBiKW0ndECGIR8= X-Gm-Message-State: AOJu0Yyr4YpNNnFZAHSNZiBlOdx/Hhx0VSwBtanh0R3HpDy8r9pP2mrN rCi83+j97rxOW17mSvPCQ69Wt4eEJA7/tXkWGTn9WWgPQl4dm1dbDN1YdudmVRp8dynDyQEIpK1 k3bv++zSOkO/1ilS2C+4wRBWo5bj0IOsnE5Wn X-Google-Smtp-Source: AGHT+IFO25qdvGFeh7yc2Mt2mOE+VvHaS6U1XJjuIk4I5bm/bruh1Rs3HtLeD0snMLUL9L9xhraymvvncI73EgEKXI8= X-Received: by 2002:a05:6e02:1c2a:b0:375:a6cd:dff2 with SMTP id e9e14a558f8ab-3761d65728emr22019775ab.5.1718787797822; Wed, 19 Jun 2024 02:03:17 -0700 (PDT) MIME-Version: 1.0 References: <20240614-swap-allocator-v2-0-2a513b4a7f2f@kernel.org> <20240614-swap-allocator-v2-1-2a513b4a7f2f@kernel.org> <87frtc5bxm.fsf@yhuang6-desk2.ccr.corp.intel.com> <87y1724re8.fsf@yhuang6-desk2.ccr.corp.intel.com> <87le314bgy.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87le314bgy.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Chris Li Date: Wed, 19 Jun 2024 02:03:05 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 1/2] mm: swap: swap cluster switch to double link list To: "Huang, Ying" Cc: Andrew Morton , Kairui Song , Ryan Roberts , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: B37B84000F X-Stat-Signature: p9cxuuihaanoxncfkztycbyn3tt7hjhf X-Rspam-User: X-HE-Tag: 1718787803-255755 X-HE-Meta: U2FsdGVkX18uuRl5obCss5syLnfSOwkYjA7KTGETL+LmP5ESguyePI8cFjyi7QOFODGK/X52A9QpiKkuwkBrQd0nH3EueiVKpV/zhh7nPB8XjujvNwuZO9lt1TVVPgRl4+11tgRdBDYCdlUN2FeDhFx4JEWPPUR1uc5nlN7HHqDXeYRSJLGrt8iB9UZnmhOeySs/Lld3HEDZBmGPsiY6fS3C/z4CMfP1mdSb0Yy6vwdX7HYMkiRs6V8rPjK3LCEV9HWRfro+gjnIycBJNLFBFqpqauc7rIMZ0XcmeN5pQsQKLtlKKaRKMxk7v+zaQ3+cW21mvjRBFT+YWTA+1ysvJ5n5v/97/ZU3nu2WXCY6c50Ur3PlHv0kMz0G/Qxfav2uyGMc672kI4z6b7qMLRnUKg9CPDTH7YLPOQxqol2xpTEsSJR0C8+uP9C0/mNyPusF8o2r3KJDnKp6d+1K31SDo5r9wv2o925O+4EnLMO6CoSVcpGI3/UjWnslatFqEau90YhFSx1qojvCMRanRAewiUnu2AMR7Skq9ijCo9dJ3ercOgqirfN3F86GI8WXYYR7zVov5FvL5l5e0nGT39uwRYf94whXrEAi3L3Mv6L9eQDR7E79j3HVVohUiZMLSLRvQU47Pc9eK5JWeCTbPTpqdiqBO9JM5VbTOocKW8m0ctkkS7rARYVXnfYeFjoKvQRBeutIrg0/8JpA5hxBRbH/oCg2hDqx3lrZkrK94YpktUgZ6rOURVzoKeuWnwULsuihVdXgc+9dnmhEuJuf9OtdBm09AG3zSgKV1NVED1ykkY9mIt4UcbpmGTnhEepHT25Wl6K5YG7wzpmLKIzyTmXEiF5PxNkfL075orkb3YtsjHTSKLZqD4emUwtTDBo7cxGL4XZYK8YpR53AmF46jSMpCoSVNMjh98HOJ9qkFlG3cjkNW/pnUXRWdI4pybPWK3BqrezvJMZTV6h4QID5HXh kXLg9HkX JWaGVkm1a5d/2gDqO5i0BCnIQ2f2gj7xV3iA1zF7b5DlVY3TiEu68dTAEqQrFXpmUI5sK5ATSaRhdgtruAI720AOqgLwk19L6zTPTEBjamnJ4aF3aG+2NVAEe91fRQfnILnkWK4Zw8yFfFX4zDKMfhDO2KTGVD9t8J1CUweE8Igrr4dlFPEVE+byHbi4vuvi0Fgh89/kRXIE/D95NYm2WfJZkwymayrHWyBU4zC6mXkijgj50owpUvGP+JAICT65la4FkyfQxTivz/UuYfgDvIwsGxBNS2aXG2bqkDP+BbJAccLraSBnivxP1GdFvSmzsj/x6DffQ981y+dU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 19, 2024 at 12:53=E2=80=AFAM Huang, Ying = wrote: > > Chris Li writes: > > > On Tue, Jun 18, 2024 at 12:56=E2=80=AFAM Huang, Ying wrote: > >> > >> Chris Li writes: > >> > >> > On Sun, Jun 16, 2024 at 11:21=E2=80=AFPM Huang, Ying wrote: > >> >> > >> >> Hi, Chris, > >> >> > >> >> Chris Li writes: > > [snip] > > >> >> > diff --git a/include/linux/swap.h b/include/linux/swap.h > >> >> > index 3df75d62a835..cd9154a3e934 100644 > >> >> > --- a/include/linux/swap.h > >> >> > +++ b/include/linux/swap.h > >> >> > @@ -242,23 +242,22 @@ enum { > >> >> > * space with SWAPFILE_CLUSTER pages long and naturally aligns i= n disk. All > >> >> > * free clusters are organized into a list. We fetch an entry fr= om the list to > >> >> > * get a free cluster. > >> >> > - * > >> >> > - * The data field stores next cluster if the cluster is free or = cluster usage > >> >> > - * counter otherwise. The flags field determines if a cluster is= free. This is > >> >> > - * protected by swap_info_struct.lock. > >> >> > */ > >> >> > struct swap_cluster_info { > >> >> > spinlock_t lock; /* > >> >> > - * Protect swap_cluster_info fields > >> >> > - * and swap_info_struct->swap_map > >> >> > + * Protect swap_cluster_info count = and state > >> >> > >> >> Protect swap_cluster_info fields except 'list' ? > >> > > >> > I change it to protect the swap_cluster_info bitfields in the second= patch. > >> > >> Although I still prefer my version, I will not insist on that. > > > > Sure, I actually don't have a strong preference about that. It is just = comments. > > > >> > >> >> > >> >> > + * field and swap_info_struct->swap= _map > >> >> > * elements correspond to the swap > >> >> > * cluster > >> >> > */ > >> >> > - unsigned int data:24; > >> >> > - unsigned int flags:8; > >> >> > + unsigned int count:12; > >> >> > + unsigned int state:3; > >> >> > >> >> I still prefer normal data type over bit fields. How about > >> >> > >> >> u16 usage; > >> >> u8 state; > >> > > >> > I don't mind the "count" rename to "usage". That is probably a bette= r > >> > name. However I have another patch intended to add more bit fields i= n > >> > the cluster info struct. The second patch adds "order" and the later > >> > patch will add more. That is why I choose bitfield to be more conden= se > >> > with bits. > >> > >> We still have space for another "u8" for "order". It appears trivial = to > >> change it to bit fields when necessary in the future. > > > > We can, I don't see it necessary to change from bit field to u8 and > > back to bit field in the future. It is more of a personal preference > > issue. > > I have to say that I don't think that it's just a personal preference. > IMO, if it's unnecessary, we shouldn't use bit fields. You cannot > guarantee that your future changes will be merged in its current state. > So, I still think that it's better to avoid bit fields for now. That is surprising to hear, I am not dependent on any hardware physical bit location. Anyway, not too big a deal for me. I changed it to u16/u8. > >> > When you get the cluster from the list, you have a cluster pointer. = I > >> > feel it is unnecessary to convert to index then back convert to > >> > cluster pointer inside lock_cluster(). I actually feel using indexes > >> > to refer to the cluster is error prone because we also have offset. > >> > >> I don't think so, it's common to use swap offset. > > > > Swap offset is not an issue, it is all over the place. The cluster > > index(offset/512) is the one I try to avoid. > > I have made some mistakes myself on offset vs index. > > Yes. That's not good, but it's hard to be avoided too. Can we make the > variable name more consistent? index: cluster index, offset: swap > offset. > > And, in fact, swap offset is the parameter of lock_cluster() instead of > cluster index. Right, when you get the cluster pointer from the list, it can't directly use lock_cluster(). > > >> > > >> >> > >> >> > + __free_cluster(si, ci); > >> >> > memset(si->swap_map + idx * SWAPFILE_CLUSTER, > >> >> > 0, SWAPFILE_CLUSTER); > >> >> > unlock_cluster(ci); > >> >> > @@ -521,20 +412,19 @@ static void swap_users_ref_free(struct perc= pu_ref *ref) > >> >> > complete(&si->comp); > >> >> > } > >> >> > > > [snip] > > >> >> > @@ -611,10 +497,10 @@ scan_swap_map_ssd_cluster_conflict(struct s= wap_info_struct *si, > >> >> > { > >> >> > struct percpu_cluster *percpu_cluster; > >> >> > bool conflict; > >> >> > - > >> >> > >> >> Usually we use one blank line after local variable declaration. > >> > Ack. > >> > > >> >> > >> >> > + struct swap_cluster_info *first =3D list_first_entry(&si->f= ree_clusters, struct swap_cluster_info, list); > >> >> > offset /=3D SWAPFILE_CLUSTER; > >> >> > - conflict =3D !cluster_list_empty(&si->free_clusters) && > >> >> > - offset !=3D cluster_list_first(&si->free_clusters) = && > >> >> > + conflict =3D !list_empty(&si->free_clusters) && > >> >> > + offset !=3D first - si->cluster_info && > >> >> > cluster_is_free(&si->cluster_info[offset]); > >> >> > > >> >> > if (!conflict) > >> >> > @@ -655,10 +541,14 @@ static bool scan_swap_map_try_ssd_cluster(s= truct swap_info_struct *si, > >> >> > cluster =3D this_cpu_ptr(si->percpu_cluster); > >> >> > tmp =3D cluster->next[order]; > >> >> > if (tmp =3D=3D SWAP_NEXT_INVALID) { > >> >> > - if (!cluster_list_empty(&si->free_clusters)) { > >> >> > - tmp =3D cluster_next(&si->free_clusters.hea= d) * > >> >> > - SWAPFILE_CLUSTER; > >> >> > - } else if (!cluster_list_empty(&si->discard_cluster= s)) { > >> >> > + if (!list_empty(&si->free_clusters)) { > >> >> > + ci =3D list_first_entry(&si->free_clusters,= struct swap_cluster_info, list); > >> >> > + list_del(&ci->list); > >> >> > >> >> The free cluster is deleted from si->free_clusters now. But later = you > >> >> will call scan_swap_map_ssd_cluster_conflict() and may abandon the > >> >> cluster. And in alloc_cluster() later, it may be deleted again. > >> > > >> > Yes, that is a bug. Thanks for catching that. > >> > > >> >> > >> >> > + spin_lock(&ci->lock); > >> >> > + ci->state =3D CLUSTER_STATE_PER_CPU; > >> >> > >> >> Need to change ci->state when move a cluster off the percpu_cluster= . > >> > > >> > In the next patch. This patch does not use the off state yet. > >> > >> But that is confusing to use wrong state name, the really meaning is > >> something like CLUSTER_STATE_NON_FREE. But as I suggested above, we c= an > > > > It can be FREE and on the per cpu pointer as well. That is the tricky p= art. > > It can happen on the current code as well. > > cluster_set_count_flag(0, 0) is called in alloc_cluster(). So, it's not > an issue in current code. If you need more, that shouldn't be done in > this patch. Revert to V1 like using the flags. Chris