From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80B04C52D7C for ; Mon, 19 Aug 2024 08:11:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 14D7A6B0082; Mon, 19 Aug 2024 04:11:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0FE576B0085; Mon, 19 Aug 2024 04:11:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F08CC6B0088; Mon, 19 Aug 2024 04:11:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D1C536B0082 for ; Mon, 19 Aug 2024 04:11:56 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 34922A7B48 for ; Mon, 19 Aug 2024 08:11:56 +0000 (UTC) X-FDA: 82468276632.23.CA559C4 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) by imf05.hostedemail.com (Postfix) with ESMTP id B5EA2100014 for ; Mon, 19 Aug 2024 08:11:49 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=SI6uNaSV; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1724055098; a=rsa-sha256; cv=none; b=EahycY/dC2OpWGRPXT90gvSIE+MasCXYFCIy443aUoy8djJxWvAxXOsVuCQCb/CECHmKHi C/P7TTYflzmBS9g8ox7zr8dHygDx00q6c2urOrhVHBITDZX9ij/tuyGlsTIO1Ubo9Z5rOm KAQlvkVtY84lP/Q/aTAJszbXe2uhKKw= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=SI6uNaSV; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.20 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1724055098; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WILp7Mes/bBGqgBKcY2d85/xfZQg/WbKhWoH+Qe3e0s=; b=dmeg3h/04tBpoEeG6DlEKr7W/OLjDlTIB/jzCzanpIQbdvH/jzKVofHcApiqRbxxXnkD1c HXMRoMQBYfqzSedIW9ahPtsBurKPxAj/rFKS9z9sujGrQQyH4FpqEXiDVgpV2MgQHi5tCX XTvBuCDLPEUQ631NQmVQ2qzGKdDG+x4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1724055112; x=1755591112; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=OlCQ4pCbcBp88IFPvk0RuBMjUfqiKABSHivfoTg8meo=; b=SI6uNaSVRDMjajrrvxNWP0A4yhtPfbLkBrf+ntmNQJoFsF6hXIFq50kV Yj+6tBASn8gTR2YbqGA1xfkFbVyWcVPZrUqIN7JwJL+iikin+ZJ5nkZm3 QqebdphskG8RxTWGvrRZAWx4h89TJeoD9GPE0Q+2JcdnkDm2nXIxj9Ctk eWhmcmjD4qBkpw5EOeL6vVipvUzMZ7vWqxYEQ4OOTLe3/vr5VsXCsqm/8 8zZVmrYSm0lLcatCo10HDE/fC4XHq8hfrE5eJnbK2qhbIV+qSZxUVgBQk qQ7w3U/a8g8XhIwwfs4RuJMcPnMoPklUclSJCB3fD+iabHiamf3VF4LPj Q==; X-CSE-ConnectionGUID: brfxbKd/Tp28MS+JWEuk/Q== X-CSE-MsgGUID: 9QrdR27UTvC7Ya31lPDQEw== X-IronPort-AV: E=McAfee;i="6700,10204,11168"; a="22104652" X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="22104652" Received: from fmviesa008.fm.intel.com ([10.60.135.148]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:11:48 -0700 X-CSE-ConnectionGUID: l1pTCZ3nSXa2SDrwZJV1OA== X-CSE-MsgGUID: KDNXLOrqQgi6J07Pje1Oyw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.10,158,1719903600"; d="scan'208";a="60285744" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Aug 2024 01:11:36 -0700 From: "Huang, Ying" To: Chris Li Cc: Andrew Morton , Kairui Song , Hugh Dickins , Ryan Roberts , Kalesh Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Barry Song Subject: Re: [PATCH v5 2/9] mm: swap: mTHP allocate swap entries from nonfull list In-Reply-To: (Chris Li's message of "Fri, 16 Aug 2024 01:01:41 -0700") References: <20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org> <20240730-swap-allocator-v5-2-cb9c148b9297@kernel.org> <87bk23250r.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Mon, 19 Aug 2024 16:08:03 +0800 Message-ID: <871q2lhr4s.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: B5EA2100014 X-Rspamd-Server: rspam01 X-Stat-Signature: qatug9tx6oqjkotqotymczykc3zd7jsr X-HE-Tag: 1724055109-920176 X-HE-Meta: U2FsdGVkX1/Z3sfgh9WNIJFxcn/S3IslG6RV8EXvRgMISOhzbF5cc4r6BW7XBGu8DmjE7vukonhrAc+F1/FKbsScXZAFMn45pj5y+F0QZmspgIRl2MRh9ntaPaO1GDw0WM6Oe6QQ3Acdzmy7QebQGmEXKCTlVQcHJ4D63mtOHIt/2efvN9BLQXwFXgj5oTq5lAOCDO5LCJiUj5yHzC96jqfxRsUsuTsOvvBZErHhp7XgxVhp4PuHnxUvyI+s8jF9B2qmwz3KWLv7He+RqGbMVWp4dObBPJ9iOOSQjpPMUMeWi7gv9ceKYOHYBGad/Uildt9UHdFRkR++Pge0LQdMKkOim/0CyeJD44lLvLxHPq9Dz5NkOTrA3NOPFqwlhbsBZyaZH1VT62L173Oln/KEoZZyvWvzw+Ulwcx4dnIFx7X7iYbo0zY5EA6bvVu8DHs6ii+Cf05pGNj9pUFygRiackr9CQ3Tru+Acjr9a0XOu1afnYq3c4UbZ8hYBvjg1XFwTnx/q92GGIvWItVEhPoBsZxzvBnn21pEd/cPwSp79LhpecoeBTzyfdhheeScRMhkSgEj3TJP3t8Zp8jSrrT6Hd1L7DAl0ZwSlYhcX4q0ulb6fw3PZlh4GEDssVV5eQQXiMsDCyYNAAmu37Rs96mjK3hLAfWDJmJvCfcVHC5/Eyay3xTSmsGeooyC2/WCWOAMbEi79XAgIb4U4ywjJjC8LVFfryHqa0zvWSr+/QO8ftOnILkVk1T/e/0jd/UCcsKcag+btto51AK9aWcjMAfEdwPAh+nuJ/88E+BNhKz6A5iXa6VTXVS6uHeNsVqSOrBPk+ca3nq6CxiEf9UoFvhSfvv71DHiDAkTfaFWkL+BaNtWBBnauRgtG2iCjbAMHUrUR3gl4o3N9y4mosy7ssQOh39WlAWkIH8JfNnnqriuUbxiv/QaBmh1aWjrx9jBdlAahlUWV5hX+fjjECDx1FU lYjgwpQW LfS6H2klSpA9f+8TwoxUs8R1Q016SPwpCopnK1OF0xupWoMZBrTdxF96ulFegvJoO3JOuuP/NXgxDe2ZD3xlRC55JHubqPfI1U9XRRiBJVydxY428votyiSxjW0D68zaUz6ZQKfb8aPag0QAVEVlHE0AZkvDeAyR0zHNBAwyknK25myCl71HLBKq2xP1gLuWeW67TwC1C8uah49TzobMyD4fGKEAxsd+an86n0oObZDajrw2B8H/hMwd9OjXrAB+OCzmyyEHX4jTlP3lVmY2vukQK6g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Chris Li writes: > On Wed, Aug 7, 2024 at 6:14=E2=80=AFPM Huang, Ying = wrote: >> >> Chris Li writes: >> [snip] >> > >> > /* >> > @@ -553,6 +568,19 @@ static bool scan_swap_map_try_ssd_cluster(struct = swap_info_struct *si, >> > if (tmp =3D=3D SWAP_NEXT_INVALID) { >> > if (!list_empty(&si->free_clusters)) { >> > ci =3D list_first_entry(&si->free_clusters, stru= ct swap_cluster_info, list); >> > + list_del(&ci->list); >> > + spin_lock(&ci->lock); >> > + ci->order =3D order; >> > + ci->flags =3D 0; >> > + spin_unlock(&ci->lock); >> > + tmp =3D cluster_index(si, ci) * SWAPFILE_CLUSTER; >> > + } else if (!list_empty(&si->nonfull_clusters[order])) { >> > + ci =3D list_first_entry(&si->nonfull_clusters[or= der], >> > + struct swap_cluster_info, = list); >> > + list_del(&ci->list); >> > + spin_lock(&ci->lock); >> > + ci->flags =3D 0; >> > + spin_unlock(&ci->lock); >> > tmp =3D cluster_index(si, ci) * SWAPFILE_CLUSTER; >> > } else if (!list_empty(&si->discard_clusters)) { >> >> We should check discard_clusters before nonfull clusters. > > And the reason behind that is? > > I see the discard_cluster can take a long time. It will take a > synchronous wait for the issuing the discard command. Why not just use > the nonfull list and return immediately. When the discard command > finished. It will show up in the free list anyway. I think that you are right. We don't need to wait for discard here. > BTW, what is your take on my previous analysis of the current SSD > prefer write new cluster can wear out the SSD faster? No. I don't agree with you on that. However, my knowledge on SSD wearing out algorithm is quite limited. > I think it might be useful to provide users an option to choose to > write a non full list first. The trade off is more friendly to SSD > wear out than preferring to write new blocks. If you keep doing the > swap long enough, there will be no new free cluster anyway. It depends on workloads. Some workloads may demonstrate better spatial locality. > The example I give in this email: > > https://lore.kernel.org/linux-mm/CACePvbXGBNC9WzzL4s2uB2UciOkV6nb4bKKkc5T= BZP6QuHS_aQ@mail.gmail.com/ > > Chris >> >> > /* >> > @@ -967,6 +995,7 @@ static void swap_free_cluster(struct swap_info_str= uct *si, unsigned long idx) >> > ci =3D lock_cluster(si, offset); >> > memset(si->swap_map + offset, 0, SWAPFILE_CLUSTER); >> > ci->count =3D 0; >> > + ci->order =3D 0; >> > ci->flags =3D 0; >> > free_cluster(si, ci); >> > unlock_cluster(ci); >> > @@ -2922,6 +2951,9 @@ static int setup_swap_map_and_extents(struct swa= p_info_struct *p, >> > INIT_LIST_HEAD(&p->free_clusters); >> > INIT_LIST_HEAD(&p->discard_clusters); >> > >> > + for (i =3D 0; i < SWAP_NR_ORDERS; i++) >> > + INIT_LIST_HEAD(&p->nonfull_clusters[i]); >> > + >> > for (i =3D 0; i < swap_header->info.nr_badpages; i++) { >> > unsigned int page_nr =3D swap_header->info.badpages[i]; >> > if (page_nr =3D=3D 0 || page_nr > swap_header->info.last= _page) -- Best Regards, Huang, Ying