From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC33ECD128A for ; Wed, 3 Apr 2024 22:12:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 67D326B009B; Wed, 3 Apr 2024 18:12:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6060D6B009C; Wed, 3 Apr 2024 18:12:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 432726B009D; Wed, 3 Apr 2024 18:12:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 20A956B009B for ; Wed, 3 Apr 2024 18:12:55 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DBCC01A03B4 for ; Wed, 3 Apr 2024 22:12:54 +0000 (UTC) X-FDA: 81969621468.03.F009E73 Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf17.hostedemail.com (Postfix) with ESMTP id 538774000B for ; Wed, 3 Apr 2024 22:12:51 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=oQNszzG1; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712182373; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aZL3Rhx6S9zgstDkvsVMVkh6J7mn9cWqAcLzEHpKMqY=; b=vFAoHHvPU7fpKWGD3NSCnW2LV2qfNiWmcRcB2u4h16Lf3Im5WETH6oUpKGKoh30bVzrbzf s0g6R4T3u+NajYvAiRBagN3pzRHmu4J10EUzJTPseg7aWkVfQFs+owD5Cq5JjI+JQGr0R1 L5YItTC6fecQUyQz4shmu96M8foB4Js= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=oQNszzG1; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf17.hostedemail.com: domain of chrisl@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712182373; a=rsa-sha256; cv=none; b=FF9WzBSr+B5+vwPA72o41uxIVMfsm1MvGWtd9fYpu52h2+0UjWMQ4t8VZSorlxTDSFwUEP aGXCEhVFFrgVGWIV4xuqlfQn2mB2AO7YF9TTHeY7mv4QwdMHTV9QzWKKIXpAHytM5tTrpA pdTbXpgdsy0nxSwfquyTVVhc/8tLWxM= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id DA8F6CE1E6D for ; Wed, 3 Apr 2024 22:12:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 28BBCC43399 for ; Wed, 3 Apr 2024 22:12:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712182368; bh=/HPUolIV4qnx/38VAY1/5mn4Zx7EV4vJjiHi32F9GQg=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=oQNszzG1O1VwSlUU/J4JQFDpHJ8ZpOEpLzTr7JVT98oWBzdpXbS57FIQ+plIqAw3k NKhetL3JtYTzBJ9LZ0iCTjSOLuEAwK9Gv3V/YTTLq/GtOY7FJA9SfCggG0fgjCvH4b sUY6wtf2LRNEJZPM4+U0xGqE43bozkZgMu8pt2vIeC0CVi5+h4Bray4gaWUYlcbDXI AotfSqA5gYOje66y/JW8RFPcTMx3xhi6hnNhaPcPGkEoGs7C4fbystDxbIh5hSXzVZ 2p3AmcoG775G6rerzoyebZWWqBGc7ZenIQqkiDkVaslTqV7qpuDAmLvCeGvL92xkgW kDQLZHTfYD+Ww== Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-51381021af1so575226e87.0 for ; Wed, 03 Apr 2024 15:12:48 -0700 (PDT) X-Forwarded-Encrypted: i=1; AJvYcCXxtKlNj+RihaeGcdLbaqCVg0Vb/6ZBnFG1pNHhDDPMNtQ6s2rnRlaqAM4BE3cvUWsjFQnykNq1IdOxt2ccXhhdQoY= X-Gm-Message-State: AOJu0YzVuxeImSR8GY8TZdcj7/wJP26cQyxZQBBb9INNCtzSVOok02WN phgvfHBwsjY0mFtWuZje+j6wY5SByVUgJQvjHLQomtQGpwk1vjDyb5JT4xt0Pzdxwv8Cvm6Jvh8 L3Nx3wUr/Rd8VAFEUNFdkqlR6ig== X-Google-Smtp-Source: AGHT+IH6Lj62mM3ql9z7o6I1ekekhOu2//NjTGy4ULjobkWWaeVE30JWwwHp4UCj8exo8Fvkn3AWPYxAYPWt978yK5c= X-Received: by 2002:ac2:41ca:0:b0:515:952b:7886 with SMTP id d10-20020ac241ca000000b00515952b7886mr580025lfi.61.1712182366804; Wed, 03 Apr 2024 15:12:46 -0700 (PDT) MIME-Version: 1.0 References: <20240403114032.1162100-1-ryan.roberts@arm.com> <20240403114032.1162100-2-ryan.roberts@arm.com> In-Reply-To: <20240403114032.1162100-2-ryan.roberts@arm.com> From: Chris Li Date: Wed, 3 Apr 2024 15:12:34 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v6 1/6] mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags To: Ryan Roberts Cc: Andrew Morton , David Hildenbrand , Matthew Wilcox , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , Barry Song <21cnbao@gmail.com>, Lance Yang , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 538774000B X-Stat-Signature: etsdkete74sy9qefg97ue34pruzszcpx X-Rspam-User: X-HE-Tag: 1712182371-370466 X-HE-Meta: U2FsdGVkX18WEBLBQRrzGb43IZdcfKWIoYjB62NITCNQcGnbmQh3PEZfjGkTPMJReIrYhqcR/cc5lruplouabXtR2fwwnz9vtkw42k38Oq4bcMP0hJhNAlWwPMrnE4PbFLcQ62YuZ3syRAyCFEb3xy+jaqunOfLVzwh4+Gpl7TN+7WZ1tZSHmLJGYyDNrptC4MAzNFkYq/EzERrYN1kW/LGWwa99WPczxE+I03ekXV4yIUcln2AvqT7g3HH4LCaDrO370Mu15Dh/eHXIlj8u9yB+aQ3dVBtW8ZysxiNFBK24X7C3Yu8JopwKXYOOqZdxErYIqiLwqPRvsO/3x3QqViAnYMdVWGZnXw7MUd7W6XxGFNay5JGGU8vnGdA4KkRCmlyYcWEJVnOHpqH33+2ccUO4N3cl5flUB4YwXSBLRMmTMv+lMFlbH6Dl1krxuHTCAhZvfr+sHMcRnINdLAhbPTh0WtKmROQHE5YlMLOS4RQ0I+Ao4TE+8PlJJq+F79KYZqA/mxsEQaWd5z/loK/dgp01oPOeXymrojUMyJlbWAGiAJsh9KyJv1+JxMbTAHQwZf7us8l2eiJvlhVsrI25VtD8a+3xapJ3GMd6Dv+1M6B3zaYhLgLXxRo6OihgBKTEew0fLb7d2ExTWpnFeqzX1dsXFZ+NnK7yQThZD9gLwYh4UtimG6YbZg2JwvSF2i01kcFd7sddmdjlal3OuivAT8rPjlIo/a1UtKbLj+z2UlhpLMs/yajMVADeVrX6p8K9UMJ5/S88xIByAXnjaYzQTewDsqESsN7H6+YHDU7qDJm2YcZ94wKaRppJ/TIi3iGjUnSMq6j95LWm2WH2eYvV/rv3XtzD9x3xh88GJAJiZ2VuEsNFfqu0bD1RShMjUP5V0A1WlHD2IWzP40xu3ZY72CGwEOeNBR7YT+8uzZhwo6xZE7eogrjscVHeA6oBr6nvn/SK7oj7JtMf5UcTJ44 DFQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Ryan, Sorry for the late reply. I want to review this series but don't have the chance to do it sooner. On Wed, Apr 3, 2024 at 4:40=E2=80=AFAM Ryan Roberts = wrote: > > As preparation for supporting small-sized THP in the swap-out path, > without first needing to split to order-0, Remove the CLUSTER_FLAG_HUGE, > which, when present, always implies PMD-sized THP, which is the same as > the cluster size. > > The only use of the flag was to determine whether a swap entry refers to > a single page or a PMD-sized THP in swap_page_trans_huge_swapped(). > Instead of relying on the flag, we now pass in nr_pages, which > originates from the folio's number of pages. This allows the logic to > work for folios of any order. > > The one snag is that one of the swap_page_trans_huge_swapped() call > sites does not have the folio. But it was only being called there to > shortcut a call __try_to_reclaim_swap() in some cases. > __try_to_reclaim_swap() gets the folio and (via some other functions) > calls swap_page_trans_huge_swapped(). So I've removed the problematic > call site and believe the new logic should be functionally equivalent. > > That said, removing the fast path means that we will take a reference > and trylock a large folio much more often, which we would like to avoid. > The next patch will solve this. > > Removing CLUSTER_FLAG_HUGE also means we can remove split_swap_cluster() > which used to be called during folio splitting, since > split_swap_cluster()'s only job was to remove the flag. Seems necessary to remove the assumption of large folio be PMD size. Acked-by: Chris Li > > Reviewed-by: "Huang, Ying" > Signed-off-by: Ryan Roberts > --- > include/linux/swap.h | 10 ---------- > mm/huge_memory.c | 3 --- > mm/swapfile.c | 47 ++++++++------------------------------------ > 3 files changed, 8 insertions(+), 52 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index a211a0383425..f6f78198f000 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -259,7 +259,6 @@ struct swap_cluster_info { > }; > #define CLUSTER_FLAG_FREE 1 /* This cluster is free */ > #define CLUSTER_FLAG_NEXT_NULL 2 /* This cluster has no next cluster */ > -#define CLUSTER_FLAG_HUGE 4 /* This cluster is backing a transparent hug= e page */ > > /* > * We assign a cluster to each CPU, so each CPU can allocate swap entry = from > @@ -590,15 +589,6 @@ static inline int add_swap_extent(struct swap_info_s= truct *sis, > } > #endif /* CONFIG_SWAP */ > > -#ifdef CONFIG_THP_SWAP > -extern int split_swap_cluster(swp_entry_t entry); > -#else > -static inline int split_swap_cluster(swp_entry_t entry) > -{ > - return 0; > -} > -#endif > - > #ifdef CONFIG_MEMCG > static inline int mem_cgroup_swappiness(struct mem_cgroup *memcg) > { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index ea6d1f09a0b9..3ca9282a0dc9 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2844,9 +2844,6 @@ static void __split_huge_page(struct page *page, st= ruct list_head *list, > shmem_uncharge(folio->mapping->host, nr_dropped); > remap_page(folio, nr); > > - if (folio_test_swapcache(folio)) > - split_swap_cluster(folio->swap); > - > /* > * set page to its compound_head when split to non order-0 pages,= so > * we can skip unlocking it below, since PG_locked is transferred= to > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 5e6d2304a2a4..0d44ee2b4f9c 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -343,18 +343,6 @@ static inline void cluster_set_null(struct swap_clus= ter_info *info) > info->data =3D 0; > } > > -static inline bool cluster_is_huge(struct swap_cluster_info *info) > -{ > - if (IS_ENABLED(CONFIG_THP_SWAP)) > - return info->flags & CLUSTER_FLAG_HUGE; > - return false; > -} > - > -static inline void cluster_clear_huge(struct swap_cluster_info *info) > -{ > - info->flags &=3D ~CLUSTER_FLAG_HUGE; > -} > - > static inline struct swap_cluster_info *lock_cluster(struct swap_info_st= ruct *si, > unsigned long offset= ) > { > @@ -1027,7 +1015,7 @@ static int swap_alloc_cluster(struct swap_info_stru= ct *si, swp_entry_t *slot) > offset =3D idx * SWAPFILE_CLUSTER; > ci =3D lock_cluster(si, offset); > alloc_cluster(si, idx); > - cluster_set_count_flag(ci, SWAPFILE_CLUSTER, CLUSTER_FLAG_HUGE); > + cluster_set_count(ci, SWAPFILE_CLUSTER); > > memset(si->swap_map + offset, SWAP_HAS_CACHE, SWAPFILE_CLUSTER); > unlock_cluster(ci); > @@ -1365,7 +1353,6 @@ void put_swap_folio(struct folio *folio, swp_entry_= t entry) > > ci =3D lock_cluster_or_swap_info(si, offset); > if (size =3D=3D SWAPFILE_CLUSTER) { > - VM_BUG_ON(!cluster_is_huge(ci)); > map =3D si->swap_map + offset; > for (i =3D 0; i < SWAPFILE_CLUSTER; i++) { > val =3D map[i]; > @@ -1373,7 +1360,6 @@ void put_swap_folio(struct folio *folio, swp_entry_= t entry) > if (val =3D=3D SWAP_HAS_CACHE) > free_entries++; > } > - cluster_clear_huge(ci); > if (free_entries =3D=3D SWAPFILE_CLUSTER) { > unlock_cluster_or_swap_info(si, ci); > spin_lock(&si->lock); > @@ -1395,23 +1381,6 @@ void put_swap_folio(struct folio *folio, swp_entry= _t entry) > unlock_cluster_or_swap_info(si, ci); > } > > -#ifdef CONFIG_THP_SWAP > -int split_swap_cluster(swp_entry_t entry) > -{ > - struct swap_info_struct *si; > - struct swap_cluster_info *ci; > - unsigned long offset =3D swp_offset(entry); > - > - si =3D _swap_info_get(entry); > - if (!si) > - return -EBUSY; > - ci =3D lock_cluster(si, offset); > - cluster_clear_huge(ci); > - unlock_cluster(ci); > - return 0; > -} > -#endif > - > static int swp_entry_cmp(const void *ent1, const void *ent2) > { > const swp_entry_t *e1 =3D ent1, *e2 =3D ent2; > @@ -1519,22 +1488,23 @@ int swp_swapcount(swp_entry_t entry) > } > > static bool swap_page_trans_huge_swapped(struct swap_info_struct *si, > - swp_entry_t entry) > + swp_entry_t entry, > + unsigned int nr_pages) > { > struct swap_cluster_info *ci; > unsigned char *map =3D si->swap_map; > unsigned long roffset =3D swp_offset(entry); > - unsigned long offset =3D round_down(roffset, SWAPFILE_CLUSTER); > + unsigned long offset =3D round_down(roffset, nr_pages); It is obvious this code only works for powers two nr_pages. The SWAPFILE_CLSTER is a power of two. If we switch to an API for nr_pages, we might want to warn/ban passing in the non-power of two nr_pages. > int i; > bool ret =3D false; > > ci =3D lock_cluster_or_swap_info(si, offset); > - if (!ci || !cluster_is_huge(ci)) { > + if (!ci || nr_pages =3D=3D 1) { > if (swap_count(map[roffset])) > ret =3D true; > goto unlock_out; > } > - for (i =3D 0; i < SWAPFILE_CLUSTER; i++) { > + for (i =3D 0; i < nr_pages; i++) { Here we assume the swap entry offset is contiguous. That is beyond your patch's scope. If in the future we want to have non-contiguous swap entries to swap out large pages, we will need to find out and change all the places that have the assumption of contiguous swap entries. Chris > if (swap_count(map[offset + i])) { > ret =3D true; > break; > @@ -1556,7 +1526,7 @@ static bool folio_swapped(struct folio *folio) > if (!IS_ENABLED(CONFIG_THP_SWAP) || likely(!folio_test_large(foli= o))) > return swap_swapcount(si, entry) !=3D 0; > > - return swap_page_trans_huge_swapped(si, entry); > + return swap_page_trans_huge_swapped(si, entry, folio_nr_pages(fol= io)); > } > > /** > @@ -1622,8 +1592,7 @@ int free_swap_and_cache(swp_entry_t entry) > } > > count =3D __swap_entry_free(p, entry); > - if (count =3D=3D SWAP_HAS_CACHE && > - !swap_page_trans_huge_swapped(p, entry)) > + if (count =3D=3D SWAP_HAS_CACHE) > __try_to_reclaim_swap(p, swp_offset(entry), > TTRS_UNMAPPED | TTRS_FULL); > put_swap_device(p); > -- > 2.25.1 >