From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A55F0CCA47A for ; Fri, 17 Jun 2022 02:27:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 28E276B0073; Thu, 16 Jun 2022 22:27:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23C836B0074; Thu, 16 Jun 2022 22:27:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DE4C6B0075; Thu, 16 Jun 2022 22:27:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id ED5526B0073 for ; Thu, 16 Jun 2022 22:27:06 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C2212716 for ; Fri, 17 Jun 2022 02:27:06 +0000 (UTC) X-FDA: 79586140452.28.543B700 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf13.hostedemail.com (Postfix) with ESMTP id 819582008E for ; Fri, 17 Jun 2022 02:27:05 +0000 (UTC) Received: from canpemm500002.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4LPNBh3TCZzSgw8; Fri, 17 Jun 2022 10:23:40 +0800 (CST) Received: from [10.174.177.76] (10.174.177.76) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 17 Jun 2022 10:26:59 +0800 Subject: Re: [PATCH 7/7] mm/khugepaged: try to free transhuge swapcache when possible To: Yang Shi CC: Andrew Morton , Andrea Arcangeli , Matthew Wilcox , Vlastimil Babka , David Howells , NeilBrown , Alistair Popple , David Hildenbrand , Suren Baghdasaryan , Peter Xu , Linux MM , Linux Kernel Mailing List References: <20220611084731.55155-1-linmiaohe@huawei.com> <20220611084731.55155-8-linmiaohe@huawei.com> <87617483-7945-30e2-471e-578da4f4d9c7@huawei.com> From: Miaohe Lin Message-ID: <17d610f7-4d85-e9b4-6429-4ad89274cb48@huawei.com> Date: Fri, 17 Jun 2022 10:26:58 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.76] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf13.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1655432826; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=7olupvgkoAqKkYSBdSRzULFpaSWG8TorK4OC1qp/ZcY=; b=YXwNQFK5vQnn2tDzNYO1kouCXMxuQvJGu+Plg4Kpo07G5+YMrRaTmqj9PBWibmWGjwDFeQ B4ZIsYeeSnxggD0L2t8ODCX8qaRISQ54UgOQot3onwnczZb+Ue3PAQLI5R02ySY6Hytyws JoclTuqed1Ci9xPMGZWhKSk9rUjoK74= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1655432826; a=rsa-sha256; cv=none; b=2fL66mc7GxGPayo21dJFaQ52u2aNqdeUoje9dMHNFxcdbwiX4TLb8p0o0i02SynW/65OvG /GliW3Xsw8g5KtiQd4lWoHI3iLuZ69p47Zn9lLx+JkjaG3qGrYxfTfDHX3GyeTDM9PLK+j 8XNBMuiUre5T3xEdquOdVTmCHCjy4Is= X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 819582008E X-Rspam-User: Authentication-Results: imf13.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf13.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com X-Stat-Signature: a78ns35yyz8mu9xhtq7tjst89k3r4qrn X-HE-Tag: 1655432825-232617 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/6/16 23:53, Yang Shi wrote: > On Thu, Jun 16, 2022 at 12:42 AM Miaohe Lin wrote: >> >> On 2022/6/16 7:58, Yang Shi wrote: >>> On Sat, Jun 11, 2022 at 1:47 AM Miaohe Lin wrote: >>>> >>>> Transhuge swapcaches won't be freed in __collapse_huge_page_copy(). >>>> It's because release_pte_page() is not called for these pages and >>>> thus free_page_and_swap_cache can't grab the page lock. These pages >>>> won't be freed from swap cache even if we are the only user until >>>> next time reclaim. It shouldn't hurt indeed, but we could try to >>>> free these pages to save more memory for system. >>> >>> >>>> >>>> Signed-off-by: Miaohe Lin >>>> --- >>>> include/linux/swap.h | 5 +++++ >>>> mm/khugepaged.c | 1 + >>>> mm/swap.h | 5 ----- >>>> 3 files changed, 6 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/include/linux/swap.h b/include/linux/swap.h >>>> index 8672a7123ccd..ccb83b12b724 100644 >>>> --- a/include/linux/swap.h >>>> +++ b/include/linux/swap.h >>>> @@ -456,6 +456,7 @@ static inline unsigned long total_swapcache_pages(void) >>>> return global_node_page_state(NR_SWAPCACHE); >>>> } >>>> >>>> +extern void free_swap_cache(struct page *page); >>>> extern void free_page_and_swap_cache(struct page *); >>>> extern void free_pages_and_swap_cache(struct page **, int); >>>> /* linux/mm/swapfile.c */ >>>> @@ -540,6 +541,10 @@ static inline void put_swap_device(struct swap_info_struct *si) >>>> /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */ >>>> #define free_swap_and_cache(e) is_pfn_swap_entry(e) >>>> >>>> +static inline void free_swap_cache(struct page *page) >>>> +{ >>>> +} >>>> + >>>> static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) >>>> { >>>> return 0; >>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >>>> index ee0a719c8be9..52109ad13f78 100644 >>>> --- a/mm/khugepaged.c >>>> +++ b/mm/khugepaged.c >>>> @@ -756,6 +756,7 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, >>>> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) { >>>> list_del(&src_page->lru); >>>> release_pte_page(src_page); >>>> + free_swap_cache(src_page); >>> >>> Will this really work? The free_swap_cache() will just dec refcounts >>> without putting the page back to buddy. So the hugepage is not >>> actually freed at all. Am I missing something? >> >> Thanks for catching this! If page is on percpu lru_pvecs cache, page will >> be released when lru_pvecs are drained. But if not, free_swap_cache() won't >> free the page as it assumes the caller has a reference on the page and thus >> only does page_ref_sub(). Does the below change looks sense for you? > > THP gets drained immediately so they won't stay in pagevecs. Yes, you're right. I missed this. > >> >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> index 52109ad13f78..b8c96e33591d 100644 >> --- a/mm/khugepaged.c >> +++ b/mm/khugepaged.c >> @@ -755,8 +755,12 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page, >> >> list_for_each_entry_safe(src_page, tmp, compound_pagelist, lru) { >> list_del(&src_page->lru); >> - release_pte_page(src_page); >> + mod_node_page_state(page_pgdat(src_page), >> + NR_ISOLATED_ANON + page_is_file_lru(src_page), >> + -compound_nr(src_page)); >> + unlock_page(src_page); >> free_swap_cache(src_page); >> + putback_lru_page(src_page); > > I'm not sure if it is worth it or not for a rare corner case since THP > should not stay in swapcache unless try_to_unmap() in vmscan fails IIUC, even if try_to_unmap() in vmscan succeeds, THP might be still in the swapcache if shrink_page_list is not called for this THP again after writeback is done, e.g. when shrink_page_list is called from madvise, so there might be no memory pressure, or do_swap_page puts the THP into page table again. Also THP might not be splited when deferred_split_shrinker is not called, e.g. due to not lacking of memory. Even if there is memory pressure, the THP will stay in swapcache until next round page reclaim for this THP is done. So there should be a non-negligible window that THP will stay in the swapcache. Or am I miss something? > IIUC. And it is not guaranteed that free_swap_cache() will get the > page lock. IMHO, we're not guaranteed that free_swap_cache() will get the page lock for the normal page anyway. Thanks! > >> } >> } >> >> Thanks! >> >>> >>>> } >>>> } >>>> >>>> diff --git a/mm/swap.h b/mm/swap.h >>>> index 0193797b0c92..863f6086c916 100644 >>>> --- a/mm/swap.h >>>> +++ b/mm/swap.h >>>> @@ -41,7 +41,6 @@ void __delete_from_swap_cache(struct page *page, >>>> void delete_from_swap_cache(struct page *page); >>>> void clear_shadow_from_swap_cache(int type, unsigned long begin, >>>> unsigned long end); >>>> -void free_swap_cache(struct page *page); >>>> struct page *lookup_swap_cache(swp_entry_t entry, >>>> struct vm_area_struct *vma, >>>> unsigned long addr); >>>> @@ -81,10 +80,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry) >>>> return NULL; >>>> } >>>> >>>> -static inline void free_swap_cache(struct page *page) >>>> -{ >>>> -} >>>> - >>>> static inline void show_swap_cache_info(void) >>>> { >>>> } >>>> -- >>>> 2.23.0 >>>> >>>> >>> . >>> >> > . >