From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D351C433F5 for ; Thu, 7 Apr 2022 17:28:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD3576B0072; Thu, 7 Apr 2022 13:28:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A5B836B0073; Thu, 7 Apr 2022 13:28:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D5636B0074; Thu, 7 Apr 2022 13:28:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 7BB786B0072 for ; Thu, 7 Apr 2022 13:28:14 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 49FD52A1F9 for ; Thu, 7 Apr 2022 17:28:04 +0000 (UTC) X-FDA: 79330766088.15.BAA9970 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf14.hostedemail.com (Postfix) with ESMTP id 59775100004 for ; Thu, 7 Apr 2022 17:28:03 +0000 (UTC) Received: by mail-pj1-f52.google.com with SMTP id i10-20020a17090a2aca00b001ca56c9ab16so6566260pjg.1 for ; Thu, 07 Apr 2022 10:28:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=HPJ/gxwWTPMsjVq1wqt/856o/PLEpZi5P1LCQp3vihA=; b=b0Fftw2gYiSz51wMYQnmyWiSCHVpRn1cZ2/G1Tb72pQ9W/97f89BBukh/dVUpOTJiD k4psjM6QujlBWLsJ7MjiCngT/pCWFjO7Y7wjEtK8+Iqsaotw1ekB0brOF+mo9/r44Bld tX+zELWubhSNeeQdAf6ydkz8LmwPLvMUJaKduE7leSwYi4nH06HrCbW1Zdi9MH78J2x6 //KLsIz9xT/XZYKDMqRh0Ufc4P69lRMPTA5DN8idndiqK2hrf7Kb8GyxlhhTHVpCBApx xwRFpcGrksg3DskkJPZkPRytVgCyNxVw7Fmq6SqgR+sZrIiHNF+xX7vPp+vOsNvvLLfw yMJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HPJ/gxwWTPMsjVq1wqt/856o/PLEpZi5P1LCQp3vihA=; b=XK29X4pAxtlIi37Qruhohef6gfXjfhZbDZZaf40nj32Ryne28W65dtqiiVMrSRyadJ vSR63XAUbo0I1jzzlz+2O5t3yktNQFLQx7qSwDI72QpIXkpg9QE+KFMyJ1ke/7fAutZX s43wELgYNk/S2shAbMeeKqYYVWCrG52yav4d3OXMi4p+MRIbmoXafFkhpoxEahhDIy2T 9Ol2ysUxtf8Z7SrQEtI1pEKFq4bLVvd8awkz3PFjjboC7tbfHeZSi0jM2RYpy9EDAsFL 46siWrevM2uy0+6tpPZ/suG8OMRsiCDptbif2LjkScrnyOuIle5tA/arXQodSj4y7eY9 L8EA== X-Gm-Message-State: AOAM532UMMNFabWTSKYsTrkS+MDwEZECN3CUaKFfqPuUiQbSXGLHqjfu z+xmpwdeoMKRTqmWHfVXM+W/zuOWKDplygupkec= X-Google-Smtp-Source: ABdhPJyBfPqYUFav5ezxzPl+JV/tc6dKIYnkPoGCeq8d25sJyX8RW8GzYLZdqrWJnXesr4888MzMP04wf1DyVwL40t8= X-Received: by 2002:a17:90a:5298:b0:1ca:7fb3:145 with SMTP id w24-20020a17090a529800b001ca7fb30145mr17083799pjh.200.1649352482048; Thu, 07 Apr 2022 10:28:02 -0700 (PDT) MIME-Version: 1.0 References: <20220407020953.475626-1-shy828301@gmail.com> In-Reply-To: From: Yang Shi Date: Thu, 7 Apr 2022 10:27:50 -0700 Message-ID: Subject: Re: [PATCH] mm: swap: determine swap device by using page nid To: Michal Hocko Cc: Huang Ying , Andrew Morton , Linux MM , Linux Kernel Mailing List , Aaron Lu Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 59775100004 X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=b0Fftw2g; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Stat-Signature: 1huthp5ybdchd43xr9khc5u9cokf8k74 X-HE-Tag: 1649352483-10422 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 7, 2022 at 12:52 AM Michal Hocko wrote: > > [Cc Aaron who has introduced the per node swap changes] > > On Wed 06-04-22 19:09:53, Yang Shi wrote: > > The swap devices are linked to per node priority lists, the swap device > > closer to the node has higher priority on that node's priority list. > > This is supposed to improve I/O latency, particularly for some fast > > devices. But the current code gets nid by calling numa_node_id() which > > actually returns the nid that the reclaimer is running on instead of the > > nid that the page belongs to. > > > > Pass the page's nid dow to get_swap_pages() in order to pick up the > > right swap device. But it doesn't work for the swap slots cache which > > is per cpu. We could skip swap slots cache if the current node is not > > the page's node, but it may be overkilling. So keep using the current > > node's swap slots cache. The issue was found by visual code inspection > > so it is not sure how much improvement could be achieved due to lack of > > suitable testing device. But anyway the current code does violate the > > design. > > Do you have any perf numbers for this change? No, it was found by visual code inspection and offline discussion with Huang Ying. > > > Cc: Huang Ying > > Signed-off-by: Yang Shi > > --- > > include/linux/swap.h | 3 ++- > > mm/swap_slots.c | 7 ++++--- > > mm/swapfile.c | 5 ++--- > > 3 files changed, 8 insertions(+), 7 deletions(-) > > > > diff --git a/include/linux/swap.h b/include/linux/swap.h > > index 27093b477c5f..e442cf6b61ea 100644 > > --- a/include/linux/swap.h > > +++ b/include/linux/swap.h > > @@ -497,7 +497,8 @@ extern void si_swapinfo(struct sysinfo *); > > extern swp_entry_t get_swap_page(struct page *page); > > extern void put_swap_page(struct page *page, swp_entry_t entry); > > extern swp_entry_t get_swap_page_of_type(int); > > -extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); > > +extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size, > > + int node); > > extern int add_swap_count_continuation(swp_entry_t, gfp_t); > > extern void swap_shmem_alloc(swp_entry_t); > > extern int swap_duplicate(swp_entry_t); > > diff --git a/mm/swap_slots.c b/mm/swap_slots.c > > index 2b5531840583..a1c5cf6a4302 100644 > > --- a/mm/swap_slots.c > > +++ b/mm/swap_slots.c > > @@ -264,7 +264,7 @@ static int refill_swap_slots_cache(struct swap_slots_cache *cache) > > cache->cur = 0; > > if (swap_slot_cache_active) > > cache->nr = get_swap_pages(SWAP_SLOTS_CACHE_SIZE, > > - cache->slots, 1); > > + cache->slots, 1, numa_node_id()); > > > > return cache->nr; > > } > > @@ -305,12 +305,13 @@ swp_entry_t get_swap_page(struct page *page) > > { > > swp_entry_t entry; > > struct swap_slots_cache *cache; > > + int nid = page_to_nid(page); > > > > entry.val = 0; > > > > if (PageTransHuge(page)) { > > if (IS_ENABLED(CONFIG_THP_SWAP)) > > - get_swap_pages(1, &entry, HPAGE_PMD_NR); > > + get_swap_pages(1, &entry, HPAGE_PMD_NR, nid); > > goto out; > > } > > > > @@ -342,7 +343,7 @@ swp_entry_t get_swap_page(struct page *page) > > goto out; > > } > > > > - get_swap_pages(1, &entry, 1); > > + get_swap_pages(1, &entry, 1, nid); > > out: > > if (mem_cgroup_try_charge_swap(page, entry)) { > > put_swap_page(page, entry); > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 63c61f8b2611..151fffe0fd60 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -1036,13 +1036,13 @@ static void swap_free_cluster(struct swap_info_struct *si, unsigned long idx) > > swap_range_free(si, offset, SWAPFILE_CLUSTER); > > } > > > > -int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) > > +int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size, > > + int node) > > { > > unsigned long size = swap_entry_size(entry_size); > > struct swap_info_struct *si, *next; > > long avail_pgs; > > int n_ret = 0; > > - int node; > > > > /* Only single cluster request supported */ > > WARN_ON_ONCE(n_goal > 1 && size == SWAPFILE_CLUSTER); > > @@ -1060,7 +1060,6 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size) > > atomic_long_sub(n_goal * size, &nr_swap_pages); > > > > start_over: > > - node = numa_node_id(); > > plist_for_each_entry_safe(si, next, &swap_avail_heads[node], avail_lists[node]) { > > /* requeue si to after same-priority siblings */ > > plist_requeue(&si->avail_lists[node], &swap_avail_heads[node]); > > -- > > 2.26.3 > > -- > Michal Hocko > SUSE Labs