From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61F9CC46CD2 for ; Tue, 30 Jan 2024 06:53:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A9A516B0080; Tue, 30 Jan 2024 01:53:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A4B076B0081; Tue, 30 Jan 2024 01:53:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 913196B0082; Tue, 30 Jan 2024 01:53:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 825F76B0080 for ; Tue, 30 Jan 2024 01:53:52 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 47DA48020E for ; Tue, 30 Jan 2024 06:53:52 +0000 (UTC) X-FDA: 81735062304.28.9744DE7 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) by imf27.hostedemail.com (Postfix) with ESMTP id 9B30F40005 for ; Tue, 30 Jan 2024 06:53:49 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="fv91vD/w"; spf=pass (imf27.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706597630; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S+fQuCTnY/fp+YabHR/LAaQc06DtlRKRi6MPbfmXrRY=; b=tXWAfdAgHjtO/g9LPCZDMVJot1Qpp1Y7XyrYmGcjay76Gc/EKBwLAhXgCWRO0+5XsMiAfn iWH55NG7bW7m20GKM0pLvQD/CFgvMoiiZAYAEDmUN/CmGoY2dmvQ0328/vlIdRW/8TFZ2G PlT1ydRm/wPZYBx1zzOnxRyY4por2PY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706597630; a=rsa-sha256; cv=none; b=5o1FSabS2jHyqRgbYIEnAH72KmtH38zqBZWq/ahVfncYhegmIlzRjs800FRx7smkrAyDHd +oVi0wNZO6EL0jJC1PHoGH/siQz35JTX1L7372LZEBPq8gDPWeSnRYGo/nsh2qba430HuJ HYWg+3blzVPti7T3dOtqEt9Pq5lJ2Yw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="fv91vD/w"; spf=pass (imf27.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.12 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706597630; x=1738133630; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=XuL5FUOw/lRKWw3rpv80uzch5cxyVqNAA8Vqzs9upeg=; b=fv91vD/wXmMKdags1TSbIxbZn/ohhlNsSHBFXWeazv9jSvKie4T0jzKM /J+TpajmntJQgN7vUGAhWAS8NzFdBepmcWC/Tvs6dX+nG7/gD88Ian0MU 1Qor7LziB+4e6u3bGYrP7jJcqR7KQhTEl1Kn5vDgEAcZjTKI3tgICjQQf /e872AXYHvuDJvLi+d64Y/hqbjZqcwIcg/qxxtSjxk6L/7jw1XnjBOblK JXANucH8bNNkf5tyG7eRMWwkQ6BRZB+vRXg7ezlmn2IrmgN+jqDK6PnZd RoXI9SzHAmwFtYWBgjYB9gqrVE8js/YAf8Pz+43Q5sdXg8vQlTg4U3c6o g==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="10573032" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="10573032" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 22:53:48 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="3645352" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 22:53:45 -0800 From: "Huang, Ying" To: Kairui Song Cc: linux-mm@kvack.org, Kairui Song , Andrew Morton , Chris Li , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , Yosry Ahmed , David Hildenbrand , linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 5/7] mm/swap: avoid a duplicated swap cache lookup for SWP_SYNCHRONOUS_IO In-Reply-To: <20240129175423.1987-6-ryncsn@gmail.com> (Kairui Song's message of "Tue, 30 Jan 2024 01:54:20 +0800") References: <20240129175423.1987-1-ryncsn@gmail.com> <20240129175423.1987-6-ryncsn@gmail.com> Date: Tue, 30 Jan 2024 14:51:48 +0800 Message-ID: <87jznrgvp7.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 9B30F40005 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: dao144fj3w1ms5heto4jtuemd4nxnkso X-HE-Tag: 1706597629-487255 X-HE-Meta: U2FsdGVkX194uDguAxlSqU9t0CenNt7tm0lgcBT275L6d9FYFgCM1QHHjP3jcHekwC68y8oiUaU5rsunTTZ/Jl8HV77+SOyIIhrToszz8lP+HJ1YipMsplVRv41spQPb9sEJBBzpu6KIoK9F42YZbT4gvm8Notq8HoCYgWBHT6s4CFCDKd1GytJfh8rIymGkkXxo2zyHjm9K9Ys6bwENNBa8yxcEFMhvYOnCpLAozGtj3J+beVNsOvbD2ZeHIDgyftzsE/7GtI6IvrE9zxs8pRXm787YIBYFraB2EezOPB1OsrDV4IOJXz7Cnx/ldvEkJNTUPhEaGmY7tquii85MfRPVY8p1tgoQzCb2LI6X2u9L1INAU6b6nOvktySlY5I5nGG7TVsn8V0VpH2huRKDG0wDPJ672D6EoENDhZAxZTkby12IWsweHMqN4Em9Di+E5dsX132UMIY9WIFbIfUovAAFi9I/iCOiHCSBn0Ir//jEao5b/6llh9H/KYcO65hiLu1KC7Q3ON1N2gnUEcCcuoXDd+Fc0ITBNW287UwDzQded24Khlh1impPhBgbcZkLYcX9AsRYiHHp5rSuuAUazgyLWdpRmIbBfpOQCY7yhH6t12s11dQyO7xMdHmCicB3oOtgysjpSBZa18pCuABUza4JW3QmfrldvRRoQFjLFgnLWfDEyOtAl3TdWGhQoPy7Rn22I7c0eBpKdF18SYLDcnmXZ8YLZLdFMboFBxz/zhozRhVQjmckiUB/tXXWaU7Bv3XlSci1e7SFs/eMXsYoWTpLBGRmFLI6PydVdKIg02eMAcyC/4VeCXSaP4PShG0prR3K4OTHVIuQiRYt+Q7lxRCu99AVB37qr1OfCAS4u3hWwbT698mc+YrVT8rsmo6LImffYPb+0Ho= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kairui Song writes: > From: Kairui Song > > When a xa_value is returned by the cache lookup, keep it to be used > later for workingset refault check instead of doing the looking up again > in swapin_no_readahead. > > Shadow look up and workingset check is skipped for swapoff to reduce > overhead, workingset checking for anon pages upon swapoff is not > helpful, simply consider all pages as inactive make more sense since > swapoff doesn't mean pages is being accessed. > > After this commit, swappin is about 4% faster for ZRAM, micro benchmark > result which use madvise to swap out 10G zero-filled data to ZRAM then > read them in: > > Before: 11143285 us > After: 10692644 us (+4.1%) > > Signed-off-by: Kairui Song LGTM, Thanks! Reviewed-by: "Huang, Ying" > --- > mm/memory.c | 5 +++-- > mm/shmem.c | 2 +- > mm/swap.h | 11 ++++++----- > mm/swap_state.c | 23 +++++++++++++---------- > mm/swapfile.c | 4 ++-- > 5 files changed, 25 insertions(+), 20 deletions(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 8711f8a07039..349946899f8d 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3800,6 +3800,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > struct swap_info_struct *si = NULL; > rmap_t rmap_flags = RMAP_NONE; > bool exclusive = false; > + void *shadow = NULL; > swp_entry_t entry; > pte_t pte; > vm_fault_t ret = 0; > @@ -3858,14 +3859,14 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > if (unlikely(!si)) > goto out; > > - folio = swap_cache_get_folio(entry, vma, vmf->address); > + folio = swap_cache_get_folio(entry, vma, vmf->address, &shadow); > if (folio) > page = folio_file_page(folio, swp_offset(entry)); > swapcache = folio; > > if (!folio) { > folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE, > - vmf, &swapcache); > + vmf, &swapcache, shadow); > if (!folio) { > /* > * Back out if somebody else faulted in this pte > diff --git a/mm/shmem.c b/mm/shmem.c > index d7c84ff62186..698a31bf7baa 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1873,7 +1873,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, > } > > /* Look it up and read it in.. */ > - folio = swap_cache_get_folio(swap, NULL, 0); > + folio = swap_cache_get_folio(swap, NULL, 0, NULL); > if (!folio) { > /* Or update major stats only when swapin succeeds?? */ > if (fault_type) { > diff --git a/mm/swap.h b/mm/swap.h > index 8f8185d3865c..ca9cb472a263 100644 > --- a/mm/swap.h > +++ b/mm/swap.h > @@ -42,7 +42,8 @@ void delete_from_swap_cache(struct folio *folio); > void clear_shadow_from_swap_cache(int type, unsigned long begin, > unsigned long end); > struct folio *swap_cache_get_folio(swp_entry_t entry, > - struct vm_area_struct *vma, unsigned long addr); > + struct vm_area_struct *vma, unsigned long addr, > + void **shadowp); > struct folio *filemap_get_incore_folio(struct address_space *mapping, > pgoff_t index); > > @@ -54,8 +55,8 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_flags, > bool skip_if_exists); > struct folio *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, > struct mempolicy *mpol, pgoff_t ilx); > -struct folio *swapin_entry(swp_entry_t entry, gfp_t flag, > - struct vm_fault *vmf, struct folio **swapcached); > +struct folio *swapin_entry(swp_entry_t entry, gfp_t flag, struct vm_fault *vmf, > + struct folio **swapcached, void *shadow); > > static inline unsigned int folio_swap_flags(struct folio *folio) > { > @@ -87,7 +88,7 @@ static inline struct folio *swap_cluster_readahead(swp_entry_t entry, > } > > static inline struct folio *swapin_entry(swp_entry_t swp, gfp_t gfp_mask, > - struct vm_fault *vmf, struct folio **swapcached) > + struct vm_fault *vmf, struct folio **swapcached, void *shadow) > { > return NULL; > } > @@ -98,7 +99,7 @@ static inline int swap_writepage(struct page *p, struct writeback_control *wbc) > } > > static inline struct folio *swap_cache_get_folio(swp_entry_t entry, > - struct vm_area_struct *vma, unsigned long addr) > + struct vm_area_struct *vma, unsigned long addr, void **shadowp) > { > return NULL; > } > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 5e06b2e140d4..e41a137a6123 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -330,12 +330,18 @@ static inline bool swap_use_vma_readahead(void) > * Caller must lock the swap device or hold a reference to keep it valid. > */ > struct folio *swap_cache_get_folio(swp_entry_t entry, > - struct vm_area_struct *vma, unsigned long addr) > + struct vm_area_struct *vma, unsigned long addr, void **shadowp) > { > struct folio *folio; > > - folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); > - if (!IS_ERR(folio)) { > + folio = filemap_get_entry(swap_address_space(entry), swp_offset(entry)); > + if (xa_is_value(folio)) { > + if (shadowp) > + *shadowp = folio; > + return NULL; > + } > + > + if (folio) { > bool vma_ra = swap_use_vma_readahead(); > bool readahead; > > @@ -365,8 +371,6 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, > if (!vma || !vma_ra) > atomic_inc(&swapin_readahead_hits); > } > - } else { > - folio = NULL; > } > > return folio; > @@ -866,16 +870,16 @@ static struct folio *swap_vma_readahead(swp_entry_t targ_entry, gfp_t gfp_mask, > * @entry: swap entry of this memory > * @gfp_mask: memory allocation flags > * @vmf: fault information > + * @shadow: workingset shadow corresponding to entry > * > * Returns the struct folio for entry and addr after the swap entry is read > * in. > */ > static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, > - struct vm_fault *vmf) > + struct vm_fault *vmf, void *shadow) > { > struct vm_area_struct *vma = vmf->vma; > struct folio *folio; > - void *shadow = NULL; > > /* skip swapcache */ > folio = vma_alloc_folio(gfp_mask, 0, > @@ -892,7 +896,6 @@ static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, > } > mem_cgroup_swapin_uncharge_swap(entry); > > - shadow = get_shadow_from_swap_cache(entry); > if (shadow) > workingset_refault(folio, shadow); > > @@ -922,7 +925,7 @@ static struct folio *swapin_direct(swp_entry_t entry, gfp_t gfp_mask, > * or skip the readahead(ie, ramdisk based swap device). > */ > struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask, > - struct vm_fault *vmf, struct folio **swapcache) > + struct vm_fault *vmf, struct folio **swapcache, void *shadow) > { > struct mempolicy *mpol; > struct folio *folio; > @@ -930,7 +933,7 @@ struct folio *swapin_entry(swp_entry_t entry, gfp_t gfp_mask, > > if (data_race(swp_swap_info(entry)->flags & SWP_SYNCHRONOUS_IO) && > __swap_count(entry) == 1) { > - folio = swapin_direct(entry, gfp_mask, vmf); > + folio = swapin_direct(entry, gfp_mask, vmf, shadow); > } else { > mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx); > if (swap_use_vma_readahead()) > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 1cf7e72e19e3..aac26f5a6cec 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1865,7 +1865,7 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > pte_unmap(pte); > pte = NULL; > > - folio = swap_cache_get_folio(entry, vma, addr); > + folio = swap_cache_get_folio(entry, vma, addr, NULL); > if (!folio) { > struct vm_fault vmf = { > .vma = vma, > @@ -1875,7 +1875,7 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, > }; > > folio = swapin_entry(entry, GFP_HIGHUSER_MOVABLE, > - &vmf, NULL); > + &vmf, NULL, NULL); > } > if (!folio) { > /*