From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 290AEC636BD for ; Fri, 24 Nov 2023 08:42:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 975858D0066; Fri, 24 Nov 2023 03:42:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 926CF8D0063; Fri, 24 Nov 2023 03:42:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7C5A38D0066; Fri, 24 Nov 2023 03:42:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 69D948D0063 for ; Fri, 24 Nov 2023 03:42:23 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 36041C14C9 for ; Fri, 24 Nov 2023 08:42:23 +0000 (UTC) X-FDA: 81492206166.14.8FC7FAE Received: from mail-lj1-f172.google.com (mail-lj1-f172.google.com [209.85.208.172]) by imf17.hostedemail.com (Postfix) with ESMTP id 56CB440014 for ; Fri, 24 Nov 2023 08:42:21 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="H9/RuVw3"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.172 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700815341; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WH1E9Yiihx2PWEpcKcucZ3CZsR2RWQov3IVdjYVdKl0=; b=wAZIpd7nZF6qHleNKd9m8J/VlLTy3mzuBSCNJJgko1OhT8n0S7LWUeoUDIZ71dvjkBy4Yn wgCkxAWnn44HCre08p9WjuQh8GBLQvRfGQy2KUtjdYMWphd0KQi6c9paHhfNi9jtR15e44 5868q4s5D2WP0Pd9ZSiKNx1+LFS915I= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="H9/RuVw3"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.172 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700815341; a=rsa-sha256; cv=none; b=yLuACydkOx0S9J9krJaexVXSdBuPcxdpRQnEXfR2XzmqFM3g+dfsUwa+EOJO42vRGHLGqJ bNFye1M/dQ2ToxJiqCOda9xoFCYMIXIBVIJMLnITSuoQX+MMkg2hPbHtyysSvilaViX2Dh FEy+zdUA+Z/20ymH4pIJsku0xwlt2oc= Received: by mail-lj1-f172.google.com with SMTP id 38308e7fff4ca-2c8872277fcso17206571fa.1 for ; Fri, 24 Nov 2023 00:42:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700815339; x=1701420139; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WH1E9Yiihx2PWEpcKcucZ3CZsR2RWQov3IVdjYVdKl0=; b=H9/RuVw3f5+uQZ6tcsS5IT507UOcWS/Sz4l3kMHE+1p01d8vw3v+lFlg9PrV0TCd+6 2eHI40FwJLtyfTHlvMH1wkA+CUw0IGkC7jg5a4h8p/8fB0eFa6RvpHaDoi7mh2ONVpbF Qwhn5iLAxboA/DHs2xWmKXk3ciY8EY0a7yNUSr+pFE10OY8hRadlwDCNM177gruXSuDO KkJFdxvime4a3iSYWBjgtdkMwCVWCf+YbV9ZXa/ji9X2TH2jEVTLwfIHVEvkOGAAv2OW XEPEvMAY7jZE16ENHRBUjt6AKTtiHAcIy7SveFX1RPs+MVaK0AzTJJ6N6l+ryrmdGZtA /IRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700815339; x=1701420139; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WH1E9Yiihx2PWEpcKcucZ3CZsR2RWQov3IVdjYVdKl0=; b=mOJCFDqCXG9gyyJUR0o9SXRBCIBx4gv2kcHAz46W6uu8wzO/GzdCZSFzqNUiiIpSQ1 /BAdEeHjPDTiLd34hui9UEVZyNmUwhBsi2o/kn1XQ1cNUf8PlJlVdpf5DvSgPSLsriXR 7LjmiR+gSAF35rlQlugcXFs17OlOa5AAk7MLIuwjumMW0jHpNOJg3Hw7N4VL4Pi4tRHS +5W0l7fgyORtNTbmLNsBhRQDFecimYeveY82Y/B9mkWjOpi1zOoI6BttAd1Wm0miIRV0 nSp/cSvwh6LST/Eu82E6K4rXWmmdAlQddnsuuzQ55VwdhyE8phv3MtCrM2dtsxIWRMyX /kYg== X-Gm-Message-State: AOJu0YzgQu9rpNueTjGlWVzbNJu/lOFrv4b1wrxKPwVX7hwTGqXxoHvC Sqqua0UW3hMl0WBROc3i5SptVsO23xs26QFo96w= X-Google-Smtp-Source: AGHT+IE0xgST11HOt5zFOR9kA3j6Z2IeyY2fjGPyfA5/yarVtW+fG4L0Q8MIwWSzD/URIQnACCJY/nlH5ft3H9+dbE0= X-Received: by 2002:a2e:9645:0:b0:2c3:e35d:13d with SMTP id z5-20020a2e9645000000b002c3e35d013dmr1857701ljh.5.1700815339317; Fri, 24 Nov 2023 00:42:19 -0800 (PST) MIME-Version: 1.0 References: <20231119194740.94101-1-ryncsn@gmail.com> <20231119194740.94101-12-ryncsn@gmail.com> In-Reply-To: From: Kairui Song Date: Fri, 24 Nov 2023 16:42:00 +0800 Message-ID: Subject: Re: [PATCH 11/24] mm/swap: also handle swapcache lookup in swapin_readahead To: Chris Li Cc: linux-mm@kvack.org, Andrew Morton , "Huang, Ying" , David Hildenbrand , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 56CB440014 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: iq5hyf7538ez6mkc1amqhnd884x1edj7 X-HE-Tag: 1700815341-106746 X-HE-Meta: U2FsdGVkX1++ptmUJcjmVl2Btx9vNU8uIFbACiQK2fmPD9l+caIWcuV7ekwvSVTe2VDekob+4P8Ij5nD+VzDqYLfUN7c0/iJntOC37VCau1HlTHciZA8AZqE82ToBtiKDRda7BLQtZOx0Osg1yUNFOts4IR3NWuNfoUC/8HAon4XfVsSgb0XjK0KxuHtlBTbg231sMapak7adSCWIxoSHOAeXkecez+lZFpGRgd9NDbIb2UZ9kKDrZJ0d1OWJRWxZiOVIke1DzmnI7E46XayLLnYEhLM/Lp3K2tCCIJZ9W/3pDkdtJLrhSv/0rFzeYT2097MoboGSxl3wntLQbjhHHAs8j0uslpvexqhUCR2U8Xg7cLLEiRZJnqr5iZMkef22NVNetzWQmmrpT/VzqfYnKOHmAamRAvVChrF1a4uOAJr+qI4SnfxmBcS0b07NNPIZbOqeqoHSuOgpdVDOwo0ttxEhTtxDwbrmtiPc414hHeYdsto9A+dXa3V7KFqVOCnqjAPqMVJkAAbWjEzTpC9exA70qdaQfT24gFUWmNXuhlVKA/Dca95SvjckDJWf/2wRuML7dy+pyd8zRKkfPVBlwjS/BNgdDVStchOGStmQGlHLJ7UYjUPCJAOJEyV9h0rrSjoM3SOHOtpIzacu2RATjEwyaP7zmxrdr/IAUWPHhyQNAG05N6eRfjEzwMkpO8BQdd/GJMVAhqyfa+iVX7vIzgYJe6/voyTqb29bQYqJtwq/nIAJvGtg6b2kIalam1G0neLJCimCBLSrlbkoVAq2VcKxLHhyYbMpl6RLcZqw2ywMz+/7Y//odLcBKkEaM7+tPBNKlAvJq19c7Jo02q1YpLGERZFCKsFXKDy7QdVQG7LKKKREwVtmvSoJmlWJSlkJa5KUC1Xt6kopNmvnSYmZYjR8kWVBIWh9+2w8URD0qckGB1EK2EQk6i1LFj1WmBDaOuCQaBMVXl5onGZ6ie /b7lo8vN nW7sKA5SBOfoLxYwOj9kWrvh2UkETIFxEz9QzDndlnHXsx8up3/Br/7CL13lqTNis55GtzCeNeG2CqWpdnv1H1RyGdKYXwvBWdl2madtPlISvjbtYulFXRbWQH44H5dVJB2C8B3ADSP6iR+nkzibmgEAda3DjEFhe96zJBS96nfU6xcz64PwtjSeJX4pu7QXGcWMWwMhEB2rdmXYp8Iy6Euxi29MoR9hIqTG4GN35gnPa9pk+aUu1QGUodf3gRQLZ91dbMhlhQhV5JwnkyrPZhb2SBpWgaXqOE/Wt3ySyzajSD/gnUilJHoNRaJKh0zeTPPRas39EFWrQgboJ0RZfkRo4/OG/ZVqceYzd4+z6UtC6uPrGPGF1bGpeUBbEzBohAH8wirjMAduTZKx2XAMgY+tPpUwctKeHiTjBqMjZz/uivvMi257zwHFNA+nDINxC4l1DvLToQp8dlBc74mQfr8syrwuWQHR8Erj0g0MZzAcdNf1G4QS+CZsHqoFggmS49gTHXmDkQqGIMhI5Wd8tZ+UBvyN7krnKzLBc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Chris Li =E4=BA=8E2023=E5=B9=B411=E6=9C=8822=E6=97=A5= =E5=91=A8=E4=B8=89 00:07=E5=86=99=E9=81=93=EF=BC=9A > > On Sun, Nov 19, 2023 at 11:48=E2=80=AFAM Kairui Song w= rote: > > > > From: Kairui Song > > > > No feature change, just prepare for later commits. > > You need to have a proper commit message why this change needs to happen. > Preparing is too generic, it does not give any real information. > For example, it seems you want to reduce one swap cache lookup because > swap_readahead already has it? > > I am a bit puzzled at this patch. It shuffles a lot of sensitive code. > However I do not get the value. > It seems like this patch should be merged with the later patch that > depends on it to be judged together. > > > > > Signed-off-by: Kairui Song > > --- > > mm/memory.c | 61 +++++++++++++++++++++++-------------------------- > > mm/swap.h | 10 ++++++-- > > mm/swap_state.c | 26 +++++++++++++-------- > > mm/swapfile.c | 30 +++++++++++------------- > > 4 files changed, 66 insertions(+), 61 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index f4237a2e3b93..22af9f3e8c75 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3786,13 +3786,13 @@ static vm_fault_t handle_pte_marker(struct vm_f= ault *vmf) > > vm_fault_t do_swap_page(struct vm_fault *vmf) > > { > > struct vm_area_struct *vma =3D vmf->vma; > > - struct folio *swapcache, *folio =3D NULL; > > + struct folio *swapcache =3D NULL, *folio =3D NULL; > > + enum swap_cache_result cache_result; > > struct page *page; > > struct swap_info_struct *si =3D NULL; > > rmap_t rmap_flags =3D RMAP_NONE; > > bool exclusive =3D false; > > swp_entry_t entry; > > - bool swapcached; > > pte_t pte; > > vm_fault_t ret =3D 0; > > > > @@ -3850,42 +3850,37 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > if (unlikely(!si)) > > goto out; > > > > - folio =3D swap_cache_get_folio(entry, vma, vmf->address); > > - if (folio) > > - page =3D folio_file_page(folio, swp_offset(entry)); > > - swapcache =3D folio; > > Is the motivation that swap_readahead() already has a swap cache look up = so you > remove this look up here? Yes, the cache look up can is moved and shared in swapin_readahead, and this also make it possible to use that look up to return a shadow when entry is not a page, so another shadow look up can be saved for sync (ZRAM) swapin path. This can help improve ZRAM performance for ~4% for a 10G ZRAM, and should improves more when the cache tree grows large. > > > - > > - if (!folio) { > > - page =3D swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, > > - vmf, &swapcached); > > - if (page) { > > - folio =3D page_folio(page); > > - if (swapcached) > > - swapcache =3D folio; > > - } else { > > + page =3D swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, > > + vmf, &cache_result); > > + if (page) { > > + folio =3D page_folio(page); > > + if (cache_result !=3D SWAP_CACHE_HIT) { > > + /* Had to read the page from swap area: Major f= ault */ > > + ret =3D VM_FAULT_MAJOR; > > + count_vm_event(PGMAJFAULT); > > + count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); > > + } > > + if (cache_result !=3D SWAP_CACHE_BYPASS) > > + swapcache =3D folio; > > + if (PageHWPoison(page)) { > > There is a lot of code shuffle here. From the diff it is hard to tell > if they are doing the same thing as before. > > > /* > > - * Back out if somebody else faulted in this pt= e > > - * while we released the pte lock. > > + * hwpoisoned dirty swapcache pages are kept fo= r killing > > + * owner processes (which may be unknown at hwp= oison time) > > */ > > - vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vm= f->pmd, > > - vmf->address, &vmf->ptl); > > - if (likely(vmf->pte && > > - pte_same(ptep_get(vmf->pte), vmf->or= ig_pte))) > > - ret =3D VM_FAULT_OOM; > > - goto unlock; > > + ret =3D VM_FAULT_HWPOISON; > > + goto out_release; > > } > > - > > - /* Had to read the page from swap area: Major fault */ > > - ret =3D VM_FAULT_MAJOR; > > - count_vm_event(PGMAJFAULT); > > - count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); > > - } else if (PageHWPoison(page)) { > > + } else { > > /* > > - * hwpoisoned dirty swapcache pages are kept for killin= g > > - * owner processes (which may be unknown at hwpoison ti= me) > > + * Back out if somebody else faulted in this pte > > + * while we released the pte lock. > > */ > > - ret =3D VM_FAULT_HWPOISON; > > - goto out_release; > > + vmf->pte =3D pte_offset_map_lock(vma->vm_mm, vmf->pmd, > > + vmf->address, &vmf->ptl); > > + if (likely(vmf->pte && > > + pte_same(ptep_get(vmf->pte), vmf->orig_pte))= ) > > + ret =3D VM_FAULT_OOM; > > + goto unlock; > > } > > > > ret |=3D folio_lock_or_retry(folio, vmf); > > diff --git a/mm/swap.h b/mm/swap.h > > index a9a654af791e..ac9136eee690 100644 > > --- a/mm/swap.h > > +++ b/mm/swap.h > > @@ -30,6 +30,12 @@ extern struct address_space *swapper_spaces[]; > > (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ > > >> SWAP_ADDRESS_SPACE_SHIFT]) > > > > +enum swap_cache_result { > > + SWAP_CACHE_HIT, > > + SWAP_CACHE_MISS, > > + SWAP_CACHE_BYPASS, > > +}; > > Does any function later care about CACHE_BYPASS? > > Again, better introduce it with the function that uses it. Don't > introduce it for "just in case I might use it later". Yes, callers in shmem will also need to know if the page is cached in swap, and need a value to indicate the bypass case. I can add some comments here to indicate the usage. > > > + > > void show_swap_cache_info(void); > > bool add_to_swap(struct folio *folio); > > void *get_shadow_from_swap_cache(swp_entry_t entry); > > @@ -55,7 +61,7 @@ struct page *__read_swap_cache_async(swp_entry_t entr= y, gfp_t gfp_mask, > > struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, > > struct mempolicy *mpol, pgoff_t ilx= ); > > struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, > > - struct vm_fault *vmf, bool *swapcached); > > + struct vm_fault *vmf, enum swap_cache_res= ult *result); > > > > static inline unsigned int folio_swap_flags(struct folio *folio) > > { > > @@ -92,7 +98,7 @@ static inline struct page *swap_cluster_readahead(swp= _entry_t entry, > > } > > > > static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp= _mask, > > - struct vm_fault *vmf, bool *swapcached) > > + struct vm_fault *vmf, enum swap_cache_result *r= esult) > > { > > return NULL; > > } > > diff --git a/mm/swap_state.c b/mm/swap_state.c > > index d87c20f9f7ec..e96d63bf8a22 100644 > > --- a/mm/swap_state.c > > +++ b/mm/swap_state.c > > @@ -908,8 +908,7 @@ static struct page *swapin_no_readahead(swp_entry_t= entry, gfp_t gfp_mask, > > * @entry: swap entry of this memory > > * @gfp_mask: memory allocation flags > > * @vmf: fault information > > - * @swapcached: pointer to a bool used as indicator if the > > - * page is swapped in through swapcache. > > + * @result: a return value to indicate swap cache usage. > > * > > * Returns the struct page for entry and addr, after queueing swapin. > > * > > @@ -918,30 +917,39 @@ static struct page *swapin_no_readahead(swp_entry= _t entry, gfp_t gfp_mask, > > * or vma-based(ie, virtual address based on faulty address) readahead= . > > */ > > struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, > > - struct vm_fault *vmf, bool *swapcached) > > + struct vm_fault *vmf, enum swap_cache_res= ult *result) > > { > > + enum swap_cache_result cache_result; > > struct swap_info_struct *si; > > struct mempolicy *mpol; > > + struct folio *folio; > > struct page *page; > > pgoff_t ilx; > > - bool cached; > > + > > + folio =3D swap_cache_get_folio(entry, vmf->vma, vmf->address); > > + if (folio) { > > + page =3D folio_file_page(folio, swp_offset(entry)); > > + cache_result =3D SWAP_CACHE_HIT; > > + goto done; > > + } > > > > si =3D swp_swap_info(entry); > > mpol =3D get_vma_policy(vmf->vma, vmf->address, 0, &ilx); > > if (swap_use_no_readahead(si, swp_offset(entry))) { > > page =3D swapin_no_readahead(entry, gfp_mask, mpol, ilx= , vmf->vma->vm_mm); > > - cached =3D false; > > + cache_result =3D SWAP_CACHE_BYPASS; > > } else if (swap_use_vma_readahead(si)) { > > page =3D swap_vma_readahead(entry, gfp_mask, mpol, ilx,= vmf); > > - cached =3D true; > > + cache_result =3D SWAP_CACHE_MISS; > > } else { > > page =3D swap_cluster_readahead(entry, gfp_mask, mpol, = ilx); > > - cached =3D true; > > + cache_result =3D SWAP_CACHE_MISS; > > } > > mpol_cond_put(mpol); > > > > - if (swapcached) > > - *swapcached =3D cached; > > +done: > > + if (result) > > + *result =3D cache_result; > > > > return page; > > } > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 01c3f53b6521..b6d57fff5e21 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -1822,13 +1822,21 @@ static int unuse_pte_range(struct vm_area_struc= t *vma, pmd_t *pmd, > > > > si =3D swap_info[type]; > > do { > > - struct folio *folio; > > + struct page *page; > > unsigned long offset; > > unsigned char swp_count; > > + struct folio *folio =3D NULL; > > swp_entry_t entry; > > int ret; > > pte_t ptent; > > > > + struct vm_fault vmf =3D { > > + .vma =3D vma, > > + .address =3D addr, > > + .real_address =3D addr, > > + .pmd =3D pmd, > > + }; > > Is this code move caused by skipping the swap cache look up here? Yes. > > This is very sensitive code related to swap cache racing. It needs > very careful reviewing. Better not shuffle it for no good reason. Thanks for the suggestion, I'll try to avoid these shuffling, but cache lookup is moved into swappin_readahead so some changes in the original caller are not avoidable...