From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73443C4167B for ; Tue, 28 Nov 2023 11:22:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EE9196B02F8; Tue, 28 Nov 2023 06:22:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EC0206B02FA; Tue, 28 Nov 2023 06:22:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D889A6B02FD; Tue, 28 Nov 2023 06:22:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C9F346B02F8 for ; Tue, 28 Nov 2023 06:22:32 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 862D51A0175 for ; Tue, 28 Nov 2023 11:22:32 +0000 (UTC) X-FDA: 81507124944.27.17ECD88 Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) by imf17.hostedemail.com (Postfix) with ESMTP id A57914001E for ; Tue, 28 Nov 2023 11:22:30 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=micbyjfN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701170550; a=rsa-sha256; cv=none; b=KpFcu/+n3seFm6hKvsJMW0eE7THoW0pOMp+hidPhGz68eI8KB1cqeuXVJiUnjXyq8cMhrj G2iVK14aMTWPMW/Hyvv4ILceJnz5Sdr6DuIWaOztpEtaJg+Fuelx2mRZ4esD6l3VLkMBjg fSEJrPaQvNV1Yv3fliqpwD51HgUBwEc= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=micbyjfN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701170550; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ULlAN2OTwH0qqZqtpyuFI3gtbbBSb9YRY8MDdp/pLEI=; b=MXx9UcsjtGEDHCROfU0q3+D3Yn6md2RqcohEhGk/6Vm04Z8TQ9mm0PF1WcJ47DCSv3ZtSd yt0S3ctH0feZyAAPK5Qd6UGl2KFRDRzU/kBWgUNhehy1GV7rJyyYYJkfTLHMku4L/drXhx k8I2/1fJKSO4pBGDqowu0/Zf58nZy9A= Received: by mail-lj1-f174.google.com with SMTP id 38308e7fff4ca-2c72e275d96so71265841fa.2 for ; Tue, 28 Nov 2023 03:22:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701170549; x=1701775349; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ULlAN2OTwH0qqZqtpyuFI3gtbbBSb9YRY8MDdp/pLEI=; b=micbyjfNS1+fGtgQ4jHroHEWElUIFCUSrLrdblUOzWEDy+xldpKcx7XhJIClO3UYTH 4ofDtBndi6SzQQtsubEttvRQVUw4BFi/nLjSaCzQtPlAAyeOj+ssGDtS2bCWW9ADgcKc TZRRkKFL0R8pgtHmanmU3Rrry++ZB/TsG0KqTpiPFYa6wDkja4ptDON1J1J8ASNri6f1 p6WnCTiXPvrzsgJjhexUAh5b4HY5WiWyemJZEUXgCeBGY9uXY10Ub+/J7qW+7NV22+aX 6AUAbMuxl849SXjLzweN5uBauPDrKVytY5AI2Zdx/Zj7wyL4YOFO/tIgImtoRnbvcQzh /trg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701170549; x=1701775349; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ULlAN2OTwH0qqZqtpyuFI3gtbbBSb9YRY8MDdp/pLEI=; b=LfJeHgaUGHL/QAJlxHJS7+ZX1ZKLlQQMcGMUZm6DCElpmyniTcdBjgY4uG0SUojH5M TPPui+vGf0zaMbn2K5qXUHd09Oazj77fQU0wlvKkcIV2r39lCDPjowUQJ1v7pNp6/izV Zjox51E3hJAptS8QQK4U3Ioz+E2UNCN2rjqEuhgFHRuQhzExJOOLssscrecSPa/RhspZ nscZ4boan3m8OhyTeOX2M6MDvTZB0yzxThr3EAEjgWa/dUaD0HNkQEy94s2Bq31sJEP1 Fuq6AMzVMcahTiCSU8eu9SOAqF418BL21dh+Tywm+it3pbJ23TpkqFqkUUhF2R2moT7S We0g== X-Gm-Message-State: AOJu0YwYJ7i9eyCriYmHaWy8Ek7Wp4OP3NC6gQ4Ayoe6SDQ3FIZXBnWN Wpsw2F6w3YKJyb4N3Kklk8jidrmNinuDnEYrbxiCzODRu13OeQ== X-Google-Smtp-Source: AGHT+IEgbSqEM0GnWhmYSByqskXTL1dK//72eoBZ12lYDz5E4DY/JajbIzLVllzygCGo+JnKCyJuyOjidlFrHgu7b5Y= X-Received: by 2002:a2e:a7c8:0:b0:2c9:a0d6:1a2d with SMTP id x8-20020a2ea7c8000000b002c9a0d61a2dmr5997059ljp.8.1701170548525; Tue, 28 Nov 2023 03:22:28 -0800 (PST) MIME-Version: 1.0 References: <20231119194740.94101-1-ryncsn@gmail.com> <20231119194740.94101-19-ryncsn@gmail.com> In-Reply-To: From: Kairui Song Date: Tue, 28 Nov 2023 19:22:10 +0800 Message-ID: Subject: Re: [PATCH 18/24] mm/swap: introduce a helper non fault swapin To: Chris Li Cc: linux-mm@kvack.org, Andrew Morton , "Huang, Ying" , David Hildenbrand , Hugh Dickins , Johannes Weiner , Matthew Wilcox , Michal Hocko , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: A57914001E X-Stat-Signature: 8h77qphqg1s3gtzau58zdkg89ybxb9jd X-HE-Tag: 1701170550-211696 X-HE-Meta: U2FsdGVkX191JlhvbE9a5a6dpiCcuhAYKqz6uwBekxSXyao1lvHQQbgSAofsA+74p9HHyDDvGRdg2QKpudfP9mRvgnfD0SIFrMYDu3dRy9PXnPNtaQ09j3ROBhWv61o4nYF1JzX4pqRmgvbg9QkhwrUrGYtsXNv02qKRxw3F6zvAfjmIfggdve1dtl13CbErrQdCeQUoGur4E0EiR7cvRclEHkwuIbdhHxLbaS1p6Md0E1ScDPwaanSlaZvSewCf2mmED/BHSDvVj2SBBZV2J6iIuWJtHJ3e5BsMOpnaS0ZQP5Ts8metlobT2x9BwKX7XBtDEG6sNf9ev0mrbbhXetsZ8PqMDOwOD6yySVdSkn+nyKE4JnetptcvWd5rJsMC9VgEvZeY6IXVq737qZ80y7gTiiPcMa2NgJ/8ggYYZsDVnspagy+fnhTimswkmxo1UUZSXcCryP3tfJ4mvv7WkG46hVgbQrz6Y56EfRveVfsfNfKAqGPREPo66lN3c35qt8mExfK9Znc8W5wnbddELdIgrioC4FGcrXrKGLwUA8oljeFyA1Hc8w7Wr5x/lN+oBNXT938eLVTyOpoT5xFJ0gwW+BZ0Gjp3cjPtfWEe4FsRofK+Mpbw001ZVau7u0kddMfmgmytK0aobeoqFgGAg5jFuzQSTOIcBpM+mAuDecCwtvvBbm6RoFSibrm45M7QJTajB1AJT+75woVlpYzJtZ69Lzsm6Q0Dw5fkuEP1cuTJQ+BEnmlJXFF3o/sbX+xYS5Lr4l98Jw8HgUeNhokeewhH+DiWpnT7I4yHrkcGUpM8mSV8ygnc4ChbG3iDvOe2jdenolMh0vXsJYpdYDI41XoJx3DSVyjWYhXHipsarfKlmDFiHJ7XUn9fwPIt6WPyqXokfyQTBbGeaNB0eeIrN5Roaqc4LDCAI3ecEvBMiXnqnzFd6MKYxvWMkCE4WyTZ5zHCv8WcBwmYxLvCOKl y4zwrVxZ +bb+ckZjWPus1HGeLoNAuYIpjmGahk+WYrhLB2FlOPhYweHfrsajdQwwrw48lXiVCkOSlU4UDRZ+AFH85mopeq5IUMJYl39IRN5d57MUuBItUDgPI40Objh6zMVy6yTMohOmNTE5qr6wrq1Oz6zQ3sg4Gg///1ptq8D8Wt93dUnv5kV4Mh4GKFrHo/h3XtnjKEWCsuwFPpba8bycBqsQRLnYgh2oCd5h4I93KzZl3w1KGagq3TwFt0c8sum2jyLUt9LvwrLVTWiZ4+a4AjbXYeqh6vDRshfDE10Z+cDqZJ+rynkuYKtHauyfgBwTowk+qP3j9TajXCI6kgkJ/aCx9j/0mY0SzOWcYK/fOpSnmOBJXO8+6q+ZEbo3juFChWG1GB/jjY+b0dMRCwLJPzofeMD3EXjm6FGAYZud1nVsMOqBrnYWd3EWteX5MhQy0KplEBQJtQlk+7nLM0wqCdSEYuRr3MD7BGPoYs/jaxql8CqN3wD7WS6gLuVYWTA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Chris Li =E4=BA=8E2023=E5=B9=B411=E6=9C=8822=E6=97=A5= =E5=91=A8=E4=B8=89 12:41=E5=86=99=E9=81=93=EF=BC=9A > > On Sun, Nov 19, 2023 at 11:49=E2=80=AFAM Kairui Song w= rote: > > > > From: Kairui Song > > > > There are two places where swapin is not direct caused by page fault: > > shmem swapin is invoked through shmem mapping, swapoff cause swapin by > > walking the page table. They used to construct a pseudo vmfault struct > > for swapin function. > > > > Shmem has dropped the pseudo vmfault recently in commit ddc1a5cbc05d > > ("mempolicy: alloc_pages_mpol() for NUMA policy without vma"). Swapoff > > path is still using a pseudo vmfault. > > > > Introduce a helper for them both, this help save stack usage for swapof= f > > path, and help apply a unified swapin cache and readahead policy check. > > > > Also prepare for follow up commits. > > > > Signed-off-by: Kairui Song > > --- > > mm/shmem.c | 51 ++++++++++++++++--------------------------------- > > mm/swap.h | 11 +++++++++++ > > mm/swap_state.c | 38 ++++++++++++++++++++++++++++++++++++ > > mm/swapfile.c | 23 +++++++++++----------- > > 4 files changed, 76 insertions(+), 47 deletions(-) > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > index f9ce4067c742..81d129aa66d1 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -1565,22 +1565,6 @@ static inline struct mempolicy *shmem_get_sbmpol= (struct shmem_sb_info *sbinfo) > > static struct mempolicy *shmem_get_pgoff_policy(struct shmem_inode_inf= o *info, > > pgoff_t index, unsigned int order, pgoff_t *ilx= ); > > > > -static struct folio *shmem_swapin_cluster(swp_entry_t swap, gfp_t gfp, > > - struct shmem_inode_info *info, pgoff_t index) > > -{ > > - struct mempolicy *mpol; > > - pgoff_t ilx; > > - struct page *page; > > - > > - mpol =3D shmem_get_pgoff_policy(info, index, 0, &ilx); > > - page =3D swap_cluster_readahead(swap, gfp, mpol, ilx); > > - mpol_cond_put(mpol); > > - > > - if (!page) > > - return NULL; > > - return page_folio(page); > > -} > > - > > Nice. Thank you. > > > /* > > * Make sure huge_gfp is always more limited than limit_gfp. > > * Some of the flags set permissions, while others set limitations. > > @@ -1854,9 +1838,12 @@ static int shmem_swapin_folio(struct inode *inod= e, pgoff_t index, > > { > > struct address_space *mapping =3D inode->i_mapping; > > struct shmem_inode_info *info =3D SHMEM_I(inode); > > - struct swap_info_struct *si; > > + enum swap_cache_result result; > > struct folio *folio =3D NULL; > > + struct mempolicy *mpol; > > + struct page *page; > > swp_entry_t swap; > > + pgoff_t ilx; > > int error; > > > > VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); > > @@ -1866,34 +1853,30 @@ static int shmem_swapin_folio(struct inode *ino= de, pgoff_t index, > > if (is_poisoned_swp_entry(swap)) > > return -EIO; > > > > - si =3D get_swap_device(swap); > > - if (!si) { > > + mpol =3D shmem_get_pgoff_policy(info, index, 0, &ilx); > > + page =3D swapin_page_non_fault(swap, gfp, mpol, ilx, fault_mm, = &result); Hi Chris, I've been trying to address these issues in V2, most issue in other patches have a straight solution, some could be discuss in seperate series, but I come up with some thoughts here: > > Notice this "result" CAN be outdated. e.g. after this call, the swap > cache can be changed by another thread generating the swap page fault > and installing the folio into the swap cache or removing it. This is true, and it seems a potential race also exist before this series for direct (no swapcache) swapin path (do_swap_page) if I understand it correctly: In do_swap_page path, multiple process could swapin the page at the same time (a mapped once page can still be shared by sub threads), they could get different folios. The later pte lock and pte_same check is not enough, because while one process is not holding the pte lock, another process could read-in, swap_free the entry, then swap-out the page again, using same entry, an ABA problem. The race is not likely to happen in reality but in theory possible. Same issue for shmem here, there are shmem_confirm_swap/shmem_add_to_page_cache check later to prevent re-installing into shmem mapping for direct swap in, but also not enough. Other process could read-in and re-swapout using same entry so the mapping entry seems unchanged during the time window. Still very unlikely to happen in reality, but not impossible. When swapcache is used there is no such issue, since swap lock and swap_map are used to sync all readers, and while one reader is still holding the folio, the entry is locked through swapcache, or if a folio is removed from swapcache, folio_test_swapcache will fail, and the reader could retry. I'm trying to come up with a better locking for direct swap in, am I missing anything here? Correct me if I get it wrong...