From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9BBAAC433F5 for ; Thu, 24 Feb 2022 21:29:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0B2C98D0003; Thu, 24 Feb 2022 16:29:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 062BE8D0002; Thu, 24 Feb 2022 16:29:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E44D68D0003; Thu, 24 Feb 2022 16:29:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id D21ED8D0002 for ; Thu, 24 Feb 2022 16:29:13 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A0E1C20C6F for ; Thu, 24 Feb 2022 21:29:13 +0000 (UTC) X-FDA: 79178964186.07.13986B2 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf03.hostedemail.com (Postfix) with ESMTP id 2FED420002 for ; Thu, 24 Feb 2022 21:29:13 +0000 (UTC) Received: by mail-ej1-f42.google.com with SMTP id p9so7031468ejd.6 for ; Thu, 24 Feb 2022 13:29:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=p919IJnsnxJq1H265GJu8r+hD/eZdrOi1yYZQKFMV+E=; b=Moc4yYvuNR+kOXi0dMtqEFC0MRIYMl13knqJXH21OTCo+3w9xCGKl7euc4ENAM/oTJ DNAmmbDe1Ksr77xAdiOBK+AMLekVjjuo7qRuOBiFyNPkCmzH717LN6/y5ZaAFXxKpM3L +RkcYu+DSztds3BgdvkGvhdhM62wm/bpmZ/jU0QhzngqLgCA26/5EMIQUeMCyBjNU3Za 6G02IynUPG2MgY07YX5vCROSv3NzcHLN8kKLKGYQ3VbLEn6iRF83v9pzfWQj1D24/2QG DPiE/TkaWlMCw+zNjzWyBwdQfhGPDmv7fvtoJ7PQ0I0vVPP1An5xiYWuYUCwQRhuRqhf qtNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=p919IJnsnxJq1H265GJu8r+hD/eZdrOi1yYZQKFMV+E=; b=nPNbucvcVEbOM9AV3kM9BEAo2WinPIch0UKHwru2XGjt4EZcGrVcNRbfctqoWzF3BZ px+UiMk5anRWeD7GFxW/ereapUrdGHYIeuFBx88EH/+MMuhzi3zqSOy4oo2+ax9AUyLp LkEV6+k1aNVeHaKCVkasqsTAJmahlgSccKunt6CZZcEQQ9zS3ymrAczY52VigYik+4nK hEhQBqwJIe1t4TR1u+csBu8k/n2LnEZ89tlBStaYHFsEHSgBQbEBvF2XY41uc7sQmIN0 7BhKeS/uWQ81IPqWuMif61q5hMZN269amLsscLML/C6ZHFH4IvFiSDt7zsODlZVpZeI2 vMzw== X-Gm-Message-State: AOAM532mdVrfY4fef1+X1qw2ZKf3aVKIyN5xE9htJfqxtH+nkKYIYZj9 c+J7fT66b/4v7uWFCrItr7u/MkIINdVaIs/lehg= X-Google-Smtp-Source: ABdhPJyM+vMdSAsGICxq1iupm3Mo5fNMk2WAQfSdCgxu+icpYGk9N1inssIZcsE7JM5WnaZ4TEJNyEdulNbHWPKTx54= X-Received: by 2002:a17:906:86c7:b0:6a8:49fa:a3f5 with SMTP id j7-20020a17090686c700b006a849faa3f5mr3800053ejy.421.1645738151725; Thu, 24 Feb 2022 13:29:11 -0800 (PST) MIME-Version: 1.0 References: <20210514093007.4117906-1-linmiaohe@huawei.com> <00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz> In-Reply-To: <00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz> From: Yang Shi Date: Thu, 24 Feb 2022 13:29:00 -0800 Message-ID: Subject: Re: [PATCH v4] mm/huge_memory.c: add missing read-only THP checking in transparent_hugepage_enabled() To: Vlastimil Babka Cc: Miaohe Lin , Andrew Morton , Zi Yan , William Kucharski , Matthew Wilcox , Yang Shi , "Aneesh Kumar K.V" , Ralph Campbell , Song Liu , "Kirill A. Shutemov" , Rik van Riel , Johannes Weiner , Minchan Kim , Linux Kernel Mailing List , Linux MM , David Hildenbrand Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Moc4yYvu; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of shy828301@gmail.com designates 209.85.218.42 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 2FED420002 X-Stat-Signature: f8tzr1r5bdju5fp39zfamsbfpit5mai6 X-HE-Tag: 1645738153-330620 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 24, 2022 at 10:51 AM Vlastimil Babka wrote: > > On 5/14/21 11:30, Miaohe Lin wrote: > > Since commit 99cb0dbd47a1 ("mm,thp: add read-only THP support for > > (non-shmem) FS"), read-only THP file mapping is supported. But it > > forgot to add checking for it in transparent_hugepage_enabled(). > > To fix it, we add checking for read-only THP file mapping and also > > introduce helper transhuge_vma_enabled() to check whether thp is > > enabled for specified vma to reduce duplicated code. We rename > > transparent_hugepage_enabled to transparent_hugepage_active to make > > the code easier to follow as suggested by David Hildenbrand. > > > > Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") > > Reviewed-by: Yang Shi > > Signed-off-by: Miaohe Lin > > FYI, I stumbled upon needing this for my reproducer for the read() > corruption [1] to work, and I think it's still not ideal. I also needed > madvise(MADV_HUGEPAGE) for the executable file mapping to make sure > khugepaged acts on my process. Yeah, commit 99cb0dbd47a1 suggests madvise. > Yet, khugepaged will happily collapse file mappings even without madvise > anyway. However, it might not know about the mm at all unless > __khugepaged_enter() has been applied at least on one of its vma's. > madvise() is one way to do it for executable file mappings, but it can also > happen through e.g. do_huge_pmd_anonymous_page() on another mapping, which > has nothing to do with the file mapping. > So what I'm trying to say is that we are somewhat inconsistent - the rules > to consider a vma in khugepaged seem to be clear and result of admin > configuration and madvise, but the rules to consider a mm for khugepaged > (which is the only way to collapse file mappings) are not completely > identical and there might be "random luck" involved. Yes, khugepaged_enter() is not called for file vma explicitly. My wild guess is THP for readonly fs was assumed to be collapsed by explicit demand from the users, for example, madvise. To achieve more consistent behavior we could call khugepaged_enter() for file fault just like what huge anonymous fault does. Of course we need to fix khugepaged_enter() to do the proper check as well. > > [1] https://lore.kernel.org/all/df3b5d1c-a36b-2c73-3e27-99e74983de3a@suse.cz/ > > > --- > > v3->v4: > > collect Reviewed-by tag > > define transhuge_vma_enabled next to transhuge_vma_suitable > > --- > > fs/proc/task_mmu.c | 2 +- > > include/linux/huge_mm.h | 57 +++++++++++++++++++++++++---------------- > > mm/huge_memory.c | 11 +++++++- > > mm/khugepaged.c | 4 +-- > > mm/shmem.c | 3 +-- > > 5 files changed, 48 insertions(+), 29 deletions(-) > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > index fc9784544b24..7389df326edd 100644 > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > > @@ -832,7 +832,7 @@ static int show_smap(struct seq_file *m, void *v) > > __show_smap(m, &mss, false); > > > > seq_printf(m, "THPeligible: %d\n", > > - transparent_hugepage_enabled(vma)); > > + transparent_hugepage_active(vma)); > > > > if (arch_pkeys_enabled()) > > seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index 0a526f211fec..7b7f7b52ccb8 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -115,9 +115,34 @@ extern struct kobj_attribute shmem_enabled_attr; > > > > extern unsigned long transparent_hugepage_flags; > > > > +static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > > + unsigned long haddr) > > +{ > > + /* Don't have to check pgoff for anonymous vma */ > > + if (!vma_is_anonymous(vma)) { > > + if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff, > > + HPAGE_PMD_NR)) > > + return false; > > + } > > + > > + if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end) > > + return false; > > + return true; > > +} > > + > > +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma, > > + unsigned long vm_flags) > > +{ > > + /* Explicitly disabled through madvise. */ > > + if ((vm_flags & VM_NOHUGEPAGE) || > > + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) > > + return false; > > + return true; > > +} > > + > > /* > > * to be used on vmas which are known to support THP. > > - * Use transparent_hugepage_enabled otherwise > > + * Use transparent_hugepage_active otherwise > > */ > > static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) > > { > > @@ -128,15 +153,12 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) > > if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX)) > > return false; > > > > - if (vma->vm_flags & VM_NOHUGEPAGE) > > + if (!transhuge_vma_enabled(vma, vma->vm_flags)) > > return false; > > > > if (vma_is_temporary_stack(vma)) > > return false; > > > > - if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) > > - return false; > > - > > if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG)) > > return true; > > > > @@ -150,22 +172,7 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) > > return false; > > } > > > > -bool transparent_hugepage_enabled(struct vm_area_struct *vma); > > - > > -static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > > - unsigned long haddr) > > -{ > > - /* Don't have to check pgoff for anonymous vma */ > > - if (!vma_is_anonymous(vma)) { > > - if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff, > > - HPAGE_PMD_NR)) > > - return false; > > - } > > - > > - if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end) > > - return false; > > - return true; > > -} > > +bool transparent_hugepage_active(struct vm_area_struct *vma); > > > > #define transparent_hugepage_use_zero_page() \ > > (transparent_hugepage_flags & \ > > @@ -351,7 +358,7 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) > > return false; > > } > > > > -static inline bool transparent_hugepage_enabled(struct vm_area_struct *vma) > > +static inline bool transparent_hugepage_active(struct vm_area_struct *vma) > > { > > return false; > > } > > @@ -362,6 +369,12 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > > return false; > > } > > > > +static inline bool transhuge_vma_enabled(struct vm_area_struct *vma, > > + unsigned long vm_flags) > > +{ > > + return false; > > +} > > + > > static inline void prep_transhuge_page(struct page *page) {} > > > > static inline bool is_transparent_hugepage(struct page *page) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 76ca1eb2a223..4f37867eed12 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -63,7 +63,14 @@ static struct shrinker deferred_split_shrinker; > > static atomic_t huge_zero_refcount; > > struct page *huge_zero_page __read_mostly; > > > > -bool transparent_hugepage_enabled(struct vm_area_struct *vma) > > +static inline bool file_thp_enabled(struct vm_area_struct *vma) > > +{ > > + return transhuge_vma_enabled(vma, vma->vm_flags) && vma->vm_file && > > + !inode_is_open_for_write(vma->vm_file->f_inode) && > > + (vma->vm_flags & VM_EXEC); > > +} > > + > > +bool transparent_hugepage_active(struct vm_area_struct *vma) > > { > > /* The addr is used to check if the vma size fits */ > > unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE; > > @@ -74,6 +81,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma) > > return __transparent_hugepage_enabled(vma); > > if (vma_is_shmem(vma)) > > return shmem_huge_enabled(vma); > > + if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) > > + return file_thp_enabled(vma); > > > > return false; > > } > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 6c0185fdd815..d97b20fad6e8 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -442,9 +442,7 @@ static inline int khugepaged_test_exit(struct mm_struct *mm) > > static bool hugepage_vma_check(struct vm_area_struct *vma, > > unsigned long vm_flags) > > { > > - /* Explicitly disabled through madvise. */ > > - if ((vm_flags & VM_NOHUGEPAGE) || > > - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) > > + if (!transhuge_vma_enabled(vma, vm_flags)) > > return false; > > > > /* Enabled via shmem mount options or sysfs settings. */ > > diff --git a/mm/shmem.c b/mm/shmem.c > > index a08cedefbfaa..1dcbec313c70 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -4032,8 +4032,7 @@ bool shmem_huge_enabled(struct vm_area_struct *vma) > > loff_t i_size; > > pgoff_t off; > > > > - if ((vma->vm_flags & VM_NOHUGEPAGE) || > > - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) > > + if (!transhuge_vma_enabled(vma, vma->vm_flags)) > > return false; > > if (shmem_huge == SHMEM_HUGE_FORCE) > > return true; >