From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89E00C43334 for ; Mon, 11 Jul 2022 20:57:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 22E2994001E; Mon, 11 Jul 2022 16:57:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1DB75940010; Mon, 11 Jul 2022 16:57:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A3B594001E; Mon, 11 Jul 2022 16:57:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EE736940010 for ; Mon, 11 Jul 2022 16:57:46 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id BDC022064A for ; Mon, 11 Jul 2022 20:57:46 +0000 (UTC) X-FDA: 79676030532.17.DBF115D Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf20.hostedemail.com (Postfix) with ESMTP id 5D2541C006C for ; Mon, 11 Jul 2022 20:57:46 +0000 (UTC) Received: by mail-pl1-f181.google.com with SMTP id j12so5471637plj.8 for ; Mon, 11 Jul 2022 13:57:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=n9f1ZviBRCWkD5pueLb41n4FFxvJ1dmF7gDGgwixLXY=; b=fYyI2dohkIeiTn4VCCCn82NJ83w+n/6WhU4hJKLpUBJKe/2eBCQLW/g+0ajxGwUR/B IrrS+AaQzxm1sckVsFfK1xaWtDHYEuipnZsxmvWb5e4Ecd+mborIwf9TBNGLD/qUp3nx rS8Dse8bLy4cIDBQQzn7TnmfvXk1F6uN3b1qJb0UY3GVpLR9s7GqrPSzSt2Ij/VnksLz VkJJQoEhFJGReVdAbs4m5OxfwChEl3+FAmLIgDNsDcG3fjhJ0ryxfskmmWNN46oPb5Q5 uKZ7UczKwUBUrtQGrjLliFYklbLBCsuc9u1ykGLS7yjsfubiJ/Mr3PqaR+q76JfILPR9 C++A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=n9f1ZviBRCWkD5pueLb41n4FFxvJ1dmF7gDGgwixLXY=; b=ES4mxDT5mRZo9Vfj3d+b75i+Vfqn+N3sZ6WTI7p8eBjiXu7AEncGO65UqU1/ROBdz/ TptnRJ18G7NNQ1u62zCu0qijlMq+e18OjvBoUtyK72/ZtC4Tdb4WNDr94GsYpR6tCJ2n VYgffoYEDDLUA2HcTW7VrJFRLcMLJDmGXAZ1VXQkZdfpzdVTHLZH8JAsL/fXeuy82RzF 7tfJ6tpGAzFhS4oRsWcVWdbPtGee7YFouBr1fYsBERL8Z3zTvSmDgJX4u8PahrlxIhx4 KVSebG20FhzMiH5Uk6Ctkktx+DW3aYAIkzEyKZPooWEJ1mc9XH+wyxBdMqPnwTVnidyo KoRg== X-Gm-Message-State: AJIora+S5CFJ8ISnIYKFTIcsWEUJBHr8e0uqbxK9V9BRMX+ENopfpl3x kYeHXWAnMw8ttvEpomnAI+dZmhhYc8dds0Uldis= X-Google-Smtp-Source: AGRyM1sNyWaoAPrA+TBdvA+D+3mFdyypKxcWDXwj6XOGtqjBXCvYFacrAKouIfp2XqCGAcPVlIexRH0TDn+52TiZWA8= X-Received: by 2002:a17:90b:2384:b0:1ef:8506:374e with SMTP id mr4-20020a17090b238400b001ef8506374emr275623pjb.99.1657573065332; Mon, 11 Jul 2022 13:57:45 -0700 (PDT) MIME-Version: 1.0 References: <20220706235936.2197195-1-zokeefe@google.com> <20220706235936.2197195-8-zokeefe@google.com> In-Reply-To: <20220706235936.2197195-8-zokeefe@google.com> From: Yang Shi Date: Mon, 11 Jul 2022 13:57:32 -0700 Message-ID: Subject: Re: [mm-unstable v7 07/18] mm/thp: add flag to enforce sysfs THP in hugepage_vma_check() To: "Zach O'Keefe" Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , Linux MM , Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=fYyI2doh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657573066; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n9f1ZviBRCWkD5pueLb41n4FFxvJ1dmF7gDGgwixLXY=; b=TbHQt4z8e4EjiWQ5RKnuEcziEH65G7kAfJJ76euq8mfaD3fGGrPa5U2BX/T/bjclHDweei 0Eace5DEdwTJWKnfce8REL33F+PxGpgNnF5wEQK90fiqEW3/detu2aiGHOg25eZtq2JQkO sLcocyfCg/fdOusQAcvl3cF5G01Wbgc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657573066; a=rsa-sha256; cv=none; b=a3F8Go3mXE6H0HCiu0I1cXgJS+VXsmnWi3/NnFukp4770yUPvb9doamQiYxwCHJg3F35q9 Lsh06cUmrpP0wlKN8eIS+vs3UiS4moGlPdE0qH9kh6hOGH+i5KqZvhDR/cn7u/OKI4EUy6 6dYYTP+NlFrIW9CL7IWVU+s4l/lBm4g= Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=fYyI2doh; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Stat-Signature: q9r9eeqgie54wxihb3nddrirqwst8ouc X-Rspamd-Queue-Id: 5D2541C006C X-HE-Tag: 1657573066-452724 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 6, 2022 at 5:06 PM Zach O'Keefe wrote: > > MADV_COLLAPSE is not coupled to the kernel-oriented sysfs THP settings[1]. > > hugepage_vma_check() is the authority on determining if a VMA is eligible > for THP allocation/collapse, and currently enforces the sysfs THP settings. > Add a flag to disable these checks. For now, only apply this arg to anon > and file, which use /sys/kernel/transparent_hugepage/enabled. We can > expand this to shmem, which uses > /sys/kernel/transparent_hugepage/shmem_enabled, later. > > Use this flag in collapse_pte_mapped_thp() where previously the VMA flags > passed to hugepage_vma_check() were OR'd with VM_HUGEPAGE to elide the > VM_HUGEPAGE check in "madvise" THP mode. Prior to "mm: khugepaged: check > THP flag in hugepage_vma_check()", this check also didn't check "never" THP > mode. As such, this restores the previous behavior of > collapse_pte_mapped_thp() where sysfs THP settings are ignored. See > comment in code for justification why this is OK. > > [1] https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/ > > Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi > --- > fs/proc/task_mmu.c | 2 +- > include/linux/huge_mm.h | 9 ++++----- > mm/huge_memory.c | 14 ++++++-------- > mm/khugepaged.c | 25 ++++++++++++++----------- > mm/memory.c | 4 ++-- > 5 files changed, 27 insertions(+), 27 deletions(-) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 34d292cec79a..f8cd58846a28 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -866,7 +866,7 @@ static int show_smap(struct seq_file *m, void *v) > __show_smap(m, &mss, false); > > seq_printf(m, "THPeligible: %d\n", > - hugepage_vma_check(vma, vma->vm_flags, true, false)); > + hugepage_vma_check(vma, vma->vm_flags, true, false, true)); > > if (arch_pkeys_enabled()) > seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 37f2f11a6d7e..00312fc251c1 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -168,9 +168,8 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); > } > > -bool hugepage_vma_check(struct vm_area_struct *vma, > - unsigned long vm_flags, > - bool smaps, bool in_pf); > +bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, > + bool smaps, bool in_pf, bool enforce_sysfs); > > #define transparent_hugepage_use_zero_page() \ > (transparent_hugepage_flags & \ > @@ -321,8 +320,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > } > > static inline bool hugepage_vma_check(struct vm_area_struct *vma, > - unsigned long vm_flags, > - bool smaps, bool in_pf) > + unsigned long vm_flags, bool smaps, > + bool in_pf, bool enforce_sysfs) > { > return false; > } > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index da300ce9dedb..4fbe43dc1568 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -69,9 +69,8 @@ static atomic_t huge_zero_refcount; > struct page *huge_zero_page __read_mostly; > unsigned long huge_zero_pfn __read_mostly = ~0UL; > > -bool hugepage_vma_check(struct vm_area_struct *vma, > - unsigned long vm_flags, > - bool smaps, bool in_pf) > +bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, > + bool smaps, bool in_pf, bool enforce_sysfs) > { > if (!vma->vm_mm) /* vdso */ > return false; > @@ -120,11 +119,10 @@ bool hugepage_vma_check(struct vm_area_struct *vma, > if (!in_pf && shmem_file(vma->vm_file)) > return shmem_huge_enabled(vma); > > - if (!hugepage_flags_enabled()) > - return false; > - > - /* THP settings require madvise. */ > - if (!(vm_flags & VM_HUGEPAGE) && !hugepage_flags_always()) > + /* Enforce sysfs THP requirements as necessary */ > + if (enforce_sysfs && > + (!hugepage_flags_enabled() || (!(vm_flags & VM_HUGEPAGE) && > + !hugepage_flags_always()))) > return false; > > /* Only regular file is valid */ > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index d89056d8cbad..b0e20db3f805 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -478,7 +478,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, > { > if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && > hugepage_flags_enabled()) { > - if (hugepage_vma_check(vma, vm_flags, false, false)) > + if (hugepage_vma_check(vma, vm_flags, false, false, true)) > __khugepaged_enter(vma->vm_mm); > } > } > @@ -844,7 +844,8 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) > */ > > static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > - struct vm_area_struct **vmap) > + struct vm_area_struct **vmap, > + struct collapse_control *cc) > { > struct vm_area_struct *vma; > > @@ -855,7 +856,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > if (!vma) > return SCAN_VMA_NULL; > > - if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) > + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, > + cc->is_khugepaged)) > return SCAN_VMA_CHECK; > /* > * Anon VMA expected, the address may be unmapped then > @@ -974,7 +976,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > goto out_nolock; > > mmap_read_lock(mm); > - result = hugepage_vma_revalidate(mm, address, &vma); > + result = hugepage_vma_revalidate(mm, address, &vma, cc); > if (result != SCAN_SUCCEED) { > mmap_read_unlock(mm); > goto out_nolock; > @@ -1006,7 +1008,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > * handled by the anon_vma lock + PG_lock. > */ > mmap_write_lock(mm); > - result = hugepage_vma_revalidate(mm, address, &vma); > + result = hugepage_vma_revalidate(mm, address, &vma, cc); > if (result != SCAN_SUCCEED) > goto out_up_write; > /* check if the pmd is still valid */ > @@ -1350,12 +1352,13 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) > return; > > /* > - * This vm_flags may not have VM_HUGEPAGE if the page was not > - * collapsed by this mm. But we can still collapse if the page is > - * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check() > - * will not fail the vma for missing VM_HUGEPAGE > + * If we are here, we've succeeded in replacing all the native pages > + * in the page cache with a single hugepage. If a mm were to fault-in > + * this memory (mapped by a suitably aligned VMA), we'd get the hugepage > + * and map it by a PMD, regardless of sysfs THP settings. As such, let's > + * analogously elide sysfs THP settings here. > */ > - if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false)) > + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false)) > return; > > /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ > @@ -2042,7 +2045,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, > progress++; > break; > } > - if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) { > + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, true)) { > skip: > progress++; > continue; > diff --git a/mm/memory.c b/mm/memory.c > index 8917bea2f0bc..96cd776e84f1 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5001,7 +5001,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > return VM_FAULT_OOM; > retry_pud: > if (pud_none(*vmf.pud) && > - hugepage_vma_check(vma, vm_flags, false, true)) { > + hugepage_vma_check(vma, vm_flags, false, true, true)) { > ret = create_huge_pud(&vmf); > if (!(ret & VM_FAULT_FALLBACK)) > return ret; > @@ -5035,7 +5035,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > goto retry_pud; > > if (pmd_none(*vmf.pmd) && > - hugepage_vma_check(vma, vm_flags, false, true)) { > + hugepage_vma_check(vma, vm_flags, false, true, true)) { > ret = create_huge_pmd(&vmf); > if (!(ret & VM_FAULT_FALLBACK)) > return ret; > -- > 2.37.0.rc0.161.g10f37bed90-goog > >