From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C405DC43334 for ; Tue, 12 Jul 2022 16:58:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FF799400B1; Tue, 12 Jul 2022 12:58:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4AFD7940063; Tue, 12 Jul 2022 12:58:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 378289400B1; Tue, 12 Jul 2022 12:58:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 28D7A940063 for ; Tue, 12 Jul 2022 12:58:10 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 0556C80E0B for ; Tue, 12 Jul 2022 16:58:10 +0000 (UTC) X-FDA: 79679055540.07.BE1FBEF Received: from mail-pj1-f44.google.com (mail-pj1-f44.google.com [209.85.216.44]) by imf01.hostedemail.com (Postfix) with ESMTP id 9776D4003C for ; Tue, 12 Jul 2022 16:58:09 +0000 (UTC) Received: by mail-pj1-f44.google.com with SMTP id s21so8286894pjq.4 for ; Tue, 12 Jul 2022 09:58:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=yaO3hJ1sn0EInyzvAAXCd7L169HHO1dwIYn0BvT21r8=; b=oG2IcrWOTznlJJ/RtxBjj7nAoTJKfPsLWFdVdXfAXk+nMqPVwpN9fPzWCOby4kBKVq 1zeHWMqE/HyBTZ9+DNVjyDDY4Sw0+1TZtSY2LxsQDN6GqLrqfHC3hdObIkrUUlLQKi47 M+d/pF+BkDHR8/0c5BBgRUgXnEgfdfIiSm0R1ZgefQ1JiC2Zfvo/+s+TeNObzLnCNCNP 61n/6hmYwxrwwVgxFiywuyjOtdgUa9+EomOaW1lo83cB4a2bV2KDU7JRzZEdU+o+++mH tQB8FibjCFpW4lOwKvpTOxl8hNFKkAlc3k51b+RBoDUmD3P2YmoYwKrovUiDDO19R642 y9Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=yaO3hJ1sn0EInyzvAAXCd7L169HHO1dwIYn0BvT21r8=; b=cj6iLqe3hf1217u1YYBL3sgVX0AYzsKL+Udcv7Xs5bU4y3rXV+VrqDUFARg+WilM13 cu0HbJsVfArrg6uS74aTwf1blQVvnPnDvnhwJkqBj77Qcf1U6Fuk+9VK/+3QMcKw0K8s 5VoSwYpEb+5qz3TigbopUKEKJb3cwrT5ilJ53BsQsQWj+nCC4IFZwCKIyxNmtvCBGHl6 Jna6ZaZaFGZqIQ6A7mGrw4pWYsEAP/NGwNfIWRu8LC7kXsB6XZilTskuWdlBa2ohZOHg OtQ0Ym/Lm0msCENOLHMBU9/uYcwlDEuKWlwoulVV7Ba7p5j2nBJ6jWI7Xid2Dwtj+M+S LdpQ== X-Gm-Message-State: AJIora+jiZ/FLB7UpiBt9G+7M/u+1p1rjDcinXYr+hVD1oS+MdcZ9lb/ CsgNcbs1l+w6NoQvy2T41rSkWQ== X-Google-Smtp-Source: AGRyM1tRdRuDgeZq0eFDVMJFycJ4z/DgPJLFZtyfETWm8nhysigyvO+7AYiXBJvszBmcUIe9IqIBWg== X-Received: by 2002:a17:902:cccf:b0:168:e13c:5cd9 with SMTP id z15-20020a170902cccf00b00168e13c5cd9mr25276167ple.53.1657645088341; Tue, 12 Jul 2022 09:58:08 -0700 (PDT) Received: from google.com (55.212.185.35.bc.googleusercontent.com. [35.185.212.55]) by smtp.gmail.com with ESMTPSA id a15-20020a170902710f00b0016bedcced2fsm7130913pll.35.2022.07.12.09.58.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Jul 2022 09:58:07 -0700 (PDT) Date: Tue, 12 Jul 2022 09:58:04 -0700 From: Zach O'Keefe To: Yang Shi Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , Linux MM , Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer Subject: Re: [mm-unstable v7 07/18] mm/thp: add flag to enforce sysfs THP in hugepage_vma_check() Message-ID: References: <20220706235936.2197195-1-zokeefe@google.com> <20220706235936.2197195-8-zokeefe@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1657645089; a=rsa-sha256; cv=none; b=T/9lh+lJw3JEpeJMq3cxwoODmS04XTJyNsiLVUghX+cmQQfume71LnT2Rzr48nULD4T+5m 9rH1mgVeyN1O5LyVamRX6uAYdicOzBElsTpDk2m0C4hXWJxRQLh6GqD3IAyVrcy0okZ5l0 +nR4DKAbbbnRE2IIEPSPp+f0+jqJUww= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=oG2IcrWO; spf=pass (imf01.hostedemail.com: domain of zokeefe@google.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1657645089; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yaO3hJ1sn0EInyzvAAXCd7L169HHO1dwIYn0BvT21r8=; b=q+es7Ofz2wQhlYoJfqlmNXHZXlAUe0IGdrcpkhZ8FnInT9lZuLoKw/N/TEHDVsgV49DcgP bdDg/aW7YYK3nDto8Atm1sWoSlF8929LKmCZduuvjsmLu3IA2h13biHhwshFxq1nFsdcaH x8UQVPflli+hcofbOtTYggh3JtKa6sU= X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9776D4003C X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=oG2IcrWO; spf=pass (imf01.hostedemail.com: domain of zokeefe@google.com designates 209.85.216.44 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: kh8jwx9s53tryn5m4rhxwmps4bgkusq8 X-HE-Tag: 1657645089-742804 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Jul 11 13:57, Yang Shi wrote: > On Wed, Jul 6, 2022 at 5:06 PM Zach O'Keefe wrote: > > > > MADV_COLLAPSE is not coupled to the kernel-oriented sysfs THP settings[1]. > > > > hugepage_vma_check() is the authority on determining if a VMA is eligible > > for THP allocation/collapse, and currently enforces the sysfs THP settings. > > Add a flag to disable these checks. For now, only apply this arg to anon > > and file, which use /sys/kernel/transparent_hugepage/enabled. We can > > expand this to shmem, which uses > > /sys/kernel/transparent_hugepage/shmem_enabled, later. > > > > Use this flag in collapse_pte_mapped_thp() where previously the VMA flags > > passed to hugepage_vma_check() were OR'd with VM_HUGEPAGE to elide the > > VM_HUGEPAGE check in "madvise" THP mode. Prior to "mm: khugepaged: check > > THP flag in hugepage_vma_check()", this check also didn't check "never" THP > > mode. As such, this restores the previous behavior of > > collapse_pte_mapped_thp() where sysfs THP settings are ignored. See > > comment in code for justification why this is OK. > > > > [1] https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/ > > > > Signed-off-by: Zach O'Keefe > > Reviewed-by: Yang Shi Thanks for the review! Best, Zach > > --- > > fs/proc/task_mmu.c | 2 +- > > include/linux/huge_mm.h | 9 ++++----- > > mm/huge_memory.c | 14 ++++++-------- > > mm/khugepaged.c | 25 ++++++++++++++----------- > > mm/memory.c | 4 ++-- > > 5 files changed, 27 insertions(+), 27 deletions(-) > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > index 34d292cec79a..f8cd58846a28 100644 > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > > @@ -866,7 +866,7 @@ static int show_smap(struct seq_file *m, void *v) > > __show_smap(m, &mss, false); > > > > seq_printf(m, "THPeligible: %d\n", > > - hugepage_vma_check(vma, vma->vm_flags, true, false)); > > + hugepage_vma_check(vma, vma->vm_flags, true, false, true)); > > > > if (arch_pkeys_enabled()) > > seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index 37f2f11a6d7e..00312fc251c1 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -168,9 +168,8 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) > > !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); > > } > > > > -bool hugepage_vma_check(struct vm_area_struct *vma, > > - unsigned long vm_flags, > > - bool smaps, bool in_pf); > > +bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, > > + bool smaps, bool in_pf, bool enforce_sysfs); > > > > #define transparent_hugepage_use_zero_page() \ > > (transparent_hugepage_flags & \ > > @@ -321,8 +320,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > > } > > > > static inline bool hugepage_vma_check(struct vm_area_struct *vma, > > - unsigned long vm_flags, > > - bool smaps, bool in_pf) > > + unsigned long vm_flags, bool smaps, > > + bool in_pf, bool enforce_sysfs) > > { > > return false; > > } > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index da300ce9dedb..4fbe43dc1568 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -69,9 +69,8 @@ static atomic_t huge_zero_refcount; > > struct page *huge_zero_page __read_mostly; > > unsigned long huge_zero_pfn __read_mostly = ~0UL; > > > > -bool hugepage_vma_check(struct vm_area_struct *vma, > > - unsigned long vm_flags, > > - bool smaps, bool in_pf) > > +bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, > > + bool smaps, bool in_pf, bool enforce_sysfs) > > { > > if (!vma->vm_mm) /* vdso */ > > return false; > > @@ -120,11 +119,10 @@ bool hugepage_vma_check(struct vm_area_struct *vma, > > if (!in_pf && shmem_file(vma->vm_file)) > > return shmem_huge_enabled(vma); > > > > - if (!hugepage_flags_enabled()) > > - return false; > > - > > - /* THP settings require madvise. */ > > - if (!(vm_flags & VM_HUGEPAGE) && !hugepage_flags_always()) > > + /* Enforce sysfs THP requirements as necessary */ > > + if (enforce_sysfs && > > + (!hugepage_flags_enabled() || (!(vm_flags & VM_HUGEPAGE) && > > + !hugepage_flags_always()))) > > return false; > > > > /* Only regular file is valid */ > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index d89056d8cbad..b0e20db3f805 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -478,7 +478,7 @@ void khugepaged_enter_vma(struct vm_area_struct *vma, > > { > > if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && > > hugepage_flags_enabled()) { > > - if (hugepage_vma_check(vma, vm_flags, false, false)) > > + if (hugepage_vma_check(vma, vm_flags, false, false, true)) > > __khugepaged_enter(vma->vm_mm); > > } > > } > > @@ -844,7 +844,8 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) > > */ > > > > static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > - struct vm_area_struct **vmap) > > + struct vm_area_struct **vmap, > > + struct collapse_control *cc) > > { > > struct vm_area_struct *vma; > > > > @@ -855,7 +856,8 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > if (!vma) > > return SCAN_VMA_NULL; > > > > - if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) > > + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, > > + cc->is_khugepaged)) > > return SCAN_VMA_CHECK; > > /* > > * Anon VMA expected, the address may be unmapped then > > @@ -974,7 +976,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > > goto out_nolock; > > > > mmap_read_lock(mm); > > - result = hugepage_vma_revalidate(mm, address, &vma); > > + result = hugepage_vma_revalidate(mm, address, &vma, cc); > > if (result != SCAN_SUCCEED) { > > mmap_read_unlock(mm); > > goto out_nolock; > > @@ -1006,7 +1008,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > > * handled by the anon_vma lock + PG_lock. > > */ > > mmap_write_lock(mm); > > - result = hugepage_vma_revalidate(mm, address, &vma); > > + result = hugepage_vma_revalidate(mm, address, &vma, cc); > > if (result != SCAN_SUCCEED) > > goto out_up_write; > > /* check if the pmd is still valid */ > > @@ -1350,12 +1352,13 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) > > return; > > > > /* > > - * This vm_flags may not have VM_HUGEPAGE if the page was not > > - * collapsed by this mm. But we can still collapse if the page is > > - * the valid THP. Add extra VM_HUGEPAGE so hugepage_vma_check() > > - * will not fail the vma for missing VM_HUGEPAGE > > + * If we are here, we've succeeded in replacing all the native pages > > + * in the page cache with a single hugepage. If a mm were to fault-in > > + * this memory (mapped by a suitably aligned VMA), we'd get the hugepage > > + * and map it by a PMD, regardless of sysfs THP settings. As such, let's > > + * analogously elide sysfs THP settings here. > > */ > > - if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE, false, false)) > > + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, false)) > > return; > > > > /* Keep pmd pgtable for uffd-wp; see comment in retract_page_tables() */ > > @@ -2042,7 +2045,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, int *result, > > progress++; > > break; > > } > > - if (!hugepage_vma_check(vma, vma->vm_flags, false, false)) { > > + if (!hugepage_vma_check(vma, vma->vm_flags, false, false, true)) { > > skip: > > progress++; > > continue; > > diff --git a/mm/memory.c b/mm/memory.c > > index 8917bea2f0bc..96cd776e84f1 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -5001,7 +5001,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > > return VM_FAULT_OOM; > > retry_pud: > > if (pud_none(*vmf.pud) && > > - hugepage_vma_check(vma, vm_flags, false, true)) { > > + hugepage_vma_check(vma, vm_flags, false, true, true)) { > > ret = create_huge_pud(&vmf); > > if (!(ret & VM_FAULT_FALLBACK)) > > return ret; > > @@ -5035,7 +5035,7 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > > goto retry_pud; > > > > if (pmd_none(*vmf.pmd) && > > - hugepage_vma_check(vma, vm_flags, false, true)) { > > + hugepage_vma_check(vma, vm_flags, false, true, true)) { > > ret = create_huge_pmd(&vmf); > > if (!(ret & VM_FAULT_FALLBACK)) > > return ret; > > -- > > 2.37.0.rc0.161.g10f37bed90-goog > > > >