From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05653C433F5 for ; Wed, 27 Apr 2022 15:49:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7B06A6B0071; Wed, 27 Apr 2022 11:49:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 75EE36B0073; Wed, 27 Apr 2022 11:49:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5FEAB6B0074; Wed, 27 Apr 2022 11:49:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 500596B0071 for ; Wed, 27 Apr 2022 11:49:19 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0EE7D28AD7 for ; Wed, 27 Apr 2022 15:49:19 +0000 (UTC) X-FDA: 79403093238.10.293596D Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf28.hostedemail.com (Postfix) with ESMTP id C4B6CC0048 for ; Wed, 27 Apr 2022 15:49:10 +0000 (UTC) Received: by mail-lf1-f48.google.com with SMTP id w1so3889398lfa.4 for ; Wed, 27 Apr 2022 08:49:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mSdPBOEpeY2Ih1nlVK+qY4uJ7kIIzmMXRYLg21EJL0Y=; b=O3fbshtyXUoB2tHleO1yt/SReaF4MSlUGnpLJ5/UlWD6KwvWWXZYunc3GPYiP/Ddrt yv8vQviRGu34n9AwV2Q9h6IfY5Ty5oqYilXc6SMnaJ5NJE106ZCemm4Qu95be9Vz71TT 6Y6AFmJgOEqnS0wqoTqGiXRZXeY9SeBYS40XBOLuLhD8qwYh/rElDmLipKKl+wQuyHF7 TmwR8/G9l/wqMkoQTzeVQ/PWASbeqdHHQXm6gr1brVNGsNciMi/8RJ6lqxeZPhoqFb3B L/+DXJEH6bL1KmvCWI1NKdn/8R9v7L6Pji3hWPwZfLf3+DmG7OQV580BVNjuDhf07CiU WW9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mSdPBOEpeY2Ih1nlVK+qY4uJ7kIIzmMXRYLg21EJL0Y=; b=4/76v9+fwXtzRWRILVEUEzgPNBU30Uymak+B61p+PDEbfIBxG9v1VdtlaWz5Rxmqfu Fa0dFiHS5iZ0+lXI3L6O8ZUU5e3i3guPUeluSeYiVm0F+YYBdB2A6hgUe0CWkleXPaxa DXUc6dCV2fd9FvRRmLyLyycEHLxH5FuOkg/7s0mVmUeXBuG7av8mUIdPGDfVsZI91zEV 7c+tfQck1u3MqgWAGQeQiAmzbIe05WzO0zCtbMmcXVDfpkeq9U4xSFWufJopOuhOzOAJ DORs9H6wnxsdMuktWUycUW7p8y8XXJhXQzBT42Tm+ueV606Kndl9qFt125z4AI+GN0n8 NVBg== X-Gm-Message-State: AOAM533MbKP6zclDp5fpNYLri+iwaTaU6TZW8/MTVEgkYONvRBhqeA5w 63k//mAavTrA9zQf7UxYV6BQkVrIz6b6xXu7LFwIVA== X-Google-Smtp-Source: ABdhPJxeC5bkRU/39AWvWZdPtPr6RH0AzOyOPWGlZ55JoFQot5QnLNmKEx9wQC79siFS5NeU5bfgYmqJ5b5Di27zw8M= X-Received: by 2002:ac2:4150:0:b0:46b:c3b6:e4ab with SMTP id c16-20020ac24150000000b0046bc3b6e4abmr21292399lfi.354.1651074556612; Wed, 27 Apr 2022 08:49:16 -0700 (PDT) MIME-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> <20220426144412.742113-2-zokeefe@google.com> In-Reply-To: From: "Zach O'Keefe" Date: Wed, 27 Apr 2022 08:48:39 -0700 Message-ID: Subject: Re: [PATCH v3 01/12] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP To: Peter Xu Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org, Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , kernel test robot Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: C4B6CC0048 X-Stat-Signature: qgiept7kjhrn9y1uuofqbg869zn83xbm X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=O3fbshty; spf=pass (imf28.hostedemail.com: domain of zokeefe@google.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1651074550-590095 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Thanks for taking the time to review, Peter! On Tue, Apr 26, 2022 at 5:26 PM Peter Xu wrote: > > Hi, Zach, > > On Tue, Apr 26, 2022 at 07:44:01AM -0700, Zach O'Keefe wrote: > > When scanning an anon pmd to see if it's eligible for collapse, return > > SCAN_PMD_MAPPED if the pmd already maps a THP. Note that > > SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the > > file-collapse path, since the latter might identify pte-mapped compound > > pages. This is required by MADV_COLLAPSE which necessarily needs to > > know what hugepage-aligned/sized regions are already pmd-mapped. > > > > Signed-off-by: Zach O'Keefe > > Reported-by: kernel test robot > > IIUC we don't need to attach this reported-by if this is not a bugfix. I > think you can simply fix all issues reported by the test bot and only > attach the line if the patch is fixing the problem that the bot was > reporting explicitly. > Ya, I wasn't entirely sure what to do here, but including seems to not be without precedent, e.g. commit 92bbef67d459 ("mm: make alloc_contig_range work at pageblock granularity"), and likewise just wanted to give credit where I thought it was due. Though, I suppose folks who catch bugs in the review process aren't ack'd similarly, so perhaps it does make sense to remove this. > > --- > > include/trace/events/huge_memory.h | 3 ++- > > mm/internal.h | 1 + > > mm/khugepaged.c | 30 ++++++++++++++++++++++++++---- > > mm/rmap.c | 15 +++++++++++++-- > > 4 files changed, 42 insertions(+), 7 deletions(-) > > > > diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h > > index d651f3437367..9faa678e0a5b 100644 > > --- a/include/trace/events/huge_memory.h > > +++ b/include/trace/events/huge_memory.h > > @@ -33,7 +33,8 @@ > > EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \ > > EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ > > EM( SCAN_TRUNCATED, "truncated") \ > > - EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ > > + EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ > > + EMe(SCAN_PMD_MAPPED, "page_pmd_mapped") \ > > Nit: IMHO it can be put even in the middle so we don't need to touch the > EMe() every time. :) > > Apart from that, it does sound proper to me to put SCAN_PMD_MAPPED to be > right after SCAN_PMD_NULL anyway. > Makes sense to me. Done. > > > > #undef EM > > #undef EMe > > diff --git a/mm/internal.h b/mm/internal.h > > index 0667abd57634..51ae9f71a2a3 100644 > > --- a/mm/internal.h > > +++ b/mm/internal.h > > @@ -172,6 +172,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason > > /* > > * in mm/rmap.c: > > */ > > +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address); > > extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); > > > > /* > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index ba8dbd1825da..2933b13fc975 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -51,6 +51,7 @@ enum scan_result { > > SCAN_CGROUP_CHARGE_FAIL, > > SCAN_TRUNCATED, > > SCAN_PAGE_HAS_PRIVATE, > > + SCAN_PMD_MAPPED, > > }; > > > > #define CREATE_TRACE_POINTS > > @@ -987,6 +988,29 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > return 0; > > } > > > > +static int find_pmd_or_thp_or_none(struct mm_struct *mm, > > + unsigned long address, > > + pmd_t **pmd) > > +{ > > + pmd_t pmde; > > + > > + *pmd = mm_find_pmd_raw(mm, address); > > + if (!*pmd) > > + return SCAN_PMD_NULL; > > + > > + pmde = pmd_read_atomic(*pmd); > > + > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > > + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ > > + barrier(); > > +#endif > > + if (!pmd_present(pmde) || pmd_none(pmde)) > > Could we drop the pmd_none() check? I assume !pmd_present() should have > covered that case already? > I opted for safety here since I didn't know if pmd_present() always implied !pmd_none() on all archs, but given mm_find_pmd() elides the check, perhaps it's safe to do so here. Thanks for the suggestion. > > + return SCAN_PMD_NULL; > > + if (pmd_trans_huge(pmde)) > > + return SCAN_PMD_MAPPED; > > + return SCAN_SUCCEED; > > +} > > + > > /* > > * Bring missing pages in from swap, to complete THP collapse. > > * Only done if khugepaged_scan_pmd believes it is worthwhile. > > @@ -1238,11 +1262,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, > > > > VM_BUG_ON(address & ~HPAGE_PMD_MASK); > > > > - pmd = mm_find_pmd(mm, address); > > - if (!pmd) { > > - result = SCAN_PMD_NULL; > > + result = find_pmd_or_thp_or_none(mm, address, &pmd); > > + if (result != SCAN_SUCCEED) > > goto out; > > - } > > > > memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); > > pte = pte_offset_map_lock(mm, pmd, address, &ptl); > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 61e63db5dc6f..49817f35e65c 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -759,13 +759,12 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) > > return vma_address(page, vma); > > } > > > > -pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) > > +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address) > > { > > pgd_t *pgd; > > p4d_t *p4d; > > pud_t *pud; > > pmd_t *pmd = NULL; > > - pmd_t pmde; > > > > pgd = pgd_offset(mm, address); > > if (!pgd_present(*pgd)) > > @@ -780,6 +779,18 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) > > goto out; > > > > pmd = pmd_offset(pud, address); > > +out: > > + return pmd; > > +} > > + > > +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) > > +{ > > + pmd_t pmde; > > + pmd_t *pmd; > > + > > + pmd = mm_find_pmd_raw(mm, address); > > + if (!pmd) > > + goto out; > > /* > > * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() > > * without holding anon_vma lock for write. So when looking for a > > -- > > 2.36.0.rc2.479.g8af0fa9b8e-goog > > > > -- > Peter Xu >