From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC73AC433EF for ; Mon, 6 Jun 2022 20:46:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 673316B007B; Mon, 6 Jun 2022 16:46:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 621296B007E; Mon, 6 Jun 2022 16:46:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C1BA6B0080; Mon, 6 Jun 2022 16:46:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 37C3F6B007B for ; Mon, 6 Jun 2022 16:46:02 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E63C620C68 for ; Mon, 6 Jun 2022 20:46:01 +0000 (UTC) X-FDA: 79548992922.21.C33DC65 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf22.hostedemail.com (Postfix) with ESMTP id C9C5CC0053 for ; Mon, 6 Jun 2022 20:45:57 +0000 (UTC) Received: by mail-pf1-f178.google.com with SMTP id u2so13664015pfc.2 for ; Mon, 06 Jun 2022 13:46:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=+FjQRtBYFBve62XsDggzKGZWLb2t8Nx8PiWBPwOzzpY=; b=ACcojjFQ8Eacqcgm/yltoG67nE/uNwf4QqcEViRJTiZdYZ7Cgkr+MivGaEJM8GC8zQ oNUHR5DSIPLvpFTsZqdmt6IoibA+2z3l3VkNywZfEjOREB23cI0ovkox0fA9+bLpXCgj z8F7VWxeSSSvOmYJgb0gOTpywWdjpJfmGmWi0ADuw8Cz8kF5QXRlcAjKzSMC7exsNcKV Fv6g/DWnL9JagZVmzJah3fTfJlsrEnutSAEBm0DUKNaEmHAKBVpfBGwa4RU2JQjEw/uJ 8sRnxmu22SPU9oPg9eVsq7ZwYQFIDJsvV7PvREr2PpqTxvd1x7CNi9a8sXbXb3hjhPsa pt+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=+FjQRtBYFBve62XsDggzKGZWLb2t8Nx8PiWBPwOzzpY=; b=j0iXhN5STyfUgHZVHi/OAv1NEnWBTrMkZp01M5gcHOhOoRltHR3jBGEGGEVslz5RTE jks0h3zCNCn/ry+2wa+4ivqIzbm70bmATUnfBBOyh6rXDh5/7BvzUbmV6A1JhJ8wkJrB MoAwgwKDoTAMcpyArXEDNux43266xSBVvGJYLdCQ5XRQ/H0fr52N006+d1pBvjD4n4Hg GHl/M3DOtfaMFD8KJD4TCW+PBcujOAKorGhESbHmJ9hCHVQEIYAQPAMwCk1fUSy7PZ/k L2UTYyGt6hXNZXGhl49dT0VEjsmE+2HbY9aXtQKMsK9tWB7iT0hpcEfhumHnWUNyXFXp gDSg== X-Gm-Message-State: AOAM530y+OfptkFRCQfxXK+lNDGF1z7acmfQCu3Izz3/BfUxHUSvm+Yh rGW67qi4j7OdXyZ62Fzl4/b2/YickMWcgJjGiV4= X-Google-Smtp-Source: ABdhPJwdhcbBW2HIZKCLstnqTBeo3qOhcpHizHSJ0qvw/GDPAW+CUbL4NmfH1UXodtb88gsUDi1r+gCnm0Esq2Nzdkk= X-Received: by 2002:a05:6a00:1745:b0:51b:de90:aefb with SMTP id j5-20020a056a00174500b0051bde90aefbmr20177734pfc.11.1654548360367; Mon, 06 Jun 2022 13:46:00 -0700 (PDT) MIME-Version: 1.0 References: <20220604004004.954674-1-zokeefe@google.com> <20220604004004.954674-3-zokeefe@google.com> In-Reply-To: <20220604004004.954674-3-zokeefe@google.com> From: Yang Shi Date: Mon, 6 Jun 2022 13:45:48 -0700 Message-ID: Subject: Re: [PATCH v6 02/15] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP To: "Zach O'Keefe" Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , Linux MM , Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C9C5CC0053 X-Stat-Signature: jt95b9trqow5j9n79tx7b6hunqd19qrn X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ACcojjFQ; spf=pass (imf22.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1654548357-248176 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jun 3, 2022 at 5:40 PM Zach O'Keefe wrote: > > When scanning an anon pmd to see if it's eligible for collapse, return > SCAN_PMD_MAPPED if the pmd already maps a THP. Note that > SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the > file-collapse path, since the latter might identify pte-mapped compound > pages. This is required by MADV_COLLAPSE which necessarily needs to > know what hugepage-aligned/sized regions are already pmd-mapped. > > Signed-off-by: Zach O'Keefe > --- > include/trace/events/huge_memory.h | 1 + > mm/internal.h | 1 + > mm/khugepaged.c | 32 ++++++++++++++++++++++++++---- > mm/rmap.c | 15 ++++++++++++-- > 4 files changed, 43 insertions(+), 6 deletions(-) > > diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h > index d651f3437367..55392bf30a03 100644 > --- a/include/trace/events/huge_memory.h > +++ b/include/trace/events/huge_memory.h > @@ -11,6 +11,7 @@ > EM( SCAN_FAIL, "failed") \ > EM( SCAN_SUCCEED, "succeeded") \ > EM( SCAN_PMD_NULL, "pmd_null") \ > + EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \ > EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ > EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ > EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ > diff --git a/mm/internal.h b/mm/internal.h > index 6e14749ad1e5..f768c7fae668 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -188,6 +188,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason > /* > * in mm/rmap.c: > */ > +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address); > extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); > > /* > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index cc3d6fb446d5..7a914ca19e96 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -28,6 +28,7 @@ enum scan_result { > SCAN_FAIL, > SCAN_SUCCEED, > SCAN_PMD_NULL, > + SCAN_PMD_MAPPED, > SCAN_EXCEED_NONE_PTE, > SCAN_EXCEED_SWAP_PTE, > SCAN_EXCEED_SHARED_PTE, > @@ -901,6 +902,31 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > return 0; > } > > +static int find_pmd_or_thp_or_none(struct mm_struct *mm, > + unsigned long address, > + pmd_t **pmd) > +{ > + pmd_t pmde; > + > + *pmd = mm_find_pmd_raw(mm, address); > + if (!*pmd) > + return SCAN_PMD_NULL; > + > + pmde = pmd_read_atomic(*pmd); > + > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ > + barrier(); > +#endif > + if (!pmd_present(pmde)) > + return SCAN_PMD_NULL; > + if (pmd_trans_huge(pmde)) > + return SCAN_PMD_MAPPED; > + if (pmd_bad(pmde)) > + return SCAN_FAIL; khugepaged doesn't handle pmd_bad before, IIRC it may just return SCAN_SUCCEED if everything else is good? It is fine to add it, but it may be better to return SCAN_PMD_NULL? > + return SCAN_SUCCEED; > +} > + > /* > * Bring missing pages in from swap, to complete THP collapse. > * Only done if khugepaged_scan_pmd believes it is worthwhile. > @@ -1146,11 +1172,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, > > VM_BUG_ON(address & ~HPAGE_PMD_MASK); > > - pmd = mm_find_pmd(mm, address); > - if (!pmd) { > - result = SCAN_PMD_NULL; > + result = find_pmd_or_thp_or_none(mm, address, &pmd); > + if (result != SCAN_SUCCEED) There are a couple of other callsites for mm_find_pmd(), you may need to change all of them to find_pmd_or_thp_or_none() for MADV_COLLAPSE since khugepaged may collapse the area before MADV_COLLAPSE reacquiring mmap_lock IIUC and MADV_COLLAPSE does care this case. It is fine w/o MADV_COLLAPSE since khupaged doesn't care if it is PMD mapped or not. So it may be better to move this patch right before MADV_COLLAPSE is introduced? > goto out; > - } > > memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); > pte = pte_offset_map_lock(mm, pmd, address, &ptl); > diff --git a/mm/rmap.c b/mm/rmap.c > index 04fac1af870b..c9979c6ad7a1 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -767,13 +767,12 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) > return vma_address(page, vma); > } > > -pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) > +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address) May be better to have some notes for mm_find_pmd_raw() and mm_find_pmd(). > { > pgd_t *pgd; > p4d_t *p4d; > pud_t *pud; > pmd_t *pmd = NULL; > - pmd_t pmde; > > pgd = pgd_offset(mm, address); > if (!pgd_present(*pgd)) > @@ -788,6 +787,18 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) > goto out; > > pmd = pmd_offset(pud, address); > +out: > + return pmd; > +} > + > +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) > +{ > + pmd_t pmde; > + pmd_t *pmd; > + > + pmd = mm_find_pmd_raw(mm, address); > + if (!pmd) > + goto out; > /* > * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() > * without holding anon_vma lock for write. So when looking for a > -- > 2.36.1.255.ge46751e96f-goog >