From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04F45C433EF for ; Thu, 10 Mar 2022 00:41:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 712F88D0003; Wed, 9 Mar 2022 19:41:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 69ABA8D0001; Wed, 9 Mar 2022 19:41:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 53C708D0003; Wed, 9 Mar 2022 19:41:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 452AE8D0001 for ; Wed, 9 Mar 2022 19:41:45 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 1F1F380CE4 for ; Thu, 10 Mar 2022 00:41:45 +0000 (UTC) X-FDA: 79226623770.13.FD96F65 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf24.hostedemail.com (Postfix) with ESMTP id 9794618001A for ; Thu, 10 Mar 2022 00:41:44 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id m11-20020a17090a7f8b00b001beef6143a8so3773128pjl.4 for ; Wed, 09 Mar 2022 16:41:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=e9oFets+v3Jn8XuSj8LMEdnfl8OEkT/tocMd75Aj/v4=; b=l0PxnLKR4VstZB/TkWO4+6zHsZFIR8BWTDuWmH0HDbmr1QvH8DvnbDbz36sjX2HB2S 7i9S6160WUxe8jEyIY1wDXf9sG6Sze55tTn9ZwEwj50Lo+PLu3FYvWcmrH8Bdstgdw+Y /k+G3NRtQdJNbaAhDgXekpImIJ3hfvmKfbo+TreVR+Va1D32gcG6svYbb1waM70DscUE /9i2QHcsQG14mRGHObBDcWhpFIz2RjibJOrin0TBMeih+XhgdLUFthVfHRHXWFDf5USw 66iIrrll29KpIJ3qauOUjiX15SuWqY+4GWwJPJVRs4JFfWKXn1x9W4MCR4TMs/sxExiV UeVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=e9oFets+v3Jn8XuSj8LMEdnfl8OEkT/tocMd75Aj/v4=; b=YWyqnvv2POZWOt7fuENOjtLRD6U7D8XKLyCnL+HyNyBwKBKGa48tKiOHElmj9nzttl e/4CzMaBcp3TWMH5u2PTlHVr0mq5OElrmmkOvXbhbywRAeeS4+lg0/UA+Um9z8IasXi/ UK8B7Q7tNraO/EFxLnbHTNUMBttJ51ihdZG34MKWG3U3mDwUHKbBHugu7wZborFDVVZf 8EEhP6OAUlknuj+gicQXigsojuk2v71+fxvKFFUpbVdOkhXJkBujgMEGaeFnAXwkYnr3 VmI5uRHlv/AH4SSYG7qg2k0mMpTMq3QA4cqmnbNuv5ym9B4CV8+kaLaD7OBrF2iAT8KN pvNw== X-Gm-Message-State: AOAM530fRpwojHoDPOX6l4pc9j3HpeTPz77SaQ6EF5LaQPO7liqb0xO2 xt8ZI7L9A8oLO8huM+zlZA0QkCPvuNVe2Myo0Ow= X-Google-Smtp-Source: ABdhPJzfjzvbq3/ZW3Elpt73xdETsJK5qHahY0Hmm51fludpbU2wBXL5lzt6zssEEGEl2hfrcgvp3VkcL1jyWo/JJKQ= X-Received: by 2002:a17:90a:3906:b0:1bf:a0a6:d208 with SMTP id y6-20020a17090a390600b001bfa0a6d208mr8685251pjb.21.1646872903759; Wed, 09 Mar 2022 16:41:43 -0800 (PST) MIME-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> <20220308213417.1407042-8-zokeefe@google.com> In-Reply-To: From: Yang Shi Date: Wed, 9 Mar 2022 16:41:32 -0800 Message-ID: Subject: Re: [RFC PATCH 07/14] mm/khugepaged: add vm_flags_ignore to hugepage_vma_revalidate_pmd_count() To: "Zach O'Keefe" Cc: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , Linux MM , Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9794618001A X-Stat-Signature: swbp4e311appmgpdwu9gtgccq3og6w3m Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=l0PxnLKR; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.53 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-HE-Tag: 1646872904-978561 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 9, 2022 at 4:01 PM Zach O'Keefe wrote: > > > On Tue, Mar 8, 2022 at 1:35 PM Zach O'Keefe wrote: > > > > > > In madvise collapse context, we optionally want to be able to ignore > > > advice from MADV_NOHUGEPAGE-marked regions. > > > > Could you please elaborate why this usecase is valid? Typically > > MADV_NOHUGEPAGE is set when the users really don't want to have THP > > for this area. So it doesn't make too much sense to ignore it IMHO. > > > > Hey Yang, thanks for taking time to review and comment. > > Semantically, the way I see it, is that MADV_NOHUGEPAGE is a way for > the user to say "I don't want hugepages here", so that the kernel > knows not to do so when faulting memory, and khugepaged can stay away. > However, in MADV_COLLAPSE, the user is explicitly requesting this be > backed by hugepages - so presumably that is exactly what they want. > > IOW, if the user didn't want this memory to be backed by hugepages, > they wouldn't be MADV_COLLAPSE'ing it. If there was a range of memory > the user wanted collapsed, but that had some sub-areas marked > MADV_NOHUGEPAGE, they could always issue multiple MADV_COLLAPSE > operations around the excluded regions. > > In terms of use cases, I don't have a concrete example, but a user > could hypothetically choose to exclude regions from management from > khugepaged, but still be able to collapse the memory themselves, > when/if they deem appropriate. I see. It seems you thought MADV_COLLAPSE actually unsets VM_NOHUGEPAGE, and is kind of equal to MADV_HUGEPAGE + doing collapse right away, right? To some degree, it makes some sense. If this is the behavior you'd like to achieve, I'd suggest making it more explicit, for example, setting VM_HUGEPAGE for the MADV_COLLAPSE area rather than ignore or change vm flags silently. When using madvise mode, but not having VM_HUGEPAGE set, the vma check should fail in the current code (I didn't look hard if you already covered this or not). > > > > > > > Add a vm_flags_ignore argument to hugepage_vma_revalidate_pmd_count() > > > which can be used to ignore vm flags used when considering thp > > > eligibility. > > > > > > Signed-off-by: Zach O'Keefe > > > --- > > > mm/khugepaged.c | 18 ++++++++++++------ > > > 1 file changed, 12 insertions(+), 6 deletions(-) > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > index 1d20be47bcea..ecbd3fc41c80 100644 > > > --- a/mm/khugepaged.c > > > +++ b/mm/khugepaged.c > > > @@ -964,10 +964,14 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) > > > #endif > > > > > > /* > > > - * Revalidate a vma's eligibility to collapse nr hugepages. > > > + * Revalidate a vma's eligibility to collapse nr hugepages. vm_flags_ignore > > > + * can be used to ignore certain vma_flags that would otherwise be checked - > > > + * the principal example being VM_NOHUGEPAGE which is ignored in madvise > > > + * collapse context. > > > */ > > > static int hugepage_vma_revalidate_pmd_count(struct mm_struct *mm, > > > unsigned long address, int nr, > > > + unsigned long vm_flags_ignore, > > > struct vm_area_struct **vmap) > > > { > > > struct vm_area_struct *vma; > > > @@ -986,7 +990,7 @@ static int hugepage_vma_revalidate_pmd_count(struct mm_struct *mm, > > > hend = vma->vm_end & HPAGE_PMD_MASK; > > > if (address < hstart || (address + nr * HPAGE_PMD_SIZE) > hend) > > > return SCAN_ADDRESS_RANGE; > > > - if (!hugepage_vma_check(vma, vma->vm_flags)) > > > + if (!hugepage_vma_check(vma, vma->vm_flags & ~vm_flags_ignore)) > > > return SCAN_VMA_CHECK; > > > /* Anon VMA expected */ > > > if (!vma->anon_vma || vma->vm_ops) > > > @@ -1000,9 +1004,11 @@ static int hugepage_vma_revalidate_pmd_count(struct mm_struct *mm, > > > */ > > > > > > static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > + unsigned long vm_flags_ignore, > > > struct vm_area_struct **vmap) > > > { > > > - return hugepage_vma_revalidate_pmd_count(mm, address, 1, vmap); > > > + return hugepage_vma_revalidate_pmd_count(mm, address, 1, > > > + vm_flags_ignore, vmap); > > > } > > > > > > /* > > > @@ -1043,7 +1049,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, > > > /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */ > > > if (ret & VM_FAULT_RETRY) { > > > mmap_read_lock(mm); > > > - if (hugepage_vma_revalidate(mm, haddr, &vma)) { > > > + if (hugepage_vma_revalidate(mm, haddr, VM_NONE, &vma)) { > > > /* vma is no longer available, don't continue to swapin */ > > > trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); > > > return false; > > > @@ -1200,7 +1206,7 @@ static void collapse_huge_page(struct mm_struct *mm, > > > count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); > > > > > > mmap_read_lock(mm); > > > - result = hugepage_vma_revalidate(mm, address, &vma); > > > + result = hugepage_vma_revalidate(mm, address, VM_NONE, &vma); > > > if (result) { > > > mmap_read_unlock(mm); > > > goto out_nolock; > > > @@ -1232,7 +1238,7 @@ static void collapse_huge_page(struct mm_struct *mm, > > > */ > > > mmap_write_lock(mm); > > > > > > - result = hugepage_vma_revalidate(mm, address, &vma); > > > + result = hugepage_vma_revalidate(mm, address, VM_NONE, &vma); > > > if (result) > > > goto out_up_write; > > > /* check if the pmd is still valid */ > > > -- > > > 2.35.1.616.g0bdcbb4464-goog > > >