linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Zach O'Keefe" <zokeefe@google.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Saurabh Singh Sengar <ssengar@microsoft.com>,
	Matthew Wilcox <willy@infradead.org>,
	 Dan Williams <dan.j.williams@intel.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [EXTERNAL] [PATCH] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"
Date: Mon, 21 Aug 2023 08:08:42 -0700	[thread overview]
Message-ID: <CAAa6QmRLkh-g4ge4D9nQge=wHFwTz8CKB7AsjcJ9akDV8d0Z_A@mail.gmail.com> (raw)
In-Reply-To: <CAHbLzko1J9ds_JfZe83JwEx=395sPExB7mQ0faju6OSaQ2tmnQ@mail.gmail.com>

On Fri, Aug 18, 2023 at 2:21 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Thu, Aug 17, 2023 at 11:29 AM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > On Thu, Aug 17, 2023 at 10:47 AM Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > On Wed, Aug 16, 2023 at 2:48 PM Zach O'Keefe <zokeefe@google.com> wrote:
> > > >
> > > > > We have a out of tree driver that maps huge pages through a file handle and
> > > > > relies on -> huge_fault. It used to work in 5.19 kernels but 6.1 changed this
> > > > > behaviour.
> > > > >
> > > > > I don’t think reverting the earlier behaviour of fault_path for huge pages should
> > > > > impact kernel negatively.
> > > > >
> > > > > Do you think we can restore this earlier behaviour of kernel to allow page fault
> > > > > for huge pages via ->huge_fault.
> > > >
> > > > That seems reasonable to me. I think using the existence of a
> > > > ->huge_fault() handler as a predicate to return "true" makes sense to
> > > > me. The "normal" flow for file-backed memory along fault path still
> > > > needs to return "false", so that we correctly fallback to ->fault()
> > > > handler. Unless there are objections, I can do that in a v2.
> > >
> > > Sorry for chiming in late. I'm just back from vacation and trying to catch up...
> > >
> > > IIUC the out-of-tree driver tries to allocate huge page and install
> > > PMD mapping via huge_fault() handler, but the cleanup of
> > > hugepage_vma_check() prevents this due to the check to
> > > VM_NO_KHUGEPAGED?
> > >
> > > So you would like to check whether a huge_fault() handler existed
> > > instead of vma_is_dax()?
> >
> > Sorry for the multiple threads here. There are two problems: (a) the
> > VM_NO_KHUGEPAGED check along fault path, and (b) we don't give
> > ->huge_fault() a fair shake, if it exists, along fault path. The
> > current code assumes vma_is_dax() iff ->huge_fault() exists.
> >
> > (a) is easy enough to fix. For (b), I'm currently looking at the
> > possibility of not worrying about ->huge_fault() in
> > hugepage_vma_check(), and just letting create_huge_pud() /
> > create_huge_pmd() check and fallback as necessary. I think we'll need
> > the explicit DAX check still, since we want to keep khugepaged and
> > MADV_COLLAPSE away, and the presence / absence of ->huge_fault() isn't
> > enough to know that (well.. today it kind of is, but we shouldn't
> > depend on it).
>
> You meant something like:
>
> if (vma->vm_ops->huge_fault) {
>     if (vma_is_dax(vma))
>         return in_pf;
>
>     /Fall through */
> }

I don't think this will work for Saurabh's case, since IIUC, they
aren't using dax, but are using VM_HUGEPAGE|VM_MIXEDMAP, faulted in
using ->huge_fault()

The old (v5.19) fault path looked like:

static inline bool transhuge_vma_enabled(struct vm_area_struct *vma,
                                          unsigned long vm_flags)
{
        /* Explicitly disabled through madvise. */
        if ((vm_flags & VM_NOHUGEPAGE) ||
            test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
                return false;
        return true;
}

/*
 * to be used on vmas which are known to support THP.
 * Use transparent_hugepage_active otherwise
 */
static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
{

        /*
         * If the hardware/firmware marked hugepage support disabled.
         */
        if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
                return false;

        if (!transhuge_vma_enabled(vma, vma->vm_flags))
                return false;

        if (vma_is_temporary_stack(vma))
                return false;

        if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
                return true;

        if (vma_is_dax(vma))
                return true;

        if (transparent_hugepage_flags &
                                (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG))
                return !!(vma->vm_flags & VM_HUGEPAGE);

        return false;
}

For non-anonymous, the next check (in create_huge_*) would be for that
->huge_fault handler, falling back as necessary if it didn't exist.

The patch I sent out last week[1] somewhat restores this logic -- the
only difference being we do the check for ->huge_fault in
hugepage_vma_check() as well. This is so smaps can surface this
possibility with some accuracy. I just realized it will erroneously
return "true" for the collapse path, however..

Maybe Matthew was right about unifying everything here :P That's 2
mistakes I've made in trying to fix this issue (but maybe that's just
me).

[1] https://lore.kernel.org/linux-mm/20230818211533.2523697-1-zokeefe@google.com/


  reply	other threads:[~2023-08-21 15:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-12 21:00 Zach O'Keefe
2023-08-12 21:24 ` Zach O'Keefe
2023-08-13  6:19 ` [EXTERNAL] " Saurabh Singh Sengar
2023-08-14 18:47   ` Zach O'Keefe
2023-08-14 19:06     ` Matthew Wilcox
2023-08-15  0:04       ` Zach O'Keefe
2023-08-15  2:24         ` Matthew Wilcox
2023-08-16 16:52           ` Saurabh Singh Sengar
2023-08-16 21:47             ` Zach O'Keefe
2023-08-17 17:46               ` Yang Shi
2023-08-17 18:29                 ` Zach O'Keefe
2023-08-18 21:21                   ` Yang Shi
2023-08-21 15:08                     ` Zach O'Keefe [this message]
2023-08-21 22:59                       ` Yang Shi
2023-08-16 21:31           ` Zach O'Keefe
2023-08-17 12:18             ` Matthew Wilcox
2023-08-17 18:13               ` Zach O'Keefe
2023-08-17 19:01                 ` Matthew Wilcox
2023-08-17 21:12                   ` Zach O'Keefe
2023-08-16 16:49         ` Saurabh Singh Sengar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAa6QmRLkh-g4ge4D9nQge=wHFwTz8CKB7AsjcJ9akDV8d0Z_A@mail.gmail.com' \
    --to=zokeefe@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=ssengar@microsoft.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox