linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dominique Martinet <asmadeus@codewreck.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org
Subject: Re: How to use huge pages in drivers?
Date: Thu, 5 Sep 2019 20:50:48 +0200	[thread overview]
Message-ID: <20190905185048.GA23588@nautica> (raw)
In-Reply-To: <20190905181555.GQ29434@bombadil.infradead.org>

Matthew Wilcox wrote on Thu, Sep 05, 2019:
> On Thu, Sep 05, 2019 at 05:44:00PM +0200, Dominique Martinet wrote:
> > Question though - is it ok to insert small pages if the huge_fault
> > handler is called with PE_SIZE_PMD ?
> > (I think the pte insertion will automatically create the pmd, but would
> > be good to confirm)
> 
> No, you need to return VM_FAULT_FALLBACK, at which point the generic code
> will create a PMD for you and then call your ->fault handler which can
> insert PTEs.

Hmm, that's a shame actually.
There is a rather costly round-trip between linux and mckernel to
determine what page size is used for this virtual address on the remote
side and to get the corresponding physical address, so basically when we
get the fault we do know know if this will be a PMD or PTE. 

I'd rather avoid having to do one round-trip at the PMD stage, get told
this is a PTE, temporarily give up and wait to be called again with
PE_SIZE_PTE and do a second round-trip in this case.
I didn't see anywhere in the vm_fault struct that I could piggy-back to
remember something from the previous call, and I'm pretty sure it would
be a bad idea to use the vma's vm_private_data here because there could
be multiple faults in parallel on other threads.


Looking at vmf_insert_pfn(), it will allocate a pmd because of
insert_pfn's get_locked_pte, so it does end up working (I never return a
page - we always return VM_FAULT_NOPAGE on success, so I do not see the
harm in doing it early if we can)

Following the code in __handle_vm_fault assuming the pmd fault would
have returned fallback I do not see any harm here - the pmd actually
already has been allocated here (at pmd level fault), it's just set to
none.

Not exactly pretty, though, and very definitely no guarantee it'll keep
working... I'll stick a comment saying what we should do at least :P

> It works the same way from PUDs to PMDs by the way, in case you ever
> have a 1GB mapping ;-)

Yes, already returning fallback in this case - but I'm just assuming
that won't happen so no round-trip here :)


> > Now that I've set it as dax I think it actually makes sense as in
> > "there's memory here that points to something linux no longer manages
> > directly, just let it be" and we might benefit from the other exceptions
> > dax have, I'll need to look at what this implies in more details...
> 
> I think that should be fine, but I don't really know RHEL 7.3 all that
> well ;-)

Good enough for me, tests will tell me what I broke :)


> No problem ... these APIs are relatively new and not necessarily all
> that intuitive.

Looking at a recent vanilla linux on evening and rhel's kernel at work
didn't help on my side (some fun differences like the VM_HUGE_FAULT flag
in the vma, but now I understand it was added for abi compatibility it
does make sense after I found about it - on an older module the function
could just have been left uninitialized and thus non-null yet not valid)

Definitely did help to point at huge_fault() again.


Thanks,
-- 
Dominique


      reply	other threads:[~2019-09-05 18:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 18:26 Dominique Martinet
2019-09-03 18:42 ` Matthew Wilcox
2019-09-03 21:28   ` Dominique Martinet
2019-09-04 17:00     ` Dominique Martinet
2019-09-04 17:50       ` Matthew Wilcox
2019-09-05 15:44         ` Dominique Martinet
2019-09-05 18:15           ` Matthew Wilcox
2019-09-05 18:50             ` Dominique Martinet [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905185048.GA23588@nautica \
    --to=asmadeus@codewreck.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox