From: "Thomas Hellström (VMware)" <thomas_os@shipmail.org>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Thomas Hellstrom <thellstrom@vmware.com>
Cc: Linux-MM <linux-mm@kvack.org>,
"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Matthew Wilcox" <willy@infradead.org>,
"Will Deacon" <will.deacon@arm.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Rik van Riel" <riel@surriel.com>,
"Minchan Kim" <minchan@kernel.org>,
"Michal Hocko" <mhocko@suse.com>,
"Huang Ying" <ying.huang@intel.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
"Kirill A . Shutemov" <kirill@shutemov.name>
Subject: Re: [PATCH v3 3/7] mm: Add write-protect and clean utilities for address space ranges
Date: Thu, 3 Oct 2019 20:03:12 +0200 [thread overview]
Message-ID: <a3aebf74-be74-bf15-f016-da9734ba435a@shipmail.org> (raw)
In-Reply-To: <CAHk-=wj5NiFPouYd6zUgY4K7VovOAxQT-xhDRjD6j5hifBWi_g@mail.gmail.com>
On 10/3/19 6:55 PM, Linus Torvalds wrote:
>
> d) Fix the pte walker to do the right thing, then just use separate
> pte walkers in your code
>
> The fix would be those two conceptual changes:
>
> 1) don't split if the walker asks for a pmd_entry (the walker itself
> can then decide to split, of course, but right now no walkers want it
> since there are no pmd _and_ pte walkers, because people who want that
> do the pte walk themselves)
>
> 2) get the proper page table lock if you do walk the pte, since
> otherwise it's racy
>
> Then there won't be any code duplication, because all the duplication
> you now have at the pmd level is literally just workarounds for the
> fact that our current walker has this bug.
I actually started on d) already when Kirill asked me to unify the
pud_entry() and pmd_entry()
callbacks.
>
> That "fix the pte walker" would be one preliminary patch that would
> look something like the attached TOTALLY UNTESTED garbage.
>
> I call it "garbage" because I really hope people take it just as what
> it is: "something like this". It compiles for me, and I did try to
> think it through, but I might have missed some big piece of the
> picture when writing that patch.
>
> And yes, this is a much bigger conceptual change for the VM layer, but
> I really think our pagewalk code is actively buggy right now, and is
> forcing users to do bad things because they work around the existing
> limitations.
>
> Hmm? Could some of the core mm people look over that patch?
>
> And yes, I was tempted to move the proper pmd locking into the walker
> too, and do
>
> ptl = pmd_trans_huge_lock(pmd, vma);
> if (ptl) {
> err = ops->pmd_entry(pmd, addr, next, walk);
> spin_unlock(ptl);
> ...
>
> but while I think that's the correct thing to do in the long run, that
> would have to be done together with changing all the existing
> pmd_entry users. It would make the pmd_entry _solely_ handle the
> hugepage case, and then you'd have to remove the locking in the
> pmd_entry, and have to make the pte walking be a walker entry. But
> that would _really_ clean things up, and would make things like
> smaps_pte_range() much easier to read, and much more obvious (it would
> be split into a smaps_pmd_range and smaps_pte_range, and the callbacks
> wouldn't need to know about the complex locking).
>
> So I think this is the right direction to move into, but I do want
> people to think about this, and think about that next phase of doing
> the pmd_trans_huge_lock too.
>
> Comments?
>
> Linus
I think if we take the ptl lock outside the callback, we'd need to allow
the callback to unlock to do non-atomic things or to avoid recursive
locking if it decides to split in the callback. FWIW The pud_entry call
is being made locked, but the only current implementation appears to
happily ignore that from what I can tell.
And if we allow unlocking or call the callback unlocked, the callback
needs to tell us whether the entry was actually handled or if we need to
fallback to the next level. Perhaps using a positive PAGE_WALK_FALLBACK
return value? That would allow current implementations to be unmodified.
/Thomas
next prev parent reply other threads:[~2019-10-03 18:03 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-02 13:47 [PATCH v3 0/7] Emulated coherent graphics memory take 2 Thomas Hellström (VMware)
2019-10-02 13:47 ` [PATCH v3 1/7] mm: Remove BUG_ON mmap_sem not held from xxx_trans_huge_lock() Thomas Hellström (VMware)
2019-10-03 11:02 ` Kirill A. Shutemov
2019-10-03 11:32 ` Thomas Hellström (VMware)
2019-10-02 13:47 ` [PATCH v3 2/7] mm: Add a walk_page_mapping() function to the pagewalk code Thomas Hellström (VMware)
2019-10-02 17:52 ` Linus Torvalds
2019-10-03 11:17 ` Kirill A. Shutemov
2019-10-03 11:32 ` Thomas Hellström (VMware)
2019-10-04 12:37 ` Kirill A. Shutemov
2019-10-04 12:58 ` Thomas Hellström (VMware)
2019-10-04 13:24 ` Kirill A. Shutemov
2019-10-02 13:47 ` [PATCH v3 3/7] mm: Add write-protect and clean utilities for address space ranges Thomas Hellström (VMware)
2019-10-02 18:06 ` Linus Torvalds
2019-10-02 18:13 ` Matthew Wilcox
2019-10-02 19:09 ` Thomas Hellström (VMware)
2019-10-02 20:27 ` Linus Torvalds
2019-10-03 7:56 ` Thomas Hellstrom
2019-10-03 16:55 ` Linus Torvalds
2019-10-03 18:03 ` Thomas Hellström (VMware) [this message]
2019-10-03 18:11 ` Linus Torvalds
2019-10-02 13:47 ` [PATCH v3 4/7] drm/vmwgfx: Implement an infrastructure for write-coherent resources Thomas Hellström (VMware)
2019-10-02 13:47 ` [PATCH v3 5/7] drm/vmwgfx: Use an RBtree instead of linked list for MOB resources Thomas Hellström (VMware)
2019-10-02 13:47 ` [PATCH v3 6/7] drm/vmwgfx: Implement an infrastructure for read-coherent resources Thomas Hellström (VMware)
2019-10-02 13:47 ` [PATCH v3 7/7] drm/vmwgfx: Add surface dirty-tracking callbacks Thomas Hellström (VMware)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a3aebf74-be74-bf15-f016-da9734ba435a@shipmail.org \
--to=thomas_os@shipmail.org \
--cc=akpm@linux-foundation.org \
--cc=jglisse@redhat.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=thellstrom@vmware.com \
--cc=torvalds@linux-foundation.org \
--cc=will.deacon@arm.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox