From: Hugh Dickins <hughd@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Rik van Riel <riel@redhat.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Oleg Nesterov <oleg@redhat.com>,
Sasha Levin <sasha.levin@oracle.com>,
Andrey Konovalov <andreyknvl@google.com>,
Dmitry Vyukov <dvyukov@google.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
linux-mm@kvack.org
Subject: Re: [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked
Date: Mon, 19 Oct 2015 12:17:53 -0700 (PDT) [thread overview]
Message-ID: <alpine.LSU.2.11.1510191204020.4652@eggly.anvils> (raw)
In-Reply-To: <5624E31A.9010202@suse.cz>
On Mon, 19 Oct 2015, Vlastimil Babka wrote:
> On 10/19/2015 01:20 PM, Hugh Dickins wrote:
> > On Mon, 19 Oct 2015, Vlastimil Babka wrote:
> >> On 10/19/2015 06:50 AM, Hugh Dickins wrote:
> >>> KernelThreadSanitizer (ktsan) has shown that the down_read_trylock()
> >>> of mmap_sem in try_to_unmap_one() (when going to set PageMlocked on
> >>> a page found mapped in a VM_LOCKED vma) is ineffective against races
> >>> with exit_mmap()'s munlock_vma_pages_all(), because mmap_sem is not
> >>> held when tearing down an mm.
> >>>
> >>> But that's okay, those races are benign; and although we've believed
> >>
> >> But didn't Kirill show that it's not so benign, and can leak memory?
> >> - http://marc.info/?l=linux-mm&m=144196800325498&w=2
> >
> > Kirill's race was this:
> >
> > CPU0 CPU1
> > exit_mmap()
> > // mmap_sem is *not* taken
> > munlock_vma_pages_all()
> > munlock_vma_pages_range()
> > try_to_unmap_one()
> > down_read_trylock(&vma->vm_mm->mmap_sem))
> > !!(vma->vm_flags & VM_LOCKED) == true
> > vma->vm_flags &= ~VM_LOCKED;
> > <munlock the page>
> > mlock_vma_page(page);
> > // mlocked pages is leaked.
> >
> > Hmm, I pulled that in to say that it looked benign to me, that he was
> > missing all the subsequent "<munlock the page>" which would correct the
> > situation. But now I look at it again, I agree with you both: lacking
> > any relevant locking on CPU1 at that point (it has already given up the
> > pte lock there), the whole of "<munlock the page>" could take place on
> > CPU0, before CPU1 reaches its mlock_vma_page(page), yes.
> >
> > Oh, hold on, no: doesn't page lock prevent that one? CPU1 has the page
> > lock throughout, so CPU0's <munlock the page> cannot complete before
> > CPU1's mlock_vma_page(page). So now I disagree with you again!
>
>
> I think the page lock doesn't help with munlock_vma_pages_range(). If I
> expand the race above:
>
> CPU0 CPU1
>
> exit_mmap()
> // mmap_sem is *not* taken
> munlock_vma_pages_all()
> munlock_vma_pages_range()
> lock_page()
> ...
> try_to_unmap_one()
> down_read_trylock(&vma->vm_mm->mmap_sem))
> !!(vma->vm_flags & VM_LOCKED) == true
> vma->vm_flags &= ~VM_LOCKED;
> __munlock_pagevec_fill
> // this briefly takes pte lock
> __munlock_pagevec()
> // Phase 1
> TestClearPageMlocked(page)
>
> mlock_vma_page(page);
> TestSetPageMlocked(page)
> // page is still mlocked
> ...
> unlock_page()
> // Phase 2
> lock_page()
> if (!__putback_lru_fast_prepare())
> // true, because page_evictable(page) is false due to PageMlocked
> __munlock_isolated_page
> if (page_mapcount(page) > 1)
> try_to_munlock(page);
> // this will not help AFAICS
>
> Now if CPU0 is the last mapper, it will unmap the page anyway
> further in exit_mmap(). If not, it stays mlocked.
>
> The key problem is that page lock doesn't cover the TestClearPageMlocked(page)
> part on CPU0.
Thank you for expanding: your diagram beats my words. Yes, I now agree
with you again - but reserve the right the change my mind an infinite
number of times as we look into this for longer.
You can see why mm/mlock.c is not my favourite source file, and every
improvement to it seems to make it worse. It doesn't help that most of
the functions named "munlock" are about trying to set the mlocked bit.
And while it's there on our screens, let me note that "page_mapcount > 1"
"improvement" of mine is, I believe, less valid in the current multistage
procedure than when I first added it (though perhaps a look back would
prove me just as wrong back then). But it errs on the safe side (never
marking something unevictable when it's evictable) since PageMlocked has
already been cleared, so I think that it's still an optimization well
worth making for the common case.
> Your patch should help AFAICS. If CPU1 does the mlock under pte lock, the
> TestClear... on CPU0 can happen only after that.
> If CPU0 takes pte lock first, then CPU1 must see the VM_LOCKED flag cleared,
> right?
Right - thanks a lot for giving it more thought.
Hugh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-10-19 19:18 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-19 4:44 [PATCH 0/12] mm: page migration cleanups, and a little mlock Hugh Dickins
2015-10-19 4:45 ` [PATCH 1/12] mm Documentation: undoc non-linear vmas Hugh Dickins
2015-10-19 9:16 ` Kirill A. Shutemov
2015-11-05 17:29 ` Vlastimil Babka
2015-10-19 4:50 ` [PATCH 2/12] mm: rmap use pte lock not mmap_sem to set PageMlocked Hugh Dickins
2015-10-19 6:23 ` Vlastimil Babka
2015-10-19 11:20 ` Hugh Dickins
2015-10-19 12:33 ` Vlastimil Babka
2015-10-19 19:17 ` Hugh Dickins [this message]
2015-10-19 20:52 ` Vlastimil Babka
2015-10-19 13:13 ` Kirill A. Shutemov
2015-10-19 19:53 ` Hugh Dickins
2015-10-19 20:10 ` Kirill A. Shutemov
2015-10-19 21:25 ` Vlastimil Babka
2015-10-19 21:53 ` Kirill A. Shutemov
2015-10-21 23:26 ` Hugh Dickins
2015-10-29 18:49 ` [PATCH v2 " Hugh Dickins
2015-11-05 17:50 ` Vlastimil Babka
2015-10-19 23:30 ` [PATCH " Davidlohr Bueso
2015-10-19 4:52 ` [PATCH 3/12] mm: page migration fix PageMlocked on migrated pages Hugh Dickins
2015-11-05 18:18 ` Vlastimil Babka
2015-10-19 4:54 ` [PATCH 4/12] mm: rename mem_cgroup_migrate to mem_cgroup_replace_page Hugh Dickins
2015-10-19 12:35 ` Johannes Weiner
2015-12-02 9:33 ` [PATCH] mm: fix kerneldoc on mem_cgroup_replace_page Hugh Dickins
2015-12-02 10:17 ` Michal Hocko
2015-12-02 16:57 ` Johannes Weiner
2015-10-19 4:55 ` [PATCH 5/12] mm: correct a couple of page migration comments Hugh Dickins
2015-10-21 17:53 ` Rafael Aquini
2015-10-19 4:57 ` [PATCH 6/12] mm: page migration use the put_new_page whenever necessary Hugh Dickins
2015-11-05 18:31 ` Vlastimil Babka
2015-11-08 21:17 ` Hugh Dickins
2015-10-19 4:59 ` [PATCH 7/12] mm: page migration trylock newpage at same level as oldpage Hugh Dickins
2015-10-21 17:54 ` Rafael Aquini
2015-10-19 5:01 ` [PATCH 8/12] mm: page migration remove_migration_ptes at lock+unlock level Hugh Dickins
2015-10-19 5:03 ` [PATCH 9/12] mm: simplify page migration's anon_vma comment and flow Hugh Dickins
2015-10-19 5:05 ` [PATCH 10/12] mm: page migration use migration entry for swapcache too Hugh Dickins
2015-10-22 22:35 ` Cyrill Gorcunov
2015-10-19 5:07 ` [PATCH 11/12] mm: page migration avoid touching newpage until no going back Hugh Dickins
2015-10-19 5:11 ` [PATCH 12/12] mm: migrate dirty page without clear_page_dirty_for_io etc Hugh Dickins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.11.1510191204020.4652@eggly.anvils \
--to=hughd@google.com \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@google.com \
--cc=cl@linux.com \
--cc=dave@stgolabs.net \
--cc=dvyukov@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
--cc=sasha.levin@oracle.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox