From: Motohiro Kosaki <Motohiro.Kosaki@us.fujitsu.com>
To: Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Sasha Levin <sasha.levin@oracle.com>,
Wanpeng Li <liwanp@linux.vnet.ibm.com>,
Michel Lespinasse <walken@google.com>,
Bob Liu <bob.liu@oracle.com>, Nick Piggin <npiggin@suse.de>,
Motohiro Kosaki JP <kosaki.motohiro@jp.fujitsu.com>,
Rik van Riel <riel@redhat.com>,
David Rientjes <rientjes@google.com>,
Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan@kernel.org>,
Hugh Dickins <hughd@google.com>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>
Subject: RE: [PATCH] mm/mlock: fix BUG_ON unlocked page for nolinear VMAs
Date: Fri, 10 Jan 2014 09:48:54 -0800 [thread overview]
Message-ID: <6B2BA408B38BA1478B473C31C3D2074E2C386DF586@SV-EXCHANGE1.Corp.FC.LOCAL> (raw)
In-Reply-To: <52CC16DC.9070308@suse.cz>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 068522d..b99c742 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1389,9 +1389,19 @@ static int try_to_unmap_cluster(unsigned long
> cursor, unsigned int *mapcount,
> BUG_ON(!page || PageAnon(page));
>
> if (locked_vma) {
> - mlock_vma_page(page); /* no-op if already
> mlocked */
> - if (page == check_page)
> + if (page == check_page) {
> + /* we know we have check_page locked */
> + mlock_vma_page(page);
> ret = SWAP_MLOCK;
> + } else if (trylock_page(page)) {
> + /*
> + * If we can lock the page, perform mlock.
> + * Otherwise leave the page alone, it will be
> + * eventually encountered again later.
> + */
> + mlock_vma_page(page);
> + unlock_page(page);
> + }
> continue; /* don't unmap */
> }
I audited all related mm code. However I couldn't find any race that it can close.
First off, current munlock code is crazy tricky.
munlock
down_write(mmap_sem)
do_mlock()
mlock_fixup
munlock_vma_pages_range
__munlock_pagevec
spin_lock_irq(zone->lru_lock)
TestClearPageMlocked(page)
del_page_from_lru_list
spin_unlock_irq(zone->lru_lock)
lock_page
__munlock_isolated_page
unlock_page
up_write(mmap_sem)
Look, TestClearPageMlocked(page) is not protected by lock_page. But this is still
safe because Page_mocked mean one or more vma marked VM_LOCKED then we
only need care about turning down final VM_LOCKED. I.e. mmap_sem protect them.
And,
spin_lock_irq(zone->lru_lock)
del_page_from_lru_list
spin_unlock_irq(zone->lru_lock)
This idiom ensures I or someone else isolate the page from lru and isolated pages
will be put back by putback_lru_page in anycase. So, the pages will move the right
lru eventually.
And then, taking page-lock doesn't help to close vs munlock race.
On the other hands, page migration has the following call stack.
some-caller [isolate_lru_page]
unmap_and_move
__unmap_and_move
trylock_page
try_to_unmap
move_to_new_page
migrate_page
migrate_page_copy
unlock_page
The critical part (i.e. migrate_page_copy) is protected by both page isolation and page lock.
Page fault path take lock page and doesn't use page isolation. This is correct.
try_to_unmap_cluster doesn't take page lock, but it ensure the page is isolated. This is correct too.
Plus, do_wp_page() has the following comment. But it is wrong. This lock is necessary to protect against
page migration, but not lru manipulation.
if ((ret & VM_FAULT_WRITE) && (vma->vm_flags & VM_LOCKED)) {
lock_page(old_page); /* LRU manipulation */
munlock_vma_page(old_page);
unlock_page(old_page);
}
But, you know, this is crazy ugly lock design. I'm ok to change the rule to that PG_mlocked must be protected
page lock. If so, I propose to add PageMlocked() like this
} else if (!PageMlocked() && trylock_page(page)) {
/*
* If we can lock the page, perform mlock.
* Otherwise leave the page alone, it will be
* eventually encountered again later.
*/
mlock_vma_page(page);
unlock_page(page);
This is safe. Even if race is happen and vmscan failed to mark PG_mlocked, that's no problem. Next vmscan may
do the right job and will fix the mistake.
Anyway, I'm also sure this is not recent issue. It lived 5 years. It is no reason to rush.
next prev parent reply other threads:[~2014-01-10 17:49 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-12-17 8:05 Wanpeng Li
2013-12-18 3:14 ` Wanpeng Li
2013-12-18 3:16 ` Wanpeng Li
2013-12-18 3:23 ` Wanpeng Li
[not found] ` <20131218032329.GA6044@hacker.(null)>
2013-12-18 3:32 ` Sasha Levin
2013-12-18 4:12 ` Wanpeng Li
2013-12-18 9:11 ` Vlastimil Babka
2013-12-18 9:23 ` Wanpeng Li
[not found] ` <52b1699f.87293c0a.75d1.34d3SMTPIN_ADDED_BROKEN@mx.google.com>
2013-12-18 21:43 ` Andrew Morton
2014-01-03 20:17 ` Sasha Levin
2014-01-03 20:52 ` Linus Torvalds
2014-01-03 23:36 ` Vlastimil Babka
2014-01-03 23:56 ` Andrew Morton
2014-01-04 3:03 ` Sasha Levin
2014-01-04 0:18 ` Linus Torvalds
2014-01-04 8:09 ` Vlastimil Babka
2014-01-05 0:27 ` Wanpeng Li
2014-01-06 16:47 ` Motohiro Kosaki
2014-01-06 22:01 ` KOSAKI Motohiro
2014-01-07 5:27 ` Wanpeng Li
2014-01-07 15:01 ` Vlastimil Babka
2014-01-08 1:06 ` Bob Liu
2014-01-08 2:44 ` Linus Torvalds
2014-01-10 17:48 ` Motohiro Kosaki [this message]
2014-01-13 14:03 ` Vlastimil Babka
2014-01-14 11:05 ` Vlastimil Babka
2014-01-04 3:31 ` Bob Liu
2013-12-18 8:54 ` Vlastimil Babka
2013-12-18 9:01 ` Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6B2BA408B38BA1478B473C31C3D2074E2C386DF586@SV-EXCHANGE1.Corp.FC.LOCAL \
--to=motohiro.kosaki@us.fujitsu.com \
--cc=akpm@linux-foundation.org \
--cc=bob.liu@oracle.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=mgorman@suse.de \
--cc=minchan@kernel.org \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=rientjes@google.com \
--cc=sasha.levin@oracle.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox