Re: [PATCH] hugetlbfs: fix anon huge page migration race

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "HORIGUCHI NAOYA(堀口　直也)" <naoya.horiguchi@nec.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Qian Cai <cai@lca.pm>, Hugh Dickins <hughd@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Christoph Hellwig <hch@infradead.org>,
	Roman Gushchin <guro@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH] hugetlbfs: fix anon huge page migration race
Date: Fri, 13 Nov 2020 05:34:27 +0000	[thread overview]
Message-ID: <20201113053426.GA20236@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20201105195058.78401-1-mike.kravetz@oracle.com>

On Thu, Nov 05, 2020 at 11:50:58AM -0800, Mike Kravetz wrote:
> Qian Cai reported the following BUG in [1]
> 
> [ 6147.019063][T45242] LTP: starting move_pages12
> [ 6147.475680][T64921] BUG: unable to handle page fault for address: ffffffffffffffe0
> ...
> [ 6147.525866][T64921] RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170
> avc_start_pgoff at mm/interval_tree.c:63
> [ 6147.620914][T64921] Call Trace:
> [ 6147.624078][T64921]  rmap_walk_anon+0x141/0xa30
> rmap_walk_anon at mm/rmap.c:1864
> [ 6147.628639][T64921]  try_to_unmap+0x209/0x2d0
> try_to_unmap at mm/rmap.c:1763
> [ 6147.633026][T64921]  ? rmap_walk_locked+0x140/0x140
> [ 6147.637936][T64921]  ? page_remove_rmap+0x1190/0x1190
> [ 6147.643020][T64921]  ? page_not_mapped+0x10/0x10
> [ 6147.647668][T64921]  ? page_get_anon_vma+0x290/0x290
> [ 6147.652664][T64921]  ? page_mapcount_is_zero+0x10/0x10
> [ 6147.657838][T64921]  ? hugetlb_page_mapping_lock_write+0x97/0x180
> [ 6147.663972][T64921]  migrate_pages+0x1005/0x1fb0
> [ 6147.668617][T64921]  ? remove_migration_pte+0xac0/0xac0
> [ 6147.673875][T64921]  move_pages_and_store_status.isra.47+0xd7/0x1a0
> [ 6147.680181][T64921]  ? migrate_pages+0x1fb0/0x1fb0
> [ 6147.685002][T64921]  __x64_sys_move_pages+0xa5c/0x1100
> [ 6147.690176][T64921]  ? trace_hardirqs_on+0x20/0x1b5
> [ 6147.695084][T64921]  ? move_pages_and_store_status.isra.47+0x1a0/0x1a0
> [ 6147.701653][T64921]  ? rcu_read_lock_sched_held+0xaa/0xd0
> [ 6147.707088][T64921]  ? switch_fpu_return+0x196/0x400
> [ 6147.712083][T64921]  ? lockdep_hardirqs_on_prepare+0x38c/0x550
> [ 6147.717954][T64921]  ? do_syscall_64+0x24/0x310
> [ 6147.722513][T64921]  do_syscall_64+0x5f/0x310
> [ 6147.726897][T64921]  ? trace_hardirqs_off+0x12/0x1a0
> [ 6147.731894][T64921]  ? asm_exc_page_fault+0x8/0x30
> [ 6147.736714][T64921]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Hugh Dickens diagnosed this as a migration bug caused by code introduced
> to use i_mmap_rwsem for pmd sharing synchronization.  Specifically, the
> routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED
> flag to try_to_unmap() while holding i_mmap_rwsem.   This is wrong for
> anon pages as the anon_vma_lock should be held in this case.  Further
> analysis suggested that i_mmap_rwsem was not required to he held at all
> when calling try_to_unmap for anon pages as an anon page could never be
> part of a shared pmd mapping.
> 
> Discussion also revealed that the hack in hugetlb_page_mapping_lock_write
> to drop page lock and acquire i_mmap_rwsem is wrong.  There is no way to
> keep mapping valid while dropping page lock.
> 
> This patch does the following:
> - Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when
>   calling try_to_unmap.
> - Remove the hacky code in hugetlb_page_mapping_lock_write.  The routine
>   will now simply do a 'trylock' while still holding the page lock.  If
>   the trylock fails, it will return NULL.  This could impact the callers:
>   - migration calling code will receive -EAGAIN and retry up to the hard
>     coded limit (10).
>   - memory error code will treat the page as BUSY.  This will force
>     killing (SIGKILL) instead of SIGBUS any mapping tasks.
>   Do note that this change in behavior only happens when there is a race.
>   None of the standard kernel testing suites actually hit this race, but
>   it is possible.
> 
> [1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/
> [2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.anvils/
> 
> Reported-by: Qian Cai <cai@lca.pw>
> Suggested-by: Hugh Dickins <hughd@google.com>
> Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

This approch looks simpler and better than former ones.
Thank you for the update.

Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

     prev parent reply	other threads:[~2020-11-13  5:34 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-05 19:50 Mike Kravetz
2020-11-13  5:34 ` HORIGUCHI NAOYA(堀口　直也) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201113053426.GA20236@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=cai@lca.pm \
    --cc=guro@fb.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox