linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@mellanox.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Oleg Nesterov <oleg@redhat.com>, Jann Horn <jannh@google.com>,
	Hugh Dickins <hughd@google.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Peter Xu <peterx@redhat.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH 1/1] coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
Date: Mon, 25 Mar 2019 23:53:52 +0000	[thread overview]
Message-ID: <20190325235347.GM9994@mellanox.com> (raw)
In-Reply-To: <20190325224949.11068-1-aarcange@redhat.com>

On Mon, Mar 25, 2019 at 06:49:49PM -0400, Andrea Arcangeli wrote:
> The core dumping code has always run without holding the mmap_sem for
> writing, despite that is the only way to ensure that the entire vma
> layout will not change from under it. Only using some signal
> serialization on the processes belonging to the mm is not nearly
> enough. This was pointed out earlier. For example in Hugh's post from
> Jul 2017:
> 
> https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
> 
> "Not strictly relevant here, but a related note: I was very surprised
> to discover, only quite recently, how handle_mm_fault() may be called
> without down_read(mmap_sem) - when core dumping.  That seems a
> misguided optimization to me, which would also be nice to correct"
> 
> In particular because the growsdown and growsup can move the
> vm_start/vm_end the various loops the core dump does around the vma
> will not be consistent if page faults can happen concurrently.
> 
> Pretty much all users calling mmget_not_zero()/get_task_mm() and then
> taking the mmap_sem had the potential to introduce unexpected side
> effects in the core dumping code.
> 
> Adding mmap_sem for writing around the ->core_dump invocation is a
> viable long term fix, but it requires removing all copy user and page
> faults and to replace them with get_dump_page() for all binary formats
> which is not suitable as a short term fix.
> 
> For the time being this solution manually covers the places that can
> confuse the core dump either by altering the vma layout or the vma
> flags while it runs. Once ->core_dump runs under mmap_sem for writing
> the function mmget_still_valid() can be dropped.
> 
> Allowing mmap_sem protected sections to run in parallel with the
> coredump provides some minor parallelism advantage to the swapoff
> code (which seems to be safe enough by never mangling any vma field
> and can keep doing swapins in parallel to the core dumping) and to
> some other corner case.
> 
> In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
> however the side effect of this same race condition in /proc/pid/mem
> should be reproducible since before commit
> 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 so I couldn't add any other
> "Fixes:" because there's no hash beyond the git genesis commit.
> 
> Because find_extend_vma() is the only location outside of the process
> context that could modify the "mm" structures under mmap_sem for
> reading, by adding the mmget_still_valid() check to it, all other
> cases that take the mmap_sem for reading don't need the new check
> after mmget_not_zero()/get_task_mm(). The expand_stack() in page fault
> context also doesn't need the new check, because all tasks under core
> dumping are frozen.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Reported-by: Jann Horn <jannh@google.com>
> Suggested-by: Oleg Nesterov <oleg@redhat.com>
> Fixes: 86039bd3b4e6 ("userfaultfd: add new syscall to provide memory externalization")
> Cc: stable@kernel.org
> Acked-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> Reviewed-by: Oleg Nesterov <oleg@redhat.com>
> Reviewed-by: Jann Horn <jannh@google.com>
> ---
>  drivers/infiniband/core/uverbs_main.c |  3 +++
>  fs/proc/task_mmu.c                    | 18 ++++++++++++++++++
>  fs/userfaultfd.c                      |  9 +++++++++
>  include/linux/sched/mm.h              | 21 +++++++++++++++++++++
>  mm/mmap.c                             |  7 ++++++-
>  5 files changed, 57 insertions(+), 1 deletion(-)

For the ib part:

Acked-by: Jason Gunthorpe <jgg@mellanox.com>

Jason


  reply	other threads:[~2019-03-25 23:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-25 22:49 Andrea Arcangeli
2019-03-25 23:53 ` Jason Gunthorpe [this message]
2019-03-26  8:30 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190325235347.GM9994@mellanox.com \
    --to=jgg@mellanox.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=oleg@redhat.com \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox