From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=us07=KQ=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-16.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,
	URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5842CC47076
	for <linux-mm@archiver.kernel.org>; Fri, 21 May 2021 15:46:22 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id E33A7613EA
	for <linux-mm@archiver.kernel.org>; Fri, 21 May 2021 15:46:21 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E33A7613EA
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 2312D8E0045; Fri, 21 May 2021 11:46:21 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 20DDA8E0022; Fri, 21 May 2021 11:46:21 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 01D848E0045; Fri, 21 May 2021 11:46:20 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0196.hostedemail.com [216.40.44.196])
	by kanga.kvack.org (Postfix) with ESMTP id AC1C28E0022
	for <linux-mm@kvack.org>; Fri, 21 May 2021 11:46:20 -0400 (EDT)
Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with ESMTP id 2A16A181D75B7
	for <linux-mm@kvack.org>; Fri, 21 May 2021 15:46:20 +0000 (UTC)
X-FDA: 78165664920.29.533D68A
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124])
	by imf17.hostedemail.com (Postfix) with ESMTP id 5F45440B8CF7
	for <linux-mm@kvack.org>; Fri, 21 May 2021 15:46:18 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1621611979;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=Oiv5CMTfVr5fGq4A95QHsr3IkiN3wgJIlDUHsc03NSs=;
	b=CK6juZW/RGfUZ27Fo8Wi78NB4bvjfCGSWefbe4/LHwFaYCnmg6hL1biM8KYl81D2aYCrtT
	PqZCUwKTJQO6PWH/hYQRmVuA88DM1RhHWVGRsn7IeJq+Eij1xpQ8rjobejuTpLeCSCuJ4r
	vsuHyIJBVEt3UI09Ri8ETJREtz7wC8g=
Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com
 [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-552-OIvehDjzOTm7YtsNRrJHCw-1; Fri, 21 May 2021 11:46:17 -0400
X-MC-Unique: OIvehDjzOTm7YtsNRrJHCw-1
Received: by mail-wr1-f69.google.com with SMTP id u5-20020adf9e050000b029010df603f280so9595611wre.18
        for <linux-mm@kvack.org>; Fri, 21 May 2021 08:46:17 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=Oiv5CMTfVr5fGq4A95QHsr3IkiN3wgJIlDUHsc03NSs=;
        b=HS4BNdwGwM+/uguYOBH1t2YGfkQ7MrpQu13V6UjobhxH67Cbd1OunG4pLHUCXrOBfI
         1fsZjDvM11y59HxquB5AUVnhqpBsn/5kr+HBLmrgUA0TLf04u3xJV0TU+8uThPeGY5LU
         ToI0VWOHwM27Th+2UURoPXN+A4G56TW6sqkQUmgYIRM8KBj0vIDq0TpOa3lWjqhiK7tS
         Df5ie0j1+ksBeJiNFbmfEWs7WSctgWKT8cpGBb2VDs3Ld2EWkhAGjNTCHYQnby8CGR2S
         +cTT8+Dyb7wBpHkWmeLEWJERNHrkJKhB8VCYewe6uHydCEUxSL8hk4p4xj5XluH9uXND
         k2zA==
X-Gm-Message-State: AOAM530Vuw9t8bkkyMVayCjG/yM9H48P78ZmFn1r3yACD94DBczORYkm
	cDK29kRgm49TLs0iyeOw2YWXqS2RprKVDfchNCmX2QG/yLiRm/X/K+VvE3JpSYKxLym0jDUx2jf
	Phhp9lA/7VWV/J01tbbgGqUNI8cw=
X-Received: by 2002:a05:600c:4f4e:: with SMTP id m14mr9769347wmq.164.1621611976464;
        Fri, 21 May 2021 08:46:16 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJwNiKyNNQUg0ROCG/SSOfEtN4zmZrv2U5Ehc2HYyxeDqg4kgiTzYPVs1QJPabytJJQk6QNyxHUQkFNSA09pUs4=
X-Received: by 2002:a05:600c:4f4e:: with SMTP id m14mr9769331wmq.164.1621611976235;
 Fri, 21 May 2021 08:46:16 -0700 (PDT)
MIME-Version: 1.0
References: <20210520122536.1596602-1-agruenba@redhat.com> <20210520122536.1596602-7-agruenba@redhat.com>
 <20210520133015.GC18952@quack2.suse.cz> <CAHc6FU7ESASp+G59d218LekK8+YMBvH9GxbPr-qOVBhzyVmq4Q@mail.gmail.com>
 <20210521152352.GQ18952@quack2.suse.cz>
In-Reply-To: <20210521152352.GQ18952@quack2.suse.cz>
From: Andreas Gruenbacher <agruenba@redhat.com>
Date: Fri, 21 May 2021 17:46:04 +0200
Message-ID: <CAHc6FU6df7cBbjmYOZE35v_FALWRO62cYjg2Y9rY+Hd6x5yeyw@mail.gmail.com>
Subject: Re: [PATCH 6/6] gfs2: Fix mmap + page fault deadlocks (part 2)
To: Jan Kara <jack@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>, Alexander Viro <viro@zeniv.linux.org.uk>, 
	cluster-devel <cluster-devel@redhat.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, 
	Linux-MM <linux-mm@kvack.org>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
X-Rspamd-Queue-Id: 5F45440B8CF7
Authentication-Results: imf17.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="CK6juZW/";
	dmarc=pass (policy=none) header.from=redhat.com;
	spf=none (imf17.hostedemail.com: domain of agruenba@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=agruenba@redhat.com
X-Rspamd-Server: rspam03
X-Stat-Signature: eii8wn5dec37q8o33j6iyqkrdwmp7fnh
X-HE-Tag: 1621611978-597343
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, May 21, 2021 at 5:23 PM Jan Kara <jack@suse.cz> wrote:
> On Thu 20-05-21 16:07:56, Andreas Gruenbacher wrote:
> > On Thu, May 20, 2021 at 3:30 PM Jan Kara <jack@suse.cz> wrote:
> > > On Thu 20-05-21 14:25:36, Andreas Gruenbacher wrote:
> > > > Now that we handle self-recursion on the inode glock in gfs2_fault and
> > > > gfs2_page_mkwrite, we need to take care of more complex deadlock
> > > > scenarios like the following (example by Jan Kara):
> > > >
> > > > Two independent processes P1, P2. Two files F1, F2, and two mappings M1,
> > > > M2 where M1 is a mapping of F1, M2 is a mapping of F2. Now P1 does DIO
> > > > to F1 with M2 as a buffer, P2 does DIO to F2 with M1 as a buffer. They
> > > > can race like:
> > > >
> > > > P1                                      P2
> > > > read()                                  read()
> > > >   gfs2_file_read_iter()                   gfs2_file_read_iter()
> > > >     gfs2_file_direct_read()                 gfs2_file_direct_read()
> > > >       locks glock of F1                       locks glock of F2
> > > >       iomap_dio_rw()                          iomap_dio_rw()
> > > >         bio_iov_iter_get_pages()                bio_iov_iter_get_pages()
> > > >           <fault in M2>                           <fault in M1>
> > > >             gfs2_fault()                            gfs2_fault()
> > > >               tries to grab glock of F2               tries to grab glock of F1
> > > >
> > > > Those kinds of scenarios are much harder to reproduce than
> > > > self-recursion.
> > > >
> > > > We deal with such situations by using the LM_FLAG_OUTER flag to mark
> > > > "outer" glock taking.  Then, when taking an "inner" glock, we use the
> > > > LM_FLAG_TRY flag so that locking attempts that don't immediately succeed
> > > > will be aborted.  In case of a failed locking attempt, we "unroll" to
> > > > where the "outer" glock was taken, drop the "outer" glock, and fault in
> > > > the first offending user page.  This will re-trigger the "inner" locking
> > > > attempt but without the LM_FLAG_TRY flag.  Once that has happened, we
> > > > re-acquire the "outer" glock and retry the original operation.
> > > >
> > > > Reported-by: Jan Kara <jack@suse.cz>
> > > > Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
> > >
> > > ...
> > >
> > > > diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> > > > index 7d88abb4629b..8b26893f8dc6 100644
> > > > --- a/fs/gfs2/file.c
> > > > +++ b/fs/gfs2/file.c
> > > > @@ -431,21 +431,30 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
> > > >       vm_fault_t ret = VM_FAULT_LOCKED;
> > > >       struct gfs2_holder gh;
> > > >       unsigned int length;
> > > > +     u16 flags = 0;
> > > >       loff_t size;
> > > >       int err;
> > > >
> > > >       sb_start_pagefault(inode->i_sb);
> > > >
> > > > -     gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
> > > > +     if (current_holds_glock())
> > > > +             flags |= LM_FLAG_TRY;
> > > > +
> > > > +     gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, flags, &gh);
> > > >       if (likely(!outer_gh)) {
> > > >               err = gfs2_glock_nq(&gh);
> > > >               if (err) {
> > > >                       ret = block_page_mkwrite_return(err);
> > > > +                     if (err == GLR_TRYFAILED) {
> > > > +                             set_current_needs_retry(true);
> > > > +                             ret = VM_FAULT_SIGBUS;
> > > > +                     }
> > >
> > > I've checked to make sure but do_user_addr_fault() indeed calls do_sigbus()
> > > which raises the SIGBUS signal. So if the application does not ignore
> > > SIGBUS, your retry will be visible to the application and can cause all
> > > sorts of interesting results...
> >
> > I would have noticed that, but no SIGBUS signals were actually
> > delivered. So we probably end up in kernelmode_fixup_or_oops() when in
> > kernel mode, which just does nothing in that case.
>
> Hum, but how would we get there? I don't think fatal_signal_pending() would
> return true yet...

Hmm, right ...

> > > So you probably need to add a new VM_FAULT_
> > > return code that will behave like VM_FAULT_SIGBUS except it will not raise
> > > the signal.
> >
> > A new VM_FAULT_* flag might make the code easier to read, but I don't
> > know if we can have one.
>
> Well, this is kernel-internal API and there's still plenty of space in
> vm_fault_reason.

That's in the context of the page fault. The other issue is how to
propagate that out through iov_iter_fault_in_readable ->
fault_in_pages_readable -> __get_user, for example. I don't think
there's much of a chance to get an additional error code out of
__get_user and __put_user.

Thanks,
Andreas