From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0680C433ED for ; Thu, 20 May 2021 14:08:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3B0DC61057 for ; Thu, 20 May 2021 14:08:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3B0DC61057 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 85AE98E0014; Thu, 20 May 2021 10:08:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 809858E0012; Thu, 20 May 2021 10:08:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D1AF8E0014; Thu, 20 May 2021 10:08:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0173.hostedemail.com [216.40.44.173]) by kanga.kvack.org (Postfix) with ESMTP id 399BF8E0012 for ; Thu, 20 May 2021 10:08:13 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C4EB7BBE6 for ; Thu, 20 May 2021 14:08:12 +0000 (UTC) X-FDA: 78161788824.10.6D93514 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 33A5C6000254 for ; Thu, 20 May 2021 14:08:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1621519691; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=It7btjJ0p3Yc2h3EMcw3zPNhica8EKGBJRqOiDF2YA8=; b=RnHhVomD76JRZOpR2lfpJKRr1xpqO3ZiBo20ZJz97BW+TGvSxr32ctM7wDUoqIR9d3flCq 1vqN/t9TC2re8ddgVphNVXzeGz5N2wiA+41JgiUG3R1vdpCBv9W8KRF0KNfFgqeOSrjGIQ qiljaYu8lo5dCyUcEzVq6WOXtxIzxZA= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-178-_2jOYkAhMsOUpjsgG_v5HA-1; Thu, 20 May 2021 10:08:09 -0400 X-MC-Unique: _2jOYkAhMsOUpjsgG_v5HA-1 Received: by mail-wr1-f69.google.com with SMTP id 2-20020adf94020000b0290110481f75ddso8672423wrq.21 for ; Thu, 20 May 2021 07:08:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=It7btjJ0p3Yc2h3EMcw3zPNhica8EKGBJRqOiDF2YA8=; b=jbDSifv9CiiFOJq+AXlzIzN8/Y8FS6KVGQZR/mFZ3ZnKfVUeUAhJ3ulkVVUgGw0l8C de5eQ6xohKhxN2DWSW4BC6h9aXcxszgx+O0UL2/tiHiPe5YJ/A9wL/EPeIebsrWrp6bX wSit7RPfQnF/JUA4EG/KdMU5hHqNrCerzjCtUvka6L9/FhUbG0VBhUxayqe4rQB257PU lbc2rC4tx8Tz8USK5Gqgf+QFlQL67uaO0zCOL51w+ZGV1iIE5aBKRBR+bERhbxFuWxiY uepuwE7Gd0OpbqR42HUa0e66OVs0+q0crFz4RfGuKvxkOx+StEnRogYytmt3EPQSWo6o JOaA== X-Gm-Message-State: AOAM530xyO4wAeatQ5YDzOG9WZFo6UW6UyuCN0DHwn327oStZd6/sKoS /dReze+DOVTyFD4lSMlyA3H3IdR+VdIbmuPemC5IKuhDez/IXWuIBCNWtz+xpR4MtGEJrXzJrIx oN9TbPXWHVprdjvgMNhl/tivZZEo= X-Received: by 2002:adf:f54b:: with SMTP id j11mr4244097wrp.376.1621519688418; Thu, 20 May 2021 07:08:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJynUwLFZ6h5RolD2UQb7WF4s79K47EEkXJZu1PY0ld5PYQtlKR2A/3xR2UsNW2/X99LMni/DqLrlJC+9YY0p8E= X-Received: by 2002:adf:f54b:: with SMTP id j11mr4244068wrp.376.1621519688151; Thu, 20 May 2021 07:08:08 -0700 (PDT) MIME-Version: 1.0 References: <20210520122536.1596602-1-agruenba@redhat.com> <20210520122536.1596602-7-agruenba@redhat.com> <20210520133015.GC18952@quack2.suse.cz> In-Reply-To: <20210520133015.GC18952@quack2.suse.cz> From: Andreas Gruenbacher Date: Thu, 20 May 2021 16:07:56 +0200 Message-ID: Subject: Re: [PATCH 6/6] gfs2: Fix mmap + page fault deadlocks (part 2) To: Jan Kara , Andy Lutomirski Cc: Alexander Viro , cluster-devel , linux-fsdevel , Linux-MM X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 33A5C6000254 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=RnHhVomD; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf25.hostedemail.com: domain of agruenba@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=agruenba@redhat.com X-Rspamd-Server: rspam03 X-Stat-Signature: 1csm1uetc1fu3u9ryzua4793r4bd1kbe X-HE-Tag: 1621519690-3940 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 20, 2021 at 3:30 PM Jan Kara wrote: > On Thu 20-05-21 14:25:36, Andreas Gruenbacher wrote: > > Now that we handle self-recursion on the inode glock in gfs2_fault and > > gfs2_page_mkwrite, we need to take care of more complex deadlock > > scenarios like the following (example by Jan Kara): > > > > Two independent processes P1, P2. Two files F1, F2, and two mappings M1, > > M2 where M1 is a mapping of F1, M2 is a mapping of F2. Now P1 does DIO > > to F1 with M2 as a buffer, P2 does DIO to F2 with M1 as a buffer. They > > can race like: > > > > P1 P2 > > read() read() > > gfs2_file_read_iter() gfs2_file_read_iter() > > gfs2_file_direct_read() gfs2_file_direct_read() > > locks glock of F1 locks glock of F2 > > iomap_dio_rw() iomap_dio_rw() > > bio_iov_iter_get_pages() bio_iov_iter_get_pages() > > > > gfs2_fault() gfs2_fault() > > tries to grab glock of F2 tries to grab glock of F1 > > > > Those kinds of scenarios are much harder to reproduce than > > self-recursion. > > > > We deal with such situations by using the LM_FLAG_OUTER flag to mark > > "outer" glock taking. Then, when taking an "inner" glock, we use the > > LM_FLAG_TRY flag so that locking attempts that don't immediately succeed > > will be aborted. In case of a failed locking attempt, we "unroll" to > > where the "outer" glock was taken, drop the "outer" glock, and fault in > > the first offending user page. This will re-trigger the "inner" locking > > attempt but without the LM_FLAG_TRY flag. Once that has happened, we > > re-acquire the "outer" glock and retry the original operation. > > > > Reported-by: Jan Kara > > Signed-off-by: Andreas Gruenbacher > > ... > > > diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c > > index 7d88abb4629b..8b26893f8dc6 100644 > > --- a/fs/gfs2/file.c > > +++ b/fs/gfs2/file.c > > @@ -431,21 +431,30 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf) > > vm_fault_t ret = VM_FAULT_LOCKED; > > struct gfs2_holder gh; > > unsigned int length; > > + u16 flags = 0; > > loff_t size; > > int err; > > > > sb_start_pagefault(inode->i_sb); > > > > - gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh); > > + if (current_holds_glock()) > > + flags |= LM_FLAG_TRY; > > + > > + gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, flags, &gh); > > if (likely(!outer_gh)) { > > err = gfs2_glock_nq(&gh); > > if (err) { > > ret = block_page_mkwrite_return(err); > > + if (err == GLR_TRYFAILED) { > > + set_current_needs_retry(true); > > + ret = VM_FAULT_SIGBUS; > > + } > > I've checked to make sure but do_user_addr_fault() indeed calls do_sigbus() > which raises the SIGBUS signal. So if the application does not ignore > SIGBUS, your retry will be visible to the application and can cause all > sorts of interesting results... I would have noticed that, but no SIGBUS signals were actually delivered. So we probably end up in kernelmode_fixup_or_oops() when in kernel mode, which just does nothing in that case. Andy Lutomirski, you've been involved with this, could you please shed some light? > So you probably need to add a new VM_FAULT_ > return code that will behave like VM_FAULT_SIGBUS except it will not raise > the signal. A new VM_FAULT_* flag might make the code easier to read, but I don't know if we can have one. > Otherwise it seems to me your approach should work. Thanks a lot, Andreas