From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7865BC433E0 for ; Tue, 9 Feb 2021 10:01:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4BFA064E77 for ; Tue, 9 Feb 2021 10:01:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4BFA064E77 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=szeredi.hu Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B2B756B0005; Tue, 9 Feb 2021 05:01:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B01B46B006C; Tue, 9 Feb 2021 05:01:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9EEB56B006E; Tue, 9 Feb 2021 05:01:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0053.hostedemail.com [216.40.44.53]) by kanga.kvack.org (Postfix) with ESMTP id 87E5A6B0005 for ; Tue, 9 Feb 2021 05:01:21 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 503E3184AE5F8 for ; Tue, 9 Feb 2021 10:01:21 +0000 (UTC) X-FDA: 77798286762.14.BEFBAD1 Received: from mail-ed1-f42.google.com (mail-ed1-f42.google.com [209.85.208.42]) by imf02.hostedemail.com (Postfix) with ESMTP id 7ED9D40001FD for ; Tue, 9 Feb 2021 10:01:19 +0000 (UTC) Received: by mail-ed1-f42.google.com with SMTP id l12so22689434edt.3 for ; Tue, 09 Feb 2021 02:01:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Ahe/88oDGLp2JKa7aUpgG0hTDRSR52Nw5wU2PH5TQN8=; b=VRSPeA2UOfn1ZHevjh3mhpVVGTu5RXY4/uLvMzYCF2+9qehACcp0dWyUuKcU7fXxdq SH6N2E0sLvRjtBCdtdzGcQghNb3ORdMVo6iYuB4RyXpdxdipLLeQfP+QYyjGgC40CQU6 a4xpujcX88eyRoVl7nOImVLiGOKxHe359s/ok= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Ahe/88oDGLp2JKa7aUpgG0hTDRSR52Nw5wU2PH5TQN8=; b=lWUswYLuP9ik4sgFPipgP9Eu6l0COrOMECeMk9pNZkaLj2QgguDOGt2gk2ZS5iDRLI qlB85ZjtgcOp9pqshiczPtXGNHNe1P1Bd9srVRy1rhf6yLA5DrqU+tKnJCTQSJZivThA NBFlL7+9ISWykxTZMujyXgwlLmzdwBCn+qSMR9zyTy7l5jfZKwH/P3L9BOCpSMbdGszX Zhx30cFzAKF9tGlhmX4wvVpoZFozv+iq8QRgRT8TQPbqSS0S4ZWJd1rpBdCvL9x9ytsd PZ0355zzyzfelnNFWf5VfhCr0FPrc0eQa/fgXP37Q4hZyNhqvgVRchTvsdEQGRhBjwn8 8AIA== X-Gm-Message-State: AOAM533P1BadLiDhxNMyjrsJlociMWYJgyjylO5TpVoQxv3YA4seS5pB TUzXhXNZuGcXwNnnyrqtWgAZZQ== X-Google-Smtp-Source: ABdhPJzdOIiuw6jzhZ8qEruOgYOnKANwOJwRc0ME7gBZpZ+BSxcudwxDoIDCLg0UCRZZzA6I+bPBTw== X-Received: by 2002:a05:6402:281:: with SMTP id l1mr2161407edv.252.1612864878928; Tue, 09 Feb 2021 02:01:18 -0800 (PST) Received: from miu.piliscsaba.redhat.com (catv-86-101-169-67.catv.broadband.hu. [86.101.169.67]) by smtp.gmail.com with ESMTPSA id l5sm9588846edv.50.2021.02.09.02.01.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Feb 2021 02:01:18 -0800 (PST) Date: Tue, 9 Feb 2021 11:01:15 +0100 From: Miklos Szeredi To: Vivek Goyal Cc: Linus Torvalds , Qian Cai , Hugh Dickins , Matthew Wilcox , "Kirill A . Shutemov" , Linux-MM , Andrew Morton , linux-fsdevel , Amir Goldstein Subject: Re: Possible deadlock in fuse write path (Was: Re: [PATCH 0/4] Some more lock_page work..) Message-ID: <20210209100115.GB1208880@miu.piliscsaba.redhat.com> References: <4794a3fa3742a5e84fb0f934944204b55730829b.camel@lca.pw> <20201015151606.GA226448@redhat.com> <20201015195526.GC226448@redhat.com> <20201020204226.GA376497@redhat.com> <20201021201249.GB442437@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7ED9D40001FD X-Stat-Signature: g9oqccobhtou7rxjbh9dx99t1f9dd5gx Received-SPF: none (szeredi.hu>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from=""; helo=mail-ed1-f42.google.com; client-ip=209.85.208.42 X-HE-DKIM-Result: invalid/invalid (public key: not available) X-HE-Tag: 1612864879-627704 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Vivek, Here's an updated patch with a header comment containing all the gory details. It's basically your patch, but it's missing your S-o-b. If you are fine with it, can you please provide one? The only change I made was to clear PG_uptodate on write error, otherwise I think the patch is fine. Splitting the request might cause a performance regression in some cases (lots of unaligned writes that spill over a page boundary) but I don't have a better idea at this moment. Thanks, Miklos ---- Date: Wed, 21 Oct 2020 16:12:49 -0400 From: Vivek Goyal Subject: fuse: fix write deadlock There are two modes for write(2) and friends in fuse: a) write through (update page cache, send sync WRITE request to userspace) b) buffered write (update page cache, async writeout later) The write through method kept all the page cache pages locked that were used for the request. Keeping more than one page locked is deadlock prone and Qian Cai demonstrated this with trinity fuzzing. The reason for keeping the pages locked is that concurrent mapped reads shouldn't try to pull possibly stale data into the page cache. For full page writes, the easy way to fix this is to make the cached page be the authoritative source by marking the page PG_uptodate immediately. After this the page can be safely unlocked, since mapped/cached reads will take the written data from the cache. Concurrent mapped writes will now cause data in the original WRITE request to be updated; this however doesn't cause any data inconsistency and this scenario should be exceedingly rare anyway. If the WRITE request returns with an error in the above case, currently the page is not marked uptodate; this means that a concurrent read will always read consistent data. After this patch the page is uptodate between writing to the cache and receiving the error: there's window where a cached read will read the wrong data. While theoretically this could be a regression, it is unlikely to be one in practice, since this is normal for buffered writes. In case of a partial page write to an already uptodate page the locking is also unnecessary, with the above caveats. Partial write of a not uptodate page still needs to be handled. One way would be to read the complete page before doing the write. This is not possible, since it might break filesystems that don't expect any READ requests when the file was opened O_WRONLY. The other solution is to serialize the synchronous write with reads from the partial pages. The easiest way to do this is to keep the partial pages locked. The problem is that a write() may involve two such pages (one head and one tail). This patch fixes it by only locking the partial tail page. If there's a partial head page as well, then split that off as a separate WRITE request. Reported-by: Qian Cai Fixes: ea9b9907b82a ("fuse: implement perform_write") Cc: # v2.6.26 Signed-off-by: Miklos Szeredi --- fs/fuse/file.c | 25 +++++++++++++++---------- fs/fuse/fuse_i.h | 1 + 2 files changed, 16 insertions(+), 10 deletions(-) --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1117,17 +1117,12 @@ static ssize_t fuse_send_write_pages(str count = ia->write.out.size; for (i = 0; i < ap->num_pages; i++) { struct page *page = ap->pages[i]; + bool page_locked = ap->page_locked && (i == ap->num_pages - 1); - if (!err && !offset && count >= PAGE_SIZE) - SetPageUptodate(page); - - if (count > PAGE_SIZE - offset) - count -= PAGE_SIZE - offset; - else - count = 0; - offset = 0; - - unlock_page(page); + if (err) + ClearPageUptodate(page); + if (page_locked) + unlock_page(page); put_page(page); } @@ -1191,6 +1186,16 @@ static ssize_t fuse_fill_write_pages(str if (offset == PAGE_SIZE) offset = 0; + /* If we copied full page, mark it uptodate */ + if (tmp == PAGE_SIZE) + SetPageUptodate(page); + + if (PageUptodate(page)) { + unlock_page(page); + } else { + ap->page_locked = true; + break; + } if (!fc->big_writes) break; } while (iov_iter_count(ii) && count < fc->max_write && --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -277,6 +277,7 @@ struct fuse_args_pages { struct page **pages; struct fuse_page_desc *descs; unsigned int num_pages; + bool page_locked; }; #define FUSE_ARGS(args) struct fuse_args args = {}