From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 605E2C433E0 for ; Mon, 15 Jun 2020 23:24:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2C7ED20739 for ; Mon, 15 Jun 2020 23:24:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2C7ED20739 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=fromorbit.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9E4EA6B0003; Mon, 15 Jun 2020 19:24:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 996876B0005; Mon, 15 Jun 2020 19:24:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AB836B0006; Mon, 15 Jun 2020 19:24:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37]) by kanga.kvack.org (Postfix) with ESMTP id 6F5DA6B0003 for ; Mon, 15 Jun 2020 19:24:07 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C982C180AD81A for ; Mon, 15 Jun 2020 23:24:06 +0000 (UTC) X-FDA: 76933026492.20.bird76_0708b6e26dfa Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id A4F231802A174 for ; Mon, 15 Jun 2020 23:24:06 +0000 (UTC) X-HE-Tag: bird76_0708b6e26dfa X-Filterd-Recvd-Size: 5835 Received: from mail110.syd.optusnet.com.au (mail110.syd.optusnet.com.au [211.29.132.97]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Mon, 15 Jun 2020 23:24:05 +0000 (UTC) Received: from dread.disaster.area (pa49-180-124-177.pa.nsw.optusnet.com.au [49.180.124.177]) by mail110.syd.optusnet.com.au (Postfix) with ESMTPS id C9FC1108579; Tue, 16 Jun 2020 09:24:03 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1jkySM-0000zv-Pn; Tue, 16 Jun 2020 09:23:58 +1000 Date: Tue, 16 Jun 2020 09:23:58 +1000 From: Dave Chinner To: Matthew Wilcox Cc: Michal Hocko , Holger =?iso-8859-1?Q?Hoffst=E4tte?= , Yafang Shao , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3] xfs: avoid deadlock when trigger memory reclaim in ->writepages Message-ID: <20200615232358.GW2040@dread.disaster.area> References: <1592222181-9832-1-git-send-email-laoar.shao@gmail.com> <20200615145331.GK25296@dhcp22.suse.cz> <20200615150736.GU8681@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20200615150736.GU8681@bombadil.infradead.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=W5xGqiek c=1 sm=1 tr=0 a=k3aV/LVJup6ZGWgigO6cSA==:117 a=k3aV/LVJup6ZGWgigO6cSA==:17 a=8nJEP1OIZ-IA:10 a=nTHF0DUjJn0A:10 a=07d9gI8wAAAA:8 a=7-415B0cAAAA:8 a=1qhV6LbZMGmvVgBn4dMA:9 a=XM1BfgJEnEYWeaHb:21 a=yyMHCK8CtyEn-BmN:21 a=wPNLvfGTeEIA:10 a=e2CUPOnPG4QKp8I52DXD:22 a=biEYGPWJfzWAr4FL6Ov7:22 X-Rspamd-Queue-Id: A4F231802A174 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 15, 2020 at 08:07:36AM -0700, Matthew Wilcox wrote: > On Mon, Jun 15, 2020 at 04:53:31PM +0200, Michal Hocko wrote: > > On Mon 15-06-20 16:25:52, Holger Hoffst=E4tte wrote: > > > On 2020-06-15 13:56, Yafang Shao wrote: > > [...] > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > > > index b356118..1ccfbf2 100644 > > > > --- a/fs/xfs/xfs_aops.c > > > > +++ b/fs/xfs/xfs_aops.c > > > > @@ -573,9 +573,21 @@ static inline bool xfs_ioend_needs_workqueue= (struct iomap_ioend *ioend) > > > > struct writeback_control *wbc) > > > > { > > > > struct xfs_writepage_ctx wpc =3D { }; > > > > + unsigned int nofs_flag; > > > > + int ret; > > > > xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); > > > > - return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_= ops); > > > > + > > > > + /* > > > > + * We can allocate memory here while doing writeback on behalf = of > > > > + * memory reclaim. To avoid memory allocation deadlocks set th= e > > > > + * task-wide nofs context for the following operations. > > > > + */ > > > > + nofs_flag =3D memalloc_nofs_save(); > > > > + ret =3D iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback= _ops); > > > > + memalloc_nofs_restore(nofs_flag); > > > > + > > > > + return ret; > > > > } > > > > STATIC int > > > >=20 > > >=20 > > > Not sure if I did something wrong, but while the previous version o= f this patch > > > worked fine, this one gave me (with v2 removed obviously): > > >=20 > > > [ +0.000004] WARNING: CPU: 0 PID: 2811 at fs/iomap/buffered-io.c:1= 544 iomap_do_writepage+0x6b4/0x780 > >=20 > > This corresponds to > > /* > > * Given that we do not allow direct reclaim to call us, we s= hould > > * never be called in a recursive filesystem reclaim context. > > */ > > if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) > > goto redirty; > >=20 > > which effectivelly says that memalloc_nofs_save/restore cannot be use= d > > for that code path. Your stack trace doesn't point to a reclaim path > > which shows that this path is shared and also underlines that this is > > not really an intended use of the api. Please refer to > > Documentation/core-api/gfp_mask-from-fs-io.rst for more details but > > shortly the API should be used at the layer which defines a context > > which shouldn't allow to recurse. E.g. a lock which would be problema= tic > > in the reclaim recursion path. >=20 > I'm concerned that this might not be the correct approach. Generally w= e > have the rule that freeing memory should not require allocating memory > to succeed. Since XFS obviously does need to allocate memory to start > a transaction, this allocation should normally be protected by a mempoo= l > or similar. We've been down this path before about exactly how much memory XFS uses to do extent allocation and why mempools nor any existing mm infrastructure can actually guarantee filesystem transactions forwards progress. e.g: https://lwn.net/Articles/636797/ It's even more problematic now that extent allocation is not jsut one transaction now, but is a series of overlapping linked transactions that update things like refcount and reverse mapping btrees, and not just the freespace and inode block map btrees... > XFS is subtle and has its own rules around memory allocation and > writeback, so I don't want to say this is definitely wrong. I'm just > saying that I have concerns it might not be right. Almost all XFS memory allocations within the writeback path are within transaction contexts - the transaction allocation itself is an exception, but then xfs_trans_alloc() sets PF_MEMALLOC_NOFS itself (used to be PF_FSTRANS, as per my previous comments in this thread). Hence putting the entire ->writepages path under NOFS context won't change anything materially for XFS. -Dave. --=20 Dave Chinner david@fromorbit.com