From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A842FC433E1 for ; Tue, 16 Jun 2020 07:56:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 610162074D for ; Tue, 16 Jun 2020 07:56:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 610162074D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ECA3B6B0062; Tue, 16 Jun 2020 03:56:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7B286B006C; Tue, 16 Jun 2020 03:56:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D44616B006E; Tue, 16 Jun 2020 03:56:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0225.hostedemail.com [216.40.44.225]) by kanga.kvack.org (Postfix) with ESMTP id BA0E26B0062 for ; Tue, 16 Jun 2020 03:56:10 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 61BAD8248047 for ; Tue, 16 Jun 2020 07:56:10 +0000 (UTC) X-FDA: 76934316900.30.bag35_2c0776f26dfd Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 3ACB1180B3AA7 for ; Tue, 16 Jun 2020 07:56:10 +0000 (UTC) X-HE-Tag: bag35_2c0776f26dfd X-Filterd-Recvd-Size: 7663 Received: from mail-ej1-f68.google.com (mail-ej1-f68.google.com [209.85.218.68]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Tue, 16 Jun 2020 07:56:09 +0000 (UTC) Received: by mail-ej1-f68.google.com with SMTP id o15so20424770ejm.12 for ; Tue, 16 Jun 2020 00:56:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=dN+mDKrpnDn717I3S9kv2L1TUXLfxVtMTVGOj5fL0pU=; b=rRzWMwyjBB2v2vWOQh0cfpI4jYxY6/pXp0utRvhCc/RIwEfzgRbLNbyMDIrPWA7e2q /gEIm057+erOWeSiI4MllOpP84qT57AAvLM86541i4PDvMmt2x/ETi/PnkkabgAK5FOK UCgsAuprN8dFNjno1r0JJgUJ5I/mhEi4Z/u+4ovGGAVdv4S9zHqxmWXIMeinyu6E4l0S eS6VJwstDYTVm9Itks3cipavTOJaMIK7C4HiguJSUHzgRuvIz65BEIYzzLVqeGFObWKT KFp5wgs3+vHiwMOMVf8YIPdGSrpm2lkF8qZLIgQoM2u6Pi7YT4d6Fb/nlKOjsa1FMDE/ M3LQ== X-Gm-Message-State: AOAM533uCeNHyp3Ml0ut31S8paIA/n/S0JzhPNkuuH9OU//nGAuGCxAm pKS+x10yLBPkTOLexZ1R3wg= X-Google-Smtp-Source: ABdhPJzByjd93nmO3buc4jKoO9YB5XnF7cfyMVfnCbUp6aMudgOe+Uw+L0TpkhUkHA/fWUQfUAmzaw== X-Received: by 2002:a17:906:2e83:: with SMTP id o3mr1576329eji.312.1592294168594; Tue, 16 Jun 2020 00:56:08 -0700 (PDT) Received: from localhost (ip-37-188-174-201.eurotel.cz. [37.188.174.201]) by smtp.gmail.com with ESMTPSA id o5sm10596109eje.66.2020.06.16.00.56.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Jun 2020 00:56:07 -0700 (PDT) Date: Tue, 16 Jun 2020 09:56:06 +0200 From: Michal Hocko To: Dave Chinner Cc: Holger =?iso-8859-1?Q?Hoffst=E4tte?= , Yafang Shao , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3] xfs: avoid deadlock when trigger memory reclaim in ->writepages Message-ID: <20200616075606.GB9499@dhcp22.suse.cz> References: <1592222181-9832-1-git-send-email-laoar.shao@gmail.com> <20200615145331.GK25296@dhcp22.suse.cz> <20200615230605.GV2040@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20200615230605.GV2040@dread.disaster.area> X-Rspamd-Queue-Id: 3ACB1180B3AA7 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 16-06-20 09:06:05, Dave Chinner wrote: > On Mon, Jun 15, 2020 at 04:53:31PM +0200, Michal Hocko wrote: > > On Mon 15-06-20 16:25:52, Holger Hoffst=E4tte wrote: > > > On 2020-06-15 13:56, Yafang Shao wrote: > > [...] > > > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > > > index b356118..1ccfbf2 100644 > > > > --- a/fs/xfs/xfs_aops.c > > > > +++ b/fs/xfs/xfs_aops.c > > > > @@ -573,9 +573,21 @@ static inline bool xfs_ioend_needs_workqueue= (struct iomap_ioend *ioend) > > > > struct writeback_control *wbc) > > > > { > > > > struct xfs_writepage_ctx wpc =3D { }; > > > > + unsigned int nofs_flag; > > > > + int ret; > > > > xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); > > > > - return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_= ops); > > > > + > > > > + /* > > > > + * We can allocate memory here while doing writeback on behalf = of > > > > + * memory reclaim. To avoid memory allocation deadlocks set th= e > > > > + * task-wide nofs context for the following operations. > > > > + */ > > > > + nofs_flag =3D memalloc_nofs_save(); > > > > + ret =3D iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback= _ops); > > > > + memalloc_nofs_restore(nofs_flag); > > > > + > > > > + return ret; > > > > } > > > > STATIC int > > > >=20 > > >=20 > > > Not sure if I did something wrong, but while the previous version o= f this patch > > > worked fine, this one gave me (with v2 removed obviously): > > >=20 > > > [ +0.000004] WARNING: CPU: 0 PID: 2811 at fs/iomap/buffered-io.c:1= 544 iomap_do_writepage+0x6b4/0x780 > >=20 > > This corresponds to > > /* > > * Given that we do not allow direct reclaim to call us, we s= hould > > * never be called in a recursive filesystem reclaim context. > > */ > > if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) > > goto redirty; > >=20 > > which effectivelly says that memalloc_nofs_save/restore cannot be use= d > > for that code path. >=20 > No it doesn't. Everyone is ignoring the -context- of this code, most > especially the previous 3 lines of code and it's comment: >=20 > /* > * Refuse to write the page out if we are called from reclaim c= ontext. > * > * This avoids stack overflows when called from deeply used sta= cks in > * random callers for direct reclaim or memcg reclaim. We expl= icitly > * allow reclaim from kswapd as the stack usage there is relati= vely low. > * > * This should never happen except in the case of a VM regressi= on so > * warn about it. > */ > if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) =3D= =3D > PF_MEMALLOC)) > goto redirty; >=20 > You will see that we specifically avoid this path from reclaim > context except for kswapd. And kswapd always runs with GFP_KERNEL > context so we allow writeback from it, so it will pass both this > check and the NOFS check above.=20 >=20 > IOws, we can't trigger to the WARN_ON_ONCE(current->flags & > PF_MEMALLOC_NOFS)) check from a memory reclaim context: this > PF_MEMALLOC_NOFS check here is not doing what people think it is. You are right. > History lesson time, eh? >=20 > The recursion protection here -used- to use PF_FSTRANS, not > PF_MEMALLOC_NOFS. See commit 9070733b4efac ("xfs: abstract > PF_FSTRANS to PF_MEMALLOC_NOFS"). This hunk is most instructive > when you look at the comment: >=20 > * Given that we do not allow direct reclaim to call us, we sho= uld > * never be called while in a filesystem transaction. > */ > - if (WARN_ON_ONCE(current->flags & PF_FSTRANS)) > + if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) > goto redirty; >=20 > It wasn't for memory allocation recursion protection in XFS - it was > for transaction reservation recursion protection by something trying > to flush data pages while holding a transaction reservation. Doing > this could deadlock the journal because the existing reservation > could prevent the nested reservation for being able to reserve space > in the journal and that is a self-deadlock vector. >=20 > IOWs, this check is not protecting against memory reclaim recursion > bugs at all (that's the previous check I quoted). This check is > protecting against the filesystem calling writepages directly from a > context where it can self-deadlock. Thanks for the clarification. > So what we are seeing here is that the PF_FSTRANS -> > PF_MEMALLOC_NOFS abstraction lost all the actual useful information > about what type of error this check was protecting against. I have to admit that I am not familiar with the xfs code and the PF_TRANS abstraction to PF_MEMALLOC_NOFS was mostly automatic and I relied on xfs maintainers to tell me I was doing something stupid. Now after your explanation it makes more sense that the warning is indeed protecting from a different kind of issue. --=20 Michal Hocko SUSE Labs