From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2F22C433E0 for ; Mon, 22 Jun 2020 12:18:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4F38D206FA for ; Mon, 22 Jun 2020 12:18:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ikvG2saP" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4F38D206FA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A72F26B0002; Mon, 22 Jun 2020 08:18:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FB4B6B0003; Mon, 22 Jun 2020 08:18:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C2D56B0005; Mon, 22 Jun 2020 08:18:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0082.hostedemail.com [216.40.44.82]) by kanga.kvack.org (Postfix) with ESMTP id 6EDD56B0002 for ; Mon, 22 Jun 2020 08:18:56 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E29F92DFD for ; Mon, 22 Jun 2020 12:18:55 +0000 (UTC) X-FDA: 76956751830.07.beds03_08158f926e32 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 59E9E1803F9C6 for ; Mon, 22 Jun 2020 12:18:46 +0000 (UTC) X-HE-Tag: beds03_08158f926e32 X-Filterd-Recvd-Size: 12253 Received: from mail-io1-f66.google.com (mail-io1-f66.google.com [209.85.166.66]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Mon, 22 Jun 2020 12:18:45 +0000 (UTC) Received: by mail-io1-f66.google.com with SMTP id i25so19268378iog.0 for ; Mon, 22 Jun 2020 05:18:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NnnUq7IDM5JBt1ImQ/OjPFJfNxXcY93InCMyrjtZ290=; b=ikvG2saPdMZV4cfjCRwp/G2LB7jCvMpt662Yu9i6rI0YHjp5bDUqeEIKy3yoKdTxS7 XbbLi53IKeoGp68qjlGFaud6eRMulBJXUSxRCGZdvN30o2U4PXkShmPNBEkr0QgBqaHJ MKaHLD66/+33XnmKJ7pXRf7RXer463IUuO2wSpiHGd66yPoQfEb50RYYxBxFMSIq62EU MEEVrpJ6qnAbkbiPjtPVJUfHliA263Yep+pVCKlUqbXfCInRo3UObAkvbzGhzC+/kN7l SlwKusgSg3fTeC7zaHEJUFY79DmMGXO3MQ9FgvjiB4cV5HVCynKMp7BgMCitORiOlxFj 9ytQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NnnUq7IDM5JBt1ImQ/OjPFJfNxXcY93InCMyrjtZ290=; b=N+4qM/EknohAnVHL259pCL4NqmYUcxEHO705/KIKS0lV7YB4dhA1ClviS8ZHmjQSly HMHmIrsb7xGhY8C6W29RcgYnTZClfT3I7no16JKy8+vFg7bUMQRxR5WhBvm+B4JII02J 6AEKZOAgmksdVqSOYMOR1R5HaFiXKlUgGHWmHHibnprnoq1mwIP7yJ1xIPRVPpmagOc6 XkBmBlUt7k4yzn5ipgQK2gLjj04CCAoYEBYUQhc2n+2hB38Xo/5Exonqg79M+HFpNQQk 9htsr7zegb+5laP7oUHYDPJSlw0gefUYr7NHTYgPizzt7RwtnXxyRj0XmrQNQQ4EA+n4 hmPQ== X-Gm-Message-State: AOAM53373ftqgmqGe5ondF1wth0wE8sVNH5u3FNA/+c8NPqAk6H3zTzk MfEgQ6bctjAL5PaPfX+Di+UO6aUHzx7a5nea78s= X-Google-Smtp-Source: ABdhPJxadLIziS5uohvCIOR68jQvCAvSv5nSdvq8coWAw3esyDudVrgXH1a6H4R7rNjGzoCoiW0wNCI8oIjjAML6k+k= X-Received: by 2002:a05:6638:31b:: with SMTP id w27mr5030526jap.109.1592828324988; Mon, 22 Jun 2020 05:18:44 -0700 (PDT) MIME-Version: 1.0 References: <1592637174-19657-1-git-send-email-laoar.shao@gmail.com> <20200621230420.GT2005@dread.disaster.area> In-Reply-To: <20200621230420.GT2005@dread.disaster.area> From: Yafang Shao Date: Mon, 22 Jun 2020 20:18:09 +0800 Message-ID: Subject: Re: [PATCH] xfs: reintroduce PF_FSTRANS for transaction reservation recursion protection To: Dave Chinner Cc: Michal Hocko , "Darrick J. Wong" , Christoph Hellwig , Andrew Morton , Brian Foster , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, Linux MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 59E9E1803F9C6 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 22, 2020 at 7:04 AM Dave Chinner wrote: > > On Sat, Jun 20, 2020 at 03:12:54AM -0400, Yafang Shao wrote: > > PF_FSTRANS which is used to avoid transaction reservation recursion, is > > dropped since commit 9070733b4efa ("xfs: abstract PF_FSTRANS to > > PF_MEMALLOC_NOFS") and commit 7dea19f9ee63 ("mm: introduce > > memalloc_nofs_{save,restore} API") and replaced by PF_MEMALLOC_NOFS which > > means to avoid filesystem reclaim recursion. That change is subtle. > > Let's take the exmple of the check of WARN_ON_ONCE(current->flags & > > PF_MEMALLOC_NOFS)) to explain why this abstraction from PF_FSTRANS to > > PF_MEMALLOC_NOFS is not proper. > > > > Bellow comment is quoted from Dave, > > > It wasn't for memory allocation recursion protection in XFS - it was for > > > transaction reservation recursion protection by something trying to flush > > > data pages while holding a transaction reservation. Doing > > > this could deadlock the journal because the existing reservation > > > could prevent the nested reservation for being able to reserve space > > > in the journal and that is a self-deadlock vector. > > > IOWs, this check is not protecting against memory reclaim recursion > > > bugs at all (that's the previous check [1]). This check is > > > protecting against the filesystem calling writepages directly from a > > > context where it can self-deadlock. > > > So what we are seeing here is that the PF_FSTRANS -> > > > PF_MEMALLOC_NOFS abstraction lost all the actual useful information > > > about what type of error this check was protecting against. > > > > Besides reintroducing PF_FSTRANS, there're some other improvements in this > > patch, > > - Remove useless MACRO current_clear_flags_nested(), current_pid() and > > current_test_flags(). > > - Remove useless memalloc_nofs_{save, restore} in __kmem_vmalloc() > > > > [1]. Bellow check is to avoid memory reclaim recursion. > > if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == > > PF_MEMALLOC)) > > goto redirty; > > > > Cc: Dave Chinner > > Cc: Michal Hocko > > Signed-off-by: Yafang Shao > > --- > > fs/iomap/buffered-io.c | 4 ++-- > > fs/xfs/kmem.c | 7 ------- > > fs/xfs/kmem.h | 2 +- > > fs/xfs/libxfs/xfs_btree.c | 2 +- > > fs/xfs/xfs_aops.c | 4 ++-- > > fs/xfs/xfs_linux.h | 4 ---- > > fs/xfs/xfs_trans.c | 12 ++++++------ > > include/linux/sched.h | 1 + > > 8 files changed, 13 insertions(+), 23 deletions(-) > > > > diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c > > index bcfc288..0f1945c 100644 > > --- a/fs/iomap/buffered-io.c > > +++ b/fs/iomap/buffered-io.c > > @@ -1500,9 +1500,9 @@ static void iomap_writepage_end_bio(struct bio *bio) > > > > /* > > * Given that we do not allow direct reclaim to call us, we should > > - * never be called in a recursive filesystem reclaim context. > > + * never be called while in a filesystem transaction. > > */ > > - if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS)) > > + if (WARN_ON_ONCE(current->flags & PF_FSTRANS)) > > goto redirty; > > This is OK, but the rest of the patch is not. > > I did not say "replace all XFS use of GFP_NOFS/KM_NOFS with > PF_TRANS", which is what this patch does. The use of > PF_MEMALLOC_NOFS within transactions is correct and valid and needs > to remain. Replacing this with PF_FSTRANS effectively reverts all > the simplifications and obviously self-documneting code that > PF_MEMALLOC_NOFS provides us with. > Sorry about that, I misunderstood it. Will correct it in the next version. > IOWs, PF_MEMALLOC_NOFS is used to indicate that this is a "no > reclaim recursion" path and so it's use remains completely unchanged > in XFS. PF_FSTRANS is to indicate this is a "no > transaction recursion" path, which is a different thing and needs > it's own specific annotation. > Thanks for the explanation. > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c > > index f136647..9875a23 100644 > > --- a/fs/xfs/kmem.c > > +++ b/fs/xfs/kmem.c > > @@ -41,18 +41,11 @@ > > static void * > > __kmem_vmalloc(size_t size, xfs_km_flags_t flags) > > { > > - unsigned nofs_flag = 0; > > void *ptr; > > gfp_t lflags = kmem_flags_convert(flags); > > > > - if (flags & KM_NOFS) > > - nofs_flag = memalloc_nofs_save(); > > - > > ptr = __vmalloc(size, lflags); > > > > - if (flags & KM_NOFS) > > - memalloc_nofs_restore(nofs_flag); > > - > > This breaks both kmem_alloc_large() and kmem_alloc_io() if they are > called from an explicit KM_NOFS context. vmalloc() does not respect > the gfp flags that are passed to it and will always do GFP_KERNEL > allocations deep down in the page table allocation code, and hence > we must use memalloc_nofs_save() here if called in a KM_NOFS > context. > I thought kmem_flags_convert() has already checked KM_NOFS so we don't need to call memalloc_nofs_save(), but it seems I was wrong. Thanks for the clarification. > > return ptr; > > } > > > > diff --git a/fs/xfs/kmem.h b/fs/xfs/kmem.h > > index 34cbcfd..ccc63de 100644 > > --- a/fs/xfs/kmem.h > > +++ b/fs/xfs/kmem.h > > @@ -34,7 +34,7 @@ > > BUG_ON(flags & ~(KM_NOFS | KM_MAYFAIL | KM_ZERO | KM_NOLOCKDEP)); > > > > lflags = GFP_KERNEL | __GFP_NOWARN; > > - if (flags & KM_NOFS) > > + if (current->flags & PF_FSTRANS || flags & KM_NOFS) > > lflags &= ~__GFP_FS; > > No. If we are in a transaction context, PF_MEMALLOC_NOFS should be > set. We got rid of all the PF_FSTRANS checks out of this code by > moving to PF_MEMALLOC_NOFS, reverting this isn't an improvement. > Got it. Thanks. > > > > /* > > diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c > > index 2d25bab..65d0afe 100644 > > --- a/fs/xfs/libxfs/xfs_btree.c > > +++ b/fs/xfs/libxfs/xfs_btree.c > > @@ -2814,7 +2814,7 @@ struct xfs_btree_split_args { > > struct xfs_btree_split_args *args = container_of(work, > > struct xfs_btree_split_args, work); > > unsigned long pflags; > > - unsigned long new_pflags = PF_MEMALLOC_NOFS; > > + unsigned long new_pflags = PF_FSTRANS; > > new_pflags = PF_MEMALLOC_NOFS | PF_FSTRANS; > > > > /* > > * we are in a transaction context here, but may also be doing work > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > > index b356118..02733eb 100644 > > --- a/fs/xfs/xfs_aops.c > > +++ b/fs/xfs/xfs_aops.c > > @@ -62,7 +62,7 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend) > > * We hand off the transaction to the completion thread now, so > > * clear the flag here. > > */ > > - current_restore_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); > > + current_restore_flags_nested(&tp->t_pflags, PF_FSTRANS); > > current_restore_flags_nested(PF_MEMALLOC_NOFS | PF_FSTRANS); > Thanks > > return 0; > > } > > > > @@ -125,7 +125,7 @@ static inline bool xfs_ioend_is_append(struct iomap_ioend *ioend) > > * thus we need to mark ourselves as being in a transaction manually. > > * Similarly for freeze protection. > > */ > > - current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); > > + current_set_flags_nested(&tp->t_pflags, PF_FSTRANS); > > current_set_flags_nested(PF_MEMALLOC_NOFS | PF_FSTRANS); > Thanks > > __sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS); > > > > /* we abort the update if there was an IO error */ > > diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h > > index 9f70d2f..ab737fe 100644 > > --- a/fs/xfs/xfs_linux.h > > +++ b/fs/xfs/xfs_linux.h > > @@ -102,12 +102,8 @@ > > #define xfs_cowb_secs xfs_params.cowb_timer.val > > > > #define current_cpu() (raw_smp_processor_id()) > > -#define current_pid() (current->pid) > > -#define current_test_flags(f) (current->flags & (f)) > > #define current_set_flags_nested(sp, f) \ > > (*(sp) = current->flags, current->flags |= (f)) > > -#define current_clear_flags_nested(sp, f) \ > > - (*(sp) = current->flags, current->flags &= ~(f)) > > #define current_restore_flags_nested(sp, f) \ > > (current->flags = ((current->flags & ~(f)) | (*(sp) & (f)))) > > Separate cleanup patch to remove unrelated definitions, please. > Sure, I will. > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c > > index 3c94e5f..1c1b982 100644 > > --- a/fs/xfs/xfs_trans.c > > +++ b/fs/xfs/xfs_trans.c > > @@ -153,7 +153,7 @@ > > bool rsvd = (tp->t_flags & XFS_TRANS_RESERVE) != 0; > > > > /* Mark this thread as being in a transaction */ > > - current_set_flags_nested(&tp->t_pflags, PF_MEMALLOC_NOFS); > > + current_set_flags_nested(&tp->t_pflags, PF_FSTRANS); > > > > And, again, PF_FSTRANS | PF_MEMALLOC_NOFS through this code. > Thanks -- Thanks Yafang