From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 534D7C433F5 for ; Thu, 17 Mar 2022 13:51:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB1588D0002; Thu, 17 Mar 2022 09:51:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A60878D0001; Thu, 17 Mar 2022 09:51:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 900948D0002; Thu, 17 Mar 2022 09:51:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id 7CDFA8D0001 for ; Thu, 17 Mar 2022 09:51:49 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3E50A8249980 for ; Thu, 17 Mar 2022 13:51:49 +0000 (UTC) X-FDA: 79254016338.19.246519D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf24.hostedemail.com (Postfix) with ESMTP id C687C18002E for ; Thu, 17 Mar 2022 13:51:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1647525108; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=iKbILpPpnN7imRoozJjDZNfBKXeUG41cjQ9Ij+ryjFs=; b=JX4CQiur9VENPXm8LctI5LxdHFNWpedw3/pMQpRQED0VmExxTdrBwHZR71RBWSmIhThUDV I/jHOKCuDNwEKGDtufT+qAiAjxh79gPX/Z8fsOLOmFgolEbezHqyvGDSJb3zsT+Xe+wJX1 7UMALL2BFgtsFlSysnkSsmKOUkcDkdU= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-404-TwYItjf2ND-QCaNV-Nhl3w-1; Thu, 17 Mar 2022 09:51:47 -0400 X-MC-Unique: TwYItjf2ND-QCaNV-Nhl3w-1 Received: by mail-qk1-f198.google.com with SMTP id z10-20020a05620a08ca00b0067d341e82edso3363150qkz.17 for ; Thu, 17 Mar 2022 06:51:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=iKbILpPpnN7imRoozJjDZNfBKXeUG41cjQ9Ij+ryjFs=; b=lDCQwMwvQgH0feCBvEU4HwGu4dPOg4iaME3e80Z6YD8x3YzRA2LOOjIjGk9+NFCVup fQXMh/Rw4B7KZH4I8CHJTjfMXtXJUONXrG1+IkNeAxm2bLGFzc3D+HJfGYAmBWMqxVl5 vT3t936HEN0Row54od9vOLQjE+zhHtd0exmaGlkdQ00bziYUgp0xnTViHnyWFuoC/SYg n4X6zUuM/8pG4kpL+HiIQLjCyUsQkZYeSkqkgFMGkBkMGaa3IGE8BVxRHDFGX5wBVfhD 8Lz1teR5EA4eTDL3iJ8fFMdI21LQPJBUkMYmyvzhxk1EH7qATQzKQvRBdM/lOgc0o7gv n3xw== X-Gm-Message-State: AOAM5336W0k/xi5Lx2PQE0UsGgQCkzNrINcYoiG0jiJDoe1UB8Qic1OO bC0Ai4hpObCAfNgnqnd0WAv2bTGAedhLrohpeTvl+pR3Gvzaq1yeu/5YFY6+c9pCiAxIKmSfj9G DRFdLuEJgkFo= X-Received: by 2002:ad4:5961:0:b0:435:a1d7:c243 with SMTP id eq1-20020ad45961000000b00435a1d7c243mr3626404qvb.46.1647525106849; Thu, 17 Mar 2022 06:51:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwHAjc6UV88TAeKq6mZK5HSi6uYRrD+GMZuSbCOJEerxC44fv/M9qQ6MPpP+40UZpQszOE3AQ== X-Received: by 2002:ad4:5961:0:b0:435:a1d7:c243 with SMTP id eq1-20020ad45961000000b00435a1d7c243mr3626378qvb.46.1647525106578; Thu, 17 Mar 2022 06:51:46 -0700 (PDT) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id s19-20020a05622a179300b002e1ceeb21d0sm3585520qtk.97.2022.03.17.06.51.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Mar 2022 06:51:46 -0700 (PDT) Date: Thu, 17 Mar 2022 09:51:44 -0400 From: Brian Foster To: Matthew Wilcox Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, Linus Torvalds , Hugh Dickins Subject: Re: writeback completion soft lockup BUG in folio_wake_bit() Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Stat-Signature: xwuiimfrd3exgfubzu11pb9agh6fagak Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=JX4CQiur; spf=none (imf24.hostedemail.com: domain of bfoster@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: C687C18002E X-HE-Tag: 1647525108-514950 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 16, 2022 at 08:59:39PM +0000, Matthew Wilcox wrote: > On Tue, Mar 15, 2022 at 03:07:10PM -0400, Brian Foster wrote: > > What seems to happen is that the majority of the fsync calls end up > > waiting on writeback of a particular page, the wakeup of the writeback > > bit on that page wakes a task that immediately resets PG_writeback on > > the page such that N other folio_wait_writeback() waiters see the bit > > still set and immediately place themselves back onto the tail of the > > wait queue. Meanwhile the waker task spins in the WQ_FLAG_BOOKMARK loop > > in folio_wake_bit() (backing off the lock for a cycle or so in each > > iteration) only to find the same bunch of tasks in the queue. This > > process repeats for a long enough amount of time to trigger the soft > > lockup warning. I've confirmed this spinning behavior with a tracepoint > > in the bookmark loop that indicates we're stuck for many hundreds of > > thousands of iterations (at least) of this loop when the soft lockup > > warning triggers. > > [...] > > > I've run a few quick experiments to try and corroborate this analysis. > > The problem goes away completely if I either back out the loop change in > > folio_wait_writeback() or bump WAITQUEUE_WALK_BREAK_CNT to something > > like 128 (i.e. greater than the total possible number of waiter tasks in > > this test). I've also played a few games with bookmark behavior mostly > > out of curiosity, but usually end up introducing other problems like > > missed wakeups, etc. > > As I recall, the bookmark hack was introduced in order to handle > lock_page() problems. It wasn't really supposed to handle writeback, > but nobody thought it would cause any harm (and indeed, it didn't at the > time). So how about we only use bookmarks for lock_page(), since > lock_page() usually doesn't have the multiple-waker semantics that > writeback has? > Oh, interesting. I wasn't aware of the tenuous status of the bookmark code. This is indeed much nicer than anything I was playing with. I suspect it will address the problem, but I'll throw it at my test env for a while and follow up.. thanks! Brian > (this is more in the spirit of "minimal patch" -- I think initialising > the bookmark should be moved to folio_unlock()). > > diff --git a/mm/filemap.c b/mm/filemap.c > index b2728eb52407..9ee3c5f1f489 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -1146,26 +1146,28 @@ static int wake_page_function(wait_queue_entry_t *wait, unsigned mode, int sync, > return (flags & WQ_FLAG_EXCLUSIVE) != 0; > } > > -static void folio_wake_bit(struct folio *folio, int bit_nr) > +static void folio_wake_bit(struct folio *folio, int bit_nr, > + wait_queue_entry_t *bookmark) > { > wait_queue_head_t *q = folio_waitqueue(folio); > struct wait_page_key key; > unsigned long flags; > - wait_queue_entry_t bookmark; > > key.folio = folio; > key.bit_nr = bit_nr; > key.page_match = 0; > > - bookmark.flags = 0; > - bookmark.private = NULL; > - bookmark.func = NULL; > - INIT_LIST_HEAD(&bookmark.entry); > + if (bookmark) { > + bookmark->flags = 0; > + bookmark->private = NULL; > + bookmark->func = NULL; > + INIT_LIST_HEAD(&bookmark->entry); > + } > > spin_lock_irqsave(&q->lock, flags); > - __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark); > + __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, bookmark); > > - while (bookmark.flags & WQ_FLAG_BOOKMARK) { > + while (bookmark && (bookmark->flags & WQ_FLAG_BOOKMARK)) { > /* > * Take a breather from holding the lock, > * allow pages that finish wake up asynchronously > @@ -1175,7 +1177,7 @@ static void folio_wake_bit(struct folio *folio, int bit_nr) > spin_unlock_irqrestore(&q->lock, flags); > cpu_relax(); > spin_lock_irqsave(&q->lock, flags); > - __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark); > + __wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, bookmark); > } > > /* > @@ -1204,7 +1206,7 @@ static void folio_wake(struct folio *folio, int bit) > { > if (!folio_test_waiters(folio)) > return; > - folio_wake_bit(folio, bit); > + folio_wake_bit(folio, bit, NULL); > } > > /* > @@ -1554,12 +1556,15 @@ static inline bool clear_bit_unlock_is_negative_byte(long nr, volatile void *mem > */ > void folio_unlock(struct folio *folio) > { > + wait_queue_entry_t bookmark; > + > /* Bit 7 allows x86 to check the byte's sign bit */ > BUILD_BUG_ON(PG_waiters != 7); > BUILD_BUG_ON(PG_locked > 7); > VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > + > if (clear_bit_unlock_is_negative_byte(PG_locked, folio_flags(folio, 0))) > - folio_wake_bit(folio, PG_locked); > + folio_wake_bit(folio, PG_locked, &bookmark); > } > EXPORT_SYMBOL(folio_unlock); > > @@ -1578,7 +1583,7 @@ void folio_end_private_2(struct folio *folio) > { > VM_BUG_ON_FOLIO(!folio_test_private_2(folio), folio); > clear_bit_unlock(PG_private_2, folio_flags(folio, 0)); > - folio_wake_bit(folio, PG_private_2); > + folio_wake_bit(folio, PG_private_2, NULL); > folio_put(folio); > } > EXPORT_SYMBOL(folio_end_private_2); >