From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C540C433F5 for ; Thu, 17 Mar 2022 15:32:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ADDF56B0071; Thu, 17 Mar 2022 11:32:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8C0A8D0002; Thu, 17 Mar 2022 11:32:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92D048D0001; Thu, 17 Mar 2022 11:32:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id 8562C6B0071 for ; Thu, 17 Mar 2022 11:32:05 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3C760A22EF for ; Thu, 17 Mar 2022 15:32:05 +0000 (UTC) X-FDA: 79254269010.20.0519BDC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 8F80E18002F for ; Thu, 17 Mar 2022 15:32:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1647531124; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FxOX+RAN4v4Lf/VXH38aBDUMX82mgpHDsamZJ2eKXbI=; b=jNaDbKmAjwoS9tf4otvnXvKowDaZ+XJ/2a/8n6JYuQHpXYa4PfE+S8i0ds64V2rg5CTI1u ptm+rHuDEBgecHi2VUKybcOxHVsx+DR9hZ8mHMGgYGgNf4x/CRrIrr14+PJe0zrcbXnhpa 11Y7QtM2HG0BAqZ6YY11oXn8LEKjSzs= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-226-LcerrH90MF-Agea0V9zelw-1; Thu, 17 Mar 2022 11:32:02 -0400 X-MC-Unique: LcerrH90MF-Agea0V9zelw-1 Received: by mail-qv1-f72.google.com with SMTP id kj16-20020a056214529000b00435218e0f0dso4253081qvb.3 for ; Thu, 17 Mar 2022 08:32:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=FxOX+RAN4v4Lf/VXH38aBDUMX82mgpHDsamZJ2eKXbI=; b=3MhtXMPhzFbXEsK+ri8Jy+rwmzv1gr3JQ9wiDaSf9s4chwTPryGpQfBsBwXsvu69bb 5FEdsgyds6V/CmLxrM6Z30ocSAyyPVe2HzLYBQcjA5B162KMMV0/C6CGYOWEnxPBT9sy w4Gxb8N48HpVbGzHWdOHyhXrudjjvB2Bug1dMQhldvdn2VOV8h0DhPHcVyJifJyR7e6Y u/0y+0P42PcIEukjpS3iycTcdP50MZdsAK6PDyEn20TsIeWnznE9sRBb6B1r2pPN2qko DC1d0WaRy0Ip3zMOfhtZ3NmGKEGF5W5JTh/c6ErTMHL88DvFGN3HBeieWz96FS8/Sc44 TpCg== X-Gm-Message-State: AOAM533OqjU5FUrSWVrA6vSxoBSxTnWmZa9+y8dRDUAk4bUfRiEE7M1a VH4ZlH7ZjJDE8FMzUKAVCF8FYPo8K747pXgg83QVZyqAjw9Sexw/9b6QF6e3yL83r8LxVbJrHcQ ENFlbec3g4kU= X-Received: by 2002:ac8:5a4f:0:b0:2e1:a7be:2d13 with SMTP id o15-20020ac85a4f000000b002e1a7be2d13mr4151324qta.598.1647531121897; Thu, 17 Mar 2022 08:32:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwHBjIab8EPVWX036RD6/D6zYUhmHxCkiYYMe9+aHBCL5A94LdRhl0wHn0trVhn0vS+XUvhGA== X-Received: by 2002:ac8:5a4f:0:b0:2e1:a7be:2d13 with SMTP id o15-20020ac85a4f000000b002e1a7be2d13mr4151286qta.598.1647531121500; Thu, 17 Mar 2022 08:32:01 -0700 (PDT) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id x6-20020ac86b46000000b002e02be9c0easm3432036qts.69.2022.03.17.08.32.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Mar 2022 08:32:01 -0700 (PDT) Date: Thu, 17 Mar 2022 11:31:59 -0400 From: Brian Foster To: Linus Torvalds Cc: Matthew Wilcox , Linux-MM , linux-fsdevel , linux-xfs , Hugh Dickins Subject: Re: writeback completion soft lockup BUG in folio_wake_bit() Message-ID: References: MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 8F80E18002F X-Stat-Signature: n7sn7ysitn6imf3s74ji73o139ak9mbr X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=jNaDbKmA; spf=none (imf16.hostedemail.com: domain of bfoster@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1647531124-887729 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 16, 2022 at 04:35:10PM -0700, Linus Torvalds wrote: > On Wed, Mar 16, 2022 at 1:59 PM Matthew Wilcox wrote: > > > > As I recall, the bookmark hack was introduced in order to handle > > lock_page() problems. It wasn't really supposed to handle writeback, > > but nobody thought it would cause any harm (and indeed, it didn't at the > > time). So how about we only use bookmarks for lock_page(), since > > lock_page() usually doesn't have the multiple-waker semantics that > > writeback has? > > I was hoping that some of the page lock problems are gone and we could > maybe try to get rid of the bookmarks entirely. > > But the page lock issues only ever showed up on some private > proprietary load and machine, so we never really got confirmation that > they are fixed. There were lots of strong signs to them being related > to the migration page locking, and it may be that the bookmark code is > only hurting these days. > > See for example commit 9a1ea439b16b ("mm: > put_and_wait_on_page_locked() while page is migrated") which doesn't > actually change the *locking* side, but drops the page reference when > waiting for the locked page to be unlocked, which in turn removes a > "loop and try again when migration". And that may have been the real > _fix_ for the problem. > > Because while the bookmark thing avoids the NMI lockup detector firing > due to excessive hold times, the bookmarking also _causes_ that "we > now will see the same page multiple times because we dropped the lock > and somebody re-added it at the end of the queue" issue. Which seems > to be the problem here. > > Ugh. I wish we had some way to test "could we just remove the bookmark > code entirely again". > > Of course, the PG_lock case also works fairly hard to not actually > remove and re-add the lock waiter to the queue, but having an actual > "wait for and get the lock" operation. The writeback bit isn't done > that way. > > I do hate how we had to make folio_wait_writeback{_killable}() use > "while" rather than an "if". It *almost* works with just a "wait for > current writeback", but not quite. See commit c2407cf7d22d ("mm: make > wait_on_page_writeback() wait for multiple pending writebacks") for > why we have to loop. Ugly, ugly. > Right.. In case you missed it in my too long of a description (sorry), I pointed out that this problem seems to manifest most recently as of that commit. In fact looking through the past discussion for that patch, it wouldn't surprise me a ton of this problem is some pathological manifestation of the perf issue that you described here [1]. Indeed most of the waiters in this case come from fsync() -> ... -> __filemap_fdatawait_range(), and your test patch in that email performs a similar sort of trick to skip out of the wake up side (I'm curious if that was ever determined to help?) to things that I was playing with to try and narrow this down. > Because I do think that "while" in the writeback waiting is a problem. > Maybe _the_ problem. > FWIW, Matthew's patch does seem to address this problem. My current test of that patch is past the point where I usually expect to see the soft lockup warning, but I'm going to let it continue to run (and then run it through some xfs regression if the approach is agreeable). Getting back to the loop thing (and seeing Matthew's latest reply wrt to wait_and_set())... If we wanted to go back to non-looping in folio_wait_writeback() to avoid the unserialized wait queue build up or whatever, would it make any sense to lift the looping writeback check to write_cache_pages()? We hold the page lock and have checked PageDirty() by that point, so ISTM that would address the BUG_ON(PageWriteback()) race caused by the delayed/unserialized wakeup without producing the excess wait queue buildup caused by waiters in the __filemap_fdatawait_range() path. Then presumably that "wait for writeback to clear" loop in write_cache_pages() is eventually replaced by the "wait and set writeback" thing when the rest of the fs code is fixed up appropriately. Hm? Of course I haven't tested that so it could be completely bogus, but I can if it makes any sort of sense as an incremental step.. Brian [1] https://lore.kernel.org/linux-mm/CAHk-=wgD9GK5CeHopYmRHoYS9cNuCmDMsc=+MbM_KgJ0KB+=ng@mail.gmail.com/ > Linus >