From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEF02C56202 for ; Wed, 25 Nov 2020 21:30:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 038E1207BB for ; Wed, 25 Nov 2020 21:30:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="g9y81DNQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 038E1207BB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 337C46B006E; Wed, 25 Nov 2020 16:30:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E83B6B0070; Wed, 25 Nov 2020 16:30:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1634A6B0071; Wed, 25 Nov 2020 16:30:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0032.hostedemail.com [216.40.44.32]) by kanga.kvack.org (Postfix) with ESMTP id 020556B006E for ; Wed, 25 Nov 2020 16:30:45 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id BB0D9362B for ; Wed, 25 Nov 2020 21:30:45 +0000 (UTC) X-FDA: 77524235250.14.glove78_3404e0227379 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 62A811822987A for ; Wed, 25 Nov 2020 21:30:45 +0000 (UTC) X-HE-Tag: glove78_3404e0227379 X-Filterd-Recvd-Size: 7224 Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Wed, 25 Nov 2020 21:30:44 +0000 (UTC) Received: by mail-lj1-f179.google.com with SMTP id o24so3803207ljj.6 for ; Wed, 25 Nov 2020 13:30:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tOPQoHjgvYUvh+HHnKxiG8jONSzsA15qJMnLlNlgP0M=; b=g9y81DNQmmhmWGTiGwiN1BpWJR901eSsmOaWRaDAPAF045hxvt7onry/8Zvu9QMIoO Mj7cMnjDGxU6hMF6tVoIt4bsojN6gPJ4SXLznmP+jMHmyniwn5WHjj2vQ6xOOO0OJlr6 V8poGWCE2wHSk99omymX/xHLo/xmK1pAoTF0c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tOPQoHjgvYUvh+HHnKxiG8jONSzsA15qJMnLlNlgP0M=; b=LgWk9NHbPj9ibGT3edGv46NJge/Au+u9AuIMc5CXP2BJrrwTFf8U+LTKSTpLgjQStm ok+yFrXDeAOJ4nqe6y18GGQJECUBrrUkheD0rl6Vbw5cin9EiGlbiRYSj1wHtdrY07bl CT6lRuylJ32z6aHeYTmdPeGeZ6Ttz/whgeuGlH32l/NT6u8RB4mL4RnoAtonOVLhAMNc iC3h0DmnL/rzSraBSugb6oNJHM/j6dx4h4KyHMHbFQDFhqRvVL64jpuLCtQjFqv4dWgG 5luzwtEVfmQIhBAWrK5MmHyoR48CNIwmxM3d9tGfWagS1XBoELyT/CyT7XsLgo7SPGOS pmmQ== X-Gm-Message-State: AOAM533W29yyyRMLwcEMP4qtPhb2R767HdR6VRRIrRWeEZmCrOcACoIX q6SRr78ADrSYrpgoQCGc04S3Kq1lyenZ0Q== X-Google-Smtp-Source: ABdhPJxlaJOvD5yplDmlhSuE8jmdTCvtvwECllo5OX36/GkeKG+7DCBy92p7GyMDAV7gYWarXPJDtg== X-Received: by 2002:a2e:5749:: with SMTP id r9mr15310ljd.255.1606339842623; Wed, 25 Nov 2020 13:30:42 -0800 (PST) Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com. [209.85.208.179]) by smtp.gmail.com with ESMTPSA id g3sm58667lfd.295.2020.11.25.13.30.37 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 25 Nov 2020 13:30:40 -0800 (PST) Received: by mail-lj1-f179.google.com with SMTP id f18so3793788ljg.9 for ; Wed, 25 Nov 2020 13:30:37 -0800 (PST) X-Received: by 2002:a05:651c:339:: with SMTP id b25mr15104ljp.285.1606339837289; Wed, 25 Nov 2020 13:30:37 -0800 (PST) MIME-Version: 1.0 References: <000000000000d3a33205add2f7b2@google.com> <20200828100755.GG7072@quack2.suse.cz> <20200831100340.GA26519@quack2.suse.cz> <20201124121912.GZ4327@casper.infradead.org> <20201124183351.GD4327@casper.infradead.org> <20201124201552.GE4327@casper.infradead.org> In-Reply-To: From: Linus Torvalds Date: Wed, 25 Nov 2020 13:30:20 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: kernel BUG at fs/ext4/inode.c:LINE! To: Hugh Dickins Cc: Matthew Wilcox , Jan Kara , syzbot , Andreas Dilger , Ext4 Developers List , Linux Kernel Mailing List , syzkaller-bugs , "Theodore Ts'o" , Linux-MM , Oleg Nesterov , Andrew Morton , "Kirill A. Shutemov" , Nicholas Piggin , Alex Shi , Qian Cai , Christoph Hellwig , "Darrick J. Wong" , William Kucharski , Jens Axboe , linux-fsdevel , linux-xfs Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 24, 2020 at 3:24 PM Linus Torvalds wrote: > > I've applied your second patch (the smaller one that just takes a ref > around the critical section). If somebody comes up with some great > alternative, we can always revisit this. Hmm. I'm not sure about "great alternative", but it strikes me that we *could* move the clearing of the PG_writeback bit _into_ wake_up_page_bit(), under the page waitqueue lock. IOW, we could make the rule be that the bit isn't actually cleared before calling wake_up_page() at all, and we'd clear it with something like unsigned long flags = READ_ONCE(page->flags); // We can clear PG_writeback directly if PG_waiters isn't set while (!(flags & (1ul << PG_waiters))) { unsigned long new = flags & ~(1ul << PG_writeback); // PG_writeback was already clear??!!? if (WARN_ON_ONCE(new == flags)) return; new = cmpxchg(&page->flags, flags, new); if (likely(flags == new)) return; flags = new; } // Otherwise, clear the bit at the end - but under the // page waitqueue lock - inside wake_up_page_bit() return wake_up_page_bit(..); instead. That would basically make the bit clearing atomic wrt the PG_waiters flags - either using that atomic cmpxchg, or by doing it under the page queue lock so that it's atomic wrt any new waiters. This seems conceptually like the right thing to do - and if would also make the (fair) exclusive lock hand-off case atomic too, because the bit we're waking up on would never be cleared if it gets handed off directly. The above is entirely untested crap written in my MUA, and obviously requires that all callers of wake_up_page() be moved to that new world order, but I think we only have two cases: unlock_page() and end_page_writeback(). And unlock_page() already has that "clear_bit_unlock_is_negative_byte()" special case that is an ugly special case of PG_waiters atomicity. So we'd get rid of that, because the cmpxchg loop would be the better model. I'm not sure I'm willing to write and test the real patch, but it doesn't look _too_ nasty from just looking at the code. The bookmark thing makes it important to only actually clear the bit at the end (as does the handoff case anyway), but the way wake_up_page_bit() is written, that's actually very straightforward - just after the while-loop. That's when we've woken up everybody. So I'm sending this idea out to see if somebody can shoot it down, or even wants to possibly even try to do it.. Linus