From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.1 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F34FC433DF for ; Fri, 7 Aug 2020 18:41:27 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 009A92074D for ; Fri, 7 Aug 2020 18:41:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LN/kimao" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 009A92074D Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 79C8F8D0008; Fri, 7 Aug 2020 14:41:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 74EE88D0005; Fri, 7 Aug 2020 14:41:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 615FA8D0008; Fri, 7 Aug 2020 14:41:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id 47D058D0005 for ; Fri, 7 Aug 2020 14:41:26 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id E6281180AD801 for ; Fri, 7 Aug 2020 18:41:25 +0000 (UTC) X-FDA: 77124640530.26.flock11_4e1378026fc2 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id AD0FE1804B660 for ; Fri, 7 Aug 2020 18:41:25 +0000 (UTC) X-HE-Tag: flock11_4e1378026fc2 X-Filterd-Recvd-Size: 8436 Received: from mail-oi1-f196.google.com (mail-oi1-f196.google.com [209.85.167.196]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Fri, 7 Aug 2020 18:41:25 +0000 (UTC) Received: by mail-oi1-f196.google.com with SMTP id u24so2786337oiv.7 for ; Fri, 07 Aug 2020 11:41:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=qmBoGKJ/6jf0yN39yT89ZvHLfPlQIChbVPrk/otNWQ0=; b=LN/kimaoPxNG7QYyWZoeFQ1GljX2ckB7ujRpdZqKUZo6aEhUnGhDNdRNJYX9qgjG50 4J9MMVXR4T16g8OfbfsNmZ7KbRjMI8bitZ8HVwidzNEB77ivrSeKq4ExqpcLBD9w4xsh VQIoy25TaJ8G28LKYf84f7MUAUWYAz0xElhDKICbvdAPuaAbLNIBSjV8uMEPmwcFHsa6 5tja8W0ULyNw+pVogR6MlHqXoZ+3rRASVIod19W+z6Zcj7jzuwg6F7Lqy8xXoY/kCdeC n8527o34/Ze9xkLcAR1IF4jmwXtejTM9Qqr4EY0/t8By+h6els1583FfG5EevsFFBMj+ UaYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=qmBoGKJ/6jf0yN39yT89ZvHLfPlQIChbVPrk/otNWQ0=; b=CzySn405c3jx9Gt4TZ0nLfit5IbAFkDPcxvrvVv4yBIFgX704qK1uYePKcXPe4GVIE OywPXy7C4EkMFgI+pJWmyzljIU7rZL2DmELVnb1AZMDZkevDpT4SGbuJ1N6EiwNZBmDG 9TWPerTS2S32tY1LekhsHCge7jO4XISod95g67DhJBsYYIQEkpInVwZ3z25JHsi2mSg+ 7JrSe2nLuVaGob24FjYEV8otnVBLX36+npFvp6gQCUaQoVs+rHWcblQY8jlO2AvXhHFO qhT/rCwn5pb5PWXSbiM1Ci17CNrfSBr7Rj0O5Cc0Q+E7aJNUDORgB0TLn+GaOijaNC9d Fcpw== X-Gm-Message-State: AOAM531hXQf4+zxPeCuY8XUNY9f1swvr18+JaopyukEKl9llAaMh5Cig b7zQCeEb/qXt49F/IvMOeipivg== X-Google-Smtp-Source: ABdhPJzm6312WQgCLIUYAZsQXEO52kI7CL0mkhj7HhyDTHhtppSZdfXqwXhnC5rSkOv9IWQXavIRiw== X-Received: by 2002:aca:d88a:: with SMTP id p132mr13091890oig.95.1596825684270; Fri, 07 Aug 2020 11:41:24 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id k25sm1798842otb.1.2020.08.07.11.41.21 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Fri, 07 Aug 2020 11:41:22 -0700 (PDT) Date: Fri, 7 Aug 2020 11:41:09 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: Matthew Wilcox , Hugh Dickins , Oleg Nesterov , Michal Hocko , Linux-MM , LKML , Andrew Morton , Tim Chen , Michal Hocko , Greg KH Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page In-Reply-To: Message-ID: References: <20200724152424.GC17209@redhat.com> <20200725101445.GB3870@redhat.com> <20200806180024.GB17456@casper.infradead.org> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Rspamd-Queue-Id: AD0FE1804B660 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 6 Aug 2020, Linus Torvalds wrote: > On Thu, Aug 6, 2020 at 11:00 AM Matthew Wilcox wrote: > > > > It wasn't clear to me whether Hugh thought it was an improvement or > > not that trylock was now less likely to jump the queue. There're > > the usual "fair is the opposite of throughput" kind of arguments. I don't know. I'm inclined to think that keeping just a small chance of jumping the queue is probably good; but pay no attention to me, that's an opinion based on ignorance. Thanks for mentioning the lucky lock_page(): I was quite wrong to single out trylock_page() - I suppose its lack of a slow path just helped it spring to mind more easily. > > Yeah, it could go either way. But on the whole, if the lock bit is > getting any contention, I think we'd rather have it be fair for > latency reasons. > > That said, I'm not convinced about my patch, and I actually threw it > away without even testing it (sometimes I keep patches around in my > private tree for testing, and they can live there for months or even > years when I wonder if they are worth it, but this time I didn't > bother to go to the trouble). > > If somebody is interested in pursuing this, I think that patch might > be a good starting point (and it _might_ even work), but it seemed to > be too subtle to really worry about unless somebody finds an actual > acute reason for it. It is an attractive idea, passing the baton from one to the next; and a logical destination from where you had already arrived. Maybe somebody, not me, should pursue it. I tried to ignore it, but eventually succumbed to temptation, and ran it overnight at home (my usual tmpfs swapping), and on one machine at work (fork/exit stress etc). It did need one fixup, below: then home testing went fine. But the fork/exit stress hit that old familiar unlock_page wake_up_page_bit softlockup that your existing patch has appeared to fix. I can't afford to investigate further (though might regret that: I keep wondering if the small dose of unfairness in your existing patch is enough to kick the test when it otherwise would hang, and this new stricter patch be pointing to that). My home testing was, effectively, on top of c6fe44d96fc1 (v5.8 plus your two patches): I did not have in the io_uring changes from the current tree. In glancing there, I noticed one new and one previous instance of SetPageWaiters() *after* __add_wait_queue_entry_tail(): are those correct? > > I think the existing patch narrowing the window is good, and it > clearly didn't hurt throughput (although that was almost certainly for > other reasons). Agreed. > > Linus > --- linus/mm/filemap.c 2020-08-07 02:08:13.727709186 -0700 +++ hughd/mm/filemap.c 2020-08-07 02:16:10.960331473 -0700 @@ -1044,7 +1044,7 @@ static int wake_page_function(wait_queue return ret; } -static int wake_up_page_bit(struct page *page, int bit_nr) +static void wake_up_page_bit(struct page *page, int bit_nr, bool exclude) { wait_queue_head_t *q = page_waitqueue(page); struct wait_page_key key; @@ -1096,15 +1096,28 @@ static int wake_up_page_bit(struct page * That's okay, it's a rare case. The next waker will clear it. */ } + + /* + * If we hoped to pass PG_locked on to the next locker, but found + * no exclusive taker, then we must clear it before dropping q->lock. + * Otherwise unlock_page() can race trylock_page_bit_common(), and if + * PageWaiters was already set from before, then cmpxchg sees no + * difference to send it back to wake_up_page_bit(). + * + * We may have already dropped and retaken q->lock on the way here, but + * I think this works out because new entries are always added at tail. + */ + if (exclude && !exclusive) + clear_bit(bit_nr, &page->flags); + spin_unlock_irqrestore(&q->lock, flags); - return exclusive; } static void wake_up_page(struct page *page, int bit) { if (!PageWaiters(page)) return; - wake_up_page_bit(page, bit); + wake_up_page_bit(page, bit, false); } /* @@ -1339,8 +1352,8 @@ void unlock_page(struct page *page) * spinlock wake_up_page_bit() will do. */ smp_mb__before_atomic(); - if (wake_up_page_bit(page, PG_locked)) - return; + wake_up_page_bit(page, PG_locked, true); + return; } new = cmpxchg_release(&page->flags, flags, flags & ~(1 << PG_locked)); if (likely(new == flags))