From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=U9ty=BR=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-13.1 required=3.0 tests=BAYES_00,DKIMWL_WL_MED,
	DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1,
	USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3F34FC433DF
	for <linux-mm@archiver.kernel.org>; Fri,  7 Aug 2020 18:41:27 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 009A92074D
	for <linux-mm@archiver.kernel.org>; Fri,  7 Aug 2020 18:41:26 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LN/kimao"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 009A92074D
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 79C8F8D0008; Fri,  7 Aug 2020 14:41:26 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 74EE88D0005; Fri,  7 Aug 2020 14:41:26 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 615FA8D0008; Fri,  7 Aug 2020 14:41:26 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107])
	by kanga.kvack.org (Postfix) with ESMTP id 47D058D0005
	for <linux-mm@kvack.org>; Fri,  7 Aug 2020 14:41:26 -0400 (EDT)
Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id E6281180AD801
	for <linux-mm@kvack.org>; Fri,  7 Aug 2020 18:41:25 +0000 (UTC)
X-FDA: 77124640530.26.flock11_4e1378026fc2
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin26.hostedemail.com (Postfix) with ESMTP id AD0FE1804B660
	for <linux-mm@kvack.org>; Fri,  7 Aug 2020 18:41:25 +0000 (UTC)
X-HE-Tag: flock11_4e1378026fc2
X-Filterd-Recvd-Size: 8436
Received: from mail-oi1-f196.google.com (mail-oi1-f196.google.com [209.85.167.196])
	by imf24.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri,  7 Aug 2020 18:41:25 +0000 (UTC)
Received: by mail-oi1-f196.google.com with SMTP id u24so2786337oiv.7
        for <linux-mm@kvack.org>; Fri, 07 Aug 2020 11:41:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=date:from:to:cc:subject:in-reply-to:message-id:references
         :user-agent:mime-version;
        bh=qmBoGKJ/6jf0yN39yT89ZvHLfPlQIChbVPrk/otNWQ0=;
        b=LN/kimaoPxNG7QYyWZoeFQ1GljX2ckB7ujRpdZqKUZo6aEhUnGhDNdRNJYX9qgjG50
         4J9MMVXR4T16g8OfbfsNmZ7KbRjMI8bitZ8HVwidzNEB77ivrSeKq4ExqpcLBD9w4xsh
         VQIoy25TaJ8G28LKYf84f7MUAUWYAz0xElhDKICbvdAPuaAbLNIBSjV8uMEPmwcFHsa6
         5tja8W0ULyNw+pVogR6MlHqXoZ+3rRASVIod19W+z6Zcj7jzuwg6F7Lqy8xXoY/kCdeC
         n8527o34/Ze9xkLcAR1IF4jmwXtejTM9Qqr4EY0/t8By+h6els1583FfG5EevsFFBMj+
         UaYA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id
         :references:user-agent:mime-version;
        bh=qmBoGKJ/6jf0yN39yT89ZvHLfPlQIChbVPrk/otNWQ0=;
        b=CzySn405c3jx9Gt4TZ0nLfit5IbAFkDPcxvrvVv4yBIFgX704qK1uYePKcXPe4GVIE
         OywPXy7C4EkMFgI+pJWmyzljIU7rZL2DmELVnb1AZMDZkevDpT4SGbuJ1N6EiwNZBmDG
         9TWPerTS2S32tY1LekhsHCge7jO4XISod95g67DhJBsYYIQEkpInVwZ3z25JHsi2mSg+
         7JrSe2nLuVaGob24FjYEV8otnVBLX36+npFvp6gQCUaQoVs+rHWcblQY8jlO2AvXhHFO
         qhT/rCwn5pb5PWXSbiM1Ci17CNrfSBr7Rj0O5Cc0Q+E7aJNUDORgB0TLn+GaOijaNC9d
         Fcpw==
X-Gm-Message-State: AOAM531hXQf4+zxPeCuY8XUNY9f1swvr18+JaopyukEKl9llAaMh5Cig
	b7zQCeEb/qXt49F/IvMOeipivg==
X-Google-Smtp-Source: ABdhPJzm6312WQgCLIUYAZsQXEO52kI7CL0mkhj7HhyDTHhtppSZdfXqwXhnC5rSkOv9IWQXavIRiw==
X-Received: by 2002:aca:d88a:: with SMTP id p132mr13091890oig.95.1596825684270;
        Fri, 07 Aug 2020 11:41:24 -0700 (PDT)
Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147])
        by smtp.gmail.com with ESMTPSA id k25sm1798842otb.1.2020.08.07.11.41.21
        (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128);
        Fri, 07 Aug 2020 11:41:22 -0700 (PDT)
Date: Fri, 7 Aug 2020 11:41:09 -0700 (PDT)
From: Hugh Dickins <hughd@google.com>
X-X-Sender: hugh@eggly.anvils
To: Linus Torvalds <torvalds@linux-foundation.org>
cc: Matthew Wilcox <willy@infradead.org>, Hugh Dickins <hughd@google.com>, 
    Oleg Nesterov <oleg@redhat.com>, Michal Hocko <mhocko@kernel.org>, 
    Linux-MM <linux-mm@kvack.org>, LKML <linux-kernel@vger.kernel.org>, 
    Andrew Morton <akpm@linux-foundation.org>, 
    Tim Chen <tim.c.chen@linux.intel.com>, Michal Hocko <mhocko@suse.com>, 
    Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page
In-Reply-To: <CAHk-=wihTRHMm1LC4AfidZptT9ZuT-wBjE2VhYzKBy66e4iwQw@mail.gmail.com>
Message-ID: <alpine.LSU.2.11.2008071047510.1239@eggly.anvils>
References: <20200724152424.GC17209@redhat.com> <CAHk-=whuG+5pUeUqdiW4gk0prvqu7GZSMo-6oWv5PdDC5dBr=A@mail.gmail.com> <CAHk-=wjYHvbOs9i39EnUsC6VEJiuJ2e_5gZB5-J5CRKxq80B_Q@mail.gmail.com> <20200725101445.GB3870@redhat.com> <CAHk-=whSJbODMVmxxDs64f7BaESKWuMqOxWGpjUSDn6Jzqa71g@mail.gmail.com>
 <alpine.LSU.2.11.2007251343370.3804@eggly.anvils> <alpine.LSU.2.11.2007252100230.5376@eggly.anvils> <alpine.LSU.2.11.2007261246530.6812@eggly.anvils> <alpine.LSU.2.11.2008052105040.8716@eggly.anvils> <CAHk-=whf7wCUV2oTDUg0eeNafhhk_OhJBT2SbHZXwgtmAzNeTg@mail.gmail.com>
 <20200806180024.GB17456@casper.infradead.org> <CAHk-=wihTRHMm1LC4AfidZptT9ZuT-wBjE2VhYzKBy66e4iwQw@mail.gmail.com>
User-Agent: Alpine 2.11 (LSU 23 2013-08-11)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-Rspamd-Queue-Id: AD0FE1804B660
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam01
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Thu, 6 Aug 2020, Linus Torvalds wrote:
> On Thu, Aug 6, 2020 at 11:00 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > It wasn't clear to me whether Hugh thought it was an improvement or
> > not that trylock was now less likely to jump the queue.  There're
> > the usual "fair is the opposite of throughput" kind of arguments.

I don't know.  I'm inclined to think that keeping just a small chance
of jumping the queue is probably good; but pay no attention to me,
that's an opinion based on ignorance.

Thanks for mentioning the lucky lock_page(): I was quite wrong to
single out trylock_page() - I suppose its lack of a slow path just
helped it spring to mind more easily.

> 
> Yeah, it could go either way. But on the whole, if the lock bit is
> getting any contention, I think we'd rather have it be fair for
> latency reasons.
> 
> That said, I'm not convinced about my patch, and I actually threw it
> away without even testing it (sometimes I keep patches around in my
> private tree for testing, and they can live there for months or even
> years when I wonder if they are worth it, but this time I didn't
> bother to go to the trouble).
> 
> If somebody is interested in pursuing this, I think that patch might
> be a good starting point (and it _might_ even work), but it seemed to
> be too subtle to really worry about unless somebody finds an actual
> acute reason for it.

It is an attractive idea, passing the baton from one to the next;
and a logical destination from where you had already arrived.
Maybe somebody, not me, should pursue it.

I tried to ignore it, but eventually succumbed to temptation, and ran
it overnight at home (my usual tmpfs swapping), and on one machine at
work (fork/exit stress etc).  It did need one fixup, below: then home
testing went fine.  But the fork/exit stress hit that old familiar
unlock_page wake_up_page_bit softlockup that your existing patch
has appeared to fix.  I can't afford to investigate further (though
might regret that: I keep wondering if the small dose of unfairness
in your existing patch is enough to kick the test when it otherwise
would hang, and this new stricter patch be pointing to that).

My home testing was, effectively, on top of c6fe44d96fc1 (v5.8 plus
your two patches): I did not have in the io_uring changes from the
current tree. In glancing there, I noticed one new and one previous
instance of SetPageWaiters() *after* __add_wait_queue_entry_tail():
are those correct?
 
> 
> I think the existing patch narrowing the window is good, and it
> clearly didn't hurt throughput (although that was almost certainly for
> other reasons).

Agreed.

> 
>                 Linus
> 

--- linus/mm/filemap.c	2020-08-07 02:08:13.727709186 -0700
+++ hughd/mm/filemap.c	2020-08-07 02:16:10.960331473 -0700
@@ -1044,7 +1044,7 @@ static int wake_page_function(wait_queue
 	return ret;
 }
 
-static int wake_up_page_bit(struct page *page, int bit_nr)
+static void wake_up_page_bit(struct page *page, int bit_nr, bool exclude)
 {
 	wait_queue_head_t *q = page_waitqueue(page);
 	struct wait_page_key key;
@@ -1096,15 +1096,28 @@ static int wake_up_page_bit(struct page
 		 * That's okay, it's a rare case. The next waker will clear it.
 		 */
 	}
+
+	/*
+	 * If we hoped to pass PG_locked on to the next locker, but found
+	 * no exclusive taker, then we must clear it before dropping q->lock.
+	 * Otherwise unlock_page() can race trylock_page_bit_common(), and if
+	 * PageWaiters was already set from before, then cmpxchg sees no
+	 * difference to send it back to wake_up_page_bit().
+	 *
+	 * We may have already dropped and retaken q->lock on the way here, but
+	 * I think this works out because new entries are always added at tail.
+	 */
+	if (exclude && !exclusive)
+		clear_bit(bit_nr, &page->flags);
+
 	spin_unlock_irqrestore(&q->lock, flags);
-	return exclusive;
 }
 
 static void wake_up_page(struct page *page, int bit)
 {
 	if (!PageWaiters(page))
 		return;
-	wake_up_page_bit(page, bit);
+	wake_up_page_bit(page, bit, false);
 }
 
 /*
@@ -1339,8 +1352,8 @@ void unlock_page(struct page *page)
 			 * spinlock wake_up_page_bit() will do.
 			 */
 			smp_mb__before_atomic();
-			if (wake_up_page_bit(page, PG_locked))
-				return;
+			wake_up_page_bit(page, PG_locked, true);
+			return;
 		}
 		new = cmpxchg_release(&page->flags, flags, flags & ~(1 << PG_locked));
 		if (likely(new == flags))