From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.1 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39D55C433DF for ; Sat, 25 Jul 2020 21:20:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D393D206D8 for ; Sat, 25 Jul 2020 21:20:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="JiQbHcw2" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D393D206D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 24E5B6B0003; Sat, 25 Jul 2020 17:20:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 200A66B0005; Sat, 25 Jul 2020 17:20:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0EF8B6B0006; Sat, 25 Jul 2020 17:20:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0111.hostedemail.com [216.40.44.111]) by kanga.kvack.org (Postfix) with ESMTP id E8C556B0003 for ; Sat, 25 Jul 2020 17:20:03 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 92594181AC9B6 for ; Sat, 25 Jul 2020 21:20:03 +0000 (UTC) X-FDA: 77077865886.21.bun62_330945426f53 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 65FC3180442C2 for ; Sat, 25 Jul 2020 21:20:03 +0000 (UTC) X-HE-Tag: bun62_330945426f53 X-Filterd-Recvd-Size: 5761 Received: from mail-qv1-f68.google.com (mail-qv1-f68.google.com [209.85.219.68]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Sat, 25 Jul 2020 21:20:02 +0000 (UTC) Received: by mail-qv1-f68.google.com with SMTP id h18so5760452qvl.3 for ; Sat, 25 Jul 2020 14:20:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=q3pXJWvqYv2PwCVcBEiyQuUZc+6aMPazhjDam8Vbwso=; b=JiQbHcw2+4jpoPbh77ibRmDOgWQcZkZzuS8Dn9PIimC21my883B0SL+o88BKMHq6BZ rccBq2coxS///bma6PFRUj/ZpJa3nID6c8w8TCgEK9f+5sOfw38N6IH/ToxZjj9/rQg0 n86Yp0Keq+nzPwKru/5qTBZ3aPAxjy7JbDtI+uv3sAu50Qbfpa7xRiJ5SVswE2giAzBb 9mIkAepypfA5R7wtYMfEBzvHykgfc1jrbr3IpIZmer8kzMJykv8YIxCXXe3C105pPPm3 hbE5xPSiOeI6SQf5YL364/HmSzUWni8jGhGBhvQJxGaIF0EEnkxfs1rU/XaNfl0MvLm0 BrAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=q3pXJWvqYv2PwCVcBEiyQuUZc+6aMPazhjDam8Vbwso=; b=N7egyyFkoe8JCRIc7XOPiHjnZLzAKUXwObdJIPlBpnpINDa1gWvbPCg84UzJwCpWF5 plLOdazzwPxN743U0AURX+GuBhbmlvQm45IPef/66dUrpDFcLYKUfARWbJ7+GIHDMW4+ +CrLkULjunqjSMoOx2BCT1mCT0OW1vCXAyZXUm42jRw1Zf68KPykRt0vZkwmuslHp0d5 0h5thoRWJag9TJ0YL/sdWK5XpRPKk1ZaLlXOaPKO3lcSmHMxarbyOK+j5QT6JybUQ04X CLNJfhpWlP3ltF3pRTXT6q80MDh+XlV2DFcMsWhqAohGWNXYJs6raDHdkHwbprztyEOF d4Ng== X-Gm-Message-State: AOAM531EWTMJwinsWPGx7O4xyOKBr7xqfs2sDhK3eek+BtPDn43MDudg V6HlgKYCXqKv+Q3CSy7lx2UINQ== X-Google-Smtp-Source: ABdhPJzU76zdM8W6qeOkRMrjtgf6zK5MeakRJQK9lNvkPyG5ovyErF2WOlwlQZXBbCNoNrVoo+U7DQ== X-Received: by 2002:ad4:424a:: with SMTP id l10mr15932830qvq.29.1595712001864; Sat, 25 Jul 2020 14:20:01 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id 205sm11163498qkj.19.2020.07.25.14.19.59 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Sat, 25 Jul 2020 14:20:00 -0700 (PDT) Date: Sat, 25 Jul 2020 14:19:46 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Linus Torvalds cc: Oleg Nesterov , Hugh Dickins , Michal Hocko , Linux-MM , LKML , Andrew Morton , Tim Chen , Michal Hocko Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page In-Reply-To: Message-ID: References: <20200723124749.GA7428@redhat.com> <20200724152424.GC17209@redhat.com> <20200725101445.GB3870@redhat.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Rspamd-Queue-Id: 65FC3180442C2 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, 25 Jul 2020, Linus Torvalds wrote: > On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote: > > > > Heh. I too thought about this. And just in case, your patch looks correct > > to me. But I can't really comment this behavioural change. Perhaps it > > should come in a separate patch? > > We could do that. At the same time, I think both parts change how the > waitqueue works that it might as well just be one "fix page_bit_wait > waitqueue usage". > > But let's wait to see what Hugh's numbers say. Oh no, no no: sorry for getting your hopes up there, I won't come up with any numbers more significant than "0 out of 10" machines crashed. I know it would be *really* useful if I could come up with performance comparisons, or steer someone else to do so: but I'm sorry, cannot. Currently it's actually 1 out of 10 machines crashed, for the same driverland issue seen last time, maybe it's a bad machine; and another 1 out of the 10 machines went AWOL for unknown reasons, but probably something outside the kernel got confused by the stress. No reason to suspect your changes at all (but some unanalyzed "failure"s, of dubious significance, accumulating like last time). I'm optimistic: nothing has happened to warn us off your changes. And on Fri, 24 Jul 2020, Linus Torvalds had written: > So the loads you are running are known to have sensitivity to this > particular area, and are why you've done your patches to the page wait > bit code? Yes. It's a series of nineteen ~hour-long tests, of which about five exhibited wake_up_page_bit problems in the past, and one has remained intermittently troublesome that way. Intermittently: usually it does get through, so getting through yesterday and today won't even tell us that your changes fixed it - that we shall learn over time later. Hugh