From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f70.google.com (mail-oi0-f70.google.com [209.85.218.70]) by kanga.kvack.org (Postfix) with ESMTP id E5E5A28071E for ; Tue, 22 Aug 2017 15:15:23 -0400 (EDT) Received: by mail-oi0-f70.google.com with SMTP id j144so8675434oib.5 for ; Tue, 22 Aug 2017 12:15:23 -0700 (PDT) Received: from mail-oi0-x241.google.com (mail-oi0-x241.google.com. [2607:f8b0:4003:c06::241]) by mx.google.com with ESMTPS id a131si12429192oih.301.2017.08.22.12.15.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Aug 2017 12:15:22 -0700 (PDT) Received: by mail-oi0-x241.google.com with SMTP id k77so5811586oib.4 for ; Tue, 22 Aug 2017 12:15:22 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20170822185624.GN32112@worktop.programming.kicks-ass.net> References: <37D7C6CF3E00A74B8858931C1DB2F077537879BB@SHSMSX103.ccr.corp.intel.com> <20170818144622.oabozle26hasg5yo@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F07753787AE4@SHSMSX103.ccr.corp.intel.com> <20170818185455.qol3st2nynfa47yc@techsingularity.net> <20170821183234.kzennaaw2zt2rbwz@techsingularity.net> <37D7C6CF3E00A74B8858931C1DB2F07753788B58@SHSMSX103.ccr.corp.intel.com> <37D7C6CF3E00A74B8858931C1DB2F0775378A24A@SHSMSX103.ccr.corp.intel.com> <20170822185624.GN32112@worktop.programming.kicks-ass.net> From: Linus Torvalds Date: Tue, 22 Aug 2017 12:15:20 -0700 Message-ID: Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org List-ID: To: Peter Zijlstra Cc: "Liang, Kan" , Mel Gorman , Mel Gorman , "Kirill A. Shutemov" , Tim Chen , Ingo Molnar , Andi Kleen , Andrew Morton , Johannes Weiner , Jan Kara , linux-mm , Linux Kernel Mailing List On Tue, Aug 22, 2017 at 11:56 AM, Peter Zijlstra wrote: > > Won't we now prematurely terminate the wait when we get a spurious > wakeup? I think there's two answers to that: (a) do we even care? (b) what spurious wakeup? The "do we even care" quesiton is because wait_on_page_bit by definition isn't really serializing. And I'm not even talking about memory ordering, altough that is true too - I'm talking just fundamentally, that by definition when we're not locking, by the time wait_on_page_bit() returns to the caller, it could obviously have changed again. So I think wait_on_page_bit() is by definition not really guaranteeing that the bit really is clear. And I don't think we have really have cases that matter. But if we do - say, 'fsync()' waiting for a page to wait for writeback, where would you get spurious wakeups from? They normally happen either when we have nested waiting (eg a page fault happens while we have other wait queues active), and I'm not seeing that being an issue here. That said, I do think we might want to perhaps make a "careful" vs "just wait a bit" version of this if the patch works out. The patch is primarily for testing this particular case. I actually think it's probably ok in general, but maybe there really is some special case that could have multiple wakeup sources and it needs to see *this* particular one. (We could perhaps handle that case by checking "is the wait-queue empty now" instead, and just get rid of the re-arming, not break out of the loop immediately after the io_schedule()). Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org