From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A859FC432C1 for ; Wed, 25 Sep 2019 08:00:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7264820872 for ; Wed, 25 Sep 2019 08:00:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7264820872 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=fromorbit.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 14CD76B0271; Wed, 25 Sep 2019 04:00:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0FD366B0272; Wed, 25 Sep 2019 04:00:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 013486B0276; Wed, 25 Sep 2019 04:00:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0225.hostedemail.com [216.40.44.225]) by kanga.kvack.org (Postfix) with ESMTP id D46216B0271 for ; Wed, 25 Sep 2019 04:00:40 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 97EB0781C for ; Wed, 25 Sep 2019 08:00:40 +0000 (UTC) X-FDA: 75972696240.03.flame85_3ffe217c03f5e X-HE-Tag: flame85_3ffe217c03f5e X-Filterd-Recvd-Size: 5648 Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Wed, 25 Sep 2019 08:00:39 +0000 (UTC) Received: from dread.disaster.area (pa49-181-226-196.pa.nsw.optusnet.com.au [49.181.226.196]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id 7FE0E43E145; Wed, 25 Sep 2019 18:00:36 +1000 (AEST) Received: from dave by dread.disaster.area with local (Exim 4.92.2) (envelope-from ) id 1iD2Dy-0000zC-Ou; Wed, 25 Sep 2019 18:00:34 +1000 Date: Wed, 25 Sep 2019 18:00:34 +1000 From: Dave Chinner To: Linus Torvalds Cc: Konstantin Khlebnikov , Tejun Heo , linux-fsdevel , Linux-MM , Linux Kernel Mailing List , Jens Axboe , Michal Hocko , Mel Gorman , Johannes Weiner Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes Message-ID: <20190925080034.GD804@dread.disaster.area> References: <156896493723.4334.13340481207144634918.stgit@buzz> <875f3b55-4fe1-e2c3-5bee-ca79e4668e72@yandex-team.ru> <20190923145242.GF2233839@devbig004.ftw2.facebook.com> <20190924073940.GM6636@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=P6RKvmIu c=1 sm=1 tr=0 a=dRuLqZ1tmBNts2YiI0zFQg==:117 a=dRuLqZ1tmBNts2YiI0zFQg==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=J70Eh1EUuV4A:10 a=7-415B0cAAAA:8 a=714uhUuBiEYtG38rzikA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 24, 2019 at 12:08:04PM -0700, Linus Torvalds wrote: > On Tue, Sep 24, 2019 at 12:39 AM Dave Chinner wrote: > > > > Stupid question: how is this any different to simply winding down > > our dirty writeback and throttling thresholds like so: > > > > # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes > > Our dirty_background stuff is very questionable, but it exists (and > has those insane defaults) because of various legacy reasons. That's not what I was asking about. The context is in the previous lines you didn't quote: > > > > Is the faster speed reproducible? I don't quite understand why this > > > > would be. > > > > > > Writing to disk simply starts earlier. > > > > Stupid question: how is this any different to simply winding down > > our dirty writeback and throttling thresholds like so: i.e. I'm asking about the reasons for the performance differential not asking for an explanation of what writebehind is. If the performance differential really is caused by writeback starting sooner, then winding down dirty_background_bytes should produce exactly the same performance because it will start writeback -much faster-. If it doesn't, then the assertion that the difference is caused by earlier writeout is questionable and the code may not actually be doing what is claimed.... Basically, I'm asking for proof that the explanation is correct. > > to start background writeback when there's 100MB of dirty pages in > > memory, and then: > > > > # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes > > The thing is, that also accounts for dirty shared mmap pages. And it > really will kill some benchmarks that people take very very seriously. Yes, I know that. I'm not suggesting that we do this, [snip] > Anyway, the end result of all this is that we have that > balance_dirty_pages() that is pretty darn complex and I suspect very > few people understand everything that goes on in that function. I'd agree with you there - most of the ground work for the balance_dirty_pages IO throttling feedback loop was all based on concepts I developed to solve dirty page writeback thrashing problems on Irix back in 2003. The code we have in Linux was written by Fenguang Wu with help for a lot of people, but the underlying concepts of delegating IO to dedicated writeback threads that calculate and track page cleaning rates (BDI writeback rates) and then throttling incoming page dirtying rate to the page cleaning rate all came out of my head.... So, much as it may surprise you, I am one of the few people who do actually understand how that whole complex mass of accounting and feedback is supposed to work. :) > Now, whether write-behind really _does_ help that, or whether it's > just yet another tweak and complication, I can't actually say. Neither can I at this point - I lack the data and that's why I was asking if there was a perf difference with the existing limits wound right down. Knowing whether the performance difference is simply a result of starting writeback IO sooner tells me an awful lot about what other behaviour is happening as a result of the changes in this patch. > But I > don't think 'dirty_background_bytes' is really an argument against > write-behind, it's just one knob on the very complex dirty handling we > have. Never said it was - just trying to determine if a one line explanation is true or not. Cheers, Dave. -- Dave Chinner david@fromorbit.com