From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 084C2C4320D for ; Tue, 24 Sep 2019 19:08:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B9DDD20872 for ; Tue, 24 Sep 2019 19:08:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="KeNNjs2Y" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B9DDD20872 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 451DE6B000A; Tue, 24 Sep 2019 15:08:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 402FE6B000C; Tue, 24 Sep 2019 15:08:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 319506B000D; Tue, 24 Sep 2019 15:08:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0198.hostedemail.com [216.40.44.198]) by kanga.kvack.org (Postfix) with ESMTP id 110116B000A for ; Tue, 24 Sep 2019 15:08:27 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id ADB9F181AC9BF for ; Tue, 24 Sep 2019 19:08:26 +0000 (UTC) X-FDA: 75970750212.07.silk86_f7211201ac27 X-HE-Tag: silk86_f7211201ac27 X-Filterd-Recvd-Size: 6900 Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Tue, 24 Sep 2019 19:08:25 +0000 (UTC) Received: by mail-lj1-f195.google.com with SMTP id b20so3048999ljj.5 for ; Tue, 24 Sep 2019 12:08:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8uj1Y9j5QWqL4P9NRMcUrWkMG7q2mqy5Dd8EdClY1dI=; b=KeNNjs2YCmvvqn9Okqghy7piNALz7QVTpDoyirLKb2vv3A9ZuN0VtUsI1Ci+kH11dV nPDKfZBN4OO7RvDf0UjlqePtCVjunFycfJIX0aRPfBS1B5KizhKrukK72fkQ+rJiQZx8 qWou7b+Jp5S+Onyl6WMDXnD9Sw0vZRzIn4dqQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8uj1Y9j5QWqL4P9NRMcUrWkMG7q2mqy5Dd8EdClY1dI=; b=TLhECvYFKHhZ3WmmgNISDJ0WH98IKgTh8TachrBcitZaO4rTcUEJL6vX9Vu6dfUHHJ t4ct1MagbFfSpR+dyv+B1g9TvEqAkNzv7CS8O6+g66xxEl0qeTzpE/HdD37ELmNfQdQF IJk2DwwiIx2rpYTcnI7v4DCcqXq+125FAeDmqHFa5uNYW2TWa99BPO+7jnJZZKe2xiHR K5uKLqMtEN00/vVcuiErCed3WxW+nS0MsHu2kwwGkGrIOhxGaCrZZjaf3At90VGV0NJ3 tpJQ5kC0AXvzcEToElAAF7eRKY9KN3Vrgntn/Q3ubcp/6r/jCBjfxT29ef83yA5iYMQZ nv1w== X-Gm-Message-State: APjAAAXFp5KknKNidzkzSqW2NdwGHNrDvYFkTcV3H1L73HjGPUqID2aU 6Czx4yVttBNVYukuMhOUQc94CMDNvg0= X-Google-Smtp-Source: APXvYqwlVhCJEvGFfenM+KMe7o3EXltegqC7SM4s4TnbZ+AACsOzvmStoYCQW3yNM1K2+Sk4T8rILg== X-Received: by 2002:a2e:b055:: with SMTP id d21mr3096482ljl.236.1569352102701; Tue, 24 Sep 2019 12:08:22 -0700 (PDT) Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com. [209.85.208.177]) by smtp.gmail.com with ESMTPSA id c16sm646958lfj.8.2019.09.24.12.08.21 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 24 Sep 2019 12:08:21 -0700 (PDT) Received: by mail-lj1-f177.google.com with SMTP id a22so3061575ljd.0 for ; Tue, 24 Sep 2019 12:08:21 -0700 (PDT) X-Received: by 2002:a2e:5b9a:: with SMTP id m26mr2982371lje.90.1569352100677; Tue, 24 Sep 2019 12:08:20 -0700 (PDT) MIME-Version: 1.0 References: <156896493723.4334.13340481207144634918.stgit@buzz> <875f3b55-4fe1-e2c3-5bee-ca79e4668e72@yandex-team.ru> <20190923145242.GF2233839@devbig004.ftw2.facebook.com> <20190924073940.GM6636@dread.disaster.area> In-Reply-To: <20190924073940.GM6636@dread.disaster.area> From: Linus Torvalds Date: Tue, 24 Sep 2019 12:08:04 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file writes To: Dave Chinner Cc: Konstantin Khlebnikov , Tejun Heo , linux-fsdevel , Linux-MM , Linux Kernel Mailing List , Jens Axboe , Michal Hocko , Mel Gorman , Johannes Weiner Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 24, 2019 at 12:39 AM Dave Chinner wrote: > > Stupid question: how is this any different to simply winding down > our dirty writeback and throttling thresholds like so: > > # echo $((100 * 1000 * 1000)) > /proc/sys/vm/dirty_background_bytes Our dirty_background stuff is very questionable, but it exists (and has those insane defaults) because of various legacy reasons. But it probably _shouldn't_ exist any more (except perhaps as a last-ditch hard limit), and I don't think it really ends up being the primary throttling any more in many cases. It used to make sense to make it a "percentage of memory" back when we were talking old machines with 8MB of RAM, and having an appreciable percentage of memory dirty was "normal". And we've kept that model and not touched it, because some benchmarks really want enormous amounts of dirty data (particularly various dirty shared mappings). But out default really is fairly crazy and questionable. 10% of memory being dirty may be ok when you have a small amount of memory, but it's rather less sane if you have gigs and gigs of RAM. Of course, SSD's made it work slightly better again, but our "dirty_background" stuff really is legacy and not very good. The whole dirty limit when seen as percentage of memory (which is our default) is particularly questionable, but even when seen as total bytes is bad. If you have slow filesystems (say, FAT on a USB stick), the limit should be very different from a fast one (eg XFS on a RAID of proper SSDs). So the limit really needs be per-bdi, not some global ratio or bytes. As a result we've grown various _other_ heuristics over time, and the simplistic dirty_background stuff is only a very small part of the picture these days. To the point of almost being irrelevant in many situations, I suspect. > to start background writeback when there's 100MB of dirty pages in > memory, and then: > > # echo $((200 * 1000 * 1000)) > /proc/sys/vm/dirty_bytes The thing is, that also accounts for dirty shared mmap pages. And it really will kill some benchmarks that people take very very seriously. And 200MB is peanuts when you're doing a benchmark on some studly machine that has a million iops per second, and 200MB of dirty data is nothing. Yet it's probably much too big when you're on a workstation that still has rotational media. And the whole memcg code obviously makes this even more complicated. Anyway, the end result of all this is that we have that balance_dirty_pages() that is pretty darn complex and I suspect very few people understand everything that goes on in that function. So I think that the point of any write-behind logic would be to avoid triggering the global limits as much as humanly possible - not just getting the simple cases to write things out more quickly, but to remove the complex global limit questions from (one) common and fairly simple case. Now, whether write-behind really _does_ help that, or whether it's just yet another tweak and complication, I can't actually say. But I don't think 'dirty_background_bytes' is really an argument against write-behind, it's just one knob on the very complex dirty handling we have. Linus