From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E92E1C48BF6 for ; Sat, 24 Feb 2024 18:20:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 574F76B00A8; Sat, 24 Feb 2024 13:20:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D4AB6B00AA; Sat, 24 Feb 2024 13:20:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34EE56B00AB; Sat, 24 Feb 2024 13:20:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1CDD66B00A8 for ; Sat, 24 Feb 2024 13:20:36 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CA9C71C034A for ; Sat, 24 Feb 2024 18:20:35 +0000 (UTC) X-FDA: 81827512830.23.AF6E5C6 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf15.hostedemail.com (Postfix) with ESMTP id E15AAA0003 for ; Sat, 24 Feb 2024 18:20:33 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Oz6GB46v; dmarc=none; spf=pass (imf15.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.49 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708798834; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=N3+t/XZyLVyYIW4d3qvzXnQMDuQBRtLknCNAnrQ0BaQ=; b=qOY/UmQ9SDTVmXz52oou4luPDA8md3D1SQNLam0HQ4CAWoDDA/3CT+sHTWC9bJw78wrKM9 iYl9pEj3/xH2XvJ4B+Yr2oJY1hO9x4rTu2wmLGBVFo+idezEQJPlbXvQgVlbkU2z+yDI7+ n9y9hgLJ8tj1DKXSAF1n8cVXZpZ4N3s= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=Oz6GB46v; dmarc=none; spf=pass (imf15.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.49 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708798834; a=rsa-sha256; cv=none; b=YkQwOoFrxjIbWuMpSnqEyRLbQt8pNZQSBmH9rjHhUGe5bXdMpBLb3gnNLrVf0RqqUDsgyo KWjPD1esUADmrXHsbu0xEtERFHKc/hV7ZGf5L1Og1QGndHyilYCxP6Fq0mGL3zKAX8oarr 3574BTd4OW8XanLol6JIT963haAsEU4= Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-55f50cf2021so2725251a12.1 for ; Sat, 24 Feb 2024 10:20:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1708798832; x=1709403632; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=N3+t/XZyLVyYIW4d3qvzXnQMDuQBRtLknCNAnrQ0BaQ=; b=Oz6GB46vuxV2WdY7suBD7Kz81/5iHwfjKdU3U6aDcAJom9gpg8LYY9kNHb3TyiSyXp gdP+BO+9Q04IqlqfPslRWeL9+3BWsupMMa3ou/9Wy5NEJxkov3cHbs0PqmrcHv68PiBC Gz4C6K9ZGymoB3mGNEPdeWumOnakbfGomRfKw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708798832; x=1709403632; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=N3+t/XZyLVyYIW4d3qvzXnQMDuQBRtLknCNAnrQ0BaQ=; b=O9JDFoeaUAPKLDRoyLg2KwnNkR1VktqD8wUxv9fjRXRe8el8YhgnSoPZZ1FLPVLIhr IzuwZJDPMOZ0qpf76vTyYz5AL/DlJ95x4YajOKm7bJ2FP4h1c/WiXiSC+mGuZB6afG3Q dziF9HM4cKH5tK6tZESemVAzFlSWiXN2uFymp6Tkv5RBmBR/T+xFL9Gl2COwuIz0RFsV iAsdKpY1WInVIf0Qt8A2VyOTy/3Nefp3tkBHyor5nFWKuBInRNrMY+yRCrFLRViqrAtv WTOJbzZ92mSjAKVTTBAb+atP+B8a+o8UrSM0dITG1Nnb7mbCF3ZvV5qJGsJmfgEet97Y 1OXA== X-Forwarded-Encrypted: i=1; AJvYcCXkL6zSEDLtrHr7e8ppPbxQLcsuQVQuNQZ1gITR0yLEjLjlZtlKVc1EiCE8j16bSpxf9/uG9NuBUuHijiSc8FbXYBE= X-Gm-Message-State: AOJu0YwAa11SmJSHGViCvr5kAuqr+kj3DCrlomtywuHrqItuLyQgT6ND FShqV79ctMt8BYR3FQYaoNwiqYDkWihGIJxrgho1H/uV2wSHwVoAMplry5Fns+9+IjQvl7gBkns 1Y7E= X-Google-Smtp-Source: AGHT+IGUV3xUN1RfLHBvPbrny6nGGXMWPZh9JXiNUJxzD7T75xmvsestsJLn04RPtaA4fPL65qCyVw== X-Received: by 2002:a17:906:ae5b:b0:a38:63d4:2273 with SMTP id lf27-20020a170906ae5b00b00a3863d42273mr1933860ejb.35.1708798832146; Sat, 24 Feb 2024 10:20:32 -0800 (PST) Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com. [209.85.218.43]) by smtp.gmail.com with ESMTPSA id kt20-20020a170906aad400b00a3cf168d084sm782168ejb.165.2024.02.24.10.20.31 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 24 Feb 2024 10:20:31 -0800 (PST) Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-a3e550ef31cso215038066b.3 for ; Sat, 24 Feb 2024 10:20:31 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCXp6SS6u7fTnNfyCA9G2id6T0fQuH8muvcmT+Rb5rly8N+AtntRq9L8dl3kwFCwKUIOBzH4cVQstpQYUjIf98jeoF8= X-Received: by 2002:a17:906:565a:b0:a3e:6a25:2603 with SMTP id v26-20020a170906565a00b00a3e6a252603mr1930750ejr.33.1708798830732; Sat, 24 Feb 2024 10:20:30 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Sat, 24 Feb 2024 10:20:14 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO To: Matthew Wilcox Cc: Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Dave Chinner , Christoph Hellwig , Chris Mason , Johannes Weiner Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E15AAA0003 X-Stat-Signature: z83n8rycpso3pqbmubdohc6snkyhshca X-Rspam-User: X-HE-Tag: 1708798833-663929 X-HE-Meta: U2FsdGVkX18UB1kQQgDYbqBQCxkFpbXZ4DAsXivg4li12dCLtpHfkjzxHggR7tNmNf6QrcwIrps4JijV08aUz+HzTXIm+frOTTrgxrQgug8mJQhfmpDBSomvPVdmXKlWQZ4qPOgT9u5H5sfqyIXIsO/AAiC63RmGsbJzq+5GE3Dz/mRkwKMMGoQ7BP+Qu8bVk2Upmv8cVyyWp3FcJRCA0GG4pz328XpHsDCIIcr7NYbauoBlikmhuMKk1FABR60zuR0oziCaydSeeWpOID8g6Xg9Kg1jfzLHChJLh/LlG5N1ayUIo3SkODcL5HeKjhfn+5QIWjjbIXTxYRKQFcTUIuFOUTR3QR/KZeTWpecR0t5Hnmq+mraIKthfJSZAPCpCWSz5yZ01oz0mdbC4hsN6ifFogdVWvi4djmpL5HMcLZojMk5DBYUMsXCklN1rJS92St8fMY7Ibd3+XNikzm5yg7lf1R7bVN4zFqpS/2eOoJ4M0VpwKYgp0EuGXW9rywZsSBtSg65Wd2zgnkOyGePVx4KNef7FKn8amVrCSEvPhKWoTWeWIkr4xLOetpDJKhZdh3L+Ep43SxNKEzqZjD0RR73O+4WZBf5jTnDd0+tqwBFv1yK7YMrqcA+xXDHxO5pr3Po97cENE4HZl19U56znu4TRuHgZ6repqNnZGkcu5dTD5Oz/49m3NWb0TfFmZgKsFvx6eQLwTxzRFz0iFJIkjsZ6hKECeFdiz7gaO0KB5j2X+l4RhmQrNr5s3tD6subtt82/pL9E7Bbr/tiZ7FC2BGQzDSBf8uLa8iJII63XY5tO/7rzvl1N6nOx/XE8u6sy4Pt7HHX2Liv9mT9cAbfqsr8jaEEsuyHfbABUPFp7MCRWMHnOGMORtfpY3z2V3YutLXkRqghmHYQLFDsg6z5Ctf7SLepudiuXEW9WMZxXzXdNpY4V1Y8OhCxB7m4hHy7z+ZoRY6uK4I2ouHgcgrn jHJVocYT RNxwunXN8gSyKWJA+uvNsQ70VIiFP/yyTDOUIB4xDBXZDzn3WQ/ZiR608ZS3ABLzOORvjmJqMc/wU+yMiBVEwa6iutnD/EYwAamuEzXWfvjaW2jhk2vCJRdLatZZrEe2OmQYmgkV0ajV+1DOosssT6LfGOt1W2I4LEieSUpVhxGtSqUMrvKXoak4Mw0UBVKy0V3gBnkmQuudda9FyNm4dU8FbaHuuntZGpend9bq0BMXzr1EJiOfLCAVVK07FIJDXZdq9nayN94htjDLHhoCU9yXBDg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, 24 Feb 2024 at 09:31, Linus Torvalds wrote: > > And (one) important part here is "nobody sane does that". So > benchmarking this is a bit crazy. The code is literally meant for bad > actors, and what you are benchmarking is the kernel telling you "don't > do that then". Side note: one reason why the big hammer approach of "don't do that" has worked so well is that the few loads that *do* want to do this and have a valid reason to write large amounts of data in one go are generally trivially translated to O_DIRECT. For example, if you actually do things like write disk images etc, O_DIRECT is lovely and easy - even trivial - to use. You don't even have to write code for it, you can (and people do) just use 'dd' with 'oflag=direct'. So even trivial shell scripting has access to the "don't do that then" flag. In other words, I really think that Luis' benchmark triggers that kernel "you are doing something actively wrong and stupid" logic. It's not the kernel trying to optimize writeback. It's the kernel trying to protect others from stupid loads. Now, I'm also not saying that you should benchmark this with our "vm_dirty_bytes" logic disabled. That may indeed help performance on that benchmark, but you'll just hit other problem spots instead. Once you fill up lots of memory, other problems become really big and nasty, so you would then need *other* fixes for those issues. If somebody really cares about this kind of load, and cannot use O_DIRECT for some reason ("I actually do want caches 99% of the time"), I suspect the solution is to have some slightly gentler way to say "instead of the throttling logic, I want you to start my writeouts much more synchronously". IOW, we could have a writer flag that still uses the page cache, but that instead of that balance_dirty_pages_ratelimited(mapping); in generic_perform_write(), it would actually synchronously *start* the write, that might work a whole lot better for any load that still wants to do big streaming writes, but wants to also keep the page cache component. Linus