From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAA47C48BF6 for ; Sat, 24 Feb 2024 23:41:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3C786B00E4; Sat, 24 Feb 2024 18:41:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EEC676B00E5; Sat, 24 Feb 2024 18:41:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB3776B00E6; Sat, 24 Feb 2024 18:41:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CBD066B00E4 for ; Sat, 24 Feb 2024 18:41:09 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5C69040472 for ; Sat, 24 Feb 2024 23:41:09 +0000 (UTC) X-FDA: 81828320658.19.7F81707 Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) by imf01.hostedemail.com (Postfix) with ESMTP id 3FD4540011 for ; Sat, 24 Feb 2024 23:41:07 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=cvSBBowt; dmarc=none; spf=pass (imf01.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.48 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708818067; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Tn6hnRSUzCG2W9ifUDxK+evyzdzo2d3iCpPIutte+dk=; b=gtycOChoLLpDkwvK9v51XE/Ih+LGrAcXvpPhie9q3IdjRVjjtEA5ijSVjbfRMw6nL6oI+1 mzpZ+lkg9QEcrzZBRxwXUJy0fB9bqIbtl9O+88+KXZLDBmQ0KbfW5HcmR8jdguJwDbVp1D 0k4h0F6YUp23c8QpTpRLp5l9l7O03Do= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=cvSBBowt; dmarc=none; spf=pass (imf01.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.48 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708818067; a=rsa-sha256; cv=none; b=qfMH5LJi268F3b18VrwNhHbdixSijIbpDY9dMa0rX86PHSdI2ZTIAZQBTQkqjTFWszXEri 41AMhemQLpcgRIdt4Z6CnjsC3VquCU3hmQdaAgidY5bc7fTD6Zl00RTAtuPl57pG73NlyN XS0bMaKNxmhWRc/3IxrxmwjNWV1OOWM= Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a3d01a9a9a2so212366966b.1 for ; Sat, 24 Feb 2024 15:41:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1708818065; x=1709422865; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Tn6hnRSUzCG2W9ifUDxK+evyzdzo2d3iCpPIutte+dk=; b=cvSBBowtxFHhS1NutrePGaCubAojcmACHF4b4iP56GAehbtf8O8BVyqKm6oM4mwM/y Ec9UPX5BfOPmdQSl6YNqi1Rxh39a8W9eRDX747UrhIA2AzEYggCvXsfFQ7yvfHvZjU4d NsOfi/vpW7H6dAEIOS89Sd4kd6qoRFRTQisFE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708818065; x=1709422865; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Tn6hnRSUzCG2W9ifUDxK+evyzdzo2d3iCpPIutte+dk=; b=WLhZCVcqCl4cWl8IDiGg3moWJSdnKgbFebvvA8yhkFs/zyj9VV/2KtFA81MhDA5ncp wwJlu+4y4xZFPx4QlY9SLm1HztYwgczP7Cl9XC+8nMXL0Om5ci20IM0FjZdwQ+OapUue 56FJwNUXaUkyTfJUzMjisvgceUhnIDG/4c9BLq/pl64jf0LU/R9SJtSQ11auq4jmm9wq zNjWOfNlEYTTwQZyrzsLjz6EwscnIv74GyXQhdIJfY0zw5aavluELt9kkuZi4FNtOLwg sSyIaYB60S6hlmidClw4vJSR6c5vcY7nEOU1xtCXYGea4yT223gluzikOV+/qaVazV7e KYmg== X-Forwarded-Encrypted: i=1; AJvYcCVGQTsBpiRpL0Lbt1AkeGrJcFiEovcqrgFjKYGmyNtUtYIGiVG5gYhact5AgvVPyCxyGO7LHuRzLKQL+W9Rm++BJho= X-Gm-Message-State: AOJu0Yyj4pAsiiN2obXRdxJQaBsZT4zQzvrdaoMWs7I0ouakJoJs4cW1 SfPIpj49srzMpzmHlcr6stWaj/Hbaw+qDpdNBHTNDjbtXS0sxt/wNwPw3OcwG1jpLyc4A3l5N2v sMxg= X-Google-Smtp-Source: AGHT+IGCdfvYfea5KDcFptkqEzb/9ASeNQ22Mq0bpV3gRbn0YsmzRlPH1CPyRfaC2uYD1heA2XmKgw== X-Received: by 2002:a17:907:1049:b0:a3e:d5ac:999d with SMTP id oy9-20020a170907104900b00a3ed5ac999dmr2100672ejb.48.1708818065686; Sat, 24 Feb 2024 15:41:05 -0800 (PST) Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com. [209.85.218.48]) by smtp.gmail.com with ESMTPSA id cw9-20020a170907160900b00a3fb7cafad8sm990099ejd.39.2024.02.24.15.41.05 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 24 Feb 2024 15:41:05 -0800 (PST) Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-a3d01a9a9a2so212366266b.1 for ; Sat, 24 Feb 2024 15:41:05 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCXFp03JHcfMB3jDVAFLL2pTSGeHgXibgEALR+DPU0+ExdojY4SFvays/DcuePAVMNTANw2rtWTh4kTX5sDWdyTMudw= X-Received: by 2002:a17:906:3593:b0:a3f:10e8:ae2b with SMTP id o19-20020a170906359300b00a3f10e8ae2bmr2172770ejb.54.1708818065004; Sat, 24 Feb 2024 15:41:05 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Sat, 24 Feb 2024 15:40:48 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO To: Chris Mason Cc: Matthew Wilcox , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Dave Chinner , Christoph Hellwig , Chris Mason , Johannes Weiner Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 3FD4540011 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: wqxwzsjfbwwoizex4k1sswtqrcu6eued X-HE-Tag: 1708818067-316408 X-HE-Meta: U2FsdGVkX18uPIUrcPCFlDQ6LO0TAlX3bxjHzQbGoIe+1AcnfngGyB6LCGv2nFCaRj6+Vw8k0fC+f1FciNPTezB+ulX90o9T1YnQ66xFPcaqH6pX5RxmKVxyNgoLujMKoUn7Vo+aCyOxPKhTWU6dktu6JwEnKpkGRm9S7eE1vtHLsZmfe7AJ9m6wvR+qYaJiaHEcmkm9f7blVnBEgdQVBywIPgSVoSaDUhmml+WlYk6sv9Ly2PbcUR2kloJT1scVKh/OA5Pvae8sELJ4PYKABm5GN3M5zTyo+3U+U1j0tvAGc8AOHTpzTNifAJuaaUKttxRWBYkN+OFvkh+fyW2z1ZaWU3F7qd7DkvPvQEvg65Q41gpLHl6M7x8j07GWH6U8RhdlZQbJyb0EGWFRTIHMhLZQuJPMqsfo97rvkGOUl36FjGR+sTniBkO4G3ZB2wO9Bk7Ly4epw/FuzKTPkB4ePgGw+tU8/CMKC7rKhK7WU6w53ldilU+j9ze0AZojEre9Ma9lFuPo1e2DHLi83Pd64S5hCblSMhD2fZPgY3SVrpWdQFVSJ7bzTjvUHyjeWED6FrReP6URQYBPU5JGOij2bEAlSBhAOQKkTgZXRP4Ly2IPfp3AN63ANl8IZ8nhZbVOpNRehaI27vM7J+oVCtd08coOkwg/pWiaJo9jRDF3p8KGHHMR5hCns0FByB4iI9g5ao7wDS3cmnbS4RYXauUUlzcZ4Nb8yRP5/5MX9gx7yQl7RrNFHa1T3xzUo895N3hylFjIAkZUnS/34+D8Y7jed8VhqoTZ8tvHjjGfxNLakn7YxHiZ9T/9m7n9sBkcLZdXqG+FFIbYJKXlmMmHVMu7pGhNePkYiHVHHGLkk6/BeA0QXhk/pS2Nj2yHMFi5kPAcOEOGZ/UWysgr6lRUlq60So/4C0gD2+r09AQ9RDhAMp4vH73LsGCw3zz7nUOW24anmn8axlT6s0gV3DNeFCY mj6SIhuF +xPD7baIn+BL2h02qGSTNlL6EJZbByY/3RR2lrz5T063bsL1HeCGr1PttDPlYI0Rrv15KR8kW/QE4VULmLjBO9fcKaSoPim2netictzM38AwzaE9W8zlCYtghg4Ew9NfL7Ed7qF4LBx1D8CeqPeiidMAt9HfkHBfi6M7MSaftOaL9ms0O8fmVCF8d+WlFs+m2ZhCmWc2V9bAq4iC5mFisdjiapTrSu3XN1vxmpegblC3M2Ih/HFc3hwgLyQWcy84mkLSXjQa5dtN6AbbzqGTfQ7kHUjv1q9YcKvhe3wbwElg2r6FuibX1har+fBbv8mbboCOowv72lgBKUP6iM4OYmjgHx/y+qs7hryH7lr0pSVgCZdxHvrkcT5yDXp05SUGj/lFrGgctdT51ruM+7MTtfHYngYIInfw+XsDVFvnuUuwUxUXO1obzN2CJ9MOdpShn9fv0lCRBZvfJEqESor6OLRjHfyX6qIzj9FI6ft7g6MQCJ6mGoJWtU7wCEdN480+3QhfmX4xl1iTQvFO3hyi02UhDW/zXNU5ayR0a7yyiJtViv1MoLqnUONM/azVB33j1Bk3G9VexDOZV2Ndxc1tRzg93xg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, 24 Feb 2024 at 14:58, Chris Mason wrote: > > For teams that really more control over dirty pages with existing APIs, > I've suggested using sync_file_range periodically. It seems to work > pretty well, and they can adjust the sizes and frequency as needed. Yes. I've written code like that myself. That said, that is also fairly close to what the write-behind patches I pointed at did. One issue (and maybe that was what killed that write-behind patch) is that there are *other* benchmarks that are actually slightly more realistic that do things like "untar a tar-file, do something with it, and them 'rm -rf' it all again". And *those* benchmarks behave best when the IO is never ever actually done at all. And unlike the "write a terabyte with random IO", those benchmarks actually approximate a few somewhat real loads (I'm not claiming they are good, but the "create files, do something, then remove them" pattern at least _exists_ in real life). For things like block device write for a 'mkfs' run, the whole "this file may be deleted soon, so let's not even start the write in the first place" behavior doesn't exist, of course. Starting writeback much more aggressively for those is probably not a bad idea. > From time to time, our random crud that maintains the system will need a > lot of memory and kswapd will saturate a core, but this tends to resolve > itself after 10-20 seconds. Our ultra sensitive workloads would > complain, but they manage the page cache more explicitly to avoid these > situations. You can see these things with slow USB devices with much more obvious results. Including long spikes of total inactivity if some system piece ends up doing a "sync" for some reason. It happens. It's very annoying. My gut feel is that it happens a lot less these days than it used to, but I suspect that's at least partly because I don't see the slow USB devices very much any more. > Ignoring widly slow devices, the dirty limits seem to work well enough > on both big and small systems that I haven't needed to investigate > issues there as often. One particular problem point used to be backing devices with wildly different IO throughput, because I think the speed heuristics don't necessarily always work all that well at least initially. And things like that may partly explain your "filesystems work better than block devices". It doesn't necessarily have to be about filesystems vs block devices per se, and be instead about things like "on a filesystem, the bdi throughput numbers have had time to stabilize". In contrast, a benchmark that uses soem other random device that doesn't look like a regular disk (whether it's really slow like a bad USB device, or really fast like pmem), you might see more issues. And I wouldn't be in the least surprised if that is part of the situation Luis sees. Linus