From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 456E5E7D0B4 for ; Fri, 22 Sep 2023 05:03:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A23006B026D; Fri, 22 Sep 2023 01:03:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D3776B026F; Fri, 22 Sep 2023 01:03:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89A2E6B0273; Fri, 22 Sep 2023 01:03:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7AD146B026D for ; Fri, 22 Sep 2023 01:03:32 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 25C3C1A06D7 for ; Fri, 22 Sep 2023 05:03:32 +0000 (UTC) X-FDA: 81263040264.24.916E720 Received: from mail-pl1-f176.google.com (mail-pl1-f176.google.com [209.85.214.176]) by imf09.hostedemail.com (Postfix) with ESMTP id 28A31140003 for ; Fri, 22 Sep 2023 05:03:29 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=UuBD1YN2; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf09.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695359010; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eKfvtCzoPO1DjRH7hqL/PCP8n2qTK2huPA+M1ou1kWk=; b=kkUE/HtN68AefMvfjXxpw/3Y9MV3/egf1sQcoNeuCuBN44vb1C3nGNz7n9DgeNjaHKso9u 1igLYOIBq7pJXwHkZDVjFlW03BahVS6UUmX5HImw4fyjbXbRRJoPl+pE+OZFsLDR1uItC/ DFuoBBlYL9NneXsjfVu8xE9j4UKfc5U= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b=UuBD1YN2; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf09.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.176 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695359010; a=rsa-sha256; cv=none; b=lS4UIQMmbKx/s0FqjDhIzoU5L1Z1+ImjJzLJVH9D/pvnosdONj8Pg8ILWVy5BR+0N964VQ 8iwD/+0iottZujKy8C6hXSIOuIVE5T2zZxMJIDodJ+/Vb+4jzdQUF0wrA/msCjrGm/0oVw ClkNm9Mir/DdofHkwtoVzqGCwo3WQbo= Received: by mail-pl1-f176.google.com with SMTP id d9443c01a7336-1c44c0f9138so14417545ad.2 for ; Thu, 21 Sep 2023 22:03:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1695359009; x=1695963809; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=eKfvtCzoPO1DjRH7hqL/PCP8n2qTK2huPA+M1ou1kWk=; b=UuBD1YN20Dz50TjrD5KVnUPtwa8ujSVTSu0jhl/UAueGHgXeBbTX53lx1MfK73INJR ZID1O2yEGVhI5/CToYmfvLWvlHKr+PXinWNkBxPv3DMsKN3iDUoZT/wZTtmaPjyJhkhJ 1QjgPdtU5XAWbV0Pdf9RR8z6vnp7ZDfwEBtd1n7IfJBl69E1JQv0vBxZQkdDmj+7eTIT pvxu5NjgbTgDz61UYUaeWm5eTgAXbGrLFh5Urc4lZMRR7Q0OBujGBG5DPCtCn6MCYZDo 9o96mivCxGZAsC4Z7urtVuTscvzpyq8z+qybPf0P54JUB1pmX6zRlLQJ95R9tFR+VGRw HK5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695359009; x=1695963809; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=eKfvtCzoPO1DjRH7hqL/PCP8n2qTK2huPA+M1ou1kWk=; b=uA512tfjPggztrKKnEHvTJeiVwnaHn5PabTKXh6XfPBxKeLuFrf7IBCdNLNiILGwb9 ZVlr595tWeptK+rdZPlv65bLFsFANS/tvVJQuWHJOQiPd1dh2x2gEkA2H3Omr/dYVw1V onTXuqdv0pF8pE2moYy9WqQDnTxCLl+6HgYEbbxLdeMKg96UEVfE0cfqV3k6t/855PGf pPe2Amv12JsqnkrEtEZrQTl/uDbFAo1lfPRuabQdLR7IDMUAQMeaiQYAoZK8U2mk4w9T iyJY/GB1DS024APBPvy1/I0MLyYGf7TiQACskKjM4OAxvEzSFoA+/pGNATMS8nEZNTpR K9ig== X-Gm-Message-State: AOJu0YxjueWQtFfWYavpmcyOz3o77HjwWbXNclMNo8ZFyJFaZFIKz4f+ 5E3vkSat1+oUL7XhM4buIStHtA== X-Google-Smtp-Source: AGHT+IFeu1EA7GyGKW4frzHTSmVZS0RTtP2XJVF07Ex6N8rmUJE7dIUep7oHw3F8G6mpIGwAto30ww== X-Received: by 2002:a17:903:2305:b0:1c2:36a:52a5 with SMTP id d5-20020a170903230500b001c2036a52a5mr7889794plh.57.1695359008731; Thu, 21 Sep 2023 22:03:28 -0700 (PDT) Received: from dread.disaster.area (pa49-180-20-59.pa.nsw.optusnet.com.au. [49.180.20.59]) by smtp.gmail.com with ESMTPSA id jk13-20020a170903330d00b001bbd8cf6b57sm2466460plb.230.2023.09.21.22.03.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Sep 2023 22:03:28 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qjYK5-003xpJ-0X; Fri, 22 Sep 2023 15:03:25 +1000 Date: Fri, 22 Sep 2023 15:03:25 +1000 From: Dave Chinner To: Luis Chamberlain Cc: Pankaj Raghav , Pankaj Raghav , linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, da.gomez@samsung.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, willy@infradead.org, djwong@kernel.org, linux-mm@kvack.org, chandan.babu@oracle.com, gost.dev@samsung.com, riteshh@linux.ibm.com Subject: Re: [RFC 00/23] Enable block size > page size in XFS Message-ID: References: <20230915183848.1018717-1-kernel@pankajraghav.com> <806df723-78cf-c7eb-66a6-1442c02126b3@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 28A31140003 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: g7xe575ym8bem8wtw59o7qioi84yt31g X-HE-Tag: 1695359009-583 X-HE-Meta: U2FsdGVkX1+/1uKC1a4Uv9Iue6+Mu+CLqSOwRMP310El2/s/flyCMrc7IEVsY8dU3oYqJcVqLBjivcy4P1K2/zl1+Os+JZLt8lZ9Hnxt3sSx+9Qr63QTqmAN4ZVAW+aXkILVr8nqSJ6CGPD+rn6FXHwzjYuGn74z8bzZWmvWpKP5TUgPFEqD7hY9l/vL6aTYls1V1V4Q9qtDkas+fdxJ6Hrla8OuXJKISrM1HIxJZ6AufmLl4LApFCB5Dk/D4gKAdgiffGri8WbwY8HJPfLFaU2l0NwoRR41mZbM4TQmccfnCy3CLO+FmSJdmwSiIo34OHk3IDTrveTxcCkJjViLw+ViAMqKNOi78R0KBFvzBsbk5N7fl9ZtmwzkpJTUahqPWqagKgs2nwNjj12yk7gvRix53wAp/xZiXkFi1mD507N5VoO/xAOiBcucp1Fx35vkzAAc4CAx3ZH85RJgZi5BZny+hSc0HjH4Ah1S9mu0PXt3689Ipyrz2PPDevPtT0QUhtcuT1H+TIvuxzZqoUxNVIGH/Ko1RM84gpQhlgmVR8q8HrDs8YGV0XoZ2e4t7Um0N/uMgnvnHbRfZnR0rb3vJovkL0UtsndePwLRx1UiasAkFQDdu9+eJ51UIDoRDGJJmgs8ylBwf2DeuV5aa4/zRX2jbl72ISFOWMxZtdK006jjq5nbnTKANxTXhnwbHgIPzAh24Aqgy7ER6DeJP2cNGsSn7aCFh1BVkSYJO/fZwEfQq9D34JxsxjXgjnp8A3GTNTZsE/xQJyTJTNe+K4SoIlAONAeXzqfWoprUaLLQcNOtdDmwGiCDb2zrAYVfwjDCChWMPXaMiPE2rI7xQsThp5ScxArT+16sxoSKMXlQjOHPaKz7m9weaG1LgCjdm2pSHJGcqQ3rQH61V+xp/ByFP6+94HaBKN1kxq98Je/9v6aRdaO52eeAgb7d3BDLfD3XbUrcXVUdRsKnqToKMPD 8WiQ1SvF NCswy2fSmxsLv1vw8FNC0/CM4mAyOlat/UwcmwP2Q0p2gue6MQKQc3BCy7kkgzK64+N1Gu42Y9KxvYkCuYdL8Tw3j/RW+btEVPKs8fI2QffVmLJ3qluGmX+g6zCe8rUWHP4G387NKnMKP8QaJ1c1DpK8cfaKoJoPSFmadx9ROix60jYgPMv9oRWbj4sGfnNOnC8hOcrti6durH4GhLr0viwTYxneo6qCnBNLuOSlopxxgncyLEH98g1cIsXQgvURE5E/u2DnmOu4kdjULlvDu2R52Od4KwMdiiNzpQgcuSEDnvsg62bXnjw9Xaj/t+t1dDFhoG5I9gemZeQDIcUhw+ZSk8/NHHosuXe8eY0zHgO/50ajuCjamEQJCq6c106z0/rvaKrnt1zyGXU3PJd9nt2KnYycnd1DmggAX X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 21, 2023 at 12:18:13AM -0700, Luis Chamberlain wrote: > On Thu, Sep 21, 2023 at 04:03:56PM +1000, Dave Chinner wrote: > > On Wed, Sep 20, 2023 at 09:57:56PM -0700, Luis Chamberlain wrote: > > > On Wed, Sep 20, 2023 at 08:00:12PM -0700, Luis Chamberlain wrote: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=large-block-linus > > > > > > > > I haven't tested yet the second branch I pushed though but it applied without any changes > > > > so it should be good (usual famous last words). > > > > > > I have run some preliminary tests on that branch as well above using fsx > > > with larger LBA formats running them all on the *same* system at the > > > same time. Kernel is happy. > > <-- snip --> > > > So I just pulled this, built it and run generic/091 as the very > > first test on this: > > > > # ./run_check.sh --mkfs-opts "-m rmapbt=1 -b size=64k" --run-opts "-s xfs_64k generic/091" > > The cover letter for this patch series acknowledged failures in fstests. But this is a new update, which you said fixed various issues, and you posted this in direct response to the bug report I gave you. > For kdevops now, we borrow the same last linux-next baseline: > > git grep "generic/091" workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev > workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_1024.txt:generic/091 # possible regression > workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_16k.txt:generic/091 > workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_32k.txt:generic/091 > workflows/fstests/expunges/6.6.0-rc2-large-block-linus-nobdev/xfs/unassigned/xfs_reflink_64k_4ks.txt:generic/091 > > So well, we already know this fails. *cough* -You- know it already fails. And you are expecting people who try the code to somehow know that you've explicitly ignored this fsx failure, especially after all your words to tell us how much fsx testing it has passed? And that's kinda my point - you're effusing about how much fsx testing this has passed, yet it istill fails after just a handful of ops in generic/091. The dissonance could break windows... ---- Fundamentally, when it comes to data integrity, it important to exercise as much of the operational application space as quickly as possible as it is that breadth of variation in operations that flushes out more bugs and helps stabilises the code faster. Why do you think we talk about the massive test matrix most filesytsems have and how long it takes to iterate so much? It's because iterating that complex test matrix is how we find all the whacky, weird bugs in the code. Concentrating on a single test configuration and running it over and over again won't find bugs in code it doesn't exercise no matter how long it is run for. Running such a setup in an automated environment doesn't mean you get better code coverage, it just means you cover the same narrow set of corner cases faster and more times. If it works once, it should work a million times. Iterating it a billion more times doesn't tell us anything additional, either. Put simply: performing deep, homogenous testing on code that has known data corruption bugs outside the narrow scope of the test case is not telling us anything useful about the overall state of the code. Indeed, turning off failing tests that are critical to validating the correct operation of the code you are modifying is bad practice. For code changes like this, all fsx testing in fstests should pass before you post anything for review - even for an RFC. There is no point reviewing code that doesn't work properly, nor wasting people's time by encouraging them to test it when it's clear to you that it's going to fail in various important ways. Hence I think your testing is focussing on the wrong things and I suspect that you've misunderstood the statements of "we'll need billions of fsx ops to test this code" that various people have made really meant. You've elevated running billions of fsx ops to your primary "it works" gating condition, at the expense of making sure all the other parts of the filesystem still work correctly. The reality is that the returns from fsx diminish as the number of ops go up. Once you've run the first hundred million fsx ops for a given operations set, the chance that the next 100M ops will find a new problem is -greatly- reduced. The vast majority of problems will be found in the first 10M ops that are run in any given fsx operation, and few bugs are found beyond the 100M mark. Yes, we occasionally find one up in the billions, but that's rare and most definitely not somethign to focus on when still developing RFC level code. Different fsx configurations change the operation set that is run - mixing DIO reads with buffered writes, turning mmap on and off, using AIO or io_uring rather than synchronous IO, etc. These all exercise different code paths and corner cases and have vastly different code interactions, and that is what we need to cover when developing new code. IOWs, we need coverage of the *entire operation space*, not just the same narrow set of operations run billions of time. A wide focus requires billions of ops to cover because it requires lots of different application configurations to be run. In constrast, there are only three fs configurations that matter: bs < PS, bs == PS and bs > PS. For example, 16kB, 32kB and 64kB filesystem configs exercise exactly the same code paths in exactly the same way (e.g. both have non-zero miniumum folio orders but only differ by what that order is). Hence running the same test application configs on these different filessytem configurations does actually not improve code coverage of the testing at all. Testing all of them only increases the resources required to the test a change, it does not improve the quality of coverage of the testing being performed at all.... Hence I'd strongly suggest that, for the next posting of these cahnge, you focus on making fstests pass without turning off any failing tests, and that fsx is run with a wide variety of configurations (e.g. modify all the fstests cases to run for a configurable number of ops (e.g. via SOAK_DURATION)). We just don't care at this point about finding that 1 in 10^15 ops bug because it's code in development; what we actually care about is that -everything- works correctly for the vast majority of use cases.... -Dave. -- Dave Chinner david@fromorbit.com