From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD134C05027 for ; Fri, 3 Feb 2023 23:53:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 257D66B0071; Fri, 3 Feb 2023 18:53:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2076B6B0072; Fri, 3 Feb 2023 18:53:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A9546B0073; Fri, 3 Feb 2023 18:53:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id EC1486B0071 for ; Fri, 3 Feb 2023 18:53:32 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id C2A511A0329 for ; Fri, 3 Feb 2023 23:53:32 +0000 (UTC) X-FDA: 80427635064.15.38AE186 Received: from mail-oo1-f49.google.com (mail-oo1-f49.google.com [209.85.161.49]) by imf05.hostedemail.com (Postfix) with ESMTP id F0A06100004 for ; Fri, 3 Feb 2023 23:53:30 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="mRN/u3Ma"; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.161.49 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675468411; a=rsa-sha256; cv=none; b=hRuSk9NkB88CpB76s+ywsNnnhXXcEoXltwEj/5xkrqGxsRIcUbFMaGUbX5nxFBF59qEmOc s6KGQ57FaNfq8f4Y4XpSg6bTuTj6rwAu0sRaPVT4A+/1CGjWB+NS3oeaAy1i1SpvXPr94n OFeiOtEwifyT92FmymYtR5qxsm6j98U= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="mRN/u3Ma"; spf=pass (imf05.hostedemail.com: domain of hughd@google.com designates 209.85.161.49 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675468411; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iyXSbv+D3GNzzmvEn+3B991hNgm4a05r2z3sYrzcrnA=; b=nZIaSVa1NbisZlsRrr4uMn03UWE3ucsDxzvJ/lxy97zlIei6JEDFUU1+w1fhcDdBmExkOv D1EyDk+w4ky+jP1vImattW1cmFUeqXJZIcQcCl5iREi3JMWmheG+fipme6Jh7pV6kLcX3u H+/kS27y1abgLaRce7yaQA6w4h76Yes= Received: by mail-oo1-f49.google.com with SMTP id b10-20020a4a9fca000000b004e6f734c6b4so658131oom.9 for ; Fri, 03 Feb 2023 15:53:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=iyXSbv+D3GNzzmvEn+3B991hNgm4a05r2z3sYrzcrnA=; b=mRN/u3MaOz2aKDgFFtDR1iMJ0EDM16wviXv1ORqlVQqMRi+q5WffP6Vks/Fs/uJqNS /hlS9ae84mSWdzkmkXiK9tSVuEiojTyXdJm0ZfzxJngkhIGiU9BOaCHww2Ax+qXDP1Yo Gbmhwub9Fs6RNbaSfUVh5vBaren+D8dsheBW15fXcIo2HAdqpRRKJZqI2me2ECPEPBYu 0FAhlDbanefuKWNY2ZXMq6Qh7XSUGcoL5cw3kscrjLup0mvxih8udwHFVQTsSN/KFluM jblXZx/6MOOuitaafobca1yVX7W7iEh21A3fWqHAhnPJmh5SMXsKuFG+Po1f/VhTNRIf b7Ng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iyXSbv+D3GNzzmvEn+3B991hNgm4a05r2z3sYrzcrnA=; b=nUSWkzNKeQ5dWqcPLev3m5sThwNl+2j7Oq6f7WCu5LLF2oD4Wlc/JMUpwZ3AYZBUNE MMT6SOnkxDzkq3l4BFMtQcrGl4SBdx1jCT81FrXxJZ+SWKjFsPOMuwDyuejJeH2Gf0hm m2TDYu/OPK7lxoaBQoDR2xUxohhENNxT9pPHpfJcqKFXkOXL5dNo8m5VqCpSecWo/h7a EENItVSJAq8ywH4Y/iKWUV3oIVWKOIlk9bfA5R5dt7ib64WGe/B3xQEauCBPBGnsUnYD EawY5nWLT6uyUlbjCaQTQvdvmL4bKeHSLhfypR4wl09YgmvRDSX5N8scI9WYR5iiNVIi WZjw== X-Gm-Message-State: AO0yUKUSeGmw+3/ywu/1QHKj6sVVsXr6T51/jsjXuGuXZQtSlxvd01Gf 1VN2IaMENP7gyQa28qQOJlzCMQ== X-Google-Smtp-Source: AK7set+v9dS3difWtAWx5KcQkWTaB3aqL3GNGkD9e+8RDgCj+V4wRZt+is4UqGbxfS8Dpb+uENa6wA== X-Received: by 2002:a4a:146:0:b0:51a:c9a:ee77 with SMTP id 67-20020a4a0146000000b0051a0c9aee77mr2451507oor.0.1675468409829; Fri, 03 Feb 2023 15:53:29 -0800 (PST) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id l6-20020a4a2706000000b004a3527e8279sm1560274oof.0.2023.02.03.15.53.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Feb 2023 15:53:29 -0800 (PST) Date: Fri, 3 Feb 2023 15:53:20 -0800 (PST) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Chuck Lever III cc: Trond Myklebust , Benjamin Coddington , Linux NFS Mailing List , Anna Schumaker , Hugh Dickins , Al Viro , Amir Goldstein , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org Subject: Re: git regression failures with v6.2-rc NFS client In-Reply-To: Message-ID: <4dd32d-9ea3-4330-454a-36f1189d599@google.com> References: <9A4A5673-691D-47EC-BC44-C43BE7E50A48@oracle.com> <5FF4061F-108C-4555-A32D-DDBFA80EE4E7@redhat.com> <44CB1E86-60E0-4CF0-9FD4-BB7E446542B7@redhat.com> <1AAC6854-2591-4B21-952A-BC58180B4091@oracle.com> <41813D21-95C8-44E3-BB97-1E9C03CE7FE5@redhat.com> <79261B77-35D0-4E36-AA29-C7BF9FB734CC@oracle.com> <104B6879-5223-485F-B099-767F741EB15B@redhat.com> <966AEC32-A7C9-4B97-A4F7-098AF6EF0067@oracle.com> <545B5AB7-93A6-496E-924E-AE882BF57B72@hammerspace.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1463760895-1450651585-1675468409=:1657" X-Rspam-User: X-Rspamd-Queue-Id: F0A06100004 X-Rspamd-Server: rspam01 X-Stat-Signature: 46afhxth9399jnig55bcjaxohip5eech X-HE-Tag: 1675468410-959904 X-HE-Meta: U2FsdGVkX18VMsVNHigSVxKhSYwhQ4zpXnyE/luTq/CK+t/C9lS0L0nzZe4++vaLJq0AmlxB8rtMA4HJzMK5E+Rzx8gjnQ6D1YiaUNy0zirc0fzFIXdyk5xOABEmRLq3oAugmeAq8dR+LOrDe10HqIamJDrqHWNBkguz+GpXIlKBQ6aqfPY8KM2M6NqTEKhNQzxoDoIvor2lxAwaETiNzTL/Op0P80pu2FtWgFeg5JEibFN/5SqVQ+94hraZJHl4xUw0q86ejE7Q8pRaHx7jUTIm7eozHm5lHwp2D/CxAMuxc4j+olJN2mIzREVDnjNQQkDMU2EBC+8rD3XN7VTRbpUNhvuNj3w1HZRDPH5sp/Mfydntj/4Wui6HtVU2+WG4fDwm3QkmSMu8i3Nj7EvnQ9gWLUFqPtKQda0LoXjcMlBYX0O94feAZZwZXy8cTQDCvyVNz67BNWM7mh+StaeTlXoHc2S9YXdVtKpnhh4LptfQFSUoNXuVIvxYCwjK+dTkCkXGIr6NGKa9goYcePuzVja+VDIDIqeFhv1uIBAt/hnuWiCB2Yy9Z8PyQ3Ev6qOXTPmdFIJNiYv36G/5RXCaFT0wSqqjszW/03ekU1fcLHzAP2gLr2EFMxcozyn8QhxGyioec8Cw9JfV/GyVFzN6I6SVVWWT35wevJH9j2zWO7EdZvQ4/vxk00yinTIWvnYzyXgAy/t/OQJcLVrNoJfFmnw1KDUFbF6puYngV8jwcBUQd3Gbtd2DcoN3CbKomLilohdTpe5NnbvrdGa6xYd2dk+TgB+QJf1AkVsR1mDhjQXm71d84wu/XNH9klLOYvaqhcVAzmDwWam/AV96sZQKhfuXShfubx65x9qSNrRrlJMidmGMa1rhaGOVNlsYlRKQWMrW3VslhA4Olw3suNarSkb2n7OKFptZridyUFu7BE1I6/Q8vqCfE9GWZnT26SWeyzqvhGdJm/ayFNl1KVI 3tQnPe2t B1qgTL+Yvkdjh20wPER1UkYkX49Wml+k9yN4XZRDG0ZyLO+Dq5vAXt9UebKVk+tFiSeoczywy7C/uK/I3ual/CC1rsM6DaSFRpQhvzZnz2ScBntkIPXHBFZa4ogsIg5w5/v1uFjJ+oAjMTgMNbhWHC+2t0TB9zc5Rdz6/UANP9AMjnipyUYsaZInObDRoNNHk7PyD7anFrQpzz0RkiHZGnxjfJjzGQszCcJRRCSiPPbkBms+Vn+17/bR5dtCXCWEkpDP/rUiVjdNny2szeuMS0+1o3wDutnaFMaQ7qZZsdgHanVffVJm39pbQDhXMO+qk56PkNmR6iCGfUB8jooJQoGJrvcyIoxZWjeqWeKyRQ0dcaffo7ypkegcnY3F+obqwpcur8KaMdzy5/2Xdmmlw8ze6AejNLHddSbJEHQC6wb/y6/Jsv+tEU4hACluvMjzQHxatClPpkLaJhnuZa4ekbsagz2krFzSkidKIQGrrgVyE2pIFuLkcUi3j/heVkREOMWJO7u+MpsKs30U0NjO573Kpk0eRaonAQZw9eZnlyoo/03pESgq20Lmg/6nHp1FD1GtBKK5TvZia8Ukn0kqHsGddEXXlyHgUNSG25egR4lHlaIOHSXjlIf+01hhdQC6X5XAZqtL0KWDIVaR1MvVJ4plcbVUHpXzIlL++dB3Ai1J/1/FrpbI1CHCuiJahu66XEoDi/nrEhyb7GI/c0lbE3Z8EhPuA8XW9k/zDaPa95HXtBpSDiyPE0EbZpe/2ZVBXKYHO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1463760895-1450651585-1675468409=:1657 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE On Fri, 3 Feb 2023, Chuck Lever III wrote: > > On Feb 3, 2023, at 5:26 PM, Trond Myklebust w= rote: > >> On Feb 3, 2023, at 15:25, Chuck Lever III wro= te: > >>> On Feb 3, 2023, at 3:01 PM, Benjamin Coddington = wrote: > >>> On 3 Feb 2023, at 13:03, Chuck Lever III wrote: > >>>> Naive suggestion: Would another option be to add server > >>>> support for the directory verifier? > >>>=20 > >>> I don't think it would resolve this bug, because what can the client = do to > >>> find its place in the directory stream after getting NFS4ERR_NOT_SAME= ? > >>>=20 > >>> Directory verifiers might help another class of bugs, where a linear = walk > >>> through the directory produces duplicate entries.. but I think it onl= y helps > >>> some of those cases. > >>>=20 > >>> Knfsd /could/ use the directory verifier to keep a reference to an op= ened > >>> directory. Perhaps knfsd's open file cache can be used. Then, as lo= ng as > >>> we're doing a linear walk through the directory, the local directory'= s > >>> file->private cursor would continue to be valid to produce consistent= linear > >>> results even if the dentries are removed in between calls to READDIR. > >>>=20 > >>> .. but that's not how the verifier is supposed to be used - rather it= 's > >>> supposed to signal when the client's cookies are invalid, which, for = tmpfs, > >>> would be anytime any changes are made to the directory. > >>=20 > >> Right. Change the verifier whenever a directory is modified. > >>=20 > >> (Make it an export -> callback per filesystem type). > >>=20 > >>>> Well, let's not argue semantics. The optimization exposes > >>>> this (previously known) bug to a wider set of workloads. > >>>=20 > >>> Ok, yes. > >>>=20 > >>>> Please, open some bugs that document it. Otherwise this stuff > >>>> will never get addressed. Can't fix what we don't know about. > >>>>=20 > >>>> I'm not claiming tmpfs is perfect. But the optimization seems > >>>> to make things worse for more workloads. That's always been a > >>>> textbook definition of regression, and a non-starter for > >>>> upstream. > >>>=20 > >>> I guess I can - my impression has been there's no interest in fixing = tmpfs > >>> for use over NFS.. after my last round of work on it I resolved to t= ell any > >>> customers that asked for it to do: > >>>=20 > >>> [root@fedora ~]# modprobe brd rd_size=3D1048576 rd_nr=3D1 > >>> [root@fedora ~]# mkfs.xfs /dev/ram0 > >>>=20 > >>> .. instead. Though, I think that tmpfs is going to have better perfo= rmance. > >>>=20 > >>>>> I spent a great deal of time making points about it and arguing tha= t the > >>>>> client should try to work around them, and ultimately was told exac= tly this: > >>>>> https://lore.kernel.org/linux-nfs/a9411640329ed06dd7306bbdbdf251097= c5e3411.camel@hammerspace.com/ > >>>>=20 > >>>> Ah, well "client should work around them" is going to be a > >>>> losing argument every time. > >>>=20 > >>> Yeah, I agree with that, though at the time I naively thought the iss= ues > >>> could be solved by better validation of changing directories. > >>>=20 > >>> So.. I guess I lost? I wasn't aware of the stable cookie issues full= y > >>> until Trond pointed them out and I compared tmpfs and xfs. At that p= oint, I > >>> saw there's not really much of a path forward for tmpfs, unless we wa= nt to > >>> work pretty hard at it. But why? Any production server wanting perf= ormance > >>> and stability on an NFS export isn't going to use tmpfs on knfsd. Th= ere are > >>> good memory-speed alternatives. > >>=20 > >> Just curious, what are they? I have xfs on a pair of NVMe > >> add-in cards, and it's quite fast. But that's an expensive > >> replacement for tmpfs. > >>=20 > >> My point is, until now, I had thought that tmpfs was a fully > >> supported filesystem type for NFS export. What I'm hearing > >> is that some people don't agree, and worse, it's not been > >> documented anywhere. > >>=20 > >> If we're not going to support tmpfs enough to fix these > >> significant problems, then it should be made unexportable, > >> and that deprecation needs to be properly documented. > >>=20 > >>=20 > >>>>> The optimization you'd like to revert fixes a performance regressio= n on > >>>>> workloads across exports of all filesystems. This is a fix we've h= ad many > >>>>> folks asking for repeatedly. > >>>>=20 > >>>> Does the community agree that tmpfs has never been a first-tier > >>>> filesystem for exporting? That's news to me. I don't recall us > >>>> deprecating support for tmpfs. > >>>=20 > >>> I'm definitely not telling folks to use it as exported from knfsd. I= 'm > >>> instead telling them, "Directory listings are going to be unstable, y= ou'll > >>> see problems." That includes any filesystems with file_operations of > >>> simple_dir_operations. > >>=20 > >>> That said, the whole opendir, getdents, unlink, getdents pattern is m= aybe > >>> fine for a test that can assume it has exclusive rights (writes?) to = a > >>> directory, but also probably insane for anything else that wants to r= eliably > >>> remove the thing, and we'll find that's why `rm -rf` still works. It= does > >>> opendir, getdents, getdents, getdents, unlink, unlink, unlink, .. rmd= ir. > >>> This more closely corresponds to POSIX stating: > >>>=20 > >>> If a file is removed from or added to the directory after the most r= ecent > >>> call to opendir() or rewinddir(), whether a subsequent call to readd= ir() > >>> returns an entry for that file is unspecified. > >>>=20 > >>>=20 > >>>> If an optimization broke ext4, xfs, or btrfs, it would be > >>>> reverted until the situation was properly addressed. I don't > >>>> see why the situation is different for tmpfs just because it > >>>> has more bugs. > >>>=20 > >>> Maybe it isn't? We've yet to hear from any authoritative sources on = this. > >>=20 > >>>>> I hope the maintainers decide not to revert > >>>>> it, and that we can also find a way to make readdir work reliably o= n tmpfs. > >>>>=20 > >>>> IMO the guidelines go the other way: readdir on tmpfs needs to > >>>> be addressed first. Reverting is a last resort, but I don't see > >>>> a fix coming before v6.2. Am I wrong? > >>>=20 > >>> Is your opinion wrong? :) IMO, yes, because this is just another ro= und of > >>> "we don't fix the client for broken server behaviors". > >>=20 > >> In general, we don't fix broken servers on the client, true. > >>=20 > >> In this case, though, this is a regression. A client change > >> broke workloads that were working in v6.1. > >=20 > > No. We=E2=80=99ve had this discussion on the NFS mailing list every tim= e someone invents a new filesystem that they want to export and then claims= that they don=E2=80=99t need stable cookies because the NFS protocol doesn= =E2=80=99t bother to spell that part out. The NFS client cannot and will no= t claim bug-free support for a filesystem that assigns cookie values in any= way that is not 100% repeatable. >=20 > Nor should it. However: >=20 > A. tmpfs is not a new filesystem. >=20 > B. Ben says this is more or less an issue on _all_ filesystems, > but others manage to hide it effectively, likely as a side-effect > of having to deal with slow durable storage. Before the optimization, > tmpfs worked "well enough" as well. >=20 > C. If we don't want to fully support tmpfs, that's fine. But let's > document that properly, please. >=20 > D. If you guys knew beforehand that this change would break tmpfs > exports, there should have been a warning in the patch description. >=20 >=20 > The guidelines about regressions are clear. I don't agree with > leaving the optimization in place while tmpfs is not working well > enough. And actually, these issues in tmpfs should have been > addressed first. There's loads of precedent for that requirement. >=20 > But it appears that as usual I don't have much choice. What I can > do is file some bugs and see if we can address the server side > pieces. >=20 > So far no one has said that the tmpfs cookie problem is irreparable. > I'd much prefer to keep it in NFSD's stable of support. >=20 > https://bugzilla.linux-nfs.org/show_bug.cgi?id=3D405 >=20 > And, if it helps, our server should support directory verifiers. >=20 > https://bugzilla.linux-nfs.org/show_bug.cgi?id=3D404 >=20 >=20 > > Concerning the claim that it was unknown this is a problem with tmpfs, = then see https://marc.info/?l=3Dlinux-kernel&m=3D100369543808129&w=3D2 > > In the years since 2001, we=E2=80=99ve =E2=80=9Cfixed=E2=80=9D the beha= viour of tmpfs by circumventing the reliance on cookies for the case of a l= inear read of a tmpfs directory, but the underlying cookie behaviour is sti= ll the same, and so the NFS behaviour ends up being the same. > >=20 > > The bottom line is that you=E2=80=99ve always been playing the lottery = when mounting tmpfs over NFS. >=20 > I'm not debating the truth of that. I just don't think we should > be making that situation needlessly worse. >=20 > And I would be much more comfortable with this if it appeared in > a man page or on our wiki, or ... I'm sorry, but "some email in > 2001" is not documentation a user should be expected to find. I very much agree with you, Chuck. Making something imperfect significantly worse is called "a regression". And I would expect the (laudable) optimization which introduced that regression to be reverted from 6.2 for now, unless (I imagine not, but have no clue) it can be easily conditionalized somehow on not-tmpfs or not-simple_dir_operations. But that's not my call. What is the likelihood that simple_dir_operations will be enhanced, or a satisfactory complicated_dir_operations added? I can assure you, never by me! If Al or Amir or some dcache-savvy FS folk have time on their hands and an urge to add what's wanted, great: but that surely will not come in 6.2, if ever. More likely that effort would have to come from the NFS(D) end, who will see the benefit. And if there's some little tweak to be made to simple_dir_operations, which will give you the hint you need to handle it better, I expect fsdevel would welcome a patch or two. Hugh ---1463760895-1450651585-1675468409=:1657--