From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CC49C7EE26 for ; Fri, 19 May 2023 17:38:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC240280001; Fri, 19 May 2023 13:38:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E720D900003; Fri, 19 May 2023 13:38:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D60DD280001; Fri, 19 May 2023 13:38:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C79E9900003 for ; Fri, 19 May 2023 13:38:23 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 9DB2BAE67A for ; Fri, 19 May 2023 17:38:23 +0000 (UTC) X-FDA: 80807713686.20.589FCA5 Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf29.hostedemail.com (Postfix) with ESMTP id 7980E12000E for ; Fri, 19 May 2023 17:38:20 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=P+KRVkT0; dmarc=none; spf=pass (imf29.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.44 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684517900; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2ucWVdofg5ar9Sy7957FpoXFp27vLdLXG53ARFC9CGo=; b=ihP7mAVozpU4Z1+ZXBGNQSkOCTPRksZhHhk8RwGmPdeuBx2u3bph81wu3lG2bdnaWGTGrg o+ApdVQeXDpnVcxLS/PFWPqNCnXumahlxmfwRhK+a64U8p28LaSAAt4abaElZKUKPuqHmU aiDEDeFGl5/ro8aNLS0zw1tNntPtq+Y= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=P+KRVkT0; dmarc=none; spf=pass (imf29.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.44 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684517900; a=rsa-sha256; cv=none; b=BclnCrCHloIyCNTZs/UaaNnfJdVOBItkcg8l8fmbiL3vpbRzSST/+pe0AkW6z9b+X8Xqik Z1iA42Q3y+ksIUSraZ69tOJB5YhILEFo7q/i6SjKLw9GKkHjQbSxcZukw0s8B7xpgTf1b4 08Mzs8LbUNqVW5tCddCkmsA/AdfCJTQ= Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-510d6e1f1b2so5998016a12.3 for ; Fri, 19 May 2023 10:38:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1684517898; x=1687109898; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2ucWVdofg5ar9Sy7957FpoXFp27vLdLXG53ARFC9CGo=; b=P+KRVkT0GqD6wqtBVbKC484I/8v5T5Szu63YKDnQIkP9nY9pOXhzfup//0Of7YMWi4 Qf2gj+3xtaLwGjcrY/4Ol5fGcYZmZbFVr1vTrM/kva/bY78nFFqbCHcY48nLhamjPZqZ 4jvo8Zf2QKC4Brc3pHCWpu2jNmWL+k2/EVUEE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684517898; x=1687109898; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2ucWVdofg5ar9Sy7957FpoXFp27vLdLXG53ARFC9CGo=; b=kqjmEDI8h+jSBdXL5lXwktXGictstBbuGkthzI6/0NYM/Iay8iScFf/wINWqjl62Et 8yywjNEmfdZXH3Z5WZxDOumrX7sos+HA+DG+BRtCKHcOBWS8YVh1H/nkpTnF7M81ZVbV ghKToWuLwgpzXJ2VPiwE6dyGSltNY1+8Ws6lX3UAG7zewORPGCF9kb1hmVWQwURPAC6q S6nCdEux3yMAzuTU589/AP5u7sCIgXZHfUmrkoWSyV0PO9YngHJhAI4aBqZAkE0zwDuG dL8mbo1pWB26jjaRFFgZc2oRnsUELsRhDr81kK9SGfwE8FGGjRVs8kOJRbCkFOz984Xb xvpQ== X-Gm-Message-State: AC+VfDzmlsIDnhUIF2QuZhkOCneRZ0DbjyOPo9P2ZFy2y3lR+mnP4kWN jixzyUE2jG2Qqs+c4ofUPxUlEwDUFULJjn9NNKvzhaHS X-Google-Smtp-Source: ACHHUZ7P4ifXMVi1uBq0Q+lw5iJ4K54kHsPRoiUkHVeubqTiZ1OGJHbVwq0ZkA8HDg24kWrvOSPH5g== X-Received: by 2002:a17:907:6287:b0:968:4d51:800b with SMTP id nd7-20020a170907628700b009684d51800bmr2309345ejc.1.1684517898365; Fri, 19 May 2023 10:38:18 -0700 (PDT) Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com. [209.85.218.53]) by smtp.gmail.com with ESMTPSA id j14-20020aa7ca4e000000b00510b5051f95sm1868581edt.90.2023.05.19.10.38.16 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 19 May 2023 10:38:17 -0700 (PDT) Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-96f53c06babso275507366b.3 for ; Fri, 19 May 2023 10:38:16 -0700 (PDT) X-Received: by 2002:a17:906:db0d:b0:94f:1a23:2f1b with SMTP id xj13-20020a170906db0d00b0094f1a232f1bmr2341051ejb.24.1684517896563; Fri, 19 May 2023 10:38:16 -0700 (PDT) MIME-Version: 1.0 References: <20230519074047.1739879-1-dhowells@redhat.com> <20230519074047.1739879-4-dhowells@redhat.com> <1845768.1684514823@warthog.procyon.org.uk> In-Reply-To: <1845768.1684514823@warthog.procyon.org.uk> From: Linus Torvalds Date: Fri, 19 May 2023 10:37:59 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v20 03/32] splice: Make direct_read_splice() limit to eof where appropriate To: David Howells Cc: Jens Axboe , Al Viro , Christoph Hellwig , Matthew Wilcox , Jan Kara , Jeff Layton , David Hildenbrand , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Christoph Hellwig Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 7980E12000E X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: bac31oymwppg3djyk3qpishqk3p6z144 X-HE-Tag: 1684517900-571088 X-HE-Meta: U2FsdGVkX18KVjIpwVM6qOZeaZ3f1UnLXCS/WuuA7eWBOUlrh/NRuj66fNUEnO/v/tChhstswAEPaWwNA9Vmj2JxOaMIQzpgZ5t62xHhkcwwXW190zvo8JNPeSG171rWuPgbbBNfgstShEjqhCgHe9Tb/syWKPtRbU15kQ8kP+zxr81UMtXPT/YQGL9i3NZfmI0g2M8jXfsPtt3SntPQHlKKg79i5OcIHBuS0UrYvMlI5HCf46fn6Gp1D+5xowiPWBaygIyOQvZ3Qb2olm8ttsqyiM+Ai4ojUcfQZN+Oqyw8NAvMU1OyCewk2OLfinh2u/QNkNkMh5TXpNBAgvy2nV3lH6bE0MQNqUKBHGG+OB4CfoYxtre8iJxNlRTz2/fDG+1y+S/cjC33S5GL2ZChfUJbX/LjgllAy98xCoPBxTojqX4Mz5FHe6vgQiZPXpPj1CaA5/+LtyIP1kJxGv6S2RRgegtMW7UVa+88JnwiqKf0dQXjgkaYi9K+Ot3qjnoD4tlsga2YBOsp6DZBL1MI2MppgKEGUxo4TFF3Z2SGoQE7y7k9BNS5i7sVt4gOODSOcN/kmjcU6tDqRx9Yypukldfd32qUzJvB9NGTq0Quy73BPBsYw5SsO3X4GD19/CHBnrHpvZYBBc+/A2gFR6G4kN7W2ljOdZ+2hnhyfDFIX1IlMNaCVRVphy9PLwm7Ap1WV6+AxjfuoHoQjE0mTgSL+iExHRP9K0+myspg50SX0GdPmDc9ouong26/rnATMO+geQU5/M5xsucq//K9FBGOT/CHS0IGTiw72A6UoSavkkjLjMHTkkROZ93K4aHqyfJcQviXfGjUSQxwd9pzf9j2PexuNs3c6sogKVZheFyEbxS8WV1hXQHFXQNxLLi0XCeUIcfUwANWxKQlk77EWFZenKR3xDbzXMkeOw/MZv+bxfgtTUQLJU9VH2ErLQ4fZivUtV6fT2uNy0DJ7J1tdot s1kDQns+ FQBxeNveqIb3Ue8LWJEUW108KHbuiZap/pXgXaqK+iMl8usPJRAKQYcareOxUOvwoWvEVGLHmIrn49T5ALKSAV0viXKE8bAnuQk+a73K5W6hwTwN5QXsRGoqfXkp8LBVN3g4WpZBFhI502WoLlfZAS3nXxh/QgNV2UBYhKdHgDHu0A4wgJJQBaDjTiGOrDPOjnd1rb4g6VQ6fvPWiAhNpPXZwx3j1SyVveuqYOolBjcr7FOrOwkji95LTNEl8Wz6djqRqyDdQzhhv121pIuwLM9P0JmQ/NPCbOZ1qtcbehPnShPUn/js2bNuPs29i5YMR63mTr3efxLTHK57vU02KSs8ryBPL2pDRIVA18BNvgQgKAgFwc2i8MEVSj1zR55hp9YuyIBu438/HnOAsTyZnDH7A1D9dbBvHSu2kk4T4exad3PVR3D2ug6lyhez7ykGCVWfQDRBe2hwFutGFZW8L73OAK0DD8LCRCUVXuueZxJ6RFdosgX7NNDvaWjg+1Qndki0KyDADW6pysEeOfhml4Zeb1gMHXYybMwjbtxNQXGAUsN7qn9KmkSQhCWSZdwOdhnrM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, May 19, 2023 at 9:48=E2=80=AFAM David Howells = wrote: > > This is just an optimisation to cut down the amount of bufferage allocate= d So the thing is, it's actually very very wrong for some files. Now, admittedly, those files have other issues too, and it's a design mistake to begin with, but look at a number of files in /proc. In particular, look at the regular files that have a size of '0'. It's quite common indeed. Things like /proc/cpuinfo /proc/stat ... you can find a ton of them with find /proc -type f -size 0 Is it horribly wrong and bad? Yes. I hate it. It means that some really basic user space tools refuse to work on them, and the tools are 100% right - this is a kernel misfeature. Trying to do things like less -S /proc/cpuinfo may or may not work depending on your version of 'less', for example, because it's entirely reasonable to do something like fd =3D open(..); if (!fstat(fd, &st)) len =3D st.st_size; and limit your reads to the size of the file - exactly like your patch does= . Except it fails horribly on those broken /proc files. I hate it, and I blame myself for the above horror, but it's pretty much unfixable. We could make them look like named pipes or something, but that's really ugly and probably would break other things anyway. And we simply don't know the size ahead of time. Now, *most* things work, because they just do the whole "read until EOF". In fact, my current version of 'less' has no problem at all doing the above thing, and gives the "expected" output. Also, honestly, I really don't think that it's necessarily a good idea to splice /proc files, but we actually do have splice wired up to these because people asked for it: fe33850ff798 ("proc: wire up generic_file_splice_read for iter ops") 4bd6a7353ee1 ("sysctl: Convert to iter interfaces") so I suspect those things do exist. > I could just drop it and leave it to userspace for now as the filesystem/= block > layer will stop anyway if it hits the EOF. Christoph would prefer that I= call > direct_splice_read() from generic_file_splice_read() in all O_DIRECT case= s, if > that's fine with you. I guess that's fine, and for O_DIRECT itself it might even make sense to do the size test. That said, I doubt it matters: if you use O_DIRECT on a small file, you only have yourself to blame for doing something stupid. And if it isn't a small file, then who cares about some small EOF-time optimization? Nobody. So I would suggest not doing that optimization at all, because as-is, it's either pointless or actively broken. That said, I would *not* hate some kind of special FMODE_SIZELIMIT flag that allows filesystems to opt in to "limit reads to size". We already have flags like that: FMODE_UNSIGNED_OFFSET and 'sb->s_maxbytes' are both basically variations on that same theme, and having another flag to say "limit reads to i_size" wouldn't be wrong. It's only wrong when it is done mindlessly with S_ISREG(). Linus