From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A30C3EB64DD for ; Thu, 29 Jun 2023 18:43:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C44008D0003; Thu, 29 Jun 2023 14:43:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF2498D0001; Thu, 29 Jun 2023 14:43:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A926B8D0003; Thu, 29 Jun 2023 14:43:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 9A5138D0001 for ; Thu, 29 Jun 2023 14:43:06 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 33262AF40A for ; Thu, 29 Jun 2023 18:43:06 +0000 (UTC) X-FDA: 80956657572.20.8926E5D Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf21.hostedemail.com (Postfix) with ESMTP id D4F371C001F for ; Thu, 29 Jun 2023 18:43:03 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=HMvzagF0; spf=pass (imf21.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.46 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688064184; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mmhtpqqkJ1KWVH50RxZoMJvMId6WkP/xoN83CJAGivM=; b=o3BmwxyfoOH8IZxBCUUBaKeCq1Z9P48JuSlidf1buhCbYJc3f9ChOILCcR7f2p/oIhv8kz Jw7IBot+bG6JV5nS7n6Bb1TsZ7QKHY/RVqPzD7kGkpVGKY2eGCAbsZg7bruFccA9n0AnaJ oz1qkEEtlkeWxgG7WTqo26NVEOo39Rg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688064184; a=rsa-sha256; cv=none; b=ivxveIV/LXkdVGB7GTEoSPT56hVF5IJxSka+IYnAcUrV6YP9kYfC1RbqPbtmwa37lNvFLD LcZTt5iiD9KxYXy4GfLClAMRXa3mj3xgNx6aFOKifJYSQ5rWlWDFfAR9+kZotQqzreDMIL +wnzA/JjM27YSkEk7Vok0EFpXV9S8q8= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=HMvzagF0; spf=pass (imf21.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.46 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-986d8332f50so125067366b.0 for ; Thu, 29 Jun 2023 11:43:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1688064182; x=1690656182; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=mmhtpqqkJ1KWVH50RxZoMJvMId6WkP/xoN83CJAGivM=; b=HMvzagF0myuZ+UEdT1gsNRXhKxGSBimD2rqEuTr2VRNZ9qRno9HbCh6BzpHJy28qbH KluJwSPOl8jfxmmq1Qp/P0x58Sv9de2mRH4oS1qbqFLdwHyaar456CsbLQUT10Goc3Bo cMJfyTsXV3cekEN8+x/aSW6zoSGGaAr3oceeA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688064182; x=1690656182; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=mmhtpqqkJ1KWVH50RxZoMJvMId6WkP/xoN83CJAGivM=; b=iMSLpmmjvq9xKk07yUv4s0LpmQ31OUUTsEPOCSNfOWlzBW4CEq53WGjgghwY/mXfqX yEanbqOR6i+kYPv5FCEAPvTXI/97iX/zT2Vh2V30YH/R98ZlDVEtBfH+DUawk2J41JAe RAyPU4azSPQ1GagjFkBn9+//Dx/UVmQASrZN+hYT5CSJ2SYRRLT2Nyf6GQX3IDMDxSAe UaHIr6FHXzqPRJrdWyL4+rri4PPmfkTnmHBEwL/ar92t/dUdNKovXudh5Pv+69zwKyEe P7kvrk9UDGd8F0XULplmwWG8HPIefuPCJ1TNX+ZrDZtFUd7IYAfJMspzcS6bMNtaNHI+ 6Bdg== X-Gm-Message-State: AC+VfDy05MrRMcnw6hJa2oILGXk5CL18mGnPDnvvSHc6hmldpvFumz6N sFs9UHR5dhnAOPQ00tyEkI1Cu5ewVu9WR6B+8yK2kL3h X-Google-Smtp-Source: APBJJlHn+eUhLOdsaU9u+i3/ZSrrFDCadTmKEEBaTbdOjAKdwfA+pJigBroCgK7UxKbXRR41YyND+Q== X-Received: by 2002:a17:906:4dcb:b0:96f:8439:6143 with SMTP id f11-20020a1709064dcb00b0096f84396143mr254744ejw.40.1688064182113; Thu, 29 Jun 2023 11:43:02 -0700 (PDT) Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com. [209.85.208.47]) by smtp.gmail.com with ESMTPSA id gw26-20020a170906f15a00b009929d998abcsm1529287ejb.209.2023.06.29.11.43.01 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 29 Jun 2023 11:43:01 -0700 (PDT) Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-51d804c7d14so1082595a12.3 for ; Thu, 29 Jun 2023 11:43:01 -0700 (PDT) X-Received: by 2002:aa7:c690:0:b0:51d:9693:5124 with SMTP id n16-20020aa7c690000000b0051d96935124mr81608edq.19.1688064180790; Thu, 29 Jun 2023 11:43:00 -0700 (PDT) MIME-Version: 1.0 References: <20230629155433.4170837-1-dhowells@redhat.com> <4bd92932-c9d2-4cc8-b730-24c749087e39@mattwhitlock.name> In-Reply-To: From: Linus Torvalds Date: Thu, 29 Jun 2023 11:42:44 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 0/4] splice: Fix corruption in data spliced to pipe To: Matt Whitlock Cc: David Howells , netdev@vger.kernel.org, Matthew Wilcox , Dave Chinner , Jens Axboe , linux-fsdevel@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: D4F371C001F X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 1mcw9kz7iz6bzimfbpxntw6rcwaea64b X-HE-Tag: 1688064183-942666 X-HE-Meta: U2FsdGVkX1+L5dJhS/dZXzDQN0w1wPSOMlvmZ6dZZ1MhVstd0IKMzDW96Ku94boW7U4fkF6UjjEbGeGASrZ4AccA1p0+kWutWD4djpCvqbwAAV5MrLkLkp9VlcgHBDc9jyBp3qlBxHlSgkDvsoD0S9x17ELTW2hkBztgNw8iQs+fXC3HzEQw0Irwu9Oj8cxy57zDZZpXiyrlfoi98BzVD9I6XLEYOTwrwQ6QILJ5CzQ9O3aoGvPKAnlLNMutrSFLoRyqjp+XxhLvkhFYeqXdcXbm/0QI3XwuDdC+i7P839g4rb1iGcSQyoPkIciECrZg6PVVpFv/a4q68DfbWAIoJitOmnxZhNwlOPII44EFeop/cYGHwshITdxQ7RL1LLZLcZtk7/aOTItA4QeDYaGh3O4WHNhfEVXj/dmuYEnwOdL+j6KiM3WTY4tzfnfkrChWhuQLJNBy6rLRMpBLUpsksSIVlECmSStCNwwPjRBmhoZRRZc7ig5UltDCHbEiQhrUiAhrJ7y3ntmTyxfZsxSgWXoiLTj/pAgxtXagEJ++/M4sUW3ONu7IZbZYBv5MVDkdaE9n4Z2eGAnXPkLv58mF+BEy2XHE17Q85Ks03OLd7o8g+v0O3K5vo9l4cSI6CGU96Vp79lu7XwiRVoecjusam3Gs9VOGLkLxF29ZJW/9Tsb8d1CW4JADmv1BwulUyFwvQ8b67KRMHgWgQFjU8A1A1owMNhn8L0FW69HF58UwDukaxz47K+caPDDNw7MJ4a5kkakCGZOsCcAAtxdWn8CykMbriH4XywVNH4OCKgTldT0v5Y5qHQoFHJtOD86PExUq43xN5dgKmAtm3FN17rdIOi6XGu4/OK+xrsfWIzyhDeb1J1EhwPZjRzKOjr9REv5+SWtO3pAMojsgrpM2/SRB9y+wgsAfhWjXpQe74YxIHfYy6LxXDHAFrWNAN25I6wFWIhqYfvfiAiV7gdmyXeY aaWEG5Xc KLsfs3k8KvbtawSsQsv2bqZM76EZFWjN/+hiSpcNwXUtVIX4IhH2S7f1vnnuUq/NbE3Xpy86Nz7rxN1QpfN1/jGXlGqCo/Ru+Hy27KuhBcfcycdB9RqvFGebNZNS4mWc8OMFTqoXvhUPeMdfr5JrWG7F27OD+BgIaVmzPoLQKRePNellYH97zyK7pX9m1owmBdQi67bfT1SNAeNwzGNirSvYvsWa4pEE4NvOuzSbPu2J0c8lf8aNpskIjvzg6jHpBY0NcwOjwZPt0kbAEQX1SBXUwroMqqeMnStcy82aSvC06MyY73xlIVk8bM4PLnknf0shgUlxpnsS6+U6yUt720BQpySKZradazyhmOkWqPSAic1Oqa4CATUzLdotG3VJpfCEFcVEwLl2wpZ99AyptR+05/hzX6N3e79gb7SBMLwOnazQXoiHrhj5e8DbYHISQNLwObw0QsizqE52NtPg8XHzOridPtnPYZ9VmwVBFSA+zUeffKtHh3yTNdPgfU9QZZZuZH2dhWGPwcqA5zbQzv3A7mx0lTolkQ9WMNO669D6Q6tfSQravLPvVN5k2a9LgWaWyA4+VR9D40cABK8N+bfoZaw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 29 Jun 2023 at 11:19, Linus Torvalds wrote: > > Now, we also have SPLICE_F_GIFT. [..] > > Now, I would actually not disagree with removing that part. It's > scary. But I think we don't really have any users (ok, fuse and some > random console driver?) Side note: maybe I should clarify. I have grown to pretty much hate splice() over the years, just because it's been a constant source of sorrow in so many ways. So I'd personally be perfectly ok with just making vmsplice() be exactly the same as write, and turn all of vmsplice() into just "it's a read() if the pipe is open for read, and a write if it's open for writing". IOW, effectively get rid of vmsplice() entirely, just leaving it as a legacy name for an interface. What I *absolutely* don't want to see is to make vmsplice() even more complicated, and actively slower in the process. Unmapping it from the source, removing it from the VM, is all just crazy talk. If you want to be really crazy, I can tell you how to make for some truly stupendously great benchmarks: make a plain "write()" system call look up the physical page, check if it's COW'able, and if so, mark it read-only in the source and steal the page. Now write() has taken a snapshot of the source, and can use that page for the pipe buffer as-is. It won't change, because if the user writes to it, the user will just take a page fault and force a COW. Then, to complete the thing, make 'read()' of a pipe able to just take the page, and insert it into the destination VM (it's ok to make it writable at that point). You can get *wonderful* performance numbers from benchmarks with that. I know, because I did exactly that long long ago. So long ago that I think I had a i486 that had memory throughput measured in megabytes. And my pipe throughput benchmark got gigabytes per second! Of course, that benchmark relied entirely on the source of the write() never actually writing to the page, and the reader never actually bothering to touch the page. So it was gigabytes on a pretty bad benchmark. But it was quite impressive. I don't think those patches ever got posted publicly, because while very impressive on benchmarks, it obviously was absolutely horrendous in real life, because in real life the source of the pipe data would (a) not usually be page-aligned anyway, and (b) even if it was and triggered this wonderful case, it would then re-use the buffer and take a COW fault, and now the overhead of faulting, allocating a new page, copying said page, was obviously higher than just doing all that in the pipe write() code without any faulting overhead. But splice() (and vmsplice()) does conceptually come from that kind of background. It's just that it was never as lovely and as useful as it promised to be. So I'd actually be more than happy to just say "let's decommission splice entirely, just keeping the interfaces alive for backwards compatibility" Linus