From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B272AC001B0 for ; Wed, 19 Jul 2023 23:48:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1CEFC280074; Wed, 19 Jul 2023 19:48:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 17ED328004C; Wed, 19 Jul 2023 19:48:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01F23280074; Wed, 19 Jul 2023 19:48:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E620228004C for ; Wed, 19 Jul 2023 19:48:39 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B0AA0A03C4 for ; Wed, 19 Jul 2023 23:48:39 +0000 (UTC) X-FDA: 81030003558.07.35F9AB5 Received: from mail-ej1-f42.google.com (mail-ej1-f42.google.com [209.85.218.42]) by imf28.hostedemail.com (Postfix) with ESMTP id A8398C0006 for ; Wed, 19 Jul 2023 23:48:37 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=GDNLb5lz; spf=pass (imf28.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.42 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689810517; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xITvYVSy1BHIKVEC9gx23jBZUjPhKEXUypo/Br0I0Ws=; b=rWpJ6mKOfLZ68cnK82Tc0vTnjprL1qZccxYlsyLZNIIdjncg4UsoIJmyj7AsuUO6vYafwq 8DEbRVU+8Y9suCAdgft96GLHjWd9ow4rW4IKU9LR0E1jDY/0RODg9Od1TLDmF0iv3QRN7l HVUd/fBiGhBUoB0YPZzrhvRCx9d3Q04= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689810517; a=rsa-sha256; cv=none; b=jcmlDD3DAQSdO0R4mqPJOJ0C4uklPrvrjkNQnI9ZVwrJibFFKJtWH2on9Nu/wb7WMmY53q DJjut3lndKxJDDLSfee/WcZpFw7QRNZC0Uh9xjzcvdnZEmGNFnmneVfFXd3HCSwIjAWkE3 ED4KulhSJe7HSPaqO7ahcmFElg0/8YM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=GDNLb5lz; spf=pass (imf28.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.42 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none Received: by mail-ej1-f42.google.com with SMTP id a640c23a62f3a-9926623e367so43515166b.0 for ; Wed, 19 Jul 2023 16:48:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1689810516; x=1690415316; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xITvYVSy1BHIKVEC9gx23jBZUjPhKEXUypo/Br0I0Ws=; b=GDNLb5lzP+ZQj5U0E2/1M7d4dJDATvicRg3z8VFRNxi4sGZyCySuW1vFGXpIecOr+6 brd7hgxfJDurj0IKEkaPlyxXEtXqt267aVDMXtlZ1oL9plrd38zdDsP2a/yCTYmciLII ih4XzflFHyq1XlNO7oiZqrTg0uzOjUs0nSOwI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689810516; x=1690415316; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xITvYVSy1BHIKVEC9gx23jBZUjPhKEXUypo/Br0I0Ws=; b=XRCJRRxoSRZlWiNivQcMjakJUJFPJNjLUR3sFeGSTmGhuQmMlQ66ihouzaEa3pLbNZ 5qT1pH8wD9PN0ie5QPYZuvksawlUriKhJgMMLjbOiOHuF9HwK2KsicxygYlY1PGX/jio BG9Rr89AUiZ9s0gtOhkfTgne2g51WZ7e87R5hqW+RfaafTKjtR5wnpj0JmXdENa1H4MF 1VeseJSAYJ5ntra8l+YpzOxe98ycRBO/7hLwE1vCVko+DNJHz7xJOv9x/lm8ORiuP62w hvO1ypWNDlDfCrr9G3sXBz0H+IQZJk17D5tFx2prgtGT4F0Lz1EcHAsCJc27iEDcjToN UYsw== X-Gm-Message-State: ABy/qLbP7AZY4bMHBa8HezKWaLHp6GbwmsMCIH7efP6G12ZcqQfngasB jMMrKKMoDb3n+F4grNXjOnKuYAr4Mu3h6XMHo/wONUVC X-Google-Smtp-Source: APBJJlFKThNvzWP608RNnYC9EVIS60NHiHT4JA1hKBCdO5lCd8/9cL43iOwEmRUB7XYpLmbXfnFQkg== X-Received: by 2002:a17:906:5a46:b0:99b:4ed4:5527 with SMTP id my6-20020a1709065a4600b0099b4ed45527mr339927ejc.25.1689810515849; Wed, 19 Jul 2023 16:48:35 -0700 (PDT) Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com. [209.85.208.50]) by smtp.gmail.com with ESMTPSA id n14-20020a170906378e00b0097404f4a124sm2994759ejc.2.2023.07.19.16.48.35 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 19 Jul 2023 16:48:35 -0700 (PDT) Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-51e429e1eabso155995a12.2 for ; Wed, 19 Jul 2023 16:48:35 -0700 (PDT) X-Received: by 2002:aa7:cd52:0:b0:521:aeba:c6c8 with SMTP id v18-20020aa7cd52000000b00521aebac6c8mr3187409edw.39.1689810514772; Wed, 19 Jul 2023 16:48:34 -0700 (PDT) MIME-Version: 1.0 References: <20230629155433.4170837-1-dhowells@redhat.com> <20230629155433.4170837-2-dhowells@redhat.com> <6609f1b8-3264-4017-ac3c-84a01ea12690@mattwhitlock.name> In-Reply-To: From: Linus Torvalds Date: Wed, 19 Jul 2023 16:48:17 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH 1/4] splice: Fix corruption of spliced data after splice() returns To: Matt Whitlock Cc: Matthew Wilcox , Miklos Szeredi , David Howells , netdev@vger.kernel.org, Dave Chinner , Jens Axboe , linux-fsdevel@kvack.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Hellwig , linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: 9x5iingo46q5cr3b3b3uqe7htwgw8tu4 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A8398C0006 X-Rspam-User: X-HE-Tag: 1689810517-418828 X-HE-Meta: U2FsdGVkX19LnrVyx//1Zo6ROQSa3Pf9T6oQ/l7vQjPny/GVzXBUTYTaSGqCKpIcBmYq9lqicPaRUa0utS8mmXZcUx3Ruh7RInXDo0LPH6JXLwBxs9m2VZh7XzfO7qeiIDfGDqwYSNQkXKAodp1z765JmTys+8kHITj3AV3n9dOL64m3yfhNK9/QqvlXtYvEw4BIzn6EmAM9qvWX7oX3YoFe3zgDs0Worhx8w+XRfzz6KWqzxLkOXnC8+LgRXE/AOB0cLqjPfjPkowrNw4Mq2ez7dmyHrhazDM/PgENI70x+ndOh+0uYatc1T80i1gPgzwq3/Sm8oszae3j/lmplsCLP2xuMzwJF0sauC5fsmYXZiwMKC4z3bdsdkG9NYneBFJzIb1J1liCtALnR0f8nVYUOAIpAuSL2VCvXZ4Jz4UU3/hzPnMwbD2f+fEi9HA/MRNbhjcMylPZehiUWcxgwCxmRAEjIf2C/ds6hxYGVpil1QfngMNMvstEYSTTzv6LMzqDrZHkkNcFvWBL3fG83q18zAlFtReY3j0qjWKO16pbnBoZkO1NMJ1GXVDvKxyFzYo/o3164KUT6S7PlSzIkisysYdcidMzBsU5J3ztKeKriHQIdz3mybhIziNwwGfSSiwHUrVJbk3mDx7/NluyVsJMpQxikoyT93JFtENTE45sq2PZkJGet1wLhhttJKSTAv+HuRe6KEV+W0nCZA2gB7l7AvkkhwJBOAuBFQpYiGm+6y8SG2ZHDgejK7IdkeHICt/GgUD3aDcd84DbjlpwHjwrsJXbEM3N6zP6Fc7YoehdRjEMfsApka63oNVHGS27MuWPSfwBKAXK354l7aTHxDHHEf140B6JQ8byLkP+Go8TIfUntMw0lOBpT7bGhd5gbXanzzHj5GbXEmRPJWc1m3JMOkB2K2IH3/54T7Xwv1vvWiKHoGx7Grl6P7vKwMoFQiPv1c3bTRwBk+/Rqw6f NZTqvHHh tK/ZDsJNeTQOb1hyXzZ03BVSRfH/qppSxgVEcOuxxRcnWVRXu44tP9ts95oBzar88sfDNN312YHgRC0kMxI19nmKbn7/xXwz3pVzALAclIkhF5Tn5+Wq2Bqkxda5XdB8bvSM6cT5U2Nh3IGH094TqwIreog8iwn4EbVEh3NGoBPkTzCiPBY2Y6i/a2nulEhzmWvhpDUqWES6fMYM6OHivFDN79vAUUX3rS4Wp+bmLUdCihacVQ8gKDElp2G471+G8J0UHU1CtzcbVCdHw3fd8mLvVOwv6p7vN5VefIDx6o7Gz9dGtKavOE2siuavTJXQZ7hkbKRaigxFZ/e62NjhCAMRPoEgzxUMM8U1c4QF5y1mFLdANLarcM//e/xpK/N3mpFQc/PDmE/qKjyQSRQ/n5xwtsaCVoEf43rMerYyTlQ3FRSvdiAFT+3py+wazxUkmtq0Uypfn0sOvKQ6hB6DXZ2hHiAkJfU5BuambljYAmFDa2p0uBsn3YgVA6JgoLMZi9vBlyUvCRO7gX8A9rnma+LhX1hMuP8TLtnvQcHxm65ti6uLohdRGESErjA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 19 Jul 2023 at 16:20, Linus Torvalds wrote: > > If you want "one-copy", what you can do is: > > - mmap() the file data (zero copy, not stable yet) > > - use "write()" to write the data to the network. This will copy it > to the skbs before the write() call returns and that copy makes it > stable. > > Alternatively, if you want to be more than a bit odd, you _can_ do the > zero-copy on the write side, by doing > > - read the file data (one copy, now it's stable) > > - vmsplice() to the kernel buffer (zero copy) > > - splice() to the network (zero copy at least for the good cases) Actually, I guess technically there's a third way: - mmap the input (zero copy) - write() to a pipe (one copy) - splice() to the network (zero copy) which doesn't seem to really have any sane use cases, but who knows... It avoids the user buffer management of the vmsplice() model, and while you cannot do anything to the data in user space *before* it is stable (because it only becomes stable as it is copied to the pipe buffers by the 'write()' system call), you could use "tee()" to duplicate the now stable stream and perhaps log it or create a checksum after-the-fact. Another use-case would be if you want to send the *same* stable stream to two different network connections, while still only having one copy. You can't do that with plain splice() - because the data isn't guaranteed to be stable, and the two network connections might see different streams. You can't do that with the 'mmap and then write-to-socket' approach, because the two writes not only copy twice, they might copy different data. And while you *can* do it with the "read+vmsplice()" approach, maybe the "write to pipe() in order to avoid any user space buffer issues" model is better. And "tee()" avoids the overhead of doing multiple vmsplice() calls on the same buffer. I dunno. What I *am* trying to say is that "splice()" is actually kind of designed for people to do these kinds of combinations. But very very few people actually do it. For example, the "tee()" system call exists, but it is crazy hard to use, I'm not sure it has ever actually been used for anything real. Linus