From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97C20C3ABAC for ; Fri, 2 May 2025 14:22:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9C27E6B0083; Fri, 2 May 2025 10:22:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 971D76B008A; Fri, 2 May 2025 10:22:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 838616B008C; Fri, 2 May 2025 10:22:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 669E86B0083 for ; Fri, 2 May 2025 10:22:13 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4C97EBD27F for ; Fri, 2 May 2025 14:22:14 +0000 (UTC) X-FDA: 83398182588.29.B6BDCB1 Received: from vps0.lunn.ch (vps0.lunn.ch [156.67.10.101]) by imf03.hostedemail.com (Postfix) with ESMTP id 460B92000E for ; Fri, 2 May 2025 14:22:12 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=lunn.ch header.s=20171124 header.b=pqVPbcIe; dmarc=pass (policy=none) header.from=lunn.ch; spf=pass (imf03.hostedemail.com: domain of andrew@lunn.ch designates 156.67.10.101 as permitted sender) smtp.mailfrom=andrew@lunn.ch ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746195732; a=rsa-sha256; cv=none; b=c7A7n9Mn445vRuSkkdJgl1nAy3APXuWU5BTbthzA0rfmBEKzKY4C23zrrC4pURXEBJQNnD LVtev9nNI8DxxH2icNsZdRAtnPXy4Ccrp7QaUvJyyniAoCNaqi3dkgDengVG9u0xrITswz DbwWh5Km1LB4t8IiayMNDiXjP4GdAvI= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=lunn.ch header.s=20171124 header.b=pqVPbcIe; dmarc=pass (policy=none) header.from=lunn.ch; spf=pass (imf03.hostedemail.com: domain of andrew@lunn.ch designates 156.67.10.101 as permitted sender) smtp.mailfrom=andrew@lunn.ch ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746195732; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DrOKSNb903gtJ6eRZ393p9uIVav58eaxgb6olet9+DE=; b=Mx1N6LGx6N4pSjlDV93By4Ia3H55i+XdwInkr+nZMrX8NqUoje/Dke06wfTOHDVReoR+OJ eDKH5gcwvJwademgCI0tbKHEaW7cpFBG87W+c4CXMt1PegYTyZ9m5V8g3sBbYymovMCvBx Md0l+ugSUtVk5kEgss7hzhMmumOZm8o= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lunn.ch; s=20171124; h=In-Reply-To:Content-Disposition:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:From:Sender:Reply-To:Subject: Date:Message-ID:To:Cc:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Content-Disposition:In-Reply-To:References; bh=DrOKSNb903gtJ6eRZ393p9uIVav58eaxgb6olet9+DE=; b=pqVPbcIePC7h1toyBujFV4JZvn seCXBKYiYwmFssa0MLgMDu5A+kxNMUYTDFaN/lO6ccR5KecjI7tcEQ9xYHw3Sjc0abFHWvm9vN49I sfnNcAucibL9sx7Bu/6fr3UNMg5aMZkcTy/RI3EADR5EvRw8vE4SpvsiOpLkSRUC77HA=; Received: from andrew by vps0.lunn.ch with local (Exim 4.94.2) (envelope-from ) id 1uArH1-00BQUY-AH; Fri, 02 May 2025 16:21:55 +0200 Date: Fri, 2 May 2025 16:21:55 +0200 From: Andrew Lunn To: David Howells Cc: David Hildenbrand , John Hubbard , "David S. Miller" , Jakub Kicinski , willy@infradead.org, netdev@vger.kernel.org, linux-mm@kvack.org Subject: Re: MSG_ZEROCOPY and the O_DIRECT vs fork() race Message-ID: <165f5d5b-34f2-40de-b0ec-8c1ca36babe8@lunn.ch> References: <0aa1b4a2-47b2-40a4-ae14-ce2dd457a1f7@lunn.ch> <1015189.1746187621@warthog.procyon.org.uk> <1021352.1746193306@warthog.procyon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1021352.1746193306@warthog.procyon.org.uk> X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 460B92000E X-Stat-Signature: 85ayn8mdj3ur8p9ie9yn6z611847xmf9 X-Rspam-User: X-HE-Tag: 1746195732-779589 X-HE-Meta: U2FsdGVkX19/zsccI8RIhUuheC6lxV5/bJxW6IeVADRoCkRfpzSurk54jveYojPAU+LRJzNTcswi12wTSQoVN+OQozrA1Xca/A/7q2jXb7V2t/NlwonUSxQxAkbn+AnbcrzpgMMJgzntT1GmEwlknfsO2eCwI97d6GM2ueSflNbYFdtke+FD5FyDkU4bS260htGS/tM4Z5zzgXXlXZjdvc5jq7Rcq141tSRON4qeV+Whv2hFSV34OyzlEqpNckcEZ64SfocImIcx5BYF+2HYa5LvKPSHslAKaRys/MdnnIM3zo5rkQo5aTNkEGrr/PmWqOzh94RDPDeAMei7z8iD/9L0kdIMN2tUx6PaDal9rXWseIYd06B6ASBPWH9Zm8c25Za/iwAPRm8XFT1+DHZ3EoklyYUOdVJJMbl6PgaY1hgIhSVt728VA2/YWN5g81nno3oeyo2mIZd/ucZYqJGd3A3WLnUsgQjcr+sIB9R/8TtXSig8al6oYU/BluLiRi6a+fg+rdOL11boZlyqL+U8Ym+zFY1w/+MVAekbMKGy4HSvlVQ2DQA3YNNXZkyMgqgUgvBtgsJyCWMmjnyIe8UV/Gb8mkniUW1KcLXhIFLBwP/gCHCLO3HV2nPKWfuqiVzLWRN5TIwkBX7CVJxkeg1cNu9LCXuPVHTFEjZjgEeEC+9bRIp8+J9P3OSFoGl8UUcZOEH1p0W8iIM67Fr1e8xE1VvNRrxQwzJly6+gIejNUccPtW55MiiHOKNk2jkqzfK3IIKHWHgqEcO9YTrRjeyqEaocrMyjqpkEwiytdQLQd7MoF3cdfVQ+c6P5wog2V0rbdLumPgHgW2v3OD/d8FB46A1QO8ovDtWx10Dzk4x397N0t0306cIpTnZIajb7wY5dmtCL9EykwXIpcGqJfm7ntT4d3/45K0XqizziF6dMWTZsXM/3UZxHhbmD0FY4Ep9hToddKzDcrW8Z/BJxJTN nhbLy5cg H5AkvBHgn6oL6Zyv7iLtj2VPYUdNHihiF21oYJe4kmUWDw+VH8C5Oipop51tKs1JR5wGFtFA9Uj1QCNqG8jn5fDh8JgzU5WGbY1g+IpMpJEoQ3AX4J61Yd02VWHf7T+Vc4zRkbkgOn6uui/VkD0WndXjPF7gu5mTf3JebZOYM+jK4o2Pt+MUfar/2MZQWd1KfjRNW7sjR4iVnDfe3jxZPHn7fTJ4mfaV7yPfCpaUiojIgY3qY7PtGpT/6UP50iRnaopuC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, May 02, 2025 at 02:41:46PM +0100, David Howells wrote: > Andrew Lunn wrote: > > > > I'm looking into making the sendmsg() code properly handle the 'DIO vs > > > fork' issue (where pages need pinning rather than refs taken) and also > > > getting rid of the taking of refs entirely as the page refcount is going > > > to go away in the relatively near future. > > > > Sorry, new to this conversation, and i don't know what you mean by DIO > > vs fork. > > As I understand it, there's a race between O_DIRECT I/O and fork whereby if > you, say, start a DIO read operation on a page and then fork, the target page > gets attached to child and a copy made for the parent (because the refcount is > elevated by the I/O) - and so only the child sees the result. This is made > more interesting by such as AIO where the parent gets the completion > notification, but not the data. > > Further, a DIO write is then alterable by the child if the DMA has not yet > happened. > > One of the things mm/gup.c does is to work around this issue... However, I > don't think that MSG_ZEROCOPY handles this - and so zerocopy sendmsg is, I > think, subject to the same race. For zerocopy, you probably should be talking to Eric Dumazet, David Wei. I don't know too much about this, but from the Ethernet drivers perspective, i _think_ it has no idea about zero copy. It is just passed a skbuf containing data, nothing special about it. Once the interface says it is on the wire, the driver tells the netdev core it has finished with the skbuf. So, i guess your question about CRC is to do with CoW? If the driver does not touch the data, just DMA it out, the page could be shared between the processes. If it needs to modify it, put CRCs into the packet, that write means the page cannot be shared? If you have scatter/gather you can place the headers in kernel memory and do writes to set the CRCs without touching the userspace data. I don't know, but i suspect this is how it is done. There is also an skbuf operation to linearize a packet, which will allocate a new skbuf big enough to contain the whole packet in a single segment, and do a memcpy of the fragments. Not what you want for zerocopy, but if your interface does not have the needed support, there is not much choice. Andrew