From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4310CC5478C for ; Wed, 28 Feb 2024 19:10:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B346F6B0093; Wed, 28 Feb 2024 14:10:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AE7716B0099; Wed, 28 Feb 2024 14:10:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9AC1E6B009A; Wed, 28 Feb 2024 14:10:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8990B6B0093 for ; Wed, 28 Feb 2024 14:10:15 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1BD17160E4D for ; Wed, 28 Feb 2024 19:10:15 +0000 (UTC) X-FDA: 81842153190.30.A154774 Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf26.hostedemail.com (Postfix) with ESMTP id 26EE5140010 for ; Wed, 28 Feb 2024 19:10:12 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=ckL5eQEV; dmarc=none; spf=pass (imf26.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.44 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709147413; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8uVmVxyYpHgrM4fxtY//py2Ki6rLVz7ol7Awl8bOb4Q=; b=JZFcv5a6EoNhjOSixF6ayyCm9WXI2ab5Jwrvg9T+yUH/i8/ook3kKNn4IzklxY/nscTwlR rBy1SL6YJpZ4TGNWyuWK294gxbVOi5Q5wnYA1u42JMHf1fQ32QNyuwTg91oIJNQXHakRTk +uHgASmrsl/faDRNq12j6I2cx1ITN7A= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=ckL5eQEV; dmarc=none; spf=pass (imf26.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.44 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709147413; a=rsa-sha256; cv=none; b=A50PGtCer4+ONNJcuAC52J0ra/KqYWGG3IYEF5Wx+qFIsHCKn7VMiIU00NVFGv91BlOQTa mKW7wrEWIKGemw3dbfKqw/vHdN9LaPHX7Abuy7HsBTRV5GIJGDrVaQBvc1IMRmy6yNKtUc 2OR4XY+qcIpkpVE27w9ARImW85vcZO8= Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-55f50cf2021so238040a12.1 for ; Wed, 28 Feb 2024 11:10:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1709147411; x=1709752211; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8uVmVxyYpHgrM4fxtY//py2Ki6rLVz7ol7Awl8bOb4Q=; b=ckL5eQEVzSBRus1gVJ7n12TlXM7L3fRdPqvRvwk8gB9jP43VdW9vat3vgtRncXrzFy D0FGk76tXE9ps0Mv3/ShgbmkaRyqp7jTnZdXPDgQZ2ioY8AMdvk4Yqj0oRWdb7w7NeRI IaXC5XyxYc6TYPPd8QFDbMps/d5F0nxFkqOOM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709147411; x=1709752211; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8uVmVxyYpHgrM4fxtY//py2Ki6rLVz7ol7Awl8bOb4Q=; b=PL44HHvIpelNqh/kt/HKZ5jpFA3S3CTGQQMo1elliBoTXgVJrq8MSMLNhySwTQzK/c 7R8qql5w0axLyZlUp4fp1CnNcnEjAPLPSg8yvsWzFLIIgSa4Pt9h/xw3Y6SdxwtPh6Nx RDxtNHV7+cjspJ7H2Gb09GWmMbEJqjTr6vJBTEaiE71pYfDsAOKRvI52xmcKeGwiYjdA RIjqaHR3Rp5HKM3r0PVQvw2O8XT2Y/UJtzgnsKTgnogDRayX8A3WZNQDNrRcXZL9A8HQ 1QXEb0Q3s7Y60mBXcn7FO2J+f7Hy9/hZs2dORhVaQ87rvfYc7oDxTdS8pyQ1jeRKmtnm JovA== X-Forwarded-Encrypted: i=1; AJvYcCVmpojeacRz/d0FTND3GCFeYwSGKTG5KjLXq4gEAVXeXwbcISm8NZ6bs2MXIyT06jayRI1RpeBDhTzmJ1BV3NrBvdE= X-Gm-Message-State: AOJu0YyqVw8ey9yV0UT8BKOXEdqGQi3+CCNH5GEK37458P1UeKTIK8vo +bXG+F/g2bN4S9unEFJnNUx/Ei1RloKgoF9QrITXx9S7Opeu9gn/d4mIG+IIo/ns+a8V6o0ozQj AjbpsqA== X-Google-Smtp-Source: AGHT+IHxx6A46htusLcdmslC2qREcFAOzMqmZMNvSBGG9/uUbX2As0aaqcVx6NnVLIQhqOEb+RqomQ== X-Received: by 2002:a17:906:d8b2:b0:a44:17c6:a54a with SMTP id qc18-20020a170906d8b200b00a4417c6a54amr422961ejb.35.1709147411525; Wed, 28 Feb 2024 11:10:11 -0800 (PST) Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com. [209.85.218.50]) by smtp.gmail.com with ESMTPSA id cx9-20020a170907168900b00a43e8562566sm1168056ejd.203.2024.02.28.11.10.10 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Feb 2024 11:10:10 -0800 (PST) Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-a3e706f50beso21933466b.0 for ; Wed, 28 Feb 2024 11:10:10 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCW5bxfqkEHPDNA80KTTuoP0yaxYGjIIMcfFAVEJ8eVHaBjGpwex3pQF9MHj7sB4siVm+55We6ozh6PTai384K/5rbo= X-Received: by 2002:a17:906:3c08:b0:a44:205e:bcb5 with SMTP id h8-20020a1709063c0800b00a44205ebcb5mr459062ejg.57.1709147410190; Wed, 28 Feb 2024 11:10:10 -0800 (PST) MIME-Version: 1.0 References: <4uiwkuqkx3lt7cbqlqchhxjq4pxxb3kdt6foblkkhxxpohlolb@iqhjdbz2oy22> In-Reply-To: <4uiwkuqkx3lt7cbqlqchhxjq4pxxb3kdt6foblkkhxxpohlolb@iqhjdbz2oy22> From: Linus Torvalds Date: Wed, 28 Feb 2024 11:09:53 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO To: Kent Overstreet Cc: Dave Chinner , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Christoph Hellwig , Chris Mason , Johannes Weiner , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 26EE5140010 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 4me6ax1nd6ppxpphsiqj7954bu9esmh6 X-HE-Tag: 1709147412-593785 X-HE-Meta: U2FsdGVkX1/Xm3BLTxvwE3rkfZ7IjSnz3Y3YlQ9veMNhNXr4QuQn6eSe5LkYc/0Xt2nEsaftZ/12Thcz5aJMg3lYwLB24pNiKC4zGqKH5Y4LjNR1BTmInyqTdHQ+QzQEqwnH90t3918m6TKzOBBS7/ZkGxs6zMr4BlyqZvh4tfxwVGiP0QRBENu22ZsOXgZj8iHtQV02IilhF1/HcUPct4pYx76Bn3bbbutQ8HlCZprxnM5AJx5ilxrNMrdQ5OyLsNBfNnay8/7s7XTrXzcbFhguekgIqt2bn1sdb4Yc+GgG4hpf4mF3Dqu+n8yWiE9qURFXXY3oNnuu1bufPoxWJo73CTmmIXlkxuRwT3D80uIjKoWpWLdJ8pdEnXFWDJc2FNqe46JOkPexkg8WqcjCPwUC0Rao0GvqE1PM6dBu81v76mnod5OqkYPaB9sYknk1kjo6lWQgwXrRW60iZAlHi7jHrl9PDOEN0agXyrLjxpTbo8Ihiy90LUhqR0/SyY7Eel8R14acw/BSiUv/6dY6ya/WTJq8L7EQbFiDoR2ejoiWLMlGvO0A7stnrPnFBTXZa2WEbTJpryZ8Zg4RS20ttjNlLJySxNGUBrzdYufOz6bWFNJwDLGS0GB5i04UgsLIiStSentdZEysA0V92DIolz9U/fLuKKsQpZ9skMd7UvrP/S445Lzx8LyTxZhUgh98lTDZNdm36NNzUiP7zPv3zEHBHWMd9/k/9ZoVqOW8bZrUr4UUe4twV6bHKC2pKeCeuK2c3VTbMjQYScXHhXGCxzam2svsHzJ2vyrWC3aDNxswPSh+Pc6cpLXIvLJinDi+KkgwTl8Ho0FcmtRJ5vzyErtjtGY6cP04jpxlflQaa33XZCu4pKIN1MMUdVO/+HvwKppGIzy+Yr9UlGQOlUrcBtoD0rVfY+RX4Ynqv+iaFssBZ8kdQhLPyWe970CHNPNTP9EaAwFm/hjr1LDyLC3 2NLlVhiS hccjnub4T1bz5bHvmN2kokRdmTGoryB7I6ofyulPq3aJ4557XD454uuS8BaimQYbKrcBanc/eiu7rh7tQ6CVpsWU9miGQ2lPQkxLgSgwzyaKaSA5/mBb4IoKy6FfN14VL3ETCIqtg41OlkYGakf0xBTaE7aEqKjlL8xFBC3VudB3McUFylgejp0AxvrlhgHE49REO+yjJO+soAhZtLq5Im/r3MGwJzT2fJY+V8J+rCDYay0R5/F7TIvJ0Ig== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 28 Feb 2024 at 10:18, Kent Overstreet wrote: > > I think we can keep that guarantee. > > The tricky case was -EFAULT from copy_from_user_nofault(), where we have > to bail out, drop locks, re-fault in the user buffer - and redo the rest > of the write, this time holding the inode lock. > > We can't guarantee that partial writes don't happen, but what we can do > is restart the write from the beginning, so the partial write gets > overwritten with a full atomic write. I think that's a solution that is actually much worse than the thing it is trying to solve. Now a concurrent reader can actually see the data change twice or more. Either because there's another writer that came in in between, or because of threaded modifications to the source buffer in the first writer. So your solution actually makes for noticeably *worse* atomicity guarantees, not better. Not the solution. Not at all. I do think the solution is to just take the inode lock exclusive (when we have to modify the inode size or the suid/sgid) or shared (to prevent concurrent i_size modifications), and leave it at that. And we should probably do a mount flag (for defaults) and an open-time flag (for specific uses) to let people opt in to this behavior. Linus