From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1ED8AC5478C for ; Tue, 27 Feb 2024 22:46:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A588D6B0137; Tue, 27 Feb 2024 17:46:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A09536B0249; Tue, 27 Feb 2024 17:46:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 882096B024A; Tue, 27 Feb 2024 17:46:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 733026B0137 for ; Tue, 27 Feb 2024 17:46:34 -0500 (EST) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 37FE840D7E for ; Tue, 27 Feb 2024 22:46:34 +0000 (UTC) X-FDA: 81839069508.19.A1DF103 Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf25.hostedemail.com (Postfix) with ESMTP id 269C2A0009 for ; Tue, 27 Feb 2024 22:46:31 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=BgdoWhOL; spf=pass (imf25.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.167.53 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709073992; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jki/jdCfzFxbFgKg5OpDwmRv+Vr2+fnyhqCG7F3VOE8=; b=h0qwfQKz3Ofv/xZfJjpVuHZfUzrjzPbCQeuuWjcIysaaUV9BnSJfOqwd/D+pJ8DA/jWTqN RkR6fIWjLqCov062cYIjRi//eLM9APYLqhW0ucdRfHZpZXpz0lcq3p6Gt6836zNZ4x8hFt UJlr37LQusnEKV4qt6+hQPQmMdh+aTc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709073992; a=rsa-sha256; cv=none; b=QzESzlX6d5OGYdPUNE67iYYMSq/2wBWdmwbmhIvTsSa3xztMyV4B6dQmHiX1sd3NZe2aqM s50jAjerByfaOyddgnQQ5bMQcMtR0Jt4OKb27fR8qyjMo+xBIMfEnKnNby8fe1YwPVorKH WjCVzCULHRGobPwLaI0ng37DJF+jBZQ= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=BgdoWhOL; spf=pass (imf25.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.167.53 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-51197ca63f5so7224332e87.1 for ; Tue, 27 Feb 2024 14:46:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1709073990; x=1709678790; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=jki/jdCfzFxbFgKg5OpDwmRv+Vr2+fnyhqCG7F3VOE8=; b=BgdoWhOLaJwkdnpKbrcoxw3EkaGMBaAzvkftvhp3HjDnn0wzVMPp2RYwounJyj6Zts rVcDDJUMxdFFgEC5p/zC/FachDKg7Os3YOmyy/5VkcxV4hBuNxytvDdA0m4ngZrTO6B2 o63uiC2H3Jk+O39uRe/pRdGDJ14By8Mhwsows= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709073990; x=1709678790; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jki/jdCfzFxbFgKg5OpDwmRv+Vr2+fnyhqCG7F3VOE8=; b=HmAGGQRn7bhbolZvSkh3QYFKXqUQ1rBzZaQW/nWnRni+91/BSutvXq0mHV337kTfdb 2iCCPVTaiP1VIzMQ0HhHfp9IUIa4dQ4OrDMshrVRaHdilywHERUAI03sgV+y5RB1DSUi cQDGlRZHtHHI+LwVioqflj/DYuirVrbXYJgY7O54tJaU2TH3g7GqW5t6hGUSiGVrUJn3 xT753rmdtzf3fCcGoYOKe47YQESzEvB9w8X6Gndg88JnkYEmJm5At+XsTSQCgEXJsoxF ToVDh6b0hukPYuRtS4OzcRUhBvPbv+I67tXawXOErTrkMIrP9fA7HSp8oX8WPQtmHktN 8Ejw== X-Forwarded-Encrypted: i=1; AJvYcCXJVukKli99K66y9aPqY7/JnTh38ykeL+t205/gbXXhAfBTxPb7j7k1CRYG22Em718q99a+XeAN4kR3/sn/Jglp/kA= X-Gm-Message-State: AOJu0YwrRTp2VhIg9Lwki6O/cIG8Exj7x6uZL7nkQcjs5spic98YEg6+ 6RYM3UoULhykJkkRZ2WTxXh1q3qhg382r/kl2qIW8+awa71L3f3J+FosvKe5xu3p7C+UkwSuob6 ZTLIsDw== X-Google-Smtp-Source: AGHT+IHuBLzsw8Ax8tWvcAy0o05AcFgALZB+4uHXSBnpIUhTZvaA5mGe3SLM7BWxXjDslI5P3iv4ZQ== X-Received: by 2002:a19:f718:0:b0:513:19df:4eb6 with SMTP id z24-20020a19f718000000b0051319df4eb6mr175615lfe.12.1709073989800; Tue, 27 Feb 2024 14:46:29 -0800 (PST) Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com. [209.85.218.54]) by smtp.gmail.com with ESMTPSA id h16-20020a170906261000b00a3ca56e9bcfsm1179906ejc.187.2024.02.27.14.46.28 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 27 Feb 2024 14:46:29 -0800 (PST) Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a4348aaa705so362402966b.0 for ; Tue, 27 Feb 2024 14:46:28 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUCfIvWtQHEqzWNBl42Srj2luk7qxICnJ9+/MQftWp20TaveJ9otW3GkV3Dqt4XQKIkNpC5DX3XQgJd9TDGeRei0EM= X-Received: by 2002:a17:907:101b:b0:a3f:2c1:9887 with SMTP id ox27-20020a170907101b00b00a3f02c19887mr7538560ejb.21.1709073988145; Tue, 27 Feb 2024 14:46:28 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Tue, 27 Feb 2024 14:46:11 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO To: Kent Overstreet Cc: Dave Chinner , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Christoph Hellwig , Chris Mason , Johannes Weiner , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 269C2A0009 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: bjze6x3oafj9yd9xocpia5w9cfej1xx8 X-HE-Tag: 1709073991-785604 X-HE-Meta: U2FsdGVkX1/oAvvATXJkydrWwEK4J/Lj/7bkcv91TLGibAEft/0OOy5WqH7r4HlOBhRDSO4bEv4BOeacqWEdwiytgtYEUiwhDb0ymjXYkmMsuV+uodpC0of4tGkpchAnYOueEuaIXov1APdZeGxcb7I80WCuxEzpMZ5dVXaog7WeYVbJNnyXTDx9TjOOLL9x1mjUzn3bxoqDCtYFHhJ4a142eSD9LtqOECehV0pgWpay8BZfiaChNY9jxt2eKXVvKi312K/s0URjkc/W54BQPd4GwYDv99hbsJCyKtu72TAQPz1XPcE75TeGI0cIwROEzBpMhfuT2t2i3q/ffkVR/CydXtQPAjgUyabiK8N20LlsJfIBHogf33gFvTvbn+eh3zw0uUYsB1tnJEAAcI7xbvs4f1ZGla84TLYWrypqfxCcIbig8yC7pMcFlgTo7jR1mnBa4Sxtsz5fab0/qEJKsyKdzWTJuCUrRoZqRI7TQeZeItgyRGTGOKCA7AvopfPi+5oX/z4AWKFLiCa/MFOo6CLNQxanOsNUDmz6rYAZNp4UFXjY3MN5iuQxAGx9f0jFPXSc5ZOAYlGFNSaXefRljbxqvZap2KuUZm0XsoHte/khj4L/FTHi5rWfAA048LKzvfhpo1k4NaXDzaFNtqq7vzCcDWv9nEKMsnZwfyU/xVSO8ViNLW9x6VJ8zSjqWcDSASNPQVkFY0d7w6kDLVJkVwEz1jDCDR161m4N95cHnv8KumW4nHC/NB8l4TFfCu6chqNXof34B9oQQnJQ5CvhGayiJtwaezEQPwlfl+ZG29z5dwOFDjCMtD4FBDqh/aZJVH1zLcnacSM433jvghYvezonujNTryCpy7Qk+18xuEfQ3oIncDv90sPEgcrk6SOa//GnvlsWw4od8iU3bUTfx94IrK5mAYA2YciC2mmUmujjn/7tp6KG9wAKVf05pa2uxyOlDCnsqbwxc2wLp3w ucivYBgX ApiuWgHVxE+7SzAHhO5ehc3rRch3bEZVdr7QGQX+EQ9kInCKdtqkUyHQR7h+00FAc0U4kMUiBGNZ77GJs00AnxyPpJXJwIk/jsRO/TryNSQ37XM6+rHNpGq/gfSH3JFebdmuOedfertgsm99K6hM6cfOuAGsW0b4XOcd6lfr4OBh8nZ2HdoB8XdJl4zAYsB/24pJVJrHcjwIt6fg4xAQY3Y0c+0vpY3u3l5bAJzqjTPqNNW2hJlRyxVWMa4u/r0Dp7tAX+xHQ8MnCXnWtz1fJDS2ktQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 27 Feb 2024 at 14:21, Kent Overstreet wrote: > > ext4 code doesn't do that. it takes the inode lock in exclusive mode, > just like everyone else. Not for dio, it doesn't. > > The real question is how much of userspace will that break, because > > of implicit assumptions that the kernel has always serialised > > buffered writes? > > What would break? Well, at least in theory you could have concurrent overlapping writes of folio crossing records, and currently you do get the guarantee that one or the other record is written, but relying just on page locking would mean that you might get a mix of them at page boundaries. I'm not sure that such a model would make any sense, but if you *intend* to break if somebody doesn't do write-to-write exclusion, that's certainly possible. The fact that we haven't given the atomicity guarantees wrt reads does imply that nobody can do this kinds of crazy thing, but it's an implication, not a guarantee. I really don't think such an odd load is sensible (except for the special case of O_APPEND records, which definitely is sensible), and it is certainly solvable. For example, a purely "local lock" model would be to just lock all pages in order as you write them, and not unlock the previous page until you've locked the next one. That is a really simple model that doesn't require any range locking or anything like that because it simply relies on all writes always being done strictly in file position order. But you'd have to be very careful with deadlocks anyway in case there are other cases of multi-page locks. And even without deadlocks, you might end up having just a lot more lock contention (nested locks can have *much* worse contention than sequential ones) There are other models with multi-level locking, but I think we'd like to try to keep things simple if we change something core like this. Linus