From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FBD4C5478C for ; Wed, 28 Feb 2024 02:22:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1C556B0283; Tue, 27 Feb 2024 21:22:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CA6226B0284; Tue, 27 Feb 2024 21:22:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B44C36B0285; Tue, 27 Feb 2024 21:22:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9EADB6B0283 for ; Tue, 27 Feb 2024 21:22:38 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4C069A1264 for ; Wed, 28 Feb 2024 02:22:38 +0000 (UTC) X-FDA: 81839613996.10.DD0F26F Received: from out-174.mta0.migadu.com (out-174.mta0.migadu.com [91.218.175.174]) by imf01.hostedemail.com (Postfix) with ESMTP id A43B540014 for ; Wed, 28 Feb 2024 02:22:36 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PlBS1kJt; spf=pass (imf01.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.174 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709086956; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zyMrThEX3Qym6hlhk1CyI/mGesH2kSc0piNuyuaKQm4=; b=pZ5s+/TaVLwb9wcR0QOakIRAH48//aqDEZcTwCjHS+lVe11v0MdkCAT+3vHnKEygAQhPPp r2IpZm2dWVVoysdA6yW7NLDd1R8ZkZHxyTfpONam1UaC4r2FYkBopQthnw+rPmzX+1r8JR oQ94o9JvUiUcwoLzBkBpDPlCP+ynbP0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709086956; a=rsa-sha256; cv=none; b=Eu+3tnKAOqDiJ0kHjxHtheaMhUmMwbkvvHrLW/GzQg8uo25koVQEuT4Qux1LvBWwEjjh+U veXV478xbCmwXpwXsnw9CQujSV4MF1jcAweBgXSNXAUqgkmjyssdLD3/wRQVYlJUyZZN74 5Ue+JElEF+P+ErXjY9iCfhvNxhIQiAU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=PlBS1kJt; spf=pass (imf01.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.174 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Tue, 27 Feb 2024 21:22:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1709086955; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zyMrThEX3Qym6hlhk1CyI/mGesH2kSc0piNuyuaKQm4=; b=PlBS1kJtKoMBzhPTxbPVH2CwmLu8cFcKgcyvki9wh+m8Ta7WyiGE940YNB3X96RK+SLLEx tWFI+Nz1+fdk7MrwfdIXS03pVBPD5t52ni6Xh5qoDmGvjbb1sSNf1p0jYEQP3Iq6zT6DBR SqGZ4qgnhIrYnPQ/10zZ07jiWoJdBeY= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Linus Torvalds Cc: Dave Chinner , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Christoph Hellwig , Chris Mason , Johannes Weiner , Matthew Wilcox Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Message-ID: <4rde5abojkj6neokif4j6z4bgkqwztowfiiklpvxramiuhvzjb@ts5af6w4bl4t> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Stat-Signature: chrssp93tk96jaeowtg45syffmdweerf X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A43B540014 X-Rspam-User: X-HE-Tag: 1709086956-534278 X-HE-Meta: U2FsdGVkX19fHUVC7mD1ZE9SDXz3Qx7kiKmYtxKjGPISsuKnxFTPzdCOFY670cYe87ggh0UY3ctUT4x7gIPlj/oYGi33lfPko0kz2xGjOZsK4Ov1U1U+pRfdcuOhN+hDmxIKyXewQGc8uhWXqymOdTLf3Tq+1a6friRlaBabCXC4iAT2rckfrtX5ocrgkMouXSpnYcUpNvVMDsshEKuyR1uvlLFQk9DrM5tT2Xk7NpRWnleUmhprDhEJdzB6A61Pr/QPBAa5dvU3tfLY7SkF54N0wcyKXzoGOHhBqejOLxJIc4vlnhldx+o9Tq59FAoOYJ+2iCBWEmXbPvVyVLR2KW+/ntV31eXVsYJc4UsPvIcacc5VJbOhxuBhH6NvS+NhCu9Anzk3Smo7LoCNTw6UnSQ0nz5ZORD+yzkeEWJUlKtpqpgMG1scIdMznym6YakKHo/OShZRmz4fbtp3yEV0bAz2wDcvQ+6mui1v435jwQBUSjQJ/PSTXhxmjVv9qVzyUdtygIcvci5qbRwPIL6O8hkxfJ0MAvUbzFElC0jc6qQLnUpo8c331c1gvYDDEhe6F19EOygdUtYlrpOilMrRslXk60F12a7ltWZljmdGGGTkdFUezWaT2GT78EQ0iHucfHnd+U7EH2/5l4WKzG3uQ4ouMRReVWjiFfZPIO2+bWHntz07Cg8AqVkqHDh3oNQ66Hqu8fJmQeOXlO5ogN6Kz/f2lPluOfhSCTNnror0PzaQpRtatWGkRau7f6VLMqWwK6kaWlIjVulNRf87MhE9L3Ri4jTWnwVp2HjWweeiQc8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 27, 2024 at 02:46:11PM -0800, Linus Torvalds wrote: > On Tue, 27 Feb 2024 at 14:21, Kent Overstreet wrote: > > > > ext4 code doesn't do that. it takes the inode lock in exclusive mode, > > just like everyone else. > > Not for dio, it doesn't. > > > > The real question is how much of userspace will that break, because > > > of implicit assumptions that the kernel has always serialised > > > buffered writes? > > > > What would break? > > Well, at least in theory you could have concurrent overlapping writes > of folio crossing records, and currently you do get the guarantee that > one or the other record is written, but relying just on page locking > would mean that you might get a mix of them at page boundaries. > > I'm not sure that such a model would make any sense, but if you > *intend* to break if somebody doesn't do write-to-write exclusion, > that's certainly possible. > > The fact that we haven't given the atomicity guarantees wrt reads does > imply that nobody can do this kinds of crazy thing, but it's an > implication, not a guarantee. > > I really don't think such an odd load is sensible (except for the > special case of O_APPEND records, which definitely is sensible), and > it is certainly solvable. > > For example, a purely "local lock" model would be to just lock all > pages in order as you write them, and not unlock the previous page > until you've locked the next one. The code I'm testing locks _all_ the folios we're writing to simultaneously, and if they can't all be pinned and locked just falls back to the inode lock. Which does raise the question of if we've ever attempted to define a lock ordering on folios. I suspect not, since folio lock doesn't even seem to have lockdep support.