From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05916D6ACC3 for ; Wed, 27 Nov 2024 12:14:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 781876B0085; Wed, 27 Nov 2024 07:14:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 732D26B0088; Wed, 27 Nov 2024 07:14:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 620B86B0089; Wed, 27 Nov 2024 07:14:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 44E7B6B0085 for ; Wed, 27 Nov 2024 07:14:04 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 008A2A115B for ; Wed, 27 Nov 2024 12:14:03 +0000 (UTC) X-FDA: 82831766514.01.4F87F20 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf01.hostedemail.com (Postfix) with ESMTP id F331040012 for ; Wed, 27 Nov 2024 12:13:57 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=LFnhym5l; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf01.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732709639; a=rsa-sha256; cv=none; b=1Wo+HvAU9XtN4OwF0tbRWMQiOYu/CZsYEgaax5RC1Z/Tp0xjMCY4LuD/knvR+8bTNbB0vc KwyCMAC45Kv26x5125Ne8dHb5v5BGUssM7BkswmfwFDyrII94Sh9FFjeWhQ2eYeeNej4MU H9eXGRVsAS7zZb/Mx3iz90j3ribmKs4= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=LFnhym5l; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf01.hostedemail.com: domain of brauner@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=brauner@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732709639; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wqkO86g9Incd/ksuAS06SATZJUdzUHv6DrNcM+bz/gs=; b=P8Je7pu2E+6aDPB0q/7hp80lXhcout8GRNZ3gMMIKFUPFbWHmUugiJ9wc8IKKNGBgrNpsA 4A2D/n2hC9HgUxyVC+m/hOh1wtfaZ9dIkve5PQd+vXzAwB9AczQYMquGaDzezKXLm6ik4L EnAyobYvjyE3/NmcpYMpsbCWFzsZFE0= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id EE2665C5A03; Wed, 27 Nov 2024 12:13:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2A7E7C4CED2; Wed, 27 Nov 2024 12:13:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1732709641; bh=2DHdoQYnTHtWREWnk1w95RvPty46V+EkizkwpK9ekUw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=LFnhym5lNdSkoAed4+/PcUsiJFiDFn/RPMdkQQ+JlqbdKBTeGpB0vYf226Bq7/Czi lwH1URHBGLy0c1kj3K7lFxQQSlbbk56C5bf/XGdcFk44CsY3yy9QLmtcG2jSmvVwEI Zzbq7zOSYVh2dvs5RCnsUTrNe/DGZXIlqG1Fnstuip8oxlXZ2bjKTsLzWosvSey2gQ 3282G3ymIDrTFVch+j3JqpRVTRMjAJKNFCSq2b19XTfrtpN+ekKiHP/4z43ubXSv6+ XTJtW6COfKAyJSyGPkrpjcN6cfn1Yd9XVVQj+fv62hB7o0mcnaDmRwgcC6JPWMrIE2 LYSEl5l7fq6PQ== Date: Wed, 27 Nov 2024 13:13:54 +0100 From: Christian Brauner To: Jan Kara Cc: Mateusz Guzik , Bharata B Rao , linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, nikunj@amd.com, willy@infradead.org, vbabka@suse.cz, david@redhat.com, akpm@linux-foundation.org, yuzhao@google.com, axboe@kernel.dk, viro@zeniv.linux.org.uk, joshdon@google.com, clm@meta.com Subject: Re: [RFC PATCH 0/1] Large folios in block buffered IO path Message-ID: <20241127-heizperiode-betuchte-edc44ec45f37@brauner> References: <20241127054737.33351-1-bharata@amd.com> <20241127120235.ejpvpks3fosbzbkr@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20241127120235.ejpvpks3fosbzbkr@quack3> X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: F331040012 X-Stat-Signature: f1ewqy13ec3kpq8d8bd6ftoegu5mahb4 X-HE-Tag: 1732709637-59457 X-HE-Meta: U2FsdGVkX1+7qjTWnX45cA1f2M57AhKJlAR+1NcF+o/RERZ80djpGeghz8P6U5uj52QYgtDrsznmu272JiuTqUnxOMhQXb/MDMtmZ1jlBsTV8hQUHWvrIuYIYIq/hBHotM7XjxeCY/Mj6T1A8lTX3os2g8K0jO6reM7vfye3JkJNsNkTgclNCW3n+xZCKWSjNqJD7LALF1M2jHVROgWowm9fdeXI0m55eOmmnNTR2slB0eJkcb8VT3euiCCxnNgNR1rAJUFfm+3T2pzioXLyYVtMB3eIHFe/hDamULxuc0Nwz9CwDOZoFiLF3exPNaDMLPiU5OORdb47iu4q7tQWbqMwwfEmv+q3WQbEFsHNEXMUXswvLc2h2FPRVPejkAkbb885EywudrDWu8ytr3fQCWe6KNcl5tj6Q7+KNwgHzoBfe5i7cUL/D4qcvhdzuQp3zu1AG4vJi20Kzy5xO9aP1OpWsz1LrsP3Y3xvZ3Woga6dZJ0VrifBVOay9bsfaHd2fKQLec+nP5OB2WSxN8EKB+Bd+dMa9hzR18k9d0FMIpYoy0bngUHyvTmO+PtsTafHuyUC6Km9z4brXd6l/hmlNBW8FufBEMnBMrMrhPEEFcJfsmox63VnNfpsZhN26wkTtXokPlYo5IrHaeFu97RJ6AhKG5Kh/xYzU6Trx+2CQSxnXuU/nkugks+IZC+7kwtgX7ue1/U2YTqlCpGmVdl61QzTFruPYCBljy+veYpm8G2//1T4IzTejAUY6lbBox1Y5MYuMuDFUTflfxKpDemHsHqRPJRK+lMMVYsd4wINyZV59bepLgZSSve/2ua68RaKuLzi7gFvdBXiEaR0OAAdILl0s6R2LEXRJYfLa9OS7bliTEomUzoLOP7oXiT633PEQxFrP4KHCGASIMQO5svCmT+9VBNxfstRw75WnyD4Y7MC/QEXQuUPGxsETKydSI5Y/ksKAoqjLfstU6bN6kz +huuiwrE ZfAPwBXUD7+/mH/wvbO6TZtrYpEbxOqDOfi1Pj6x0XMIxiAbpzoFMXBRxnS7azcEDBigqN7O7pQugEKYpe1eDeQwQAUcAJ3ekuNkN2U1ZaSEX79XzQ9IvP+Y/7LGyQLphbcEYqeJut8yCQP8fHgLVuQ1YssUAe4yXdrJ7ygOYvLYBYiZj3SnRaQb7P+mSJO3/OYpI8SbGWNSyeCy4nuSr0jZhc/WjuGsrSR+c/czx+4wvBFz+eKbKgDu/9mDCYzHSJTvyLxhzBLH8lLOXXNHgqeebfIBehW0RJKm/BqX38nfUNI1klUfFKAVNmGWIoFU7RgJIwOunvusfKVS3/SwXVjjV3DMNF+8ezcw8pqGwaXWrh2EoNCykKFVPSFXlSo8ukqG/Fa24cLFdww7jBQpHQ/I0IA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Nov 27, 2024 at 01:02:35PM +0100, Jan Kara wrote: > On Wed 27-11-24 07:19:59, Mateusz Guzik wrote: > > On Wed, Nov 27, 2024 at 7:13 AM Mateusz Guzik wrote: > > > > > > On Wed, Nov 27, 2024 at 6:48 AM Bharata B Rao wrote: > > > > > > > > Recently we discussed the scalability issues while running large > > > > instances of FIO with buffered IO option on NVME block devices here: > > > > > > > > https://lore.kernel.org/linux-mm/d2841226-e27b-4d3d-a578-63587a3aa4f3@amd.com/ > > > > > > > > One of the suggestions Chris Mason gave (during private discussions) was > > > > to enable large folios in block buffered IO path as that could > > > > improve the scalability problems and improve the lock contention > > > > scenarios. > > > > > > > > > > I have no basis to comment on the idea. > > > > > > However, it is pretty apparent whatever the situation it is being > > > heavily disfigured by lock contention in blkdev_llseek: > > > > > > > perf-lock contention output > > > > --------------------------- > > > > The lock contention data doesn't look all that conclusive but for 30% rwmixwrite > > > > mix it looks like this: > > > > > > > > perf-lock contention default > > > > contended total wait max wait avg wait type caller > > > > > > > > 1337359017 64.69 h 769.04 us 174.14 us spinlock rwsem_wake.isra.0+0x42 > > > > 0xffffffff903f60a3 native_queued_spin_lock_slowpath+0x1f3 > > > > 0xffffffff903f537c _raw_spin_lock_irqsave+0x5c > > > > 0xffffffff8f39e7d2 rwsem_wake.isra.0+0x42 > > > > 0xffffffff8f39e88f up_write+0x4f > > > > 0xffffffff8f9d598e blkdev_llseek+0x4e > > > > 0xffffffff8f703322 ksys_lseek+0x72 > > > > 0xffffffff8f7033a8 __x64_sys_lseek+0x18 > > > > 0xffffffff8f20b983 x64_sys_call+0x1fb3 > > > > 2665573 64.38 h 1.98 s 86.95 ms rwsem:W blkdev_llseek+0x31 > > > > 0xffffffff903f15bc rwsem_down_write_slowpath+0x36c > > > > 0xffffffff903f18fb down_write+0x5b > > > > 0xffffffff8f9d5971 blkdev_llseek+0x31 > > > > 0xffffffff8f703322 ksys_lseek+0x72 > > > > 0xffffffff8f7033a8 __x64_sys_lseek+0x18 > > > > 0xffffffff8f20b983 x64_sys_call+0x1fb3 > > > > 0xffffffff903dce5e do_syscall_64+0x7e > > > > 0xffffffff9040012b entry_SYSCALL_64_after_hwframe+0x76 > > > > > > Admittedly I'm not familiar with this code, but at a quick glance the > > > lock can be just straight up removed here? > > > > > > 534 static loff_t blkdev_llseek(struct file *file, loff_t offset, int whence) > > > 535 { > > > 536 │ struct inode *bd_inode = bdev_file_inode(file); > > > 537 │ loff_t retval; > > > 538 │ > > > 539 │ inode_lock(bd_inode); > > > 540 │ retval = fixed_size_llseek(file, offset, whence, > > > i_size_read(bd_inode)); > > > 541 │ inode_unlock(bd_inode); > > > 542 │ return retval; > > > 543 } > > > > > > At best it stabilizes the size for the duration of the call. Sounds > > > like it helps nothing since if the size can change, the file offset > > > will still be altered as if there was no locking? > > > > > > Suppose this cannot be avoided to grab the size for whatever reason. > > > > > > While the above fio invocation did not work for me, I ran some crapper > > > which I had in my shell history and according to strace: > > > [pid 271829] lseek(7, 0, SEEK_SET) = 0 > > > [pid 271829] lseek(7, 0, SEEK_SET) = 0 > > > [pid 271830] lseek(7, 0, SEEK_SET) = 0 > > > > > > ... the lseeks just rewind to the beginning, *definitely* not needing > > > to know the size. One would have to check but this is most likely the > > > case in your test as well. > > > > > > And for that there is 0 need to grab the size, and consequently the inode lock. > > > > That is to say bare minimum this needs to be benchmarked before/after > > with the lock removed from the picture, like so: > > Yeah, I've noticed this in the locking profiles as well and I agree > bd_inode locking seems unnecessary here. Even some filesystems (e.g. ext4) > get away without using inode lock in their llseek handler... nod. This should be removed.