From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABB5FD6D25F for ; Thu, 28 Nov 2024 04:22:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 302C16B0085; Wed, 27 Nov 2024 23:22:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 28B2A6B0089; Wed, 27 Nov 2024 23:22:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1051A6B008C; Wed, 27 Nov 2024 23:22:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E6EE76B0085 for ; Wed, 27 Nov 2024 23:22:57 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 800F0C1603 for ; Thu, 28 Nov 2024 04:22:57 +0000 (UTC) X-FDA: 82834208226.15.5F22C8E Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf22.hostedemail.com (Postfix) with ESMTP id E7C55C0002 for ; Thu, 28 Nov 2024 04:22:47 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CvStrALW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732767769; a=rsa-sha256; cv=none; b=XKGJzGyywqAZxC8tpreTNIr97d3xneOJw1PBngTzSt7Nzb5rSP93xRjE1jCgUSOedrvCwB XdTQ1JafS5Qe5N/w8T5Uk8Q/sj3h3SbI21CoKB/VnM+t//JlT6gFu6n4e+w5HHer4RqnfE k23nmPH34VbsX/bG16YimCT3uSGOWXE= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CvStrALW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732767769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5ux7GUH7n8ejrcL8zDOd21JVXw9InCTifNch8P3hDOE=; b=7NN5n7rDu+m/aJxj0ZbcvCZEHFnqPLl897aHPkmXi699mnskZWQH72+rPozjdXYY+Bwuql Ug3lKS2qtbQclKNO9DvjJImyjh9rSIOoePO9vumC4Op6wO9turKltL8Gm8qlv4rg0WbO2x bK3WljMTSfvkKojQ7mNjwo3DQi6uDk4= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-5cedf5fe237so478545a12.3 for ; Wed, 27 Nov 2024 20:22:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732767774; x=1733372574; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5ux7GUH7n8ejrcL8zDOd21JVXw9InCTifNch8P3hDOE=; b=CvStrALWuHrvOBcCPfe0D0Iyr6qiMMAsIFSr2FWh4xD5Tx9a73vmiNt7Sbx37S+iyD QaQUme0a0jAXRO7edHFXvuR9CTIfCq4Rn70LzvU5wJAzakTivCsQe4kowpHydtuChIYT 8lyO8Ci5OVxW8Ruy4pK0Re516R89KXuLAtMCTsZlZahxqG10AXmEojbyDHxGBsm3JF5s B9ItLCqgzto1cRUQ+byi50MAoqvQUgP5nKaGanf15E7B41LXLHK6Yb3jfVWb1oYarYn5 vAwNYiFthE14bTvDqHYIBhP0w/EdtNP5LOy5d4JR4H91kKmR1ahVcZgZ2bDz+fh3ul8p 8zVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732767774; x=1733372574; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5ux7GUH7n8ejrcL8zDOd21JVXw9InCTifNch8P3hDOE=; b=JFoFZx1mKcaN/IT/vyc9AEp24MnimFiokX0SbUsmQaY0locJv0tdazRxj7dcVw1wrD HUFRQTjUUIqUb3PtmTtAG5R5vT7p+hiwIbvhmfkqIyZW03poN4YR+143mQO9ok6H0efk VDe3J7Yy1fNnXrDH2JArDJRMUxQ/CZTaZz5XLGWwzLaekG3SRj9CsEvTSlaHNApWvwfH 3pgGtHZXc+7aklJYF3fmNKO54xej3b4t8YPrw6C7vfWHTHytC/bkc+tmhm80nVM8ah92 mThs+qo3xoPDya4rCYMEVKkapU1XxikcPfHig41PTK2o2tk61NDgZASjmE14guYTETsN zKKQ== X-Forwarded-Encrypted: i=1; AJvYcCUeJHSnHs9bMBhJ78WGI75Diu9s6dhy7KSxOzP8LC+YCL502iuThYaKLo8yChxuJthx5T+yjLe1Hg==@kvack.org X-Gm-Message-State: AOJu0YzeLGrJYZTTjm7McRkZbSPh+GYCtkzpNZqUuUDK9t8y8hVtcjoq DxnoQr7x7f0EbtvT0CxRSjEYOAPWwQbGRNjIcJztE3bmf2wc+cVlhnTBFtj88FosSA1qo9BYqmT OwzyEeMwMgNR03iZaMwumj6wW6wPeTKl131s= X-Gm-Gg: ASbGncvS5oxVLwL7w/TPEvnNMd2gMIwx59InudqZ4o33XFHzbJJGYIEXOU+gMa7Ga0J zjh49+AJrpRgf7nlb9DTNf735z2aKE68= X-Google-Smtp-Source: AGHT+IH2R1VLWEQk2u6NYseAiav0WBUws3NmAen1u9K8pxLydyfmlQ+GNF+VF71b4XAUvve4j/yUHoPAFV4BW2yNTtA= X-Received: by 2002:a05:6402:3546:b0:5cf:e71c:ff8e with SMTP id 4fb4d7f45d1cf-5d080b998b2mr4836167a12.12.1732767773809; Wed, 27 Nov 2024 20:22:53 -0800 (PST) MIME-Version: 1.0 References: <20241127054737.33351-1-bharata@amd.com> <3947869f-90d4-4912-a42f-197147fe64f0@amd.com> <5a517b3a-51b2-45d6-bea3-4a64b75dfd30@amd.com> In-Reply-To: <5a517b3a-51b2-45d6-bea3-4a64b75dfd30@amd.com> From: Mateusz Guzik Date: Thu, 28 Nov 2024 05:22:41 +0100 Message-ID: Subject: Re: [RFC PATCH 0/1] Large folios in block buffered IO path To: Bharata B Rao Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, nikunj@amd.com, willy@infradead.org, vbabka@suse.cz, david@redhat.com, akpm@linux-foundation.org, yuzhao@google.com, axboe@kernel.dk, viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, joshdon@google.com, clm@meta.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E7C55C0002 X-Stat-Signature: 6h8x9343t7yqtqkbne34zbzq5iamu6pw X-Rspam-User: X-HE-Tag: 1732767767-672146 X-HE-Meta: U2FsdGVkX18lqpebUT8sx2pkz3JSgcixt5GxR2k/ve34r4oXKaT5purtUVGld7g/0dQuhesklr3dOhi6BDZxPST7zksw0XdH3Eh+FVUPgddBB4yBRThaoFKj1rMJCU++05d/OBMEe0ZWAkZ0L3IGHCve3dzGWesiaEM3UUuWl7cPxXr4+yEC9mue5whoArwQzFbvQJsbxgHavXDtuSLfXgweNrJgqqlaDaVqvlTxQReqxTlKlO98TpIpzNZWVPptmxrUldcprjmF1a63TQp3WCP+1vXlVQ2r1nNNgMl2grSPWI3EASZ9m+My4Um96XSvMZqaisiVUvwuVWJu492IvQE30/R+yMV/xwP/YeNuxxFbsc3WP5Dt8Snzlz55nzm+fMz+RnvAtnkCV/F3VNRUyLTBd8vM4jBoA+6DmTp6copALxlLSprERSXmSwLiDziuxXmZBm2+KPDMsvP3mYGzMNwGZPUECj6IZcA6FYQQKDqPPCXkbkaJnd/aOszyNmZSpPP6CUggWxCh0+ltblIKoC6RXPs1H+ech4rfrqmfznYpydSRNSkd1HtVBoh/nYeY8dMxdR20P8vp/zhFZxLfCGsYW8hipbAMUcqAoNXbDlujHLw2R94y9ba0mOjvOom5RpOVt0YItUQjHxLu40bb970i6Of2LJ6f7q0OaQ5O+mhTjj7GXFGWYTl7Huro8phtaAwFj8Uiry49qfp3VMgc8jSd+9p7KjFsD+WLe/i7PuHG+W6SyZmBTODBPgSJ6jTNdlVNM7uxpZGjbXh7CBrUpjGOhW643Tsm1nrtdXeeT9HjFYAnvHqvUHstCajW0PWlBrwNS44yB+KTFP2J/wb+r6Hj0EtbZtKNYkE2tC2eFuwN11at3pwOUr89dUgAkJZqGOmdWeu227Pb3ClbvO8UlpCUA8xkKx/1nsI8v3aqV4DQtkA9gENcCIGNmfsnTPCoZmMjCXGIYbhe4r6owFH 3bSEyzhV EXRCKwjDI/owZ7EW2QXmnQ8u3nBTqtZ0Ydofl4OTcA7R0Nqm4jyKzf9vKctWGh7B1wVT7woQlm3uFdZU0DMPn1aqShOv7Qk2rEQ8lYP5ZfXJx73rOd6xztuYL465F9bZXckFt8We3wEvQJE9SHQ1ibwB9w+JoKu2cUoaJmxTVEt4MXwCp+NxNT6tUtD9cECyD+mKNmBzVc1TQ5nMNM0I9q6NQsYlMMs4dmoCINnOzKVaAZ9M5c80Z5LgY+d4tmzd9MJZ77Lkd6ejskIyKSQstAghPgmaUSJ2CrQfzWEAoeoDsTORiYKd+VIq0tcnImRvYBp28sTR9l5nw8ve4ipBmX125EEpqU3l6O54S0pCm3Tle3UJbTCbrAzl7lpB7XLDqXPKVgAMGp0/cp03JDAFTwg+Bqg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 28, 2024 at 5:02=E2=80=AFAM Bharata B Rao wro= te: > > The contention with inode_lock is gone after your above changes. The new > top 10 contention data looks like this now: > > contended total wait max wait avg wait type caller > > 2441494015 172.15 h 1.72 ms 253.83 us spinlock > folio_wait_bit_common+0xd5 > 0xffffffffadbf60a3 > native_queued_spin_lock_slowpath+0x1f3 > 0xffffffffadbf5d01 _raw_spin_lock_irq+0x51 > 0xffffffffacdd1905 folio_wait_bit_common+0xd5 > 0xffffffffacdd2d0a filemap_get_pages+0x68a > 0xffffffffacdd2e73 filemap_read+0x103 > 0xffffffffad1d67ba blkdev_read_iter+0x6a > 0xffffffffacf06937 vfs_read+0x297 > 0xffffffffacf07653 ksys_read+0x73 > 25269947 1.58 h 1.72 ms 225.44 us spinlock > folio_wake_bit+0x62 > 0xffffffffadbf60a3 > native_queued_spin_lock_slowpath+0x1f3 > 0xffffffffadbf537c _raw_spin_lock_irqsave+0x5c > 0xffffffffacdcf322 folio_wake_bit+0x62 > 0xffffffffacdd2ca7 filemap_get_pages+0x627 > 0xffffffffacdd2e73 filemap_read+0x103 > 0xffffffffad1d67ba blkdev_read_iter+0x6a > 0xffffffffacf06937 vfs_read+0x297 > 0xffffffffacf07653 ksys_read+0x73 > 44757761 1.05 h 1.55 ms 84.41 us spinlock > folio_wake_bit+0x62 > 0xffffffffadbf60a3 > native_queued_spin_lock_slowpath+0x1f3 > 0xffffffffadbf537c _raw_spin_lock_irqsave+0x5c > 0xffffffffacdcf322 folio_wake_bit+0x62 > 0xffffffffacdcf7bc folio_end_read+0x2c > 0xffffffffacf6d4cf mpage_read_end_io+0x6f > 0xffffffffad1d8abb bio_endio+0x12b > 0xffffffffad1f07bd blk_mq_end_request_batch+0x1= 2d > 0xffffffffc05e4e9b nvme_pci_complete_batch+0xbb [snip] > However a point of concern is that FIO bandwidth comes down drastically > after the change. > Nicely put :) > default inode_lock-fix > rw=3D30% > Instance 1 r=3D55.7GiB/s,w=3D23.9GiB/s r=3D9616MiB/s,w=3D412= 1MiB/s > Instance 2 r=3D38.5GiB/s,w=3D16.5GiB/s r=3D8482MiB/s,w=3D363= 5MiB/s > Instance 3 r=3D37.5GiB/s,w=3D16.1GiB/s r=3D8609MiB/s,w=3D369= 0MiB/s > Instance 4 r=3D37.4GiB/s,w=3D16.0GiB/s r=3D8486MiB/s,w=3D363= 7MiB/s > This means that the folio waiting stuff has poor scalability, but without digging into it I have no idea what can be done. The easy way out would be to speculatively spin before buggering off, but one would have to check what happens in real workloads -- presumably the lock owner can be off cpu for a long time (I presume there is no way to store the owner). The now-removed lock uses rwsems which behave better when contested and was pulling contention away from folios, artificially *helping* performance by having the folio bottleneck be exercised less. The right thing to do in the long run is still to whack the llseek lock acquire, but in the light of the above it can probably wait for better times. --=20 Mateusz Guzik