From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D96FE77180 for ; Thu, 12 Dec 2024 15:46:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C6D96B007B; Thu, 12 Dec 2024 10:46:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0767C6B0085; Thu, 12 Dec 2024 10:46:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E808B6B0088; Thu, 12 Dec 2024 10:46:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CA6F46B007B for ; Thu, 12 Dec 2024 10:46:47 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3255AC01FB for ; Thu, 12 Dec 2024 15:46:47 +0000 (UTC) X-FDA: 82886733720.08.B838BE5 Received: from mail-io1-f42.google.com (mail-io1-f42.google.com [209.85.166.42]) by imf24.hostedemail.com (Postfix) with ESMTP id B12B0180016 for ; Thu, 12 Dec 2024 15:46:41 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=SiMtBnbG; spf=pass (imf24.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.42 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734018382; a=rsa-sha256; cv=none; b=DbdjS0J8jbwCP1TNZf59QGtXBNSW+ZXour8Ux6X2juQWTZ4j8LJQVVypb7SQ+2F9vGuAk2 Tsd5432LHXq+QNrpa18ueIwEbr4lQnneYWxbovo83BRUTG4SltKaJsPolLfFi8ZIv+Orr+ w2Gcw49VzTACMalfd4JSnh1fqaizRuY= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=SiMtBnbG; spf=pass (imf24.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.42 as permitted sender) smtp.mailfrom=axboe@kernel.dk; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734018382; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tO8UEzJ4vRcwe2Pe8Itytncx4R9qJI/GK3O9fbQd828=; b=abYKj3We0y0kSZ9fus+4fj0CF1SgG7D+6Ks5t6FoL2/ROhnCprWMxeR3bT4yqYdi/W6TKd QnsBaua5EQjEpVhleOrssxYWxiwUGI4rfMzv4Y2iIfFLWcrQCQZQalnadAEv8ow3Qga2rM 58rFDDYZc4cZqynE1fWnvuhH14GfYtY= Received: by mail-io1-f42.google.com with SMTP id ca18e2360f4ac-844dfe4b136so20666239f.3 for ; Thu, 12 Dec 2024 07:46:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1734018403; x=1734623203; darn=kvack.org; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=tO8UEzJ4vRcwe2Pe8Itytncx4R9qJI/GK3O9fbQd828=; b=SiMtBnbGGwGADR57lGj37VUBgZj0M4tFisO2VmdRZyMhsWAhVkyaVfoLXvVj+RSSfP rwkX2yM7Pcqv+4SghHYnY6zPVEbhBj7AQcypb87QqLGQN/zpnrim0gGrSMT2K4FKk4rk CYc7R21IuSbqYtKvV5/qkiHRPQqVMriW2giAL3wJ+zkYrJi3Ze+prYdvAuUZRl1GO+5i h8zJYp3hoo172g1nHAq4G3dJmfBxmQg6Ypme7KIYPxPpD3OvjFbZrmxWaSfYyERPF0f1 erOSeV4M3G1e1MSlA6eW4zojJC7d3JRSwYPwb5wuXB+6PscXmUiKHl17p95ZAE7D20kv yRZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734018403; x=1734623203; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tO8UEzJ4vRcwe2Pe8Itytncx4R9qJI/GK3O9fbQd828=; b=HAPykVIgouRapIZ059Qz6CSkNfayPicdKAdqe9XDhbxdD+00KCyUYjOD6myjl6+Con mDMVG4Jyf5P43TRy80MutmcZAX3W+ShRwlClqTpGqvZaolewvbE31ApzE4L3vfy8o4Ol Aq6ZZXkvCMNOPGP8LVYxEBNnaNH4+CjRkM2pgJns1GqxKeLuvNas7KM5AbVGQxxFbrSc 6ChwSF/n6V6PpENACKDLC2hBB7E6xp7aFK6JOmCkQHHmW3o9zfZKFVnkiSWCGGteChCg /G0xryhjF5mxOoxbgKd7xHtkpzOr2W+etjJWs1lJC+d5Dmbj4GRtSScrdJ28O/6CYXSl A6Kw== X-Forwarded-Encrypted: i=1; AJvYcCVxdkxCglrldv3JQYr3n1ci83WXF0vPHVFBPmu0A+ssPlW4H3JcaCRunjfWLNAPYpeg1qzXteNVIA==@kvack.org X-Gm-Message-State: AOJu0Yze0NgpHmQJAm8ElZnOb2hZjLNRyXaxo7/HHI5gchJJbl36Lssl 2XwBvWT10jq37nOG1guZrtaPBPwQkpHAC4BR8Xb66L4BkMk4RXPUyV8zyrkSMB0= X-Gm-Gg: ASbGncvMk1yqRuoDxwKzjnGCetvcKcAwexflbjHcPE+Q3X7PDOmODacYc3re5g4bG+j K310owuSKTakUbtv8nAE/p7X8icx7LNDpYLHfq5HVqHUTOHYfg8SzEoJRPqyKQIAerzVqs+IFJn wIEHa6L9lyQ6BnHHB/g8GF08EdfUf+3DxvlibAAxGOZfWV+TPWSWmoUGRrUsis4ADQcag+Id1ka Aqig4nZ8fKMtytU9XMLVguhQ4kOQ/mPLNekDKf6V1mAJsZVUAMo X-Google-Smtp-Source: AGHT+IEWOOfyZauproNpLzyqbWskkA8C5m2JWmRRDzC2lEgVYvSQEsLpIBT1S8N7zYh/eZTxtW0sVQ== X-Received: by 2002:a05:6602:3404:b0:843:eca9:a050 with SMTP id ca18e2360f4ac-844e560bb23mr72779339f.1.1734018403424; Thu, 12 Dec 2024 07:46:43 -0800 (PST) Received: from [192.168.1.116] ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-844d44183d4sm66027039f.51.2024.12.12.07.46.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 12 Dec 2024 07:46:42 -0800 (PST) Message-ID: <6e6475c2-7a5f-4742-b4ae-e1eef5f2c508@kernel.dk> Date: Thu, 12 Dec 2024 08:46:41 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCHSET v6 0/12] Uncached buffered IO To: Bharata B Rao Cc: bfoster@redhat.com, clm@meta.com, hannes@cmpxchg.org, kirill@shutemov.name, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, willy@infradead.org References: <20241203153232.92224-2-axboe@kernel.dk> <20241210094842.204504-1-bharata@amd.com> From: Jens Axboe Content-Language: en-US In-Reply-To: <20241210094842.204504-1-bharata@amd.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: B12B0180016 X-Stat-Signature: 55d6j6mmpgquj4iowxpak1gwtewecn6w X-Rspam-User: X-Rspamd-Server: rspam09 X-HE-Tag: 1734018401-321908 X-HE-Meta: U2FsdGVkX1/MbdPDHWAqKFi6nYPbphC03sn4nsQayB0cHoIxdWlg7UzMGJ8/g3FDDaS0RvUrEZQ30AwjnM0p8pSyQyF4mz+R+iVBFVPDpdcao/3hlm6Gv4corb2rO7BlDMjRFcIZaCOWUyYQiDSTNSNNxrZfCUNrd7RBlvt0ywsaFk0QEpsDTrBsLic5vhT3HTBuQpd0DelQZfJH0qKRqzvB6P/AYEKG/zzlU6n6p8xCpqrowey1EOgpNdK3YsEgrOX+OXVu/gwyxZs4G6x1UhBy4BzyRDFFFW68r+0lJhitEACofFew/fn/rm2slcQiJ48oeIswifBcga/BXfq1l9k6sVp5Jc7N4LMudhNjJtLrYBATAbdOuR0z7wwH6g71bjnvHYztdfJrI6RsejdJ/KKb7eY+FF6/fOkGipbEJrc4yPBNpwD7RE+b8vcf+JlJl5NhcBZhscZKeaP2us93X6NjH4FfmopB+4sJuMM8LElNPkry8CdkK+4OHw0Lw3DGXUwMvdEQFngn13X9CjNKOjUTPKFpipJp/SzDiB/B3E8gnOpfEFc6ubazsfSID/nUCI1rpkV51dp36XVJCBQ5RZOKhFQD5xXmN2gCk5UrG/okGm3Kw02mJwEiNOWwwg5NZbMUuLe5JKCQN3PrDhHuhCyv67Se9Oq0gjRwBVYJHBolRCC0BQXpygYcZsmsD5vCjrXfjtNvCeQlFdRLopOYyUqrreVeu9qfyN2K7ue31FG8x9jnNOAwwAqsakdUQOT98sWBfTT9DbSM1zK/RGpFYCodp3RakJ/mheAixwoRiYT9NZuA/fvMDnpVE7yBPP8V6bj3R7toB6/v0UYVEAmDWGe74Nm+dV36m1JRcIfULmC8ruHFwNg/pJqUX+jj3Ri+3pjMRSn3fDjlU62kP5wR54T8raXhfHqR4B1RuTE1KOw9ucUDsGc/5gyEw9h9Mia0099caScC8DL3iguIiy3 vG/ZvUFz LKdnNyjPhJ1cAP2rUZXu39SAX+SY+pZuH/AJLd2FErvW0iRClH9McjAC7UXl1KogeYy9qo3mMyFTMlmhISs/WQu0hsJcZXaqU/Skx2/GvAqSDqYkqXJzN91kEi2ErdSpjpto0+DikCY/AvQ/XZz89K/mH/5xYuHqwvnRH3nEab3k1x0nFpLFWaR7kUVKe+Umj9utVnWEK4SV9Wlb9khVpqlJCKD3agNJIWNCEErBir4tWOm5lkdbQj47ghEAindniSCZIuJvhVFukDY5r4zfm+zIFvf/TuhJOrINd+G+OpdsIMXS1VmsXktglVBB+ndPW6ObQRlOWaS9VxyaC2JOsRuYm0sizROgce1DwZfbyuOP7IpzKXqFr2hnE4g2NfapVZjGTt1VLcnIXCtb8q+K23fOnOiH0PSGaQIC/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.277476, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 12/10/24 2:48 AM, Bharata B Rao wrote: > Hi Jens, > > I ran a couple of variants of FIO to check how this patchset affects the > FIO numbers. My other motivation with this patchset is to check if it > improves the scalability issues that were discussed in [1] and [2]. > But for now, here are some initial numbers. Thanks for testing! > Also note that I am using your buffered-uncached.8 branch from > https://git.kernel.dk/cgit/linux/log/?h=buffered-uncached.8 that has > changes to enable uncached buffered IO for EXT4 and block devices. > > In the below reported numbers, > 'base' means kernel from buffered-uncached.8 branch and > 'patched' means kernel from buffered-uncached.8 branch + above shown FIO change > > FIO on EXT4 partitions > ====================== > nvme1n1 259:12 0 3.5T 0 disk > ??nvme1n1p1 259:13 0 894.3G 0 part /mnt1 > ??nvme1n1p2 259:14 0 894.3G 0 part /mnt2 > ??nvme1n1p3 259:15 0 894.3G 0 part /mnt3 > ??nvme1n1p4 259:16 0 894.1G 0 part /mnt4 > > fio -directory=/mnt4/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -directory=/mnt3/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -directory=/mnt1/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -directory=/mnt2/ -direct=0 -thread -size=3G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > > Four NVME devices are formatted with EXT4 and four parallel FIO instances > are run on them with the options as shown above. > > FIO output looks like this: > > base: > READ: bw=1233MiB/s (1293MB/s), 1233MiB/s-1233MiB/s (1293MB/s-1293MB/s), io=4335GiB (4654GB), run=3600097-3600097msec > WRITE: bw=529MiB/s (554MB/s), 529MiB/s-529MiB/s (554MB/s-554MB/s), io=1858GiB (1995GB), run=3600097-3600097msec > READ: bw=1248MiB/s (1308MB/s), 1248MiB/s-1248MiB/s (1308MB/s-1308MB/s), io=4387GiB (4710GB), run=3600091-3600091msec > WRITE: bw=535MiB/s (561MB/s), 535MiB/s-535MiB/s (561MB/s-561MB/s), io=1880GiB (2019GB), run=3600091-3600091msec > READ: bw=1235MiB/s (1294MB/s), 1235MiB/s-1235MiB/s (1294MB/s-1294MB/s), io=4340GiB (4660GB), run=3600094-3600094msec > WRITE: bw=529MiB/s (555MB/s), 529MiB/s-529MiB/s (555MB/s-555MB/s), io=1860GiB (1997GB), run=3600094-3600094msec > READ: bw=1234MiB/s (1294MB/s), 1234MiB/s-1234MiB/s (1294MB/s-1294MB/s), io=4337GiB (4657GB), run=3600093-3600093msec > WRITE: bw=529MiB/s (554MB/s), 529MiB/s-529MiB/s (554MB/s-554MB/s), io=1859GiB (1996GB), run=3600093-3600093msec > > patched: > READ: bw=1400MiB/s (1469MB/s), 1400MiB/s-1400MiB/s (1469MB/s-1469MB/s), io=4924GiB (5287GB), run=3600100-3600100msec > WRITE: bw=600MiB/s (629MB/s), 600MiB/s-600MiB/s (629MB/s-629MB/s), io=2110GiB (2266GB), run=3600100-3600100msec > READ: bw=1395MiB/s (1463MB/s), 1395MiB/s-1395MiB/s (1463MB/s-1463MB/s), io=4904GiB (5266GB), run=3600148-3600148msec > WRITE: bw=598MiB/s (627MB/s), 598MiB/s-598MiB/s (627MB/s-627MB/s), io=2102GiB (2257GB), run=3600148-3600148msec > READ: bw=1385MiB/s (1452MB/s), 1385MiB/s-1385MiB/s (1452MB/s-1452MB/s), io=4868GiB (5227GB), run=3600136-3600136msec > WRITE: bw=594MiB/s (622MB/s), 594MiB/s-594MiB/s (622MB/s-622MB/s), io=2087GiB (2241GB), run=3600136-3600136msec > READ: bw=1376MiB/s (1443MB/s), 1376MiB/s-1376MiB/s (1443MB/s-1443MB/s), io=4837GiB (5194GB), run=3600145-3600145msec > WRITE: bw=590MiB/s (618MB/s), 590MiB/s-590MiB/s (618MB/s-618MB/s), io=2073GiB (2226GB), run=3600145-3600145msec > > FIO on block devices > ==================== > nvme1n1 259:12 0 3.5T 0 disk > ??nvme1n1p1 259:13 0 894.3G 0 part > ??nvme1n1p2 259:14 0 894.3G 0 part > ??nvme1n1p3 259:15 0 894.3G 0 part > ??nvme1n1p4 259:16 0 894.1G 0 part > > fio -filename=/dev/nvme1n1p4 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -filename=/dev/nvme1n1p2 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -filename=/dev/nvme1n1p1 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > fio -filename=/dev/nvme1n1p3 -direct=0 -thread -size=800G -rw=rw -rwmixwrite=30 --norandommap --randrepeat=0 -ioengine=pvsync2 -bs=64k -numjobs=252 -runtime=3600 --time_based -group_reporting -name=mytest > > Four instances of FIO are run on four different NVME block devices > with the options as shown above. > > base: > READ: bw=8712MiB/s (9135MB/s), 8712MiB/s-8712MiB/s (9135MB/s-9135MB/s), io=29.9TiB (32.9TB), run=3600011-3600011msec > WRITE: bw=3734MiB/s (3915MB/s), 3734MiB/s-3734MiB/s (3915MB/s-3915MB/s), io=12.8TiB (14.1TB), run=3600011-3600011msec > READ: bw=8727MiB/s (9151MB/s), 8727MiB/s-8727MiB/s (9151MB/s-9151MB/s), io=30.0TiB (32.9TB), run=3600005-3600005msec > WRITE: bw=3740MiB/s (3922MB/s), 3740MiB/s-3740MiB/s (3922MB/s-3922MB/s), io=12.8TiB (14.1TB), run=3600005-3600005msec > READ: bw=8701MiB/s (9123MB/s), 8701MiB/s-8701MiB/s (9123MB/s-9123MB/s), io=29.9TiB (32.8TB), run=3600004-3600004msec > WRITE: bw=3729MiB/s (3910MB/s), 3729MiB/s-3729MiB/s (3910MB/s-3910MB/s), io=12.8TiB (14.1TB), run=3600004-3600004msec > READ: bw=8706MiB/s (9128MB/s), 8706MiB/s-8706MiB/s (9128MB/s-9128MB/s), io=29.9TiB (32.9TB), run=3600005-3600005msec > WRITE: bw=3731MiB/s (3913MB/s), 3731MiB/s-3731MiB/s (3913MB/s-3913MB/s), io=12.8TiB (14.1TB), run=3600005-3600005msec > > patched: > READ: bw=1844MiB/s (1933MB/s), 1844MiB/s-1844MiB/s (1933MB/s-1933MB/s), io=6500GiB (6980GB), run=3610641-3610641msec > WRITE: bw=790MiB/s (828MB/s), 790MiB/s-790MiB/s (828MB/s-828MB/s), io=2786GiB (2991GB), run=3610642-3610642msec > READ: bw=1753MiB/s (1838MB/s), 1753MiB/s-1753MiB/s (1838MB/s-1838MB/s), io=6235GiB (6695GB), run=3641973-3641973msec > WRITE: bw=751MiB/s (788MB/s), 751MiB/s-751MiB/s (788MB/s-788MB/s), io=2672GiB (2869GB), run=3641969-3641969msec > READ: bw=1078MiB/s (1130MB/s), 1078MiB/s-1078MiB/s (1130MB/s-1130MB/s), io=3788GiB (4068GB), run=3600007-3600007msec > WRITE: bw=462MiB/s (484MB/s), 462MiB/s-462MiB/s (484MB/s-484MB/s), io=1624GiB (1743GB), run=3600007-3600007msec > READ: bw=1752MiB/s (1838MB/s), 1752MiB/s-1752MiB/s (1838MB/s-1838MB/s), io=6234GiB (6694GB), run=3642657-3642657msec > WRITE: bw=751MiB/s (788MB/s), 751MiB/s-751MiB/s (788MB/s-788MB/s), io=2672GiB (2869GB), run=3642622-3642622msec > > While FIO on FS shows improvement, FIO on block shows numbers going down. > Is this expected or am I missing enabling anything else for the block option? The fs side looks as expected, that's a nice win. For the bdev side, I deliberately did not post the bdev patch for enabling uncached buffered IO on a raw block device, as it's just a hack for testing. It currently needs punting similar to dirtying of pages, and it's not optimized at all. We really need the raw bdev ops moving fully to iomap and not being dependent on buffer_heads for this to pan out, so the most likely outcome here is that raw bdev uncached buffered IO will not really be supported until the time that someone (probably Christoph) does that work. I don't think this is a big issue, can't imagine buffered IO on raw block devices being THAT interesting of a use case. -- Jens Axboe