From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6355CC02188 for ; Mon, 27 Jan 2025 19:36:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D772628019D; Mon, 27 Jan 2025 14:36:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D28EF280191; Mon, 27 Jan 2025 14:36:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC7F328019D; Mon, 27 Jan 2025 14:36:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 9DA89280191 for ; Mon, 27 Jan 2025 14:36:58 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 275311405CA for ; Mon, 27 Jan 2025 19:36:58 +0000 (UTC) X-FDA: 83054239716.22.0D73565 Received: from fout-b5-smtp.messagingengine.com (fout-b5-smtp.messagingengine.com [202.12.124.148]) by imf10.hostedemail.com (Postfix) with ESMTP id 06FC5C0008 for ; Mon, 27 Jan 2025 19:36:55 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=anarazel.de header.s=fm3 header.b=SGQ+hY5H; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=zHTWgYvq; spf=pass (imf10.hostedemail.com: domain of andres@anarazel.de designates 202.12.124.148 as permitted sender) smtp.mailfrom=andres@anarazel.de; dmarc=pass (policy=none) header.from=anarazel.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738006616; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0L1mt8TW1+VI6Bf4XILI8+bzfSFIatHw3GOQUrulDzM=; b=WfEW8yDBMtRebML1aZBnpab4MjUx3BaRXtO1unqdkjRF2yRuCWFQxKpJPlqf2jk5YXkKar b3NnhZlga5hus8bY65UyKn+3lb1Ym6QY5EF10IccUQs7dMG7Xj3opyyo4dMT7YtDc6bI9w TvVB4TCa8e9Wdq6Ns7lMEqLsBu+EHlU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738006616; a=rsa-sha256; cv=none; b=Y+LSNBM0q+TIHD12f1OV3inZ+dvrOQ4oEYwAsalJMN5inZrIJ2Q6h2RXkIIaNYAcXwExjt v8OoAqqJ2IPP0LuFj2NbpbfcsiweUCEjC16BgiR6UwwoA4iF7XyW5sgBD5idfMeZRzB5FS uEF+UmKPLmrRuI4dhBYAq+qT4SGVHls= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=anarazel.de header.s=fm3 header.b=SGQ+hY5H; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=zHTWgYvq; spf=pass (imf10.hostedemail.com: domain of andres@anarazel.de designates 202.12.124.148 as permitted sender) smtp.mailfrom=andres@anarazel.de; dmarc=pass (policy=none) header.from=anarazel.de Received: from phl-compute-04.internal (phl-compute-04.phl.internal [10.202.2.44]) by mailfout.stl.internal (Postfix) with ESMTP id 073391140103; Mon, 27 Jan 2025 14:36:55 -0500 (EST) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-04.internal (MEProxy); Mon, 27 Jan 2025 14:36:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1738006614; x=1738093014; bh=0L1mt8TW1+ VI6Bf4XILI8+bzfSFIatHw3GOQUrulDzM=; b=SGQ+hY5HX4pfkhMuj8WFreEHi9 9pFChSANpT4NrYdSV5GTs22L7ZQccc33VcyTPn9U5vVkzhrqnyc0Ei53FE7KZFzO Op3+XzqgQvxIrB8bnPfUXjg1ZSxhCPuUAR+4iTtRs3on8565FFQCWxikzqazvqvw iBhArA8NpnSCLYccbLmtitc9mE4bvzBmPNgMCHsDE7UazlXjhV/sfrs8gih04q/h rRqE7V54OYUMA7U3xrsr55tgH4SjRFiIcoT9KaSH3W5A8E2ERTLk/2D5IBByVFNG BRD8Ra06bX8xk4soCNO41iIpgY5bygNTCEv21PxxeF2CfRy9kAedmxUpKrRg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1738006614; x=1738093014; bh=0L1mt8TW1+VI6Bf4XILI8+bzfSFIatHw3GO QUrulDzM=; b=zHTWgYvqK+tADyWEjSVexkdlxfNbZFxtPzyL3JCqJK1EZpVmF1D rxhUDOIGlyOhOY5bSBDJL+o7O25gkro7H9sXeSsgoC1sWN20t0832CtVXYHwCmkv qLZq4OTCQA4vUA1Ek2hObNHkTj2aaWUkIeFUCt854jNP6Qe0L6YBvjZ4eQsucFDQ voZfiY2cg1RTUV3XLF1QBeaRuUvwHQBJnSJU5zssUYz4KWZ2TWlqS+hhR6/pmaWg MKpZqYcvtszDSGRLaYjeiJ6uBjN0H6mkexAA7Jqe2UpzQM9IaKk8RWhqbVv2MLgR GpO9JAwXIRZOe7gn2kj9VlC5wMdiSZlP6DQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefuddrudejgedgudegtdegucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhepfffhvfevuffkfhggtggujgesthdtsfdttddt vdenucfhrhhomheptehnughrvghsucfhrhgvuhhnugcuoegrnhgurhgvshesrghnrghrrg iivghlrdguvgeqnecuggftrfgrthhtvghrnheptedtkeefffeuudeufeeiffekgeeujefg teefvefhudegleehieeufeffhfehgffhnecuffhomhgrihhnpehgihhthhhusgdrtghomh enucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrnhgu rhgvshesrghnrghrrgiivghlrdguvgdpnhgspghrtghpthhtohepiedpmhhouggvpehsmh htphhouhhtpdhrtghpthhtohepfihilhhlhiesihhnfhhrrgguvggrugdrohhrghdprhgt phhtthhopehlihhnuhigqdhmmheskhhvrggtkhdrohhrghdprhgtphhtthhopehmuhgthh hunhdrshhonhhgsehlihhnuhigrdguvghvpdhrtghpthhtohepjhgrnhgvrdgthhhuseho rhgrtghlvgdrtghomhdprhgtphhtthhopegurghvihgusehrvgguhhgrthdrtghomhdprh gtphhtthhopehlihhnuhigqdgslhhotghksehvghgvrhdrkhgvrhhnvghlrdhorhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 27 Jan 2025 14:36:54 -0500 (EST) Date: Mon, 27 Jan 2025 14:36:53 -0500 From: Andres Freund To: David Hildenbrand Cc: Matthew Wilcox , linux-mm@kvack.org, linux-block@vger.kernel.org, Muchun Song , Jane Chu Subject: Re: Direct I/O performance problems with 1GB pages Message-ID: References: <4a75d25f-bcb9-42b6-aa9e-1e63e4be98e3@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4a75d25f-bcb9-42b6-aa9e-1e63e4be98e3@redhat.com> X-Stat-Signature: 17sty57y63ysth5nreqc5r5aa3o65wy4 X-Rspam-User: X-Rspamd-Queue-Id: 06FC5C0008 X-Rspamd-Server: rspam03 X-HE-Tag: 1738006615-351911 X-HE-Meta: U2FsdGVkX1/iA6TU1aDZxoMf7CoLSALbxMXr+krtRmA8Vxeyup32CJr2YdZsbVWCfaEjVDfvJhe2QVY0A1B3dTNXqttbjPIsQ0sWmVvrKXXSsmH1bkx0ixo3Lt+0iAaM0iOlTZZYvNFk6k7ren0RDzjbtj6RzL9BNO0sN/T0Ncd4Ci4zgdJD6nbMt8C8VSY4DcSBe/70UUqsiWW25fSro8TS03jWyGB6lI2BfOkJtYrMaLC0Zw04A1LmmhtTFUOnyz1JZeU8YbyDE1kmANi+vauPFxNlhrf55i+esVTaU2ksJVKwyMvBL3Fjn/pRcXAUcrRf7iP6FCxDiA/WIVKumZ/VrxPhWNX42qlLzQEAXTX9R98eBEXI6pfr7VZtG2sZmAGjuzlbeSwiZ09f/UZfSaxnguccI1AkMr62Bgc3XiXdg90jAQpQ/v8VzC/sL4FP2p0FXZ3rnoNJMEr7wq+Wt5b9tsGbB/kWGmK/80WUhDo3QghCXHTO4Db4fH9s/HhAeUkm9qDrq4X8FNWBvjUbZKErKhpHbfxrtQNAz7Qzna3pIKWZtUiw1nEkhtqVFWkf2xuJN71PSzHqwtZaK0Nn/8L5wSKpfGYciAqySlRDH5G3nAr9T+YT+N+Fp1ErZCee3sYGWYsdx1pHuI1yzJIZrf0EfLA352ACEAGdFbXH5UtGk+vZQ+iWxhP2JkbdWubPl4N1WDTvqAzvH7hXrPBLoqzBRIZveBD24yoeTllFHHSSg7bCrhnvuI8Kyl7QUMnhQkqcnM50NRj98o1maKArnhcieCekjbTjocG4g9vp7nQYjCE6Y62Dn/dAXe3FXBMtpVmkvCpDJ8Kufm1Bi79Fdw5qZd9xoqkFszrAsfaKAi85HFUHdwNZ2UULkrINTVsn7hrTvtYGuT2XlmTe9d09QjYxMgXv7Rpo6ZQ5HHk9JTQTYY2A6oxQM1nrhluLcwERMvjLFJ03uuUPan4UdAK q413ouQW wQHDSNMCUa850hDHnJIUom5MiFERLkltZi/1AQf9lyQRZN75KyENreBGuSjxuBeBVCJHgRR0MUpT3qrxkZDegcsq6Nyskv9xWMl8LvdZVCwvG+dLQfO0C09R9dTEJqYjqYY/SRon3qoTjLR0yRHOk6gFcW0ZTRUgwE4LSUUMNNNv5Y4DCwVi67RISVV70sY3ShJFq2MuYM8N1xXQ92FPX4qQp/fCOk0UPLq2qQCM8D/34eLWz1IR6VXFy5K32ctDMP4wWKnHHD+F1hkHPw6tv1ZV44yK7HJ6O+vslMkCuffdT+E8GrbKrQk1T8FbW/IIQXAgZAWAXOV8se+6/8bHuXIG8lO7c4t/jpMojJKDT4BjCD25as1gZFkykc4Ity+Y+6sAJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, On 2025-01-27 20:20:41 +0100, David Hildenbrand wrote: > On 27.01.25 18:25, Andres Freund wrote: > > On 2025-01-27 15:09:23 +0100, David Hildenbrand wrote: > > Unfortunately for the VMs with those disks I don't have access to hardware > > performance counters :(. > > > > > > Maybe there is a link to the report you could share, thanks. > > > > A profile of the "original" case where I hit this, without the patch that > > Willy linked to: > > > > Note this is a profile *not* using hardware perf counters, thus likely to be > > rather skewed: > > https://gist.github.com/anarazel/304aa6b81d05feb3f4990b467d02dabc > > (this was on Debian Sid's 6.12.6) > > > > Without the patch I achieved ~18GB/s with 1GB pages and ~35GB/s with 2MB > > pages. > > Out of interest, did you ever compare it to 4k? I didn't. Postgres will always do at least 8kB (unless compiled with non-default settings). But I also don't think I tested just doing 8kB on that VM. I doubt I'd have gotten close to the max, even with 2MB huge pages. At least not without block-layer-level merging of IOs. If it's particularly interesting, I can bring a similar VM up and run that comparison. > > This time it's actual hardware perf counters... > > > > Relevant details about the c2c report, excerpted from IRC: > > > > andres | willy: Looking at a bit more detail into the c2c report, it looks > > like the dirtying is due to folio->_pincount and folio->_refcount in > > about equal measure and folio->flags being modified in > > gup_fast_fallback(). The modifications then, unsurprisingly, cause a > > lot of cache misses for reads (like in bio_set_pages_dirty() and > > bio_check_pages_dirty()). > > > > willy | andres: that makes perfect sense, thanks > > willy | really, the only way to fix that is to split it up > > willy | and either we can split it per-cpu or per-physical-address-range > > As discussed, even better is "not repeatedly pinning/unpinning" at all :) Indeed ;) > I'm curious, are multiple processes involved, or is this all within a single > process? In the test case here multiple processes are involved, I was testing a parallel sequential scan, with a high limit to the paralellism. There are cases in which a fair bit of read IO is done from a single proccess (e.g. to prerewarm the buffer pool after a restart, that's currently done by a single process), but it's more common for high throughput to happen across multiple processes. With modern drives a single task won't be able to execute non-trivial queries at full disk speed. Greetings, Andres Freund