From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8834C02188 for ; Mon, 27 Jan 2025 18:21:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1554E280175; Mon, 27 Jan 2025 13:21:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1062F280163; Mon, 27 Jan 2025 13:21:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EE887280175; Mon, 27 Jan 2025 13:21:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CEEEE280163 for ; Mon, 27 Jan 2025 13:21:29 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 5D9D3B02FB for ; Mon, 27 Jan 2025 18:21:29 +0000 (UTC) X-FDA: 83054049498.02.C926D01 Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) by imf04.hostedemail.com (Postfix) with ESMTP id 3DD3440002 for ; Mon, 27 Jan 2025 18:21:27 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=anarazel.de header.s=fm3 header.b=K0yBw62Y; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=GwFb7Etg; dmarc=pass (policy=none) header.from=anarazel.de; spf=pass (imf04.hostedemail.com: domain of andres@anarazel.de designates 202.12.124.155 as permitted sender) smtp.mailfrom=andres@anarazel.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738002087; a=rsa-sha256; cv=none; b=mTD8J5/1S5NAZDfcb4f3WHMxYBEMavNhqeadxOLY2TPoFVzocKEl76ztzR814USk0/3dE2 DzinzHv7Z1UIPQ8p+SiN5GaewBnPTQ5fZYQbCxMIJvjIDlD14v1A8ObAxhGxrELtE34Kph rBghsAkYfYjaa/tlWVrCcYKsynrddls= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=anarazel.de header.s=fm3 header.b=K0yBw62Y; dkim=pass header.d=messagingengine.com header.s=fm3 header.b=GwFb7Etg; dmarc=pass (policy=none) header.from=anarazel.de; spf=pass (imf04.hostedemail.com: domain of andres@anarazel.de designates 202.12.124.155 as permitted sender) smtp.mailfrom=andres@anarazel.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738002087; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=cRP+DK217qjmvyi+zlvZh1DoEcriRk8x6tJpLrq4SHM=; b=falqqz0Q7/7dhmKkJmKoDt/iN4n+aTJflZ3I7phsEMhUM11+hoeMSd7YXn5oD0lPg370dU 2TLIrUvRceExURV39PCYjUC2Zz4kRhFFDTmK+bzx1PBek5Iv7BgimLqyvZUqeIZA96cr4r 3QUJkJ3o2Z57RTUzfFaUS6YYDABr0Bw= Received: from phl-compute-11.internal (phl-compute-11.phl.internal [10.202.2.51]) by mailfhigh.stl.internal (Postfix) with ESMTP id EAFCE2540113; Mon, 27 Jan 2025 13:21:25 -0500 (EST) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-11.internal (MEProxy); Mon, 27 Jan 2025 13:21:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1738002085; x=1738088485; bh=cRP+DK217q jmvyi+zlvZh1DoEcriRk8x6tJpLrq4SHM=; b=K0yBw62Yb73WbXUBEHoqCbzA+7 oawTh2Z2nU0/MMW6wXT6zqkMFcHCJptD84p8RdOGJHpMNYe+OY5nZ9JgprrWipuk 286OUDsKkJIgSjiNHFGINcKfgajKgMsuQnjMWR32LoaQ1SSsCsO3nW+Io3fX66An DlJz0Kz40KWaiHgVOjcIr1lp6skGFNihS2pWuVokdyRum9vjD4hcOl/r1RvSicyn bZhut7QTzBuOlnAGatCNo6osy4Qj4Ah5i5tJnE3LpeXzPqd0PxXbKXFV3eZCINB/ M0ujUbI+lep7UKZLJEFHR9qZQ8dTnJEpfcboyfUXPVcp2CKqX405+S/fPNJQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1738002085; x=1738088485; bh=cRP+DK217qjmvyi+zlvZh1DoEcriRk8x6tJ pLrq4SHM=; b=GwFb7Etgx0E50q70h1p1A1jM7hE3M4U17QQDLkBD7U74JQhxfnx aeWGhG3UsGp6Maglika6CmVq0G4cS5Ilbac8aAvCETRzcnmN9mbRKZhL8ih5bSTa iEDi6Czt5IVh48iwKe6wcVgHrRPKRfG3hHOcajbIOMMyE2xlPsu+qXrJqKBnahsp dppzmF0fx9s/iKkoc4nYYqArmXmtol8fsY+kbDlwNq4t3YMcgX3BN6R5kmiHWwi+ 0lwl8HmCL60t2gCeeuf48EfDvVG8ZA0K6t/R6UE382AmELji3D/nPxV0h1X3qrvd geB5uP6hKfh0lAXc6ocpWvVtpYLUG8fFz3A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefuddrudejgedgudefkeekucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucesvcftvggtihhpih gvnhhtshculddquddttddmnecujfgurhepfffhvfevuffkfhggtggujgesthdtsfdttddt vdenucfhrhhomheptehnughrvghsucfhrhgvuhhnugcuoegrnhgurhgvshesrghnrghrrg iivghlrdguvgeqnecuggftrfgrthhtvghrnhepfeffgfelvdffgedtveelgfdtgefghfdv kefggeetieevjeekteduleevjefhueegnecuvehluhhsthgvrhfuihiivgeptdenucfrrg hrrghmpehmrghilhhfrhhomheprghnughrvghssegrnhgrrhgriigvlhdruggvpdhnsggp rhgtphhtthhopeejpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopeifihhllhihse hinhhfrhgruggvrggurdhorhhgpdhrtghpthhtoheprgigsghovgeskhgvrhhnvghlrdgu khdprhgtphhtthhopehlihhnuhigqdhmmheskhhvrggtkhdrohhrghdprhgtphhtthhope hmuhgthhhunhdrshhonhhgsehlihhnuhigrdguvghvpdhrtghpthhtohepjhgrnhgvrdgt hhhusehorhgrtghlvgdrtghomhdprhgtphhtthhopegurghvihgusehrvgguhhgrthdrtg homhdprhgtphhtthhopehlihhnuhigqdgslhhotghksehvghgvrhdrkhgvrhhnvghlrdho rhhg X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 27 Jan 2025 13:21:25 -0500 (EST) Date: Mon, 27 Jan 2025 13:21:24 -0500 From: Andres Freund To: David Hildenbrand Cc: Matthew Wilcox , linux-mm@kvack.org, linux-block@vger.kernel.org, Muchun Song , Jane Chu , Jens Axboe Subject: Re: Direct I/O performance problems with 1GB pages Message-ID: <6ulkhmnl4rot5vrywoxvoewko7vbgkhypcwxjccghdu26kwsx5@bnseuzrsedte> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3DD3440002 X-Stat-Signature: bxf4k7mu4ys83rki3348dupc11jk1so5 X-Rspam-User: X-HE-Tag: 1738002087-838599 X-HE-Meta: U2FsdGVkX1+fwyzP0+ogxTgsyEXD7gIr5E3dahD9Y85wZ1CJd0yYSHyd3WHiLq3ijMnivkz2tFKKQNaJChuMTJ1MHotZTAXMRjw+VgAj6gs6ARCqBkrakCZOtGrEBX13okozzjAG6fqW2VoHd1n8pSptjqmyh2Zth43aviQtB3LBJttX36NPYBESfJVxhE47/Su9dligFQ/qFaGbeiX/uD13utmKhqqo7aSUvjn9jXForplHJMzMj6qGgiduILn3Z4TKp9ygGTELl5aKHAnxp9+mxPq9Q7CkvaiD7JQLbMBwT+dbYmEPgM1qovVnLhAM4yjCje94rmBJYaA2MMDLAFLNB8mGlHzU5CUOt/9wPoXXFG+18lHd4QQr94ipId4GsMRaFyglQu7kJyGkGjdRZFKaSDxhkQPRq6YuYBu93AwzSDreRKtxft1eTIjWm6+9NX4LxTg5yTaVPvcN3L1hllFK2H8E7t2DvfuZi8+HrST+fSmHi/SJH4zik42Om2YAPfsUUS4dQpuA5iv50+0Qa+LjADnifE8YAOpyazIkEbHW9KGu4MX9c68SXIyHfQEr2JCuB5XwUxdLiCZHVYPN7Dy3Pd6n8qtXo3GPTgTL4C+/daC1D0n7ose5/5hJiYH0QOqbKv8S5+hD9FqwJpthOC6IYUKoiquRbd84AeYSDD4vfHJx1PmrUTx3sBlc6m9Pf9PmG002Hg3RNA1ytFKI0tnYEJINSCA5zuy3OsSX5VuaCeF22AooZNJLrwmod63MnMMDivQyGpfwxdYR2GVNRT2fHNYDcVrHl2PxAYsHr5W2cgKMNcL6z/6UgptwZB7lFRb+WK6JKhjT191eAUh1TznYuzNBSAbn87KIT4Upw9qtsAvPed8idQycua2dtVUIWms6ATMeFnQkN1qzOtPWRiSk0EDNELQZw3TxmeBnyrX4V/KgyucynpTJE/YCOC34H1Qi56T68SSoWdemzWb 9rq6z2fk 4Nh+zsJPCyJxUAtZY/W8EznNNCI/G59SxBWtD/85gxQfsyPorAI3HahDlhANmYEAyph8+V3WXGNQDPlJX3U0OYzEZSwzjEBNFTWyuleofW1EedZ0KRoisknPZHF2KusYVyYbgAPntTkXPC6UeAM/KTutdTQunWI9AG4iC0A1EjDMsvhfRYef1LRmmePS9mcGc0zQ5u2JE+Lyk8Eqv9bROQOdjGb5q/9tgOXUrLqY8eb7PhCewJYA1QPy4rw7kegw9hu3jz4x8hl6yN4CD4wPKRUzjwz0DYFopfHJtfes7CSc9msz/x5W6gXCSdVwJ7ywqBBtPJpnrOV7yTf9wARLvNXTvMpVZ7jaEWlVUN4QQDrvi/Rk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, On 2025-01-27 17:09:57 +0100, David Hildenbrand wrote: > > Andres shared some gists, but I don't want to send those to a > > mailing list without permission. Here's the kernel part of the > > perf report: > > > > 14.04% postgres [kernel.kallsyms] [k] try_grab_folio_fast > > | > > --14.04%--try_grab_folio_fast > > gup_fast_fallback > > | > > --13.85%--iov_iter_extract_pages > > bio_iov_iter_get_pages > > iomap_dio_bio_iter > > __iomap_dio_rw > > iomap_dio_rw > > xfs_file_dio_read > > xfs_file_read_iter > > __io_read > > io_read > > io_issue_sqe > > io_submit_sqes > > __do_sys_io_uring_enter > > do_syscall_64 > > > > Now, since postgres is using io_uring, perhaps there could be a path > > which registers the memory with the iouring (doing the refcount/pincount > > dance once), and then use that pinned memory for each I/O. Maybe that > > already exists; I'm not keeping up with io_uring development and I can't > > seem to find any documentation on what things like io_provide_buffers() > > actually do. Worth noting that we'll not always use io_uring. Partially for portability to other platforms, partially because it turns out that io_uring is disabled in enough environments that we can't rely on it. The generic fallback implementation is a pool of worker processes connected via shared memory. The worker process approach did run into this issue, fwiw. That's not to say that a legit answer to this scalability issue can't be "use fixed bufs with io_uring", just wanted to give context. > That's precisely what io-uring fixed buffers do :) I looked at using them at some point - unfortunately it seems that there is just {READ,WRITE}_FIXED not {READV,WRITEV}_FIXED. It's *exceedingly* common for us to do reads/writes where source/target buffers aren't wholly contiguous. Thus - unless I am misunderstanding something, entirely plausible - using fixed buffers would unfortunately increase the number of IOs noticeably. Should have sent an email about that... I guess we could add some heuristic to use _FIXED if it doesn't require splitting an IO into too many sub-ios. But that seems pretty gnarly. I dimly recall that I also ran into some around using fixed buffers as a non-root user. It might just be the accounting of registered buffers as mlocked memory and the difficulty of configuring that across distributions. But I unfortunately don't remember any details anymore. Greetings, Andres Freund