From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D059C54798 for ; Sun, 25 Feb 2024 17:32:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF7A66B0116; Sun, 25 Feb 2024 12:32:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DA6AA6B0117; Sun, 25 Feb 2024 12:32:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C6E5B6B0118; Sun, 25 Feb 2024 12:32:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B43806B0116 for ; Sun, 25 Feb 2024 12:32:46 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2DE95A0895 for ; Sun, 25 Feb 2024 17:32:46 +0000 (UTC) X-FDA: 81831021132.22.3960751 Received: from out-178.mta1.migadu.com (out-178.mta1.migadu.com [95.215.58.178]) by imf11.hostedemail.com (Postfix) with ESMTP id 38E4A40009 for ; Sun, 25 Feb 2024 17:32:43 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sYfZStVJ; spf=pass (imf11.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.178 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708882364; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=48kxm1i6cHsIkepVpM1ROlIBeKHmlm9Uv4gXRmTJ3LM=; b=y0KKzi/WuMF9Pc9Lk2mBl+FDF7dFoWrpAAf3fkoPzga0XHhE3BamhC/sP8x4XPXWnGGYXn cJoD2TdqEQkk7ItD5WwQjuTKl3Tn6R82diZPDuGmZes/89eKzWe8aLyzK3pRlYH/YvYUJg Ux0B8vitn5AxuamzuEGgaFdtKKZw4Go= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708882364; a=rsa-sha256; cv=none; b=hRCWarbhXX6CZSIWiVNrgCVi26Mapbg0C+ueorXAHKtQkyPyX2RnN0/1mIQsANGAQ7TS8o ls4ZKoi0JibFOcNg3NrgaxPowhF9AniBJYLaUGj3CxIQiSA951DGNTTxRvCXzEIxXBwaRq U4df/CduTrzWMOBjgbXCcsEZrtZKCvQ= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=sYfZStVJ; spf=pass (imf11.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.178 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev Date: Sun, 25 Feb 2024 12:32:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1708882361; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=48kxm1i6cHsIkepVpM1ROlIBeKHmlm9Uv4gXRmTJ3LM=; b=sYfZStVJ55kmFRBLPasdaCrFUnTHQ786jA3sSk4ys8XnRkmo6P3BUnlD4dk+1AoLLbM4cA hvRfsndXdCe4ExffmLTmBefPQjNbbzisfhd26yLqPYIkEfOFagi4Siqf6OxgMtoDvAI+Nh Nl2fHAIYj1lTlkQUqZvf1Uan2RIqZjw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Matthew Wilcox Cc: Linus Torvalds , Luis Chamberlain , lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm , Daniel Gomez , Pankaj Raghav , Jens Axboe , Dave Chinner , Christoph Hellwig , Chris Mason , Johannes Weiner Subject: Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO Message-ID: <2impeknscqi2dg3ik6woohow26wjlfnv4oaevuqa7o2uyc3ppz@pwpnppp54jnh> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 38E4A40009 X-Rspam-User: X-Stat-Signature: odp5mk9cr6pbiemodrd1z5aampcchstj X-Rspamd-Server: rspam03 X-HE-Tag: 1708882363-734215 X-HE-Meta: U2FsdGVkX19I7HztAae3T0sktQUKK6s6I1ma8MyhTKVhUE/e9E8JF6dtwNL3gEVLRSwbxkpL6S9TwaIlfHayGUUw0RCnYiEAa5zt3jntmz3ZjwEKietDquQgK+53R5gXpwpqnj7JsOpe5kzUGFYDKz82TjCDhpoYlpaOslACfib038B1cbVDqcLLBrf1nMqiqP34a4by5nLhY2YUWRIONAKWQKKVtlF8ndhT5E4JtqrQs7wGkwaa/fNJEshG46npFpVq1hSFCHf9bhemAV9qBaV0onfi0RoC2LFVFIrLB8Hq9p6dJtJqaoTFIF3jUM9HHiFtH8RqKu3nxIn7lr0zq4EXWmwN1T3le8teXvLzIi2g5rZROhd4rB6bkve/vOBUTNXhDE2JxTz09tuzURRdcCc0leSwCLbgL+Q+la/R3W28KudB9i1mIqE9uU1JLYgNpHlxRwDvdY5sMJYBS3hmSB1FBFO6aCat7/0A7w+X+CVd+yiHVBI7hnoRjz6IFWwM2tDho4DhTHWCSgzsu4wcGIcDScuNnZXrXc4Lw+01ylIYviM+AGwTtMrLhgMHTXDEbWS8BF1iQ2UTiLk/JoXk5cmNzGmCsmb0d/rZfxonGCrE/QRPGJc5aLtjs3YvVgc8vRP9hSyaLexdggHqlQUVFNCIaV3w+CqYeNqN4THWHxMEU8QBUKxRgeaxp29BHnwAfVy0CcDpZxdqe1mz1OnExDH7Oj4ykofALElmfWw0WNxkdReqa8jqDGyzPh+UrFHDSSqoE1zoOL6iIixjv7auSuesAk8efkHDEMxttC0PkDxk1gWn8rZ9j9EDSWJmqXFDnGLPqsnbV1CMn3iwRjTSsS+502nM361U+T2tbd7Xaf7yxmv7DFXTpzBfy2NhVWhSjl7f+y4Cpwbmk3qjCAWe4nW6l4YGHrt7Bw73YI/3ggy2CtUFD0uxl0mSrG5WyEm7ZaLPCUsXpqLpUKLDoYO X3LBKI2Z CZrKgWSOrDlFraZ2d0mGQVUoQ1ARzSELN1zrW7x8+w+pZWkTW5NAvvJPP9y8pzEN6WFE/M6kdwFouOt0AfYLpQivKrA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Feb 25, 2024 at 01:10:39PM +0000, Matthew Wilcox wrote: > On Sun, Feb 25, 2024 at 12:18:23AM -0500, Kent Overstreet wrote: > > Before large folios, we had people very much bottlenecked by 4k page > > overhead on sequential IO; my customer/sponsor was one of them. > > > > Factor of 2 or 3, IIRC; it was _bad_. And when you looked at the > > profiles and looked at the filemap.c code it wasn't hard to see why; > > we'd walk a radix tree, do an atomic op (get the page), then do a 4k > > usercopy... hence the work I did to break up > > generic_file_buffered_read() and vectorize it, which was a huge > > improvement. > > There's also the small random 64 byte read case that we haven't optimised > for yet. That also bottlenecks on the page refcount atomic op. > > The proposed solution to that was double-copy; look up the page without > bumping its refcount, copy to a buffer, look up the page again to be > sure it's still there, copy from the buffer to userspace. > > Except that can go wrong under really unlikely circumstances. Look up the > page, page gets freed, page gets reallocated to slab, we copy sensitive > data from it, page gets freed again, page gets reallocated to the same > spot in the file (!), lookup says "yup the same page is there". > We'd need a seqcount or something to be sure the page hasn't moved. yes, generation numbers are the standard solution to ABA...