From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61B34C64EC7 for ; Wed, 1 Mar 2023 04:35:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D81FA6B0071; Tue, 28 Feb 2023 23:35:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D32506B0072; Tue, 28 Feb 2023 23:35:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF8E26B0073; Tue, 28 Feb 2023 23:35:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id ACD0F6B0071 for ; Tue, 28 Feb 2023 23:35:56 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 7817D1A10F2 for ; Wed, 1 Mar 2023 04:35:56 +0000 (UTC) X-FDA: 80519066712.04.FD86D56 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf03.hostedemail.com (Postfix) with ESMTP id 8979020017 for ; Wed, 1 Mar 2023 04:35:54 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=rigVAISJ; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677645354; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1qllLySs/K2FAIUkeg+7LYst0u8+3BaJfRN7QCh55SY=; b=T7HvYi30GyPFKMfrWjVxWMKc8WDeyoxDPnUD6WoMwN4ZVz9duFjJERM2tyLtfeHu9rGB93 dYOJtcrAU1YYjcJ+wFF45HD/EoEVT/oIaDBBMdh5HJz/KPcBJVn3qRBVgsJUPULfhSSUUx WcLwa39F0N3agKj+7Hgy3Ctw9O3QJJA= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=rigVAISJ; spf=none (imf03.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677645354; a=rsa-sha256; cv=none; b=hMp5jxlJXcf9oujaDSZS6z1yxZml1erBVeOEAUg86atcG/BNZzF1T3E3svwBLlAhapxnBB IVURx2c+W3R0rkUft89pAF9AGm9e1uHw/A2YqJmm3KTF4SAtJzthEPekUkmYZlE0/N3I9/ ZydGADBIGPSpv3ZVs9gjchph5aQyLIE= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=1qllLySs/K2FAIUkeg+7LYst0u8+3BaJfRN7QCh55SY=; b=rigVAISJSL2T7VTZ/17rxRCh1k J50h177oGgnzulvADT4/34FyZA/Zqiv7hB4/u0fbO+4PtMrg/N5k0i+gvLRw8yvdvznjydj605GXf mb1U4Sty3y8m70uB25ZsDSaqHY8+i7W8SdxHvOXPsvv5dSb1MqRKvb9HNC3zTQ+NoNHHEHjM6hs0V eKY6dOnXeQv4XRvwY1vaHGBQ1bZ+Ax4Jyan+J4HQNSly1xqb68qPR7ROoX8liKAsjT9Pc0TFu5+GJ bUbFDgBN5Kbg/GfuRZiatrj+Elc9UWjo0+JuPihJaNSbwGDPHeDddrXcsv4Tu3grpqdaVGo4uis1u PuqAW6gw==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pXEBw-001NHV-4h; Wed, 01 Mar 2023 04:35:48 +0000 Date: Wed, 1 Mar 2023 04:35:48 +0000 From: Matthew Wilcox To: Theodore Ts'o Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 8979020017 X-Rspam-User: X-Stat-Signature: pdwzi5qjfp471faog55ykyjmi74pm94b X-HE-Tag: 1677645354-5375 X-HE-Meta: U2FsdGVkX1/lrlxUDLSHVCCebzVjCYQIqb4cIZDs9nwQTHcniqMcPjYK3rnDZ7ZbDCDP8QAZ7OVbSDIlXPK1ys2QFucJy5CENkCeEh/tsb57J/fySKIPfVlEu06vGuEWKfs2wabbmtH8wSJgTDmXoGHXsYMcdHnZnyLBjtcHnh0LUCkQ0UGANqA9TD5iVmF9lbNDYs2V/BW9+4AXUEn9eOO7Zkm+5fHbo+ExX/vf7O+jZcY5MWUJTLus8WAnSwrZCCMnb6fH5/Eo+7W+yWidEzPnL0HaTP/gXC61Tdq5j5hwHaG061Fwik/UCeeMTX2WBYPTnBhqQBb9yf44GhHi3jOUGBKyG2q9d+0odZYF4qifPculeZRV2PVdWrBF7IThZ4cSokIDZyRb3KH48W/GghefVWkjlMNal2KunO1aBauSJiJ56Jov/URpOlurAjgN5JNsJmkh0A02e8kBT2uALm3Mm+ukm2NB0rdL4KCByGPd3L9AnJS2hVszzmGVb/lJ58viSNviWSGMTB6EL9RaUJc2vx/vg8/Sf6QYUyJgE8WdPo5xlGHVHrsv/HsmCO2mnaOXMJlf5JyzcsRP2QfR7/pd7B70qNhC2gxyNzIyjnaal4zKzi11iqbA2+ik8Ir1hbNtHtM+cgXJy1B6U3wMNFQahi8vBqk/I76ei4ag8ar4I5cMB7llqKDObdS3AxUYDxxj4crUU9yapQDCS/gxFc0SKohe85a4iV8h6tKAv5H84eGyQsRmXjG+62Ym+A1G6yJG+CGnPPjROLUEfjfDk7pdTLDSzhtZElaUhOXDRkv5cVSzxCNbRWp/0wGxm1FkJHBNohOD9wkMRTFIq0NobzWaUjkyJmqnCdhGFaVmj8+lDgTi26CJeePETWO2CiRAG5Eqc/IHUTUKp3n4glsGzTHvNOpokesgcj4PIHFXAnKPYfwyB97w9Dsbbt3vZ/7Opd0iNz3Xyyy336ho28+ 8j+sSB49 tPTfDn5T/ibGpTZf9/cegjsUqI3EDEseP3ols/I4GpLV4Uag60NH40F4QQwPhBbKfk9k0XLFFTUj9efF+1ozR/BPaXxX7hjV6b9o1oHTiapHk64OQiC/R9E47GgJg40tk4szgHEbiNVYftUB1dfr/MzFIog== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 28, 2023 at 10:52:15PM -0500, Theodore Ts'o wrote: > For example, most cloud storage devices are doing read-ahead to try to > anticipate read requests from the VM. This can interfere with the > read-ahead being done by the guest kernel. So being able to tell > cloud storage device whether a particular read request is stemming > from a read-ahead or not. At the moment, as Matthew Wilcox has > pointed out, we currently use the read-ahead code path for synchronous > buffered reads. So plumbing this information so it can passed through > multiple levels of the mm, fs, and block layers will probably be > needed. This shouldn't be _too_ painful. For example, the NVMe driver already does the right thing: if (req->cmd_flags & (REQ_FAILFAST_DEV | REQ_RAHEAD)) control |= NVME_RW_LR; if (req->cmd_flags & REQ_RAHEAD) dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH; (LR is Limited Retry; FREQ_PREFETCH is "Speculative read. The command is part of a prefetch operation") The only problem is that the readahead code doesn't tell the filesystem whether the request is sync or async. This should be a simple matter of adding a new 'bool async' to the readahead_control and then setting REQ_RAHEAD based on that, rather than on whether the request came in through readahead() or read_folio() (eg see mpage_readahead()). Another thing to fix is that SCSI doesn't do anything with the REQ_RAHEAD flag, so I presume T10 has some work to do (maybe they could borrow the Access Frequency field from NVMe, since that was what the drive vendors told us they wanted; maybe they changed their minds since).