From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF0DFC7EE23 for ; Wed, 1 Mar 2023 03:52:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3919F6B0082; Tue, 28 Feb 2023 22:52:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 340896B0093; Tue, 28 Feb 2023 22:52:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 208296B0095; Tue, 28 Feb 2023 22:52:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0EAB16B0082 for ; Tue, 28 Feb 2023 22:52:24 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C6A841404CB for ; Wed, 1 Mar 2023 03:52:23 +0000 (UTC) X-FDA: 80518956966.09.E0F2B10 Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) by imf13.hostedemail.com (Postfix) with ESMTP id 2FC2A2000B for ; Wed, 1 Mar 2023 03:52:20 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=fail ("headers rsa verify failed") header.d=mit.edu header.s=outgoing header.b=CtSfW2Af; dmarc=pass (policy=none) header.from=mit.edu; spf=pass (imf13.hostedemail.com: domain of tytso@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=tytso@mit.edu ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677642741; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=ZA1nW/swc2lOnU8SSLD4xZ2BivJ0oOyDYTicwUJAQBQ=; b=Tv2BmfaTdjC41oJS+PvhXBbDSs5U1klkgia6pp/dkczrSHHQyOjdi/gCupEbovK3+HDljS nOirIjz0VziOutJxkDnl9rmgdxkgGkBF3ifH+N0j5TzsbEz4rSoOw8Cd0wpmaCLJft77An wPIRyfqGN1ljyFCyliq/yZ/K2eqoHpw= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=fail ("headers rsa verify failed") header.d=mit.edu header.s=outgoing header.b=CtSfW2Af; dmarc=pass (policy=none) header.from=mit.edu; spf=pass (imf13.hostedemail.com: domain of tytso@mit.edu designates 18.9.28.11 as permitted sender) smtp.mailfrom=tytso@mit.edu ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677642741; a=rsa-sha256; cv=none; b=pmUkKvV2HIlATLmWfaKNuy7C5jT7Tejm2nWQEmZFNpjndv4fIO60c/oanHPIbasUAG+I52 uSiQAlVAMa9BLUXlWmYplfim/Y24SjQAfLUiQ5LnjzLPjpIlxgvhTrOZ1Nmfm6VudvyCcb HKo9gQTtTLU1mPSnz5BCXqysbGAVtuM= Received: from cwcc.thunk.org (pool-173-48-120-46.bstnma.fios.verizon.net [173.48.120.46]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 3213qF9v017148 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 28 Feb 2023 22:52:17 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mit.edu; s=outgoing; t=1677642738; bh=ZA1nW/swc2lOnU8SSLD4xZ2BivJ0oOyDYTicwUJAQBQ=; h=Date:From:To:Cc:Subject; b=CtSfW2AfVJuETan9DkrTe7BgH27KA+kstH6kyzBn0b36djJ81Du/Ryk5Xc312MM2C g8MZsNEXsDOBxtkV59JNATEo9BGCesvXmsNLySNwnghbGEPmW5lmlioxIwtGld7ZL6 kVGnIFwBiCfmcf74Zo1CZf7YTBHYQW35KJpHriFjUquq+fnFt9YgOpA3g6gHUDAdEd VNVM87qlLYgXxmuLqhJcOXSnN74aD0q0qBiUAu5L6WaSiduHIQf78e1vWkzuLvq2iZ ZTB2hXG5mfwgaekXGtYFrbMwpb57q7hxEMPTEsYfTw/vGjQgYZf8mGw4nfd4Oy+NMw r61njsrh6z8UA== Received: by cwcc.thunk.org (Postfix, from userid 15806) id C522D15C3593; Tue, 28 Feb 2023 22:52:15 -0500 (EST) Date: Tue, 28 Feb 2023 22:52:15 -0500 From: "Theodore Ts'o" To: lsf-pc@lists.linux-foundation.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org Subject: [LSF/MM/BPF TOPIC] Cloud storage optimizations Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 2FC2A2000B X-Stat-Signature: n5iws5mntaontr5jzc9gxaukz3d3okc3 X-HE-Tag: 1677642740-985773 X-HE-Meta: U2FsdGVkX19hVgIQ8oZTZRodOFfv0rqg5Qsaj6JfCAW1AtWeLsEj3UHQVYeQKGkIOciXmargNyMlZ+njbXQd9G7B3sPGK8Qt+RCx7Z3A7PlRvA8aVGzKuHn4rEJy0ccI2PiebkFx2msjN3WRuM2Y08y8kHdfEUq1+Pr3bizSAq9iB+o5c506z2TLgmj9RcpnfKiLNV/U/EGFAze43HDSyYq1+zq16hLpDBZmv13RKhAk1dg3tEdKPjVS8pau40yyDP85MOA5JUsTGEU6m+49xbWugLHnI8gAhHmKapu7JOKGtwFajqivRhCNqCwCKGhRye7kncuSvaITHjkCaerxEKFTfGeBoRhnZdpxn2+8sBYU55ypLibVl8r3qykxlbIvNI78t20BBRIJvRExZcUzw2h2kaTtbvmdrj9iueu97B1dB0ypRjfO31horzPFZDUI2qL35unyzEnQwvqMiDokVXV2eTRwOfLAz21lY6q6lyGy9oDEmI98483iRb9WL6RfLK7tGZX9fEvvMz7rmnOopYNRyJjCZj60exyOsjhk9GAuyUCozpMJGexusmgqYdgFf65mSSEkN4JRuCUavzYAaBLiVfMpWTq2cF8MhpsiQaIwqImLcIOTuzNGxr6UbU3M4kl+wsuFKrRvEqZbdDk/7Zwv3/B0VyuLD1OWrdga4vbjvtbyDkLUtmiU5L8T537knpFBceEpVzAqTHp/oUyJ5KJeGqCLpKp5ojE0VEUHtPhmSARUeFX05HQATl/kaV8oThAJr3w6pimebALVcvLMq5I7Bg2nLLjxHC2T0H8L0KAzChXk8ytcDplzUnWE5hbvCQLJmOcAc+mtdrwEqifhtU6g/2bgwZ6pOKgh66aC2PIIT1ZiRpbXJfQI2LrZPlkD83gl1fcicch0bhwywc+z++dfPCmAxXGbTUI4Ov9Sv1xKpBM7rdzfqWeZvL86LLLJqGS3Eq4uzjOiQQGBNDp Xq5jbpYw LmFprV4sz3o4oCukqWgFPWOCfyRt2jSs0rqG7DTEp5ZKDdIMuYtc4+Hx7aMAhNFy2gzzwv0jnrlB4PP/6MMSWvfo3Wn1pD7O9DZ2koGDb3vbJhMnhN6OHxhhE53ceC/oGd/9+KUfmjiPFfocL9MMd49vApf8INFJolZtSRl2w7peED5AShoDgxGCirOSgbPfzdQ5ccLkrPKA2HFCPAeRlrP3DzqKuZrsu7h6Yyf39wA1UA6I= X-Bogosity: Ham, tests=bogofilter, spamicity=0.140188, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Emulated block devices offered by cloud VM’s can provide functionality to guest kernels and applications that traditionally have not been available to users of consumer-grade HDD and SSD’s. For example, today it’s possible to create a block device in Google’s Persistent Disk with a 16k physical sector size, which promises that aligned 16k writes will be atomically. With NVMe, it is possible for a storage device to promise this without requiring read-modify-write updates for sub-16k writes. All that is necessary are some changes in the block layer so that the kernel does not inadvertently tear a write request when splitting a bio because it is too large (perhaps because it got merged with some other request, and then it gets split at an inconvenient boundary). There are also more interesting, advanced optimizations that might be possible. For example, Jens had observed the passing hints that journaling writes (either from file systems or databases) could be potentially useful. Unfortunately most common storage devices have not supported write hints, and support for write hints were ripped out last year. That can be easily reversed, but there are some other interesting related subjects that are very much suited for LSF/MM. For example, most cloud storage devices are doing read-ahead to try to anticipate read requests from the VM. This can interfere with the read-ahead being done by the guest kernel. So being able to tell cloud storage device whether a particular read request is stemming from a read-ahead or not. At the moment, as Matthew Wilcox has pointed out, we currently use the read-ahead code path for synchronous buffered reads. So plumbing this information so it can passed through multiple levels of the mm, fs, and block layers will probably be needed.