From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01B17D32D81 for ; Tue, 12 Nov 2024 18:48:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 88A776B00A8; Tue, 12 Nov 2024 13:48:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 839FF6B00F1; Tue, 12 Nov 2024 13:48:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6DB536B00FE; Tue, 12 Nov 2024 13:48:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 4386A6B00A8 for ; Tue, 12 Nov 2024 13:48:03 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 02195140494 for ; Tue, 12 Nov 2024 18:48:02 +0000 (UTC) X-FDA: 82778326638.05.EF8F109 Received: from mail-oa1-f52.google.com (mail-oa1-f52.google.com [209.85.160.52]) by imf13.hostedemail.com (Postfix) with ESMTP id 85C932001C for ; Tue, 12 Nov 2024 18:47:17 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=ncYLoqWk; dmarc=none; spf=pass (imf13.hostedemail.com: domain of axboe@kernel.dk designates 209.85.160.52 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731437152; a=rsa-sha256; cv=none; b=YfSDvP3M9WHisH0Bq+kIo1UXe3qnlJbmc4T84rQMRwuCrz/8MpMXnU5J6ymKK4bLoATlb3 hquwnuPYaX5z4+lsufkJOHCsMt08ycS8DJ3/JrS8K4ZmT8mC4O0purHDwi0+VbMVRVRJVR tI58+hFIgyhl1b1R1ficFl3EGFpObhw= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=ncYLoqWk; dmarc=none; spf=pass (imf13.hostedemail.com: domain of axboe@kernel.dk designates 209.85.160.52 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731437152; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NEKWtbV5azCN1LS/TrOFInpSTESMK9RCRhSvphgR+zE=; b=c4TlTvSnu5ih7Br/Sg09HjD+ZL+ILrAIcxAliii3cZKSOsIIg78zGq/tSM5mnqHx5u7av9 ZWZH/0/uei/hM21ko3YrkeIQIjmr7GSySOB7y6oGuk8FJegt2yjFqJKgFUySQG7N/g7X21 x6JvrLB8gb+xfPjr60V7Kja3/TH0Sdc= Received: by mail-oa1-f52.google.com with SMTP id 586e51a60fabf-288c7567f5dso2680223fac.1 for ; Tue, 12 Nov 2024 10:48:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1731437279; x=1732042079; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=NEKWtbV5azCN1LS/TrOFInpSTESMK9RCRhSvphgR+zE=; b=ncYLoqWkw+mgoWRyUpfuif22IiUUsPUV90z/FrsLel/DAjD2a0kTHztfEknZ3SV07D GAYeBlrgvj5MdJGHUQhvoKmCdUwALPW97XOWonSO4kGBJvSwEHtqf/7jsE0+7Bn/X6Jc ym5BuHkJa/7hi/wCjxg+uIuN+cAXTKL6POuVe4Kxz+lWejcOJuEViBnJYk2rQk9tG3E8 qI/sUDfESAFar+fVAoxXK3QZCqQwV6wFktdXMEF4NuzIIqxqk+TP1zjKx3+jy2UGT5cI A6YPxDG95vcK0o+QC7bTSbA3Xdfd5EE3b5j1nD0/gTsHgGR00NjqeddHBmlCP0d78bxj Lzag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731437279; x=1732042079; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=NEKWtbV5azCN1LS/TrOFInpSTESMK9RCRhSvphgR+zE=; b=czqB2UGhoe2B7nrOcyRn8gNUrwkqI3zPUHw3+aCQHQjWDud3QADzVKh4EO4tNNb4o9 vzi5mBdcx+QSeM+XMalPyD+LE7SN6JuMaxaagQd91zqN0/bI2VZn4fEdFiMEZdzRaO3d tKRuZvwUzmGkYdkEvcjdVkQszQ6YTsipsGiE+gkkiDS9HWXYLiYZuieDiLOT4Qt/XcRT nQSLxNsT8KCx/Loje/loGdzzex4skh2ECVln9ulxr25Cwmv2R5k+cAR6P+bzCmYUyKPn qCLdqaNXBXRclgFjGAfDL1ZSHirEvzW71N+KcNR5ycRLL7dGQiS9u/omJRutyhhTFtnN hX2Q== X-Gm-Message-State: AOJu0YyjL3gJNAHxgkR+VSwsD0X9ONl5GwGkJidMIVsGDH8RF9OJbyhb zNruDFjj6yg/r7lOFg2YTueDZCnWzfOJM9OyhK/gNc1DuaOia7k3Y6xHP38QLAw= X-Google-Smtp-Source: AGHT+IEbUQIAv8rwvNy1b5k48spMrO6qHPjvfMhK8WTkm9KlJBHwjMMwhaZ9S+HNwCgrodi6AerXxg== X-Received: by 2002:a05:6870:20a:b0:25a:eca3:6b5e with SMTP id 586e51a60fabf-2956004793fmr13932152fac.9.1731437279605; Tue, 12 Nov 2024 10:47:59 -0800 (PST) Received: from [192.168.1.116] ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-295e8eb8c53sm14182fac.8.2024.11.12.10.47.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Nov 2024 10:47:59 -0800 (PST) Message-ID: <58ebc5a8-941b-4c3d-a3b2-3985d7eeea30@kernel.dk> Date: Tue, 12 Nov 2024 11:47:57 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 12/16] ext4: add RWF_UNCACHED write support To: Brian Foster Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org, willy@infradead.org, kirill@shutemov.name, linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org References: <20241111234842.2024180-1-axboe@kernel.dk> <20241111234842.2024180-13-axboe@kernel.dk> Content-Language: en-US From: Jens Axboe In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 85C932001C X-Stat-Signature: tn85gq7b416ye1akjxybhmh8ybnkrcp4 X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1731437237-303778 X-HE-Meta: U2FsdGVkX1+5Wy/hdTzXM4xRWBNsMec3adUis27xZZspvVKsOA76HcK9xhDLQBAckxBJPeyh3OQZdxPri8oH3HrOzhA0pRA7Wjd0ka04VkNweFTdLRiptQ17DursdLvglp/Ob5F4C5lbu9EHfMO3YGJtDErx4/5llTQlyRSfMZMOdL/43v4hXw1N620/WHnfJIF4Jyih5xA/YB4LDuSeZ7wSv9qpgU1e4racmirs/jY5xsldsLPONzKEZIh6Re7M4UH3WTFYdzJj0/KgYoxmtJxyc8VQp4JwwxVSyTi3mUeizty5PS2OeG0ftB7nR6qX+jE1v9kjGluS+pi4eZFH8PXzmybrBt6DaZ66l+0+hGNt0XtUO6ZmJvKcdVEKZ63nmSK/8BWcBWrZWohSZ23zp/Lv156ISMdVh1Iq5FFGIHmJgZN+NsIorrY0J+/SrVqWjV7hFsRo7GoLAeKLmmJcDPD2jvVnatLGd3hHGhzV4of6BnbTfEaJeRjNw7r3NzTnWMnDX1mTCL1MvPJ+4aIIqs0l39pJaeNWKuBHIc8z6TiH4jlD3i/CIJ8x5H53XItbY8FRKqa/T5yVMTfzCCbzT4lgqpTzKwWfhJhZfrIXsIvxyvHEBP2UaZMu7m08ss1cfEbresHSm7viOhT4surMhgdhFVJOJsTl3Xbcay6C9iloTIagFPtHz/1oXpLf9/d81WdVWkr9YCCZdI5c2is4fOeu3JBgnBcLG1An0dti9Rgkm4CpoVvWWdGWYhUJ1z8ZW6R8ixGW3xwpGVrlECmoqxVTa9Y2Nq+tiD2RAebnTavScXr6EqpyQkFEouK9HgUs/vuKwcsm0c+MjIVh+7MQPjNCj7z4yEIrTU7asXu9c0yV7qiThzO26vhOaLeWWni13v2JXFsPbXTXXruzx5mTiMb80SOAtN+AYX/njgjtnUzw6nIv85Y7T6JFtYZI0bN2YKWOR9pIWdMufS5m9N5 yo9wShMS Ocw/uIwCM/hS4MMrpqidvoGuwYUhMMIbk0xr+XZ6UNF0wTvYgGY1Vyf+j6TqZS0ZZRfO78jRuG3S0CFs4CP809Gi3F5k3G6owawaoE/9owo3OP2w6vu/uHWsUTPVNJ/K+ir+dfqnQp90YFu+BQ2qKFjs4swDHHlJDtBuNgaNwU3Ai1KIzbGU/zjtRCg2fVElx0J8mNXtv0srzJxiBeqYluN+Z2cGe8TLC3hZNl3y7PJ6cwqqCGk0WAIb1Mcg8C010PMnCpI36ZuvmAM1upahyE1OuTmUOfrHDJygzbi6J615tZLB1uannVXDxWRsSi5ZGFiCO6ug2WTCfMwb2s6esu6SIwAmRK515uGjV6MJBp9joXpC5DyyD5/Ux+tVSTFTnqm2P+epVXEoBVmRnonMjWykC3A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 11/12/24 11:11 AM, Brian Foster wrote: > On Tue, Nov 12, 2024 at 10:13:12AM -0700, Jens Axboe wrote: >> On 11/12/24 9:36 AM, Brian Foster wrote: >>> On Mon, Nov 11, 2024 at 04:37:39PM -0700, Jens Axboe wrote: >>>> IOCB_UNCACHED IO needs to prune writeback regions on IO completion, >>>> and hence need the worker punt that ext4 also does for unwritten >>>> extents. Add an io_end flag to manage that. >>>> >>>> If foliop is set to foliop_uncached in ext4_write_begin(), then set >>>> FGP_UNCACHED so that __filemap_get_folio() will mark newly created >>>> folios as uncached. That in turn will make writeback completion drop >>>> these ranges from the page cache. >>>> >>>> Now that ext4 supports both uncached reads and writes, add the fop_flag >>>> FOP_UNCACHED to enable it. >>>> >>>> Signed-off-by: Jens Axboe >>>> --- >>>> fs/ext4/ext4.h | 1 + >>>> fs/ext4/file.c | 2 +- >>>> fs/ext4/inline.c | 7 ++++++- >>>> fs/ext4/inode.c | 18 ++++++++++++++++-- >>>> fs/ext4/page-io.c | 28 ++++++++++++++++------------ >>>> 5 files changed, 40 insertions(+), 16 deletions(-) >>>> >>> ... >>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>>> index 54bdd4884fe6..afae3ab64c9e 100644 >>>> --- a/fs/ext4/inode.c >>>> +++ b/fs/ext4/inode.c >>>> @@ -1138,6 +1138,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, >>>> int ret, needed_blocks; >>>> handle_t *handle; >>>> int retries = 0; >>>> + fgf_t fgp_flags; >>>> struct folio *folio; >>>> pgoff_t index; >>>> unsigned from, to; >>>> @@ -1164,6 +1165,15 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, >>>> return 0; >>>> } >>>> >>>> + /* >>>> + * Set FGP_WRITEBEGIN, and FGP_UNCACHED if foliop contains >>>> + * foliop_uncached. That's how generic_perform_write() informs us >>>> + * that this is an uncached write. >>>> + */ >>>> + fgp_flags = FGP_WRITEBEGIN; >>>> + if (*foliop == foliop_uncached) >>>> + fgp_flags |= FGP_UNCACHED; >>>> + >>>> /* >>>> * __filemap_get_folio() can take a long time if the >>>> * system is thrashing due to memory pressure, or if the folio >>>> @@ -1172,7 +1182,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, >>>> * the folio (if needed) without using GFP_NOFS. >>>> */ >>>> retry_grab: >>>> - folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, >>>> + folio = __filemap_get_folio(mapping, index, fgp_flags, >>>> mapping_gfp_mask(mapping)); >>>> if (IS_ERR(folio)) >>>> return PTR_ERR(folio); >>> >>> JFYI, I notice that ext4 cycles the folio lock here in this path and >>> thus follows up with a couple checks presumably to accommodate that. One >>> is whether i_mapping has changed, which I assume means uncached state >>> would have been handled/cleared externally somewhere..? I.e., if an >>> uncached folio is somehow truncated/freed without ever having been >>> written back? >>> >>> The next is a folio_wait_stable() call "in case writeback began ..." >>> It's not immediately clear to me if that is possible here, but taking >>> that at face value, is it an issue if we were to create an uncached >>> folio, drop the folio lock, then have some other task dirty and >>> writeback the folio (due to a sync write or something), then have >>> writeback completion invalidate the folio before we relock it here? >> >> I don't either of those are an issue. The UNCACHED flag will only be set >> on a newly created folio, it does not get inherited for folios that >> already exist. >> > > Right.. but what I was wondering for that latter case is if the folio is > created here by ext4, so uncached is set before it is unlocked. > > On second look I guess the uncached completion invalidation should clear > mapping and thus trigger the retry logic here. That seems reasonable > enough, but is it still possible to race with writeback? > > Maybe this is a better way to ask.. what happens if a write completes to > an uncached folio that is already under writeback? For example, uncached > write 1 completes, submits for writeback and returns to userspace. Then > write 2 begins and redirties the same folio before the uncached > writeback completes. > > If I follow correctly, if write 2 is also uncached, it eventually blocks > in writeback submission (folio_prepare_writeback() -> > folio_wait_writeback()). It looks like folio lock is held there, so > presumably that would bypass the completion time invalidation in > folio_end_uncached(). But what if write 2 was not uncached or perhaps > writeback completion won the race for folio lock vs. the write side > (between locking the folio for dirtying and later for writeback > submission)? Does anything prevent invalidation of the folio before the > second write is submitted for writeback? > > IOW, I'm wondering if the uncached completion time invalidation also > needs a folio dirty check..? Ah ok, I see what you mean. If the folio is dirty, the unmapping will fail. But I guess with the recent change, we'll actually unmap it first. I'll add the folio dirty check, thanks! -- Jens Axboe