From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EC79D42BBE for ; Tue, 12 Nov 2024 18:10:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B103F8D0005; Tue, 12 Nov 2024 13:10:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AC0428D0001; Tue, 12 Nov 2024 13:10:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95FFC8D0005; Tue, 12 Nov 2024 13:10:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 75E598D0001 for ; Tue, 12 Nov 2024 13:10:33 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1FF5416053F for ; Tue, 12 Nov 2024 18:10:33 +0000 (UTC) X-FDA: 82778232306.04.D4F093C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 46ED340018 for ; Tue, 12 Nov 2024 18:09:58 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PcCYW0w9; spf=pass (imf01.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731434976; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qSj0SdB8QSc7Mjd9L/4LWSQMYeFV5iHP2hv8TZtArto=; b=vjSI6FxpuIvmAEW5kVUkCI00RBbcVFxYc4ARf7XwpqBAaXo44XIoPAZZJ4b8gbaxL5NjOI euQG8uBL2jcVuOygAm4Pv7Xj6t2hx0Uhbo2FXTLnbJSHljppft+brhiYU1BmhTqeB1/Hj+ EhCoYov+t3PMZF/+WPZAjEQZRB6Th5k= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731434976; a=rsa-sha256; cv=none; b=DEcs1XU4ojuXoshO0GiMzsx9gXnmjKbmLJOoLT62Rw0LD31rmk+sEFdNoYyK/+qTJ7R/bb Yr4qACR8XI62t5FcZfGgGGhNuxWo9FGhKrXOfWiNNohUMHtasxe3+vyjKv6V7SGNk4yTXD k5SSJPVScC/RmnASPWuosv7t2gaJS3s= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=PcCYW0w9; spf=pass (imf01.hostedemail.com: domain of bfoster@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731435030; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=qSj0SdB8QSc7Mjd9L/4LWSQMYeFV5iHP2hv8TZtArto=; b=PcCYW0w9RBRf6pKRusZhpOSr70pl0TE4to6d8KyIvV+hkLFFlCOl2Bla5quayV9APoSu79 cPxJ4dyEJanBHQfn1CsvMf11M3X+p0eh7XtUg3fCLeFSW8B/SVqzlp463Xq5K65NkzVjKj XQsnxAmH9Vaxw52OlgAMMikKCj5zieY= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-266--KDSIWVmPMGLF4ReAom0aQ-1; Tue, 12 Nov 2024 13:10:24 -0500 X-MC-Unique: -KDSIWVmPMGLF4ReAom0aQ-1 X-Mimecast-MFC-AGG-ID: -KDSIWVmPMGLF4ReAom0aQ Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 042A91955BCF; Tue, 12 Nov 2024 18:10:22 +0000 (UTC) Received: from bfoster (unknown [10.22.80.120]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 84FA61955F40; Tue, 12 Nov 2024 18:10:19 +0000 (UTC) Date: Tue, 12 Nov 2024 13:11:52 -0500 From: Brian Foster To: Jens Axboe Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, hannes@cmpxchg.org, clm@meta.com, linux-kernel@vger.kernel.org, willy@infradead.org, kirill@shutemov.name, linux-btrfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org Subject: Re: [PATCH 12/16] ext4: add RWF_UNCACHED write support Message-ID: References: <20241111234842.2024180-1-axboe@kernel.dk> <20241111234842.2024180-13-axboe@kernel.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 X-Stat-Signature: o3sxqn5beajio9rre96pm7hnskkbmuaq X-Rspam-User: X-Rspamd-Queue-Id: 46ED340018 X-Rspamd-Server: rspam02 X-HE-Tag: 1731434998-251927 X-HE-Meta: U2FsdGVkX1/ZuEVGMMSgAsGVeOmw7J3NwREJjFMVlOrdAUVpHwi8t1IHtUNGzcyMZcJC8IGG67GVxQfaytFY/fuuqZ3NGQLeUJKdx172hOA1h04vSI+fdLBr0JgJAFAikUthTBsKaCNoCQZKhTnT23an46obVqIclolsESPxJQLWr7/WdbJbO+yfZshDk4UbeL3XihscEKBgHn2GRlEzdwOLowApuWL2F67IY23wDj+qiyqt/G1VVx9fprdjr41LAkmTFWddhqOoD3dsPeRG8jqrkO2nRhKwmj71AC0Hkm//Gy8oNm3uBxM1Vg7gKlkNQOn/QBBn2TMLWHUiJ6eV9nJ/AuRzh7paBw/kbhp1ycdubRj9GxPmbhu/ZOjLgbtFZWNAImknR+uCq/qNiCduecz3RGcITaN9KvOgvvuEqo2UO9BfCPiuoYDxtYSF86xMOTF5cF727F95RCtreYKQ7xUbGtfLMu3OUvsM0VZfe/bSHNJxNbHBpiTq192gaMnV2UJ1TKGqnq0n8FgSVjudeVTlBANF9pC4H1dBNcWyt9q4S+oIYQAFk/OZ0/syavJdGgs7fn9KQPH6r6VTLRmnwZ96gfBZ9BwwezusTozqg6e6i4xVAT/I+Kd1tv7X7jp3rKhsL6F4H6nkYyZ/QSID+jtez4pKcoy8ork2mG+AJF15E2ejiNNDrs3SM4Ivr6CSETEAWvR/W/X7/kRRs2oYvk0zazHdRO+hWzCfFvo8zv28OLg7DyM7o00T1Oymecf52sv+ubeBv7Qf7qOhg7s5q4mv/4YHw/jVLzyTg94UIZkJsjkvwQ9U84+2Msr3ZI9UfSsTkL86YgoOhs4GwcEV9POWkaauU/MpPa2TXSIPVXGrC+nQpz0ismFYxDthgzg/7it0S9YfVvWi/ai9Yc0MTg0vtQUhaGoC/3EH3rhJIvdbjVKR1r6euGdn773tbnEvkmYrMI00XdQdhkKo5+e vW518M9W 51evpX/oYbBzhLVxxYvTf6/O37mcVHxdXOL1B7Q91VlIZstA+0UiQqqBY+UddS+2C+AhKNXkXruY8hbN111cwsY+VyvHynJITiF+VH/ZHQaLZIV2xdyw8aZO3Iu7XHHPCTNde7cvP7czi8vB4UXAjxLIKVwg41NQepqtKyTK/b8CS2RXdbYg6xWZ8Po6bXS4w1uvbqfuIY1oVIoQcm6er/y1uxBOzZJOIZHDwQRoOgeu/lglHJc5cUDv6LM5OY6p8kCqiWfF19v6Brdu61Xm+3gUBbU2WtTywJhX9m1t038ZJUH2SPTW+PConBkOYEKYmJN4USDYZZKNObrsBwYCF/YQ6PGNLxubNCzD9aK1PcXDCdjfWH+PR4UQ1Qfa1YLSUqqUo2U43dxe7RDDdXjRM8+GjTA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Nov 12, 2024 at 10:13:12AM -0700, Jens Axboe wrote: > On 11/12/24 9:36 AM, Brian Foster wrote: > > On Mon, Nov 11, 2024 at 04:37:39PM -0700, Jens Axboe wrote: > >> IOCB_UNCACHED IO needs to prune writeback regions on IO completion, > >> and hence need the worker punt that ext4 also does for unwritten > >> extents. Add an io_end flag to manage that. > >> > >> If foliop is set to foliop_uncached in ext4_write_begin(), then set > >> FGP_UNCACHED so that __filemap_get_folio() will mark newly created > >> folios as uncached. That in turn will make writeback completion drop > >> these ranges from the page cache. > >> > >> Now that ext4 supports both uncached reads and writes, add the fop_flag > >> FOP_UNCACHED to enable it. > >> > >> Signed-off-by: Jens Axboe > >> --- > >> fs/ext4/ext4.h | 1 + > >> fs/ext4/file.c | 2 +- > >> fs/ext4/inline.c | 7 ++++++- > >> fs/ext4/inode.c | 18 ++++++++++++++++-- > >> fs/ext4/page-io.c | 28 ++++++++++++++++------------ > >> 5 files changed, 40 insertions(+), 16 deletions(-) > >> > > ... > >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > >> index 54bdd4884fe6..afae3ab64c9e 100644 > >> --- a/fs/ext4/inode.c > >> +++ b/fs/ext4/inode.c > >> @@ -1138,6 +1138,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, > >> int ret, needed_blocks; > >> handle_t *handle; > >> int retries = 0; > >> + fgf_t fgp_flags; > >> struct folio *folio; > >> pgoff_t index; > >> unsigned from, to; > >> @@ -1164,6 +1165,15 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, > >> return 0; > >> } > >> > >> + /* > >> + * Set FGP_WRITEBEGIN, and FGP_UNCACHED if foliop contains > >> + * foliop_uncached. That's how generic_perform_write() informs us > >> + * that this is an uncached write. > >> + */ > >> + fgp_flags = FGP_WRITEBEGIN; > >> + if (*foliop == foliop_uncached) > >> + fgp_flags |= FGP_UNCACHED; > >> + > >> /* > >> * __filemap_get_folio() can take a long time if the > >> * system is thrashing due to memory pressure, or if the folio > >> @@ -1172,7 +1182,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, > >> * the folio (if needed) without using GFP_NOFS. > >> */ > >> retry_grab: > >> - folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, > >> + folio = __filemap_get_folio(mapping, index, fgp_flags, > >> mapping_gfp_mask(mapping)); > >> if (IS_ERR(folio)) > >> return PTR_ERR(folio); > > > > JFYI, I notice that ext4 cycles the folio lock here in this path and > > thus follows up with a couple checks presumably to accommodate that. One > > is whether i_mapping has changed, which I assume means uncached state > > would have been handled/cleared externally somewhere..? I.e., if an > > uncached folio is somehow truncated/freed without ever having been > > written back? > > > > The next is a folio_wait_stable() call "in case writeback began ..." > > It's not immediately clear to me if that is possible here, but taking > > that at face value, is it an issue if we were to create an uncached > > folio, drop the folio lock, then have some other task dirty and > > writeback the folio (due to a sync write or something), then have > > writeback completion invalidate the folio before we relock it here? > > I don't either of those are an issue. The UNCACHED flag will only be set > on a newly created folio, it does not get inherited for folios that > already exist. > Right.. but what I was wondering for that latter case is if the folio is created here by ext4, so uncached is set before it is unlocked. On second look I guess the uncached completion invalidation should clear mapping and thus trigger the retry logic here. That seems reasonable enough, but is it still possible to race with writeback? Maybe this is a better way to ask.. what happens if a write completes to an uncached folio that is already under writeback? For example, uncached write 1 completes, submits for writeback and returns to userspace. Then write 2 begins and redirties the same folio before the uncached writeback completes. If I follow correctly, if write 2 is also uncached, it eventually blocks in writeback submission (folio_prepare_writeback() -> folio_wait_writeback()). It looks like folio lock is held there, so presumably that would bypass the completion time invalidation in folio_end_uncached(). But what if write 2 was not uncached or perhaps writeback completion won the race for folio lock vs. the write side (between locking the folio for dirtying and later for writeback submission)? Does anything prevent invalidation of the folio before the second write is submitted for writeback? IOW, I'm wondering if the uncached completion time invalidation also needs a folio dirty check..? Brian > -- > Jens Axboe >