From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C539EC4332F for ; Mon, 7 Nov 2022 16:33:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A0E48E0005; Mon, 7 Nov 2022 11:33:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 350D98E0002; Mon, 7 Nov 2022 11:33:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F25B8E0005; Mon, 7 Nov 2022 11:33:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0ECDD8E0002 for ; Mon, 7 Nov 2022 11:33:47 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E03531603B5 for ; Mon, 7 Nov 2022 16:33:46 +0000 (UTC) X-FDA: 80107192452.28.26FB63D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 87CFC1C000C for ; Mon, 7 Nov 2022 16:33:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1667838824; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=qPvbymbxNjLT5RHEfpMnv1QvB9Cg4g/BMoB0bjE1LrI=; b=ZQwrwNEmZHs5i+yBti1ohhrQLKSePhQ3sHq31MrabzpmYg92vqlfS4Ozxh6glT08G9/naA x2nFfllciBppkK1uNfUHo8BZaP+0DoE03YPao+nKGoLeLLNWV5g/WGX1+wesFopp0hGsC1 mooKjhWe5jYszV6Ij4T+NqZQssxMSqY= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-249-6e0s6yIdN8-Yl0cm__X4hw-1; Mon, 07 Nov 2022 11:33:43 -0500 X-MC-Unique: 6e0s6yIdN8-Yl0cm__X4hw-1 Received: by mail-qv1-f69.google.com with SMTP id l6-20020ad44446000000b004bb60364075so7937589qvt.13 for ; Mon, 07 Nov 2022 08:33:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=qPvbymbxNjLT5RHEfpMnv1QvB9Cg4g/BMoB0bjE1LrI=; b=ikfGaoijfyRyatX5qF6EdOPREyouSIiWhvODnWtOZIoxjR2CWtQRjwRzSkUUPAs74V UoBFTEAgX3JTZj207qevQbdH5af1nQego3hiVvRWl4mK9If5P0Qe1NX8LvqtzZY/TxMm jLpd18rAYQFWS8oKwOOJVq4mLqMNy91UfTU10j0GrXTCS347h91qoa0kUk2ehvS4VQHi Gnv5UtoOkI1V6qPOrF/BAIV9LpsraumYfIQNuKb15MeAT4T6r8erv6EqivHBSYIyN2lz HUelHJwsO9V9DgByqMDZzjFiznTnHhTmnZNVCCfVDT5GbyEvAwbwWHjIlCX0h/s4m3Ok I/jA== X-Gm-Message-State: ACrzQf1uCFM4PxkDrbK0RsMT6BeWigNiImnsHFYnCsbnecVgo1cz9Qd6 iP/mt7EMAq1QHRJ8tYBtTLqEx1m7OCstnSlP0q1ddtPfju9PTJskgFIkMSEVo0htY6mzxvy8HzU iHUlKFfL01sk= X-Received: by 2002:a05:620a:2492:b0:6fa:4014:20c6 with SMTP id i18-20020a05620a249200b006fa401420c6mr26624750qkn.129.1667838823070; Mon, 07 Nov 2022 08:33:43 -0800 (PST) X-Google-Smtp-Source: AMsMyM5b/Sf0d8OhG0tQsiY0dTKXv4cziiuHsAYlddxaiHsn+GmO7KYDakqc8+y0jowU38spb+CWWw== X-Received: by 2002:a05:620a:2492:b0:6fa:4014:20c6 with SMTP id i18-20020a05620a249200b006fa401420c6mr26624723qkn.129.1667838822685; Mon, 07 Nov 2022 08:33:42 -0800 (PST) Received: from bfoster (c-24-61-119-116.hsd1.ma.comcast.net. [24.61.119.116]) by smtp.gmail.com with ESMTPSA id y22-20020a05620a44d600b006f9c2be0b4bsm7235119qkp.135.2022.11.07.08.33.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Nov 2022 08:33:42 -0800 (PST) Date: Mon, 7 Nov 2022 11:33:46 -0500 From: Brian Foster To: Andrew Morton Cc: linux-mm@kvack.org Subject: Re: [PATCH] filemap: skip range writeback if end offset precedes start Message-ID: References: <20221028125428.976549-1-bfoster@redhat.com> <20221030215100.f5e7116e848077c96d86591d@linux-foundation.org> MIME-Version: 1.0 In-Reply-To: <20221030215100.f5e7116e848077c96d86591d@linux-foundation.org> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZQwrwNEm; spf=pass (imf20.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667838825; a=rsa-sha256; cv=none; b=eR1IAx4SQVL+uSNlrUcyuZvQVZD2DQXSG9Zk0YZhuTgakyvMjgRxyLctZOy33AV9QhrCLB yIthYPyNwjyi4SSXn2IHxmpraGliDVC8+VvDyGmCNE98XH/RG/iaGn/4P7JYPZYLqJMvTN Jwp4Qdmw5GnHIEYD6eYvb/uJQNz7o1Q= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667838825; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qPvbymbxNjLT5RHEfpMnv1QvB9Cg4g/BMoB0bjE1LrI=; b=Z9nEg5XLPuNNjR6JaqW7kvbvDWz6Kojt87Q/i9Pc+yhS7xvB9y4xJ5Te761G94qd7x4b6I CJyZLmJHdhYW/feQnliWNPYwHObKfwPBaYM/wzZzacMsKXf8sji9GoYVtlmETZOLMluQHI odMj3ZNm5oKy6W0m2tLD3o6nyNKhIIw= X-Stat-Signature: s7bax97azarfzarhucoe8icr467k3s57 X-Rspamd-Queue-Id: 87CFC1C000C Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=ZQwrwNEm; spf=pass (imf20.hostedemail.com: domain of bfoster@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=bfoster@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1667838825-686848 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Oct 30, 2022 at 09:51:00PM -0700, Andrew Morton wrote: > On Fri, 28 Oct 2022 08:54:28 -0400 Brian Foster wrote: > > > A call to file[map]_write_and_wait_range() with an end offset that > > precedes the start offset but happens to land in the same page can > > trigger writeback submission but fail to wait on the submitted page. > > Writeback submission occurs because __filemap_fdatawrite_range() > > passes both offsets down into write_cache_pages(), which rounds down > > to page indexes before it starts processing writeback. > > __filemap_fdatawait_range() immediately returns if the specified end > > offset precedes the start offset, however. > > > > I suspect these checks are primarily intended to handle overflow > > conditions. I happened to notice this behavior when investigating an > > unrelated problem and observed that a filemap_write_and_wait_range() > > call with unexpected parameters had seemingly unpredictable latency. > > That latency turned out to be the submission path occasionally > > waiting on writeback state of the page (i.e. from > > write_cache_pages()) before issuing the currently requested > > writepage and then unconditionally failing to wait on the latter via > > __filemap_fdatawait_range(). > > > > This could probably be reasonably fixed to either elide the > > submission, as this patch does, or modify the fdatawait path to > > check the page indexes instead of the unaligned offsets. After > > poking around a bit, it seemed more consistent with various other > > filemap interfaces to check the offsets in the write path and return > > if the end offset is not >= the start. For example, > > filemap_range_has_page() and filemap_range_has_writeback() both > > include similar byte granularity checks. > > > > ... > > > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -418,6 +418,9 @@ int __filemap_fdatawrite_range(struct address_space *mapping, loff_t start, > > .range_end = end, > > }; > > > > + if (end < start) > > + return 0; > > + > > return filemap_fdatawrite_wbc(mapping, &wbc); > > } > Hi Andrew, Sorry for the delay.. > Is there any way in which this condition can be triggered from > userspace? Or from any non-buggy kernelspace? > Hmm.. good question. Making a quick pass through the callers of __filemap_fdatawrite_range(), I see the following situations: - sync_file_range() includes a higher level end < start check that results in skipping the operation. This appears to cover the sync_file_range() family of syscalls. - generic_fadvise(..., POSIX_FADV_DONTNEED) - sets end = -1 if less than start. This essentially converts the range to cover through the end of the file. - filemap_fdatawrite_range() and file[map]_write_and_wait_range() are called from quite a few places with non-determistic inputs. It wouldn't surprise me a ton if somehow it were possible for some of these callers to do the end < start thing based on unsanitized input or buggy logic, they may just not care depending on if they wait or not. For example, gfs2_fsync() calls filemap_fdata[write|wait]_range() separately, so in theory if called with end < start, the write could submit but the wait could skip similar to having called filemap_write_and_wait_range() (assuming the conditional whole file write in that path doesn't trigger) if the offsets land in the same page. > Should we have a WARN_ON() in there to detect this? > That might make sense. I suppose it depends on what expected behavior is. It certainly doesn't make much sense to write and then not wait from write_and_wait() variants, so callers would probably want to know about that. It might be hard to really audit all callsites to determine whether anybody actually relies on the "end is before start but lands on the same page" behavior for the write only case. We could alternatively change fdatawait to compare the shifted page indexes to match fdatawrite behavior, but that all seems a bit fragile because of 1. the various higher level byte granularity checks and 2. I doubt anybody actually checks for whether the range crosses a page boundary, which is the difference between skipping just the wait or the write as well. So I dunno, I could see various combinations of changes being considered reasonable. Perhaps a good starting point would be to wrap the check in this patch with a WARN_ON_ONCE() and let it soak in -next for a while? That would avoid excessive noise from repetitive callers [1] but still allow those callsites to be identified/fixed. If there is some really weird fdatawrite-only caller that conflicts, the change could always be loosened up from there (as unlikely as that seems).. Hm? Brian [1] The use case that identified this problem is a wonky call from the XFS truncate path from a workload that makes this truncate call repeatedly. A WARN_ON_ONCE() would have most definitely been useful IMO, but an unconditional warning would spam the logs in this particular case.