From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E84BC433EF for ; Tue, 1 Feb 2022 02:01:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE3166B010C; Mon, 31 Jan 2022 21:01:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A92366B010D; Mon, 31 Jan 2022 21:01:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 959866B010E; Mon, 31 Jan 2022 21:01:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0157.hostedemail.com [216.40.44.157]) by kanga.kvack.org (Postfix) with ESMTP id 86A9E6B010C for ; Mon, 31 Jan 2022 21:01:28 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3F4B3998E9 for ; Tue, 1 Feb 2022 02:01:28 +0000 (UTC) X-FDA: 79092559056.14.9231AAB Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf02.hostedemail.com (Postfix) with ESMTP id 009D580006 for ; Tue, 1 Feb 2022 02:01:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Sm7Yc1CtQBIVZ/GRlT8gqFP2ojSk7fSCO2XvuJwXk+w=; b=aLmub9E/3RVA1+xFe6GB5IgwFa KonTAXz5S6wqGq94yXuiVfawT+IlIZKim6M+ABrd3VJMZAYL6q2I1HShpV2MCjDcBHomp1cU10R7t 44i1b0YiK/p8Whr8CpU7+kHI8CMB8jW1qMCSlLKrFBlKbZx0ggC+mLwWOoA8aWPH1rMghEQtCOtj0 FyMMeZrqM1Jzy77Qhx13Uybrr7yZJJfFJINGoqCRfceMUsqi9x5huOhgbAg7Tnb1bRRoiD7CgJa4d R+dr5qTA+tFPH1/vEboBzowgBYxSw62CD53N+coERAPBHeLHW/uuAvM7YlIPaKL4LkY2BUyZufqSf 29wVeJKQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nEiTo-00BAtw-Pj; Tue, 01 Feb 2022 02:01:12 +0000 Date: Tue, 1 Feb 2022 02:01:12 +0000 From: Matthew Wilcox To: NeilBrown Cc: Andrew Morton , Jeff Layton , Ilya Dryomov , Miklos Szeredi , Trond Myklebust , Anna Schumaker , linux-mm@kvack.org, linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/3] fuse: remove reliance on bdi congestion Message-ID: References: <164360127045.4233.2606812444285122570.stgit@noble.brown> <164360183348.4233.761031466326833349.stgit@noble.brown> <164360446180.18996.6767388833611575467@noble.neil.brown.name> <164367002370.18996.7242801209611375112@noble.neil.brown.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <164367002370.18996.7242801209611375112@noble.neil.brown.name> X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 009D580006 X-Rspam-User: nil Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b="aLmub9E/"; dmarc=none; spf=none (imf02.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-Stat-Signature: 1mc14r34de6jgcr8q11qffidexewbewe X-HE-Tag: 1643680886-366501 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 01, 2022 at 10:00:23AM +1100, NeilBrown wrote: > On Tue, 01 Feb 2022, Matthew Wilcox wrote: > > On Mon, Jan 31, 2022 at 03:47:41PM +1100, NeilBrown wrote: > > > On Mon, 31 Jan 2022, Matthew Wilcox wrote: > > > > > +++ b/fs/fuse/file.c > > > > > @@ -958,6 +958,8 @@ static void fuse_readahead(struct readahead_control *rac) > > > > > > > > > > if (fuse_is_bad(inode)) > > > > > return; > > > > > + if (fc->num_background >= fc->congestion_threshold) > > > > > + return; > > > > > > > > This seems like a bad idea to me. If we don't even start reads on > > > > readahead pages, they'll get ->readpage called on them one at a time > > > > and the reading thread will block. It's going to lead to some nasty > > > > performance problems, exactly when you don't want them. Better to > > > > queue the reads internally and wait for congestion to ease before > > > > submitting the read. > > > > > > > > > > Isn't that exactly what happens now? page_cache_async_ra() sees that > > > inode_read_congested() returns true, so it doesn't start readahead. > > > ??? > > > > It's rather different. Imagine the readahead window has expanded to > > 256kB (64 pages). Today, we see congestion and don't do anything. > > That means we miss the async readahed opportunity, find a missing > > page and end up calling into page_cache_sync_ra(), by which time > > we may or may not be congested. > > > > If the inode_read_congested() in page_cache_async_ra() is removed and > > the patch above is added to replace it, we'll allocate those 64 pages and > > add them to the page cache. But then we'll return without starting IO. > > When we hit one of those !uptodate pages, we'll call ->readpage on it, > > but we won't do anything to the other 63 pages. So we'll go through a > > protracted slow period of sending 64 reads, one at a time, whether or > > not congestion has eased. Then we'll hit a missing page and proceed > > to the sync ra case as above. > > Hmmm... where is all this documented? > The entry for readahead in vfs.rst says: > > If the filesystem decides to stop attempting I/O before reaching the > end of the readahead window, it can simply return. > > but you are saying that if it simply returns, it'll most likely just get > called again. So maybe it shouldn't say that? That's not what I'm saying at all. I'm saying that if ->readahead fails to read the page, ->readpage will be called to read the page (if it's actually accessed). > What do other filesystems do? > ext4 sets REQ_RAHEAD, but otherwise just pushes ahead and submits all > requests. btrfs seems much the same. > xfs uses iomp_readahead .. which again sets REQ_RAHEAD but otherwise > just does a normal read. > > The effect of REQ_RAHEAD seems to be primarily to avoid retries on > failure. > > So it seems that core read-ahead code it not set up to expect readahead > to fail, though it is (begrudgingly) permitted. Well, yes. The vast majority of reads don't fail. > The current inode_read_congested() test in page_cache_async_ra() seems > to be just delaying the inevitable (and in fairness, the comment does > say "Defer...."). Maybe just blocking on the congestion is an equally > good way to delay it... I don't think we should _block_ for an async read request. We're in the context of a process which has read a different page. Maybe what we need is a readahead_abandon() call that removes the just-added pages from the page cache, so we fall back to doing a sync readahead? > I note that ->readahead isn't told if the read-ahead is async or not, so > my patch will drop sync read-ahead on congestion, which the current code > doesn't do. Now that we have a readahead_control, it's simple to add that information to it. > So maybe this congestion tracking really is useful, and we really want > to keep it. > > I really would like to see that high-level documentation!! I've done my best to add documentation. There's more than before I started.