From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A4ACC433F5 for ; Thu, 10 Mar 2022 12:13:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9F8208D0002; Thu, 10 Mar 2022 07:13:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 98B738D0001; Thu, 10 Mar 2022 07:13:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 847F28D0002; Thu, 10 Mar 2022 07:13:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 72BA58D0001 for ; Thu, 10 Mar 2022 07:13:30 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3D78A9677D for ; Thu, 10 Mar 2022 12:13:30 +0000 (UTC) X-FDA: 79228366980.29.06995D2 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf30.hostedemail.com (Postfix) with ESMTP id A82468000F for ; Thu, 10 Mar 2022 12:13:29 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1012661896; Thu, 10 Mar 2022 12:13:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1F128C340F4; Thu, 10 Mar 2022 12:13:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1646914408; bh=UKbNBOgnndbhaHzuo5wICJa7XmlyUEtQGp8bX6HrNe0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=H4qbh3FrxVI2JPSZ4Jd44z+OtU7akB1h2LiFq2xkFqCAZ9WR6Fu9thwdicFr+iALm 2X1CVvi/RBVtCvcgUYqS44gWR5u2uh2x8m7DoYIV6wjcSVE8rL+dHBwsSWetnLdjRu rpCyQLe48rFAkT/QWj7+3c3ES4J88a68XH+zIkl+s/WUaqWuRJ3utHxR8VtcWcshDH jgGItDtiMD774HBYbB/QQHEYcPNi0GGkpPov1Oz/HKyT+8gOVvaFSSo7/ThRLQe6UQ aWSM3TXF6ezSTVYC91gDfyOqpe6mj2HZK9Ug6tKWItJHf/x/VUGJRceZ7hBEUGO3jl N1Ht4/fONRSdA== Date: Thu, 10 Mar 2022 12:13:25 +0000 From: Filipe Manana To: Andreas Gruenbacher Cc: Linus Torvalds , Filipe Manana , Catalin Marinas , David Hildenbrand , Alexander Viro , linux-s390 , Linux-MM , linux-fsdevel , linux-btrfs Subject: Re: Buffered I/O broken on s390x with page faults disabled (gfs2) Message-ID: References: <20220309184238.1583093-1-agruenba@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A82468000F X-Stat-Signature: qjk6aq1zx54xzjb991bzezkkcawzg48a Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=H4qbh3Fr; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf30.hostedemail.com: domain of fdmanana@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=fdmanana@kernel.org X-HE-Tag: 1646914409-54403 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 09, 2022 at 10:08:32PM +0100, Andreas Gruenbacher wrote: > On Wed, Mar 9, 2022 at 8:08 PM Linus Torvalds > wrote: > > On Wed, Mar 9, 2022 at 10:42 AM Andreas Gruenbacher wrote: > > > With a large enough buffer, a simple malloc() will return unmapped > > > pages, and reading into such a buffer will result in fault-in. So page > > > faults during read() are actually pretty normal, and it's not the user's > > > fault. > > > > Agreed. But that wasn't the case here: > > > > > In my test case, the buffer was pre-initialized with memset() to avoid > > > those kinds of page faults, which meant that the page faults in > > > gfs2_file_read_iter() only started to happen when we were out of memory. > > > But that's not the common case. > > > > Exactly. I do not think this is a case that we should - or need to - > > optimize for. > > > > And doing too much pre-faulting is actually counter-productive. > > > > > * Get rid of max_size: it really makes no sense to second-guess what the > > > caller needs. > > > > It's not about "what caller needs". It's literally about latency > > issues. If you can force a busy loop in kernel space by having one > > unmapped page and then do a 2GB read(), that's a *PROBLEM*. > > > > Now, we can try this thing, because I think we end up having other > > size limitations in the IO subsystem that means that the filesystem > > won't actually do that, but the moment I hear somebody talk about > > latencies, that max_size goes back. > > Thanks, this puts fault_in_safe_writeable() in line with > fault_in_readable() and fault_in_writeable(). > > There currently are two users of > fault_in_safe_writeable()/fault_in_iov_iter_writeable(): gfs2 and > btrfs. > In gfs2, we cap the size at BIO_MAX_VECS pages (256). I don't see an > explicit cap in btrfs; adding Filipe. On btrfs, for buffered writes, we have some cap (done at btrfs_buffered_write()), for buffered reads we don't have any control on that as we use filemap_read(). For direct IO we don't have any cap, we try to fault in everything that's left. However we keep track if we are doing any progress, and if we aren't making any progress we just fall back to the buffered IO path. So that prevents infinite or long loops. There's really no good reason to not cap how much we try to fault in in the direct IO paths. We should do it, as it probably has a negative performance impact for very large direct IO reads/writes. Thanks. > > Andreas >