From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADD55C4363A for ; Tue, 20 Oct 2020 11:21:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7A84222404 for ; Tue, 20 Oct 2020 11:21:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="UrYIw+Uo" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A84222404 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 717B46B005C; Tue, 20 Oct 2020 07:21:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 69EF26B0062; Tue, 20 Oct 2020 07:21:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 58C286B0068; Tue, 20 Oct 2020 07:21:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0062.hostedemail.com [216.40.44.62]) by kanga.kvack.org (Postfix) with ESMTP id 22D376B005C for ; Tue, 20 Oct 2020 07:21:47 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BE38B180AD802 for ; Tue, 20 Oct 2020 11:21:46 +0000 (UTC) X-FDA: 77392063812.26.spot22_01137ed2723f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id A3FB41804B654 for ; Tue, 20 Oct 2020 11:21:46 +0000 (UTC) X-HE-Tag: spot22_01137ed2723f X-Filterd-Recvd-Size: 4237 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Tue, 20 Oct 2020 11:21:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=tSS1MuZ8AFbkW+ypKAMNnv6YlEacISP3BShZ3dh/KFQ=; b=UrYIw+UoNzUIduQWr+Vr0lA53w +Cic3Xur8MBbfGAFjra5xdsoOviIc1j3gA/WUG5RUFHxFVo0sRUHNHhk1+w46kip7KYZy8q216ogu Lrd6UmvNEcWmQ2vxAZwRnLi2ZlSEExY+eNkAR942XEQ6ZXgVrrViPyS2TGhT7EYKK4KMg90QTj/ud ER2FY5jiFgtlaATtoiuhSUTPGLw+Axd1ZG6seTYpp70FCzkOMKE7UR4VtE8gvKx11BCnI7K8DmBax XHDjIYzpZaMmw6AUQ2RWw9qL2qxEf35fdGMi/UU+e+QxUHG4F7H0CXg4sGWHmHsqq9juPpeVq5eJK U/CMYL1Q==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kUphy-0001Fq-Ny; Tue, 20 Oct 2020 11:21:38 +0000 Date: Tue, 20 Oct 2020 12:21:38 +0100 From: Matthew Wilcox To: Dave Chinner Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org Subject: Re: Splitting a THP beyond EOF Message-ID: <20201020112138.GZ20115@casper.infradead.org> References: <20201020014357.GW20115@casper.infradead.org> <20201020045928.GO7391@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201020045928.GO7391@dread.disaster.area> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 20, 2020 at 03:59:28PM +1100, Dave Chinner wrote: > On Tue, Oct 20, 2020 at 02:43:57AM +0100, Matthew Wilcox wrote: > > This is a weird one ... which is good because it means the obvious > > ones have been fixed and now I'm just tripping over the weird cases. > > And fortunately, xfstests exercises the weird cases. > > > > 1. The file is 0x3d000 bytes long. > > 2. A readahead allocates an order-2 THP for 0x3c000-0x3ffff > > 3. We simulate a read error for 0x3c000-0x3cfff > > 4. Userspace writes to 0x3d697 to 0x3dfaa > > So this is a write() beyond EOF, yes? > > If yes, then we first go through this path: > > xfs_file_buffered_aio_write() > xfs_file_aio_write_checks() > iomap_zero_range(isize, pos - isize) > > To zero the region between the current EOF and where the new write > starts. i.e. from 0x3d000 to 0x3d696. Yes. That calls iomap_write_begin() which calls iomap_split_page() which is where we run into trouble. I elided the exact path from the description of the problem. > > 5. iomap_write_begin() gets the 0x3c page, sees it's THP and !Uptodate > > so it calls iomap_split_page() (passing page 0x3d) > > Splitting the page because it's !Uptodate seems rather drastic to > me. Why does it need to split the page here? Because we can't handle Dirty, !Uptodate THPs in the truncate path. Previous discussion: https://lore.kernel.org/linux-mm/20200821144021.GV17456@casper.infradead.org/ The current assumption is that a !Uptodate THP is due to a read error, and so the sensible thing to do is split it and handle read errors at a single-page level. I've been playing around with creating THPs in the write path, and that offers a different pathway to creating Dirty, !Uptodate THPs, so this may also change at some point. I'd like to get what I have merged and then figure out how to make this better. > Also, this concerns me: if we are exposing the cached EOF page via > mmap, it needs to contain only zeroes in the region beyond EOF so > that we don't expose stale data to userspace. Hence when a THP that > contains EOF is instantiated, we have to ensure that the region > beyond EOF is compeltely zeroed. It then follows that if we read all > the data in that THP up to EOF, then the page is actually up to > date... We do that in iomap_readpage_actor(). Had the readahead I/O not "failed", we'd've had an Uptodate THP which straddled EOF. I dumped the page and its uptodate bits are 0xfff0 (1kB block size filesystem, the three pages 0x3d-0x3f are uptodate because they were zeroed and 0x3c is !uptodate because the I/O failed).