From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74B54C433F5 for ; Wed, 23 Feb 2022 14:33:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C64538D0009; Wed, 23 Feb 2022 09:33:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C13F18D0001; Wed, 23 Feb 2022 09:33:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B02D38D0009; Wed, 23 Feb 2022 09:33:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 8C7958D0001 for ; Wed, 23 Feb 2022 09:33:32 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2946E6133E for ; Wed, 23 Feb 2022 14:33:32 +0000 (UTC) X-FDA: 79174287864.01.6E5C184 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf01.hostedemail.com (Postfix) with ESMTP id F20C84000C for ; Wed, 23 Feb 2022 14:33:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=DSsaUYTnIRiB2PzfiQOl8KPGCRZn4AtjbzhZ9aqhhS4=; b=IQXhNyh1bKgWXPOCghk7P8JhIZ Mmd6lvW0zsac2ttbddPyjChwCJi0fwY11aiclzI7Hw4eOXo941yYLwzZ4l0DN2fScjO5sXGiFzUN3 iU3RobPREFLg/76kC/9ItXvzISIgAyscJP3J4kTWyBY33FNOuXdUI0J0ZWz4nI43BMNnJ0hq1/FMi /8l+lnWfO+OiYGNlKu1HF0CnrDgh2FOMzMcamPw8jR4k0I25IYdRd1RVpGBHoCh1YA64BuuwwxOrB wQi70Q48ot93pNp5hAkHX5SSswz6mpay0u2xuCbhLiDX5xT252IvieyYolC40c5bVgwikxh0cxx3s 62tuPSSg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nMshj-003pS2-ML; Wed, 23 Feb 2022 14:33:19 +0000 Date: Wed, 23 Feb 2022 14:33:19 +0000 From: Matthew Wilcox To: Vlastimil Babka Cc: "stable@vger.kernel.org" , Miaohe Lin , Christoph Hellwig , Jan Kara , Takashi Iwai , "linux-mm@kvack.org" , patches@lists.linux.dev, LKML , "Kirill A. Shutemov" Subject: Re: read() data corruption with CONFIG_READ_ONLY_THP_FOR_FS=y Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=IQXhNyh1; dmarc=none; spf=none (imf01.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F20C84000C X-Stat-Signature: 6fgpkfjj56cqycw9g91wiqe9f9yysscm X-HE-Tag: 1645626810-257668 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 23, 2022 at 02:54:43PM +0100, Vlastimil Babka wrote: > we have found a bug involving CONFIG_READ_ONLY_THP_FOR_FS=y, introduced in > 5.12 by cbd59c48ae2b ("mm/filemap: use head pages in > generic_file_buffered_read") > and apparently fixed in 5.17-rc1 by 6b24ca4a1a8d ("mm: Use multi-index > entries in the page cache") > The latter commit is part of folio rework so likely not stable material, so > it would be nice to have a small fix for e.g. 5.15 LTS. Preferably from > someone who understands xarray :) [...] > I've hacked some printk on top 5.16 (attached debug.patch) > which gives this output: > > i=0 page=ffffea0004340000 page_offset=0 uoff=0 bytes=2097152 > i=1 page=ffffea0004340000 page_offset=0 uoff=0 bytes=2097152 > i=2 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0 > i=3 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0 > i=4 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0 > i=5 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0 > i=6 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0 > i=7 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0 > i=8 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0 > i=9 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0 > i=10 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0 > i=11 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0 > i=12 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0 > i=13 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0 > i=14 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0 > > It seems filemap_get_read_batch() should be returning pages ffffea0004340000 > and ffffea0004470000 consecutively in the pvec, but returns the first one 8 > times, so it's read twice and then the rest is just skipped over as it's > beyond the requested read size. > > I suspect these lines: > xas.xa_index = head->index + thp_nr_pages(head) - 1; > xas.xa_offset = (xas.xa_index >> xas.xa_shift) & XA_CHUNK_MASK; > > commit 6b24ca4a1a8d changes those to xas_advance() (introduced one patch > earlier), so some self-contained fix should be possible for prior kernels? > But I don't understand xarray well enough. I figured it out! In v5.15 (indeed, everything before commit 6b24ca4a1a8d), an order-9 page is stored in 512 consecutive slots. The XArray stores 64 entries per level. So what happens is we start looking at index 0 and we walk down to the bottom of the tree and find the THP at index 0. xas.xa_index = head->index + thp_nr_pages(head) - 1; xas.xa_offset = (xas.xa_index >> xas.xa_shift) & XA_CHUNK_MASK; So we've advanced xas.xa_index to 511, but advanced xas.xa_offset to 63. Then we call xas_next() which calls __xas_next(), which moves us along to array index 64 while we think we're looking at index 512. We could make __xas_next() more resistant to this kind of abuse (by extracting the correct offset in the parent node from xa_index), but as you say, we're looking for a small fix for LTS. I suggest this will probably do the right thing: +++ b/mm/filemap.c @@ -2354,8 +2354,7 @@ static void filemap_get_read_batch(struct address_space *mapping, break; if (PageReadahead(head)) break; - xas.xa_index = head->index + thp_nr_pages(head) - 1; - xas.xa_offset = (xas.xa_index >> xas.xa_shift) & XA_CHUNK_MASK; + xas_set(&xas, head->index + thp_nr_pages(head) - 1); continue; put_page: put_page(head); but I'll start trying the reproducer now.