Re: read() data corruption with CONFIG_READ_ONLY_THP_FOR_FS=y

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vlastimil Babka <vbabka@suse.cz>
To: Matthew Wilcox <willy@infradead.org>
Cc: "stable@vger.kernel.org" <stable@vger.kernel.org>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Takashi Iwai <tiwai@suse.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	patches@lists.linux.dev, LKML <linux-kernel@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: read() data corruption with CONFIG_READ_ONLY_THP_FOR_FS=y
Date: Wed, 23 Feb 2022 18:07:03 +0100	[thread overview]
Message-ID: <87b056da-80e8-17f5-41ae-9fb493a7f0da@suse.cz> (raw)
In-Reply-To: <YhZFr+kXIJFgiMaf@casper.infradead.org>

On 2/23/22 15:33, Matthew Wilcox wrote:
> On Wed, Feb 23, 2022 at 02:54:43PM +0100, Vlastimil Babka wrote:
>> we have found a bug involving CONFIG_READ_ONLY_THP_FOR_FS=y, introduced in
>> 5.12 by cbd59c48ae2b ("mm/filemap: use head pages in
>> generic_file_buffered_read")
>> and apparently fixed in 5.17-rc1 by 6b24ca4a1a8d ("mm: Use multi-index
>> entries in the page cache")
>> The latter commit is part of folio rework so likely not stable material, so
>> it would be nice to have a small fix for e.g. 5.15 LTS. Preferably from
>> someone who understands xarray :)
> 
> [...]
> 
>> I've hacked some printk on top 5.16 (attached debug.patch)
>> which gives this output:
>> 
>> i=0 page=ffffea0004340000 page_offset=0 uoff=0 bytes=2097152
>> i=1 page=ffffea0004340000 page_offset=0 uoff=0 bytes=2097152
>> i=2 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0
>> i=3 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0
>> i=4 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0
>> i=5 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0
>> i=6 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0
>> i=7 page=ffffea0004340000 page_offset=0 uoff=0 bytes=0
>> i=8 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0
>> i=9 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0
>> i=10 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0
>> i=11 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0
>> i=12 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0
>> i=13 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0
>> i=14 page=ffffea0004470000 page_offset=2097152 uoff=0 bytes=0
>> 
>> It seems filemap_get_read_batch() should be returning pages ffffea0004340000
>> and ffffea0004470000 consecutively in the pvec, but returns the first one 8
>> times, so it's read twice and then the rest is just skipped over as it's
>> beyond the requested read size.
>> 
>> I suspect these lines:
>>   xas.xa_index = head->index + thp_nr_pages(head) - 1;
>>   xas.xa_offset = (xas.xa_index >> xas.xa_shift) & XA_CHUNK_MASK;
>> 
>> commit 6b24ca4a1a8d changes those to xas_advance() (introduced one patch
>> earlier), so some self-contained fix should be possible for prior kernels?
>> But I don't understand xarray well enough.
> 
> I figured it out!
> 
> In v5.15 (indeed, everything before commit 6b24ca4a1a8d), an order-9
> page is stored in 512 consecutive slots.  The XArray stores 64 entries
> per level.  So what happens is we start looking at index 0 and we walk
> down to the bottom of the tree and find the THP at index 0.
> 
>                 xas.xa_index = head->index + thp_nr_pages(head) - 1;
>                 xas.xa_offset = (xas.xa_index >> xas.xa_shift) & XA_CHUNK_MASK;
> 
> So we've advanced xas.xa_index to 511, but advanced xas.xa_offset to 63.
> Then we call xas_next() which calls __xas_next(), which moves us along to
> array index 64 while we think we're looking at index 512.
> 
> We could make __xas_next() more resistant to this kind of abuse (by
> extracting the correct offset in the parent node from xa_index), but
> as you say, we're looking for a small fix for LTS.  I suggest this
> will probably do the right thing:

Great!

Just so others are aware: the final fix is here:
https://lore.kernel.org/all/20220223155918.927140-1-willy@infradead.org/

> +++ b/mm/filemap.c
> @@ -2354,8 +2354,7 @@ static void filemap_get_read_batch(struct address_space *mapping,
>                         break;
>                 if (PageReadahead(head))
>                         break;
> -               xas.xa_index = head->index + thp_nr_pages(head) - 1;
> -               xas.xa_offset = (xas.xa_index >> xas.xa_shift) & XA_CHUNK_MASK;
> +               xas_set(&xas, head->index + thp_nr_pages(head) - 1);
>                 continue;
>  put_page:
>                 put_page(head);
> 
> but I'll start trying the reproducer now.
> 
>

     prev parent reply	other threads:[~2022-02-23 17:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-23 13:54 Vlastimil Babka
2022-02-23 14:33 ` Matthew Wilcox
2022-02-23 17:07   ` Vlastimil Babka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87b056da-80e8-17f5-41ae-9fb493a7f0da@suse.cz \
    --to=vbabka@suse.cz \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=tiwai@suse.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox