Re: [PATCHv2 1/1] mm: fix unproperly folio_put by changing API in read_pages

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: "zhaoyang.huang" <zhaoyang.huang@unisoc.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	NeilBrown <neilb@suse.de>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Zhaoyang Huang <huangzhaoyang@gmail.com>,
	steve.kang@unisoc.com
Cc: Matthew Wilcox <willy@infradead.org>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: [PATCHv2 1/1] mm: fix unproperly folio_put by changing API in read_pages
Date: Tue, 2 Apr 2024 14:58:50 +0200	[thread overview]
Message-ID: <736b982a-57c9-441a-812c-87cdee2e096e@redhat.com> (raw)
In-Reply-To: <20240401081734.1433755-1-zhaoyang.huang@unisoc.com>

On 01.04.24 10:17, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> An VM_BUG_ON in step 9 of [1] could happen as the refcnt is dropped
> unproperly during the procedure of read_pages()->readahead_folio->folio_put.
> This is introduced by commit 9fd472af84ab ("mm: improve cleanup when
> ->readpages doesn't process all pages")'.
> 
> key steps of[1] in brief:
> 2'. Thread_truncate get folio to its local fbatch by find_get_entry in step 2
> 7'. Last refcnt remained which is not as expect as from alloc_pages
>      but from thread_truncate's local fbatch in step 7
> 8'. Thread_reclaim succeed to isolate the folio by the wrong refcnt(not
>      the value but meaning) in step 8
> 9'. Thread_truncate hit the VM_BUG_ON in step 9
> 
> [1]
> Thread_readahead:
> 0. folio = filemap_alloc_folio(gfp_mask, 0);
>         (refcount 1: alloc_pages)
> 1. ret = filemap_add_folio(mapping, folio, index + i, gfp_mask);
>         (refcount 2: alloc_pages, page_cache)
> 
> Thread_truncate:
> 2. folio = find_get_entries(&fbatch_truncate);
>         (refcount 3: alloc_pages, page_cache, fbatch_truncate))
> 
> Thread_readahead:
> 3. Then we call read_pages()
>         First we call ->readahead() which for some reason stops early.
> 4. Then we call readahead_folio() which calls folio_put()
>         (refcount 2: page_cache, fbatch_truncate)
> 5. Then we call folio_get()
>         (refcount 3: page_cache, fbatch_truncate, read_pages_temp)
> 6. Then we call filemap_remove_folio()
>         (refcount 2: fbatch_truncate, read_pages_temp)
> 7. Then we call folio_unlock() and folio_put()
>         (refcount 1: fbatch_truncate)
> 
> Thread_reclaim:
> 8. collect the page from LRU and call shrink_inactive_list->isolate_lru_folios
>          shrink_inactive_list
>          {
>              isolate_lru_folios
>              {
>                 if (!folio_test_lru(folio)) //false
>                     bail out;
>                 if (!folio_try_get(folio)) //false
>                     bail out;
>              }
>           }
>           (refcount 2: fbatch_truncate, reclaim_isolate)
> 
> 9. call shrink_folio_list->__remove_mapping
>          shrink_folio_list()
>          {
>              folio_try_lock(folio);
>              __remove_mapping()
>              {
>                  if (!folio_ref_freeze(2)) //false
>                      bail out;
>              }
>              folio_unlock(folio)
>              list_add(folio, free_folios);
>          }
>          (folio has refcount 0)
> 
> Thread_truncate:
> 10. Thread_truncate will hit the refcnt VM_BUG_ON(refcnt == 0) in
> release_pages->folio_put_testzero
>          truncate_inode_pages_range
>          {
>              folio = find_get_entries(&fbatch_truncate);
>              truncate_inode_pages(&fbatch_truncate);
>              folio_fbatch_release(&fbatch_truncate);
>              {
>                  folio_put_testzero(folio); //VM_BUG_ON here
>              }
>          }
> 
> fix: commit 9fd472af84ab ("mm: improve cleanup when ->readpages doesn't process all pages")'

Something that would help here is an actual reproducer that triggersthis 
issue.

To me, it's unclear at this point if we are talking about an actual 
issue or a theoretical issue?

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2024-04-02 12:59 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-01  8:17 zhaoyang.huang
2024-04-02  0:34 ` Matthew Wilcox
2024-04-02  6:33   ` Zhaoyang Huang
2024-04-02 12:58 ` David Hildenbrand [this message]
2024-04-03  5:50   ` Zhaoyang Huang
2024-04-03  8:01     ` David Hildenbrand
2024-04-03 11:08       ` Zhaoyang Huang
2024-04-03 11:47         ` David Hildenbrand
2024-04-04  9:15           ` Zhaoyang Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=736b982a-57c9-441a-812c-87cdee2e096e@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=hch@infradead.org \
    --cc=huangzhaoyang@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=neilb@suse.de \
    --cc=steve.kang@unisoc.com \
    --cc=willy@infradead.org \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox