linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Stevens <stevensd@chromium.org>
Cc: linux-mm@kvack.org, Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Yang Shi <shy828301@gmail.com>,
	David Hildenbrand <david@redhat.com>,
	Jiaqi Yan <jiaqiyan@google.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6 4/4] mm/khugepaged: maintain page cache uptodate flag
Date: Tue, 4 Apr 2023 17:21:31 -0400	[thread overview]
Message-ID: <ZCyU26ZLSKqLj+kA@x1n> (raw)
In-Reply-To: <20230404120117.2562166-5-stevensd@google.com>

On Tue, Apr 04, 2023 at 09:01:17PM +0900, David Stevens wrote:
> From: David Stevens <stevensd@chromium.org>
> 
> Make sure that collapse_file doesn't interfere with checking the
> uptodate flag in the page cache by only inserting hpage into the page
> cache after it has been updated and marked uptodate. This is achieved by
> simply not replacing present pages with hpage when iterating over the
> target range.
> 
> The present pages are already locked, so replacing them with the locked
> hpage before the collapse is finalized is unnecessary. However, it is
> necessary to stop freezing the present pages after validating them,
> since leaving long-term frozen pages in the page cache can lead to
> deadlocks. Simply checking the reference count is sufficient to ensure
> that there are no long-term references hanging around that would the
> collapse would break. Similar to hpage, there is no reason that the
> present pages actually need to be frozen in addition to being locked.
> 
> This fixes a race where folio_seek_hole_data would mistake hpage for
> an fallocated but unwritten page. This race is visible to userspace via
> data temporarily disappearing from SEEK_DATA/SEEK_HOLE. This also fixes
> a similar race where pages could temporarily disappear from mincore.
> 
> Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
> Signed-off-by: David Stevens <stevensd@chromium.org>
> ---
>  mm/khugepaged.c | 79 ++++++++++++++++++-------------------------------
>  1 file changed, 29 insertions(+), 50 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 7679551e9540..a19aa140fd52 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1855,17 +1855,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
>   *
>   * Basic scheme is simple, details are more complex:
>   *  - allocate and lock a new huge page;
> - *  - scan page cache replacing old pages with the new one
> + *  - scan page cache, locking old pages
>   *    + swap/gup in pages if necessary;
> - *    + keep old pages around in case rollback is required;
> + *  - copy data to new page
> + *  - handle shmem holes
> + *    + re-validate that holes weren't filled by someone else
> + *    + check for userfaultfd

PS: some of the changes may belong to previous patch here, but not
necessary to repost only for this, just in case there'll be a new one.

>   *  - finalize updates to the page cache;
>   *  - if replacing succeeds:
> - *    + copy data over;
> - *    + free old pages;
>   *    + unlock huge page;
> + *    + free old pages;
>   *  - if replacing failed;
> - *    + put all pages back and unfreeze them;
> - *    + restore gaps in the page cache;
> + *    + unlock old pages
>   *    + unlock and free huge page;
>   */
>  static int collapse_file(struct mm_struct *mm, unsigned long addr,
> @@ -1913,12 +1914,6 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  		}
>  	} while (1);
>  
> -	/*
> -	 * At this point the hpage is locked and not up-to-date.
> -	 * It's safe to insert it into the page cache, because nobody would
> -	 * be able to map it or use it in another way until we unlock it.
> -	 */
> -
>  	xas_set(&xas, start);
>  	for (index = start; index < end; index++) {
>  		page = xas_next(&xas);
> @@ -2076,12 +2071,16 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
>  		VM_BUG_ON_PAGE(page != xas_load(&xas), page);
>  
>  		/*
> -		 * The page is expected to have page_count() == 3:
> +		 * We control three references to the page:
>  		 *  - we hold a pin on it;
>  		 *  - one reference from page cache;
>  		 *  - one from isolate_lru_page;
> +		 * If those are the only references, then any new usage of the
> +		 * page will have to fetch it from the page cache. That requires
> +		 * locking the page to handle truncate, so any new usage will be
> +		 * blocked until we unlock page after collapse/during rollback.
>  		 */
> -		if (!page_ref_freeze(page, 3)) {
> +		if (page_count(page) != 3) {
>  			result = SCAN_PAGE_COUNT;
>  			xas_unlock_irq(&xas);
>  			putback_lru_page(page);

Personally I don't see anything wrong with this change to resolve the dead
lock.  E.g. fast gup race right before unmapping the pgtables seems fine,
since we'll just bail out with >3 refcounts (or fast-gup bails out by
checking pte changes).  Either way looks fine here.

So far it looks good to me, but that may not mean much per the history on
what I can overlook.  It'll be always good to hear from Hugh and others.

-- 
Peter Xu



  reply	other threads:[~2023-04-04 21:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04 12:01 [PATCH v6 0/4] mm/khugepaged: fixes for khugepaged+shmem David Stevens
2023-04-04 12:01 ` [PATCH v6 1/4] mm/khugepaged: drain lru after swapping in shmem David Stevens
2023-04-04 12:01 ` [PATCH v6 2/4] mm/khugepaged: refactor collapse_file control flow David Stevens
2023-04-04 12:01 ` [PATCH v6 3/4] mm/khugepaged: skip shmem with userfaultfd David Stevens
2023-04-04 12:01 ` [PATCH v6 4/4] mm/khugepaged: maintain page cache uptodate flag David Stevens
2023-04-04 21:21   ` Peter Xu [this message]
2023-04-19  4:37     ` Hugh Dickins
2023-06-20 20:55   ` Andres Freund
2023-06-20 21:11     ` Peter Xu
2023-06-20 21:41       ` Andres Freund

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCyU26ZLSKqLj+kA@x1n \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=jiaqiyan@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=stevensd@chromium.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox