From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0D101C433EF
	for <linux-mm@archiver.kernel.org>; Tue, 10 May 2022 00:06:38 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 78A156B0072; Mon,  9 May 2022 20:06:37 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 73A1A6B0073; Mon,  9 May 2022 20:06:37 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6021A6B0074; Mon,  9 May 2022 20:06:37 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id 51CDB6B0072
	for <linux-mm@kvack.org>; Mon,  9 May 2022 20:06:37 -0400 (EDT)
Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay11.hostedemail.com (Postfix) with ESMTP id 1AA7480EE6
	for <linux-mm@kvack.org>; Tue, 10 May 2022 00:06:37 +0000 (UTC)
X-FDA: 79447892034.29.6D92A4D
Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75])
	by imf03.hostedemail.com (Postfix) with ESMTP id E9DA220087
	for <linux-mm@kvack.org>; Tue, 10 May 2022 00:06:28 +0000 (UTC)
Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by ams.source.kernel.org (Postfix) with ESMTPS id E93FBB819DF;
	Tue, 10 May 2022 00:06:34 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5DD17C385C5;
	Tue, 10 May 2022 00:06:33 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org;
	s=korg; t=1652141193;
	bh=DgAVSJugWswR9uGU+ofLa7fkVHUE/3d5Oo1amCmtuNs=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=BPrOIvDlxOMKUg1sOv2e2PoJnlDv3eWLxx0sGSl0HsM6FQEouAPObXr1iCN5PWqP1
	 xlwCdWAvnwhnBHmbuDq0MYeyEHoryRYw9h4oCjbAPN3GbIYIrhZTIs8+9eMuRi1XSm
	 gLuN3VXFFEVIHvkw4926kwehsfY2sHummU76ogwc=
Date: Mon, 9 May 2022 17:06:32 -0700
From: Andrew Morton <akpm@linux-foundation.org>
To: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: stable@vger.kernel.org, Minchan Kim <minchan@kernel.org>, Nitin Gupta
 <ngupta@vflare.org>, Sergey Senozhatsky <senozhatsky@chromium.org>,
 linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] zsmalloc: Fix races between asynchronous zspage free
 and page migration
Message-Id: <20220509170632.fec2f56ad9f640329330b9de@linux-foundation.org>
In-Reply-To: <20220509024703.243847-1-sultan@kerneltoast.com>
References: <20220509024703.243847-1-sultan@kerneltoast.com>
X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.33; x86_64-redhat-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: E9DA220087
X-Stat-Signature: kxkoepmoyg69rw1bmrgr86d4g91myeb4
X-Rspam-User: 
Authentication-Results: imf03.hostedemail.com;
	dkim=pass header.d=linux-foundation.org header.s=korg header.b=BPrOIvDl;
	spf=pass (imf03.hostedemail.com: domain of akpm@linux-foundation.org designates 145.40.68.75 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org;
	dmarc=none
X-Rspamd-Server: rspam09
X-HE-Tag: 1652141188-357639
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Sun,  8 May 2022 19:47:02 -0700 Sultan Alsawaf <sultan@kerneltoast.com> wrote:

> From: Sultan Alsawaf <sultan@kerneltoast.com>
> 
> The asynchronous zspage free worker tries to lock a zspage's entire page
> list without defending against page migration. Since pages which haven't
> yet been locked can concurrently migrate off the zspage page list while
> lock_zspage() churns away, lock_zspage() can suffer from a few different
> lethal races. It can lock a page which no longer belongs to the zspage and
> unsafely dereference page_private(), it can unsafely dereference a torn
> pointer to the next page (since there's a data race), and it can observe a
> spurious NULL pointer to the next page and thus not lock all of the
> zspage's pages (since a single page migration will reconstruct the entire
> page list, and create_page_chain() unconditionally zeroes out each list
> pointer in the process).
> 
> Fix the races by using migrate_read_lock() in lock_zspage() to synchronize
> with page migration.
> 
> --- a/mm/zsmalloc.c
> +++ b/mm/zsmalloc.c
> @@ -1718,11 +1718,40 @@ static enum fullness_group putback_zspage(struct size_class *class,
>   */
>  static void lock_zspage(struct zspage *zspage)
>  {
> -	struct page *page = get_first_page(zspage);
> +	struct page *curr_page, *page;
>  
> -	do {
> -		lock_page(page);
> -	} while ((page = get_next_page(page)) != NULL);
> +	/*
> +	 * Pages we haven't locked yet can be migrated off the list while we're
> +	 * trying to lock them, so we need to be careful and only attempt to
> +	 * lock each page under migrate_read_lock(). Otherwise, the page we lock
> +	 * may no longer belong to the zspage. This means that we may wait for
> +	 * the wrong page to unlock, so we must take a reference to the page
> +	 * prior to waiting for it to unlock outside migrate_read_lock().
> +	 */
> +	while (1) {
> +		migrate_read_lock(zspage);
> +		page = get_first_page(zspage);
> +		if (trylock_page(page))
> +			break;
> +		get_page(page);
> +		migrate_read_unlock(zspage);
> +		wait_on_page_locked(page);

Why not simply lock_page() here?  The get_page() alone won't protect
from all the dire consequences which you have identified?

> +		put_page(page);
> +	}
> +
> +	curr_page = page;
> +	while ((page = get_next_page(curr_page))) {
> +		if (trylock_page(page)) {
> +			curr_page = page;
> +		} else {
> +			get_page(page);
> +			migrate_read_unlock(zspage);
> +			wait_on_page_locked(page);

ditto.

> +			put_page(page);
> +			migrate_read_lock(zspage);
> +		}
> +	}
> +	migrate_read_unlock(zspage);
>  }
>  
>  static int zs_init_fs_context(struct fs_context *fc)