From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=QuKr=HT=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C92AEC43381
	for <linux-mm@archiver.kernel.org>; Wed, 17 Feb 2021 15:01:22 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 21DCE64E33
	for <linux-mm@archiver.kernel.org>; Wed, 17 Feb 2021 15:01:22 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 21DCE64E33
Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 96C376B0006; Wed, 17 Feb 2021 10:01:21 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 91C676B006C; Wed, 17 Feb 2021 10:01:21 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 859996B006E; Wed, 17 Feb 2021 10:01:21 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40])
	by kanga.kvack.org (Postfix) with ESMTP id 702006B0006
	for <linux-mm@kvack.org>; Wed, 17 Feb 2021 10:01:21 -0500 (EST)
Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id C44673655
	for <linux-mm@kvack.org>; Wed, 17 Feb 2021 15:01:20 +0000 (UTC)
X-FDA: 77828073120.21.6BA61DE
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
	by imf20.hostedemail.com (Postfix) with ESMTP id E6F992BC1
	for <linux-mm@kvack.org>; Wed, 17 Feb 2021 15:00:19 +0000 (UTC)
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1;
	t=1613574018; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=zNdiGqAoz3ZwhvlUdJkNCXpe7aX8eZygEhCr3tOEEWw=;
	b=eJ64cNAG041bxJF2KFRBBXXn727Q2vCB/jsaissiJL9tVdOl0408WcejPxq1wISIB11yAh
	3bcxpmrBpQw1RZhQoYnHQMY1Ry9pO6CUsa8UvPvyYli1ScXKJiU36gScBtZlW/SG6/cc8R
	H1IurCthlR+0+3ClJ0ZsyFhyLONGLnI=
Received: from relay2.suse.de (unknown [195.135.221.27])
	by mx2.suse.de (Postfix) with ESMTP id 9AF0EB8FE;
	Wed, 17 Feb 2021 15:00:18 +0000 (UTC)
Date: Wed, 17 Feb 2021 16:00:11 +0100
From: Michal Hocko <mhocko@suse.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	David Hildenbrand <david@redhat.com>,
	Muchun Song <songmuchun@bytedance.com>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages
Message-ID: <YC0ve4PP+VTrEEtw@dhcp22.suse.cz>
References: <20210217100816.28860-1-osalvador@suse.de>
 <20210217100816.28860-2-osalvador@suse.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20210217100816.28860-2-osalvador@suse.de>
X-Stat-Signature: 5x5tdecgjxgp5ecu4s6oy7877hwa5g84
X-Rspamd-Server: rspam02
X-Rspamd-Queue-Id: E6F992BC1
Received-SPF: none (suse.com>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from="<mhocko@suse.com>"; helo=mx2.suse.de; client-ip=195.135.220.15
X-HE-DKIM-Result: pass/pass
X-HE-Tag: 1613574019-99444
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed 17-02-21 11:08:15, Oscar Salvador wrote:
[...]
> +static bool alloc_and_dissolve_huge_page(struct hstate *h, struct page *page)
> +{
> +	gfp_t gfp_mask = htlb_alloc_mask(h);
> +	nodemask_t *nmask = &node_states[N_MEMORY];
> +	struct page *new_page;
> +	bool ret = false;
> +	int nid;
> +
> +	spin_lock(&hugetlb_lock);
> +	/*
> +	 * Check one more time to make race-window smaller.
> +	 */
> +	if (!PageHuge(page)) {
> +		/*
> +		 * Dissolved from under our feet.
> +		 */
> +		spin_unlock(&hugetlb_lock);
> +		return true;
> +	}

Is this really necessary? dissolve_free_huge_page will take care of this
and the race windown you are covering is really tiny.

> +
> +	nid = page_to_nid(page);
> +	spin_unlock(&hugetlb_lock);
> +
> +	/*
> +	 * Before dissolving the page, we need to allocate a new one,
> +	 * so the pool remains stable.
> +	 */
> +	new_page = alloc_fresh_huge_page(h, gfp_mask, nid, nmask, NULL);

wrt. fallback to other zones, I haven't realized that the primary
usecase is a form of memory offlining (from virt-mem). I am not yet sure
what the proper behavior is in that case but if breaking hugetlb pools,
similar to the normal hotplug operation, is viable then this needs a
special mode. We do not want a random alloc_contig_range user to do the
same. So for starter I would go with __GFP_THISNODE here.

> +	if (new_page) {
> +		/*
> +		 * Ok, we got a new free hugepage to replace this one. Try to
> +		 * dissolve the old page.
> +		 */
> +		if (!dissolve_free_huge_page(page)) {
> +			ret = true;
> +		} else if (dissolve_free_huge_page(new_page)) {
> +			/*
> +			 * Seems the old page could not be dissolved, so try to
> +			 * dissolve the freshly allocated page. If that fails
> +			 * too, let us count the new page as a surplus. Doing so
> +			 * allows the pool to be re-balanced when pages are freed
> +			 * instead of enqueued again.
> +			 */
> +			spin_lock(&hugetlb_lock);
> +			h->surplus_huge_pages++;
> +			h->surplus_huge_pages_node[nid]++;
> +			spin_unlock(&hugetlb_lock);
> +		}
> +		/*
> +		 * Free it into the hugepage allocator
> +		 */
> +		put_page(new_page);
> +	}
> +
> +	return ret;
> +}
> +
> +bool isolate_or_dissolve_huge_page(struct page *page)
> +{
> +	struct hstate *h = NULL;
> +	struct page *head;
> +	bool ret = false;
> +
> +	spin_lock(&hugetlb_lock);
> +	if (PageHuge(page)) {
> +		head = compound_head(page);
> +		h = page_hstate(head);
> +	}
> +	spin_unlock(&hugetlb_lock);
> +
> +	if (!h)
> +		/*
> +		 * The page might have been dissolved from under our feet.
> +		 * If that is the case, return success as if we dissolved it
> +		 * ourselves.
> +		 */
> +		return true;

nit I would put the comment above the conditin for both cases. It reads
more easily that way. At least without { }.

> +
> +	if (hstate_is_gigantic(h))
> +		/*
> +		 * Fence off gigantic pages as there is a cyclic dependency
> +		 * between alloc_contig_range and them.
> +		 */
> +		return ret;
> +
> +	if(!page_count(head) && alloc_and_dissolve_huge_page(h, head))
> +		ret = true;
> +
> +	return ret;
> +}
> +
>  struct page *alloc_huge_page(struct vm_area_struct *vma,
>  				    unsigned long addr, int avoid_reserve)
>  {

Other than that I haven't noticed any surprises.
-- 
Michal Hocko
SUSE Labs