From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=wefk=GP=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9D774C433DB
	for <linux-mm@archiver.kernel.org>; Tue, 12 Jan 2021 10:06:06 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 3F64B22B30
	for <linux-mm@archiver.kernel.org>; Tue, 12 Jan 2021 10:06:06 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F64B22B30
Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 8C2A38D008B; Tue, 12 Jan 2021 05:06:05 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 823298D0051; Tue, 12 Jan 2021 05:06:05 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 6C3D78D008B; Tue, 12 Jan 2021 05:06:05 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0064.hostedemail.com [216.40.44.64])
	by kanga.kvack.org (Postfix) with ESMTP id 4F7178D0051
	for <linux-mm@kvack.org>; Tue, 12 Jan 2021 05:06:05 -0500 (EST)
Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id 072868248047
	for <linux-mm@kvack.org>; Tue, 12 Jan 2021 10:06:05 +0000 (UTC)
X-FDA: 77696692290.25.wound65_560fcbb27514
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin25.hostedemail.com (Postfix) with ESMTP id DD7B11804E3A1
	for <linux-mm@kvack.org>; Tue, 12 Jan 2021 10:06:04 +0000 (UTC)
X-HE-Tag: wound65_560fcbb27514
X-Filterd-Recvd-Size: 4888
Received: from mx2.suse.de (mx2.suse.de [195.135.220.15])
	by imf36.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 12 Jan 2021 10:06:04 +0000 (UTC)
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1;
	t=1610445963; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=RHdSnnGmXTWqWhEM7prug3O03a1r9bZU4cr4+Pqj7zo=;
	b=VkMMdobLpWMBdDJI9kg5/mNdrZjqd/kRaTtvWQCHKIp8N7/Bf+CChct089lPa7lCEdxJzz
	X/6d24udMOJyDqzFidgh1/wB9VqaT0l7QAJXJqdgbW1gop8lYRDbvFeblKgAIj0ul97Qwn
	7qr3ww50TkIor29cndwzzZoeElVg1Jo=
Received: from relay2.suse.de (unknown [195.135.221.27])
	by mx2.suse.de (Postfix) with ESMTP id 36A79ADC4;
	Tue, 12 Jan 2021 10:06:03 +0000 (UTC)
Date: Tue, 12 Jan 2021 11:06:02 +0100
From: Michal Hocko <mhocko@suse.com>
To: Muchun Song <songmuchun@bytedance.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Andi Kleen <ak@linux.intel.com>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [External] Re: [PATCH v3 4/6] mm: hugetlb: add return -EAGAIN
 for dissolve_free_huge_page
Message-ID: <20210112100602.GL22493@dhcp22.suse.cz>
References: <20210110124017.86750-1-songmuchun@bytedance.com>
 <20210110124017.86750-5-songmuchun@bytedance.com>
 <c61cdf1d-2feb-ecb3-393d-ca25175c73f4@oracle.com>
 <20210112083335.GH22493@dhcp22.suse.cz>
 <CAMZfGtVCntwNM=2RHHp=qDLN3L71ouQy=9V_e=VTNHtCDHsmWA@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAMZfGtVCntwNM=2RHHp=qDLN3L71ouQy=9V_e=VTNHtCDHsmWA@mail.gmail.com>
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue 12-01-21 17:51:05, Muchun Song wrote:
> On Tue, Jan 12, 2021 at 4:33 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 11-01-21 17:20:51, Mike Kravetz wrote:
> > > On 1/10/21 4:40 AM, Muchun Song wrote:
> > > > There is a race between dissolve_free_huge_page() and put_page(),
> > > > and the race window is quite small. Theoretically, we should return
> > > > -EBUSY when we encounter this race. In fact, we have a chance to
> > > > successfully dissolve the page if we do a retry. Because the race
> > > > window is quite small. If we seize this opportunity, it is an
> > > > optimization for increasing the success rate of dissolving page.
> > > >
> > > > If we free a HugeTLB page from a non-task context, it is deferred
> > > > through a workqueue. In this case, we need to flush the work.
> > > >
> > > > The dissolve_free_huge_page() can be called from memory hotplug,
> > > > the caller aims to free the HugeTLB page to the buddy allocator
> > > > so that the caller can unplug the page successfully.
> > > >
> > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > > ---
> > > >  mm/hugetlb.c | 26 +++++++++++++++++++++-----
> > > >  1 file changed, 21 insertions(+), 5 deletions(-)
> > >
> > > I am unsure about the need for this patch.  The code is OK, there are no
> > > issues with the code.
> > >
> > > As mentioned in the commit message, this is an optimization and could
> > > potentially cause a memory offline operation to succeed instead of fail.
> > > However, we are very unlikely to ever exercise this code.  Adding an
> > > optimization that is unlikely to be exercised is certainly questionable.
> > >
> > > Memory offline is the only code that could benefit from this optimization.
> > > As someone with more memory offline user experience, what is your opinion
> > > Michal?
> >
> > I am not a great fun of optimizations without any data to back them up.
> > I do not see any sign this code has been actually tested and the
> > condition triggered.
> 
> This race is quite small. I only trigger this only once on my server.
> And then the kernel panic. So I sent this patch series to fix some
> bugs.

Memory hotplug shouldn't panic when this race happens. Are you sure you
have seen a race that is directly related to this patch?

> > Besides that I have requested to have an explanation of why blocking on
> > the WQ is safe and that hasn't happened.
> 
> I have seen all the caller of dissolve_free_huge_page, some caller is under
> page lock (via lock_page). Others are also under a sleep context.
> 
> So I think that blocking on the WQ is safe. Right?

I have requested to explicitly write your thinking why this is safe so
that we can double check it. Dependency on a work queue progress is much
more complex than any other locks because there is no guarantee that WQ
will make forward progress (all workers might be stuck, new workers not
able to be created etc.).
-- 
Michal Hocko
SUSE Labs