From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2C31C433E0 for ; Tue, 12 Jan 2021 10:50:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2479323109 for ; Tue, 12 Jan 2021 10:50:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2479323109 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7CA7D6B0288; Tue, 12 Jan 2021 05:50:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 77AD38D0091; Tue, 12 Jan 2021 05:50:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68F898D0090; Tue, 12 Jan 2021 05:50:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0201.hostedemail.com [216.40.44.201]) by kanga.kvack.org (Postfix) with ESMTP id 4B5CD6B0288 for ; Tue, 12 Jan 2021 05:50:03 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 19A84181AEF0B for ; Tue, 12 Jan 2021 10:50:03 +0000 (UTC) X-FDA: 77696803086.03.fear95_620999c27514 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id E9C8428A4E8 for ; Tue, 12 Jan 2021 10:50:02 +0000 (UTC) X-HE-Tag: fear95_620999c27514 X-Filterd-Recvd-Size: 7179 Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Tue, 12 Jan 2021 10:50:02 +0000 (UTC) Received: by mail-pj1-f42.google.com with SMTP id u4so1402076pjn.4 for ; Tue, 12 Jan 2021 02:50:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=j6hOZRhr6lM1qFIJTkdoya6lHwDoxk8ZO60XJ7q+5ys=; b=vi+crJBdaq2Sz6WZnM5jf9qWYETFaFFQMhnrk5Zi18ejf3zZGhiBBZXRbMhYgnIKfp ABV31ETb3qIp9zyDU9K02NSUcS+I0WFHCqxM2zrxmySLIN7Z11DaWiN1F4bUrhsiqDC+ 2fn7qbNzs41a03EJXq5vvBQMjpKGjjt37sddYaYrXfRTNDhTbhyAvFHk5X3PsOzgIKai TwEf4sDpKQrwPKnkgItUwshQD5/NYC9wGuTFcp4lgpmH+lfMFhmrDowLfv3A8DhPaCMW GtZnf/aLpufpLNP0UbWukc6ClsAAOAg3aGEGhoN9RNpZnUgRr1SNPyVYe0vUogWdBk4h W1AQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=j6hOZRhr6lM1qFIJTkdoya6lHwDoxk8ZO60XJ7q+5ys=; b=uSeCW0GtRHZCHznFF1reQZOXeMdQEBuDCkA5sm2/isKXpwZr6gJSl8LJptRHsUgPRl lWHDGVNMjL9QgaedcxshspSp3QFdNkB9YpjVdI076aM6bSQ/sZGnd5fa5CPZQqksFvLG Zd4dNTSLpNU0O/GgMusJ/iq1BHVHZ5aVyEid6HcsNiod5sFJjRsVWXmvkA83SOFBrdz8 IxKahiZleOFevwSAGbpAxGl2Z24Sk3w6Ec0B5HYsVon4kA4t6o0zMvi8FwxKnhfaTq36 fWsDacB8ITtKyYDIaXrXGdaL4SMV/UqZCL8naCceiAN1uKQWNu02e6SW0QQUML1fd4oF OmnQ== X-Gm-Message-State: AOAM533WHszMm+ELBZErgyh5Y02eVmTqsEtvt6gOpkF/CEAc4r5iOArq S97vX9ZTwYWPPD3ToAlqni2xBKpZ8E6HMPoB7IIUHUL0ltL6jY7XFGU= X-Google-Smtp-Source: ABdhPJwYg/d8Ey5S+NTEOtv5UjN8J2XLJeVXg45ipl3li1Kcb9t0pPL0IRxxqWfrG1PuGGR9+o0w94TB7OGPxCTUoAo= X-Received: by 2002:a17:902:8503:b029:dc:44f:62d8 with SMTP id bj3-20020a1709028503b02900dc044f62d8mr4197756plb.34.1610448600939; Tue, 12 Jan 2021 02:50:00 -0800 (PST) MIME-Version: 1.0 References: <20210110124017.86750-1-songmuchun@bytedance.com> <20210110124017.86750-5-songmuchun@bytedance.com> <20210112083335.GH22493@dhcp22.suse.cz> <20210112100602.GL22493@dhcp22.suse.cz> In-Reply-To: <20210112100602.GL22493@dhcp22.suse.cz> From: Muchun Song Date: Tue, 12 Jan 2021 18:49:17 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v3 4/6] mm: hugetlb: add return -EAGAIN for dissolve_free_huge_page To: Michal Hocko Cc: Mike Kravetz , Andrew Morton , Naoya Horiguchi , Andi Kleen , Linux Memory Management List , LKML Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jan 12, 2021 at 6:06 PM Michal Hocko wrote: > > On Tue 12-01-21 17:51:05, Muchun Song wrote: > > On Tue, Jan 12, 2021 at 4:33 PM Michal Hocko wrote: > > > > > > On Mon 11-01-21 17:20:51, Mike Kravetz wrote: > > > > On 1/10/21 4:40 AM, Muchun Song wrote: > > > > > There is a race between dissolve_free_huge_page() and put_page(), > > > > > and the race window is quite small. Theoretically, we should return > > > > > -EBUSY when we encounter this race. In fact, we have a chance to > > > > > successfully dissolve the page if we do a retry. Because the race > > > > > window is quite small. If we seize this opportunity, it is an > > > > > optimization for increasing the success rate of dissolving page. > > > > > > > > > > If we free a HugeTLB page from a non-task context, it is deferred > > > > > through a workqueue. In this case, we need to flush the work. > > > > > > > > > > The dissolve_free_huge_page() can be called from memory hotplug, > > > > > the caller aims to free the HugeTLB page to the buddy allocator > > > > > so that the caller can unplug the page successfully. > > > > > > > > > > Signed-off-by: Muchun Song > > > > > --- > > > > > mm/hugetlb.c | 26 +++++++++++++++++++++----- > > > > > 1 file changed, 21 insertions(+), 5 deletions(-) > > > > > > > > I am unsure about the need for this patch. The code is OK, there are no > > > > issues with the code. > > > > > > > > As mentioned in the commit message, this is an optimization and could > > > > potentially cause a memory offline operation to succeed instead of fail. > > > > However, we are very unlikely to ever exercise this code. Adding an > > > > optimization that is unlikely to be exercised is certainly questionable. > > > > > > > > Memory offline is the only code that could benefit from this optimization. > > > > As someone with more memory offline user experience, what is your opinion > > > > Michal? > > > > > > I am not a great fun of optimizations without any data to back them up. > > > I do not see any sign this code has been actually tested and the > > > condition triggered. > > > > This race is quite small. I only trigger this only once on my server. > > And then the kernel panic. So I sent this patch series to fix some > > bugs. > > Memory hotplug shouldn't panic when this race happens. Are you sure you > have seen a race that is directly related to this patch? I mean the panic is fixed by: [PATCH v3 3/6] mm: hugetlb: fix a race between freeing and dissolving the page > > > > Besides that I have requested to have an explanation of why blocking on > > > the WQ is safe and that hasn't happened. > > > > I have seen all the caller of dissolve_free_huge_page, some caller is under > > page lock (via lock_page). Others are also under a sleep context. > > > > So I think that blocking on the WQ is safe. Right? > > I have requested to explicitly write your thinking why this is safe so > that we can double check it. Dependency on a work queue progress is much > more complex than any other locks because there is no guarantee that WQ > will make forward progress (all workers might be stuck, new workers not > able to be created etc.). OK. I know about your concern. How about setting the page as temporary when hitting this race? int dissolve_free_huge_page(struct page *page) { @@ -1793,8 +1794,10 @@ int dissolve_free_huge_page(struct page *page) * We should make sure that the page is already on the free list * when it is dissolved. */ - if (unlikely(!PageHugeFreed(head))) + if (unlikely(!PageHugeFreed(head))) { + SetPageHugeTemporary(page) goto out; + } Setting the page as temporary and just return -EBUSY (do not flush the work). __free_huge_page() will free it to the buddy allocator later. > -- > Michal Hocko > SUSE Labs