From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF2D0C433DB for ; Mon, 21 Dec 2020 11:08:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2092522BEF for ; Mon, 21 Dec 2020 11:07:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2092522BEF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 525276B0068; Mon, 21 Dec 2020 06:07:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D2FB6B006C; Mon, 21 Dec 2020 06:07:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 374DA6B006E; Mon, 21 Dec 2020 06:07:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70]) by kanga.kvack.org (Postfix) with ESMTP id 1F6696B0068 for ; Mon, 21 Dec 2020 06:07:59 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id CEF66180AD815 for ; Mon, 21 Dec 2020 11:07:58 +0000 (UTC) X-FDA: 77617014636.14.pin35_480288727456 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id AF61318020B28 for ; Mon, 21 Dec 2020 11:07:58 +0000 (UTC) X-HE-Tag: pin35_480288727456 X-Filterd-Recvd-Size: 7004 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 11:07:57 +0000 (UTC) Received: by mail-pl1-f179.google.com with SMTP id b8so5447829plx.0 for ; Mon, 21 Dec 2020 03:07:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BOlZCOLM2fiX9IZHU6rHTKDZB2KqCTIIheGSQkZLkzk=; b=keW0J8Q4lXZv+SIUf1De3T7sb3p2+MR0Lv0chpM09bx0Nf5BD8D5xSd/IAc7+zxacC TWPoC21wd4gx+W52iqF52xIef/afr+jhVsi7ICGWneMOi4bju/knXoiROvRkFBMcggys gkA4Ik1i+mImdXy06ZzmVZofmSfej86CBElBp5x9DGRYfeQd4HWCLFRlbaKjaowPGAsJ IVkAjGPr4+BIpMFGmzi2+qnWEg3k/aB5NkTtneAyxaWGJsaxzm+lfSEVxypucCTxG0Lf p3NLvydg4KDMuXdIT3HZUgFFg5XlzjgZy5lcvJZS07IWPc3vUWStqiexCXRwm73UpQh6 Onmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BOlZCOLM2fiX9IZHU6rHTKDZB2KqCTIIheGSQkZLkzk=; b=BQT/DjkfSKSH3vMFBRCJWVGMnNuFiRk908fk/QDjmT4CiOZpKpWEpZ9AC+UTwHZ1zb Vvufmbx2cB97zw48LTVAOmd89t+CuL4OV+j21kpuzTXyozEkhCFDHJYIs+yD+z8wzKFh l6d0UHL1J8jBjfOVDKEGm1Z3ngvaMfBJH+ZzD7ITHyZW0C6U1ddw3GtE4uvKRHnhG11C n8vdLlmNRuAwBd1LzKJPcFZDpv6KvDPAK8hVoFNzuehuH2GR4U9DjpIZzO8+T4zlusq5 aWzgYpnHKDp+cZP/BaYD0MaohrqT3FOf0shCUW96UfTboZorOrjjopf7UWlMuQfZU2gx mjUQ== X-Gm-Message-State: AOAM5311bHyYR7xtSkiFLV0YsK5sko8pD1Fe8Vs1V0fM65UjIAsBJ6cq +YzC6mom3odO4VteTeO/MBJn/ip5n/5Fw7ZeS7kVVQ== X-Google-Smtp-Source: ABdhPJwoJStF1A669nR7sEzwHJAQg6sembIt+S/xmXLOZRFLjO1my0AFoe54teplQtqVoxYW3ICQo1YTApqa6dAcBYs= X-Received: by 2002:a17:902:8503:b029:dc:44f:62d8 with SMTP id bj3-20020a1709028503b02900dc044f62d8mr15600526plb.34.1608548876589; Mon, 21 Dec 2020 03:07:56 -0800 (PST) MIME-Version: 1.0 References: <20201217121303.13386-1-songmuchun@bytedance.com> <20201217121303.13386-5-songmuchun@bytedance.com> <20201221102703.GA15804@linux> In-Reply-To: <20201221102703.GA15804@linux> From: Muchun Song Date: Mon, 21 Dec 2020 19:07:18 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v10 04/11] mm/hugetlb: Defer freeing of HugeTLB pages To: Oscar Salvador Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Michal Hocko , "Song Bao Hua (Barry Song)" , David Hildenbrand , naoya.horiguchi@nec.com, Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 21, 2020 at 6:27 PM Oscar Salvador wrote: > > On Thu, Dec 17, 2020 at 08:12:56PM +0800, Muchun Song wrote: > > In the subsequent patch, we will allocate the vmemmap pages when free > > HugeTLB pages. But update_and_free_page() is called from a non-task > > context(and hold hugetlb_lock), so we can defer the actual freeing in > > a workqueue to prevent from using GFP_ATOMIC to allocate the vmemmap > > pages. > > I think we would benefit from a more complete changelog, at least I had > to stare at the code for a while in order to grasp what are we trying > to do and the reasons behind. OK. Will do. > > > +static void __free_hugepage(struct hstate *h, struct page *page); > > + > > +/* > > + * As update_and_free_page() is be called from a non-task context(and hold > > + * hugetlb_lock), we can defer the actual freeing in a workqueue to prevent > > + * use GFP_ATOMIC to allocate a lot of vmemmap pages. > > The above implies that update_and_free_page() is __always__ called from a > non-task context, but that is not always the case? IIUC, here is always the case. > > > +static void update_hpage_vmemmap_workfn(struct work_struct *work) > > { > > - int i; > > + struct llist_node *node; > > + struct page *page; > > > > + node = llist_del_all(&hpage_update_freelist); > > + > > + while (node) { > > + page = container_of((struct address_space **)node, > > + struct page, mapping); > > + node = node->next; > > + page->mapping = NULL; > > + __free_hugepage(page_hstate(page), page); > > + > > + cond_resched(); > > + } > > +} > > +static DECLARE_WORK(hpage_update_work, update_hpage_vmemmap_workfn); > > I wonder if this should be moved to hugetlb_vmemmap.c Maybe I can do a try. > > > +/* > > + * This is where the call to allocate vmemmmap pages will be inserted. > > + */ > > I think this should go in the changelog. OK. Will do. > > > +static void __free_hugepage(struct hstate *h, struct page *page) > > +{ > > + int i; > > + > > for (i = 0; i < pages_per_huge_page(h); i++) { > > page[i].flags &= ~(1 << PG_locked | 1 << PG_error | > > 1 << PG_referenced | 1 << PG_dirty | > > @@ -1313,13 +1377,17 @@ static void update_and_free_page(struct hstate *h, struct page *page) > > set_page_refcounted(page); > > if (hstate_is_gigantic(h)) { > > /* > > - * Temporarily drop the hugetlb_lock, because > > - * we might block in free_gigantic_page(). > > + * Temporarily drop the hugetlb_lock only when this type of > > + * HugeTLB page does not support vmemmap optimization (which > > + * context do not hold the hugetlb_lock), because we might > > + * block in free_gigantic_page(). > > " > /* > * Temporarily drop the hugetlb_lock, because we might block > * in free_gigantic_page(). Only drop it in case the vmemmap > * optimization is disabled, since that context does not hold > * the lock. > */ > " ? Thanks a lot. > > > Oscar Salvador > SUSE L3 -- Yours, Muchun