From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C92AEC43381 for ; Wed, 17 Feb 2021 15:01:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 21DCE64E33 for ; Wed, 17 Feb 2021 15:01:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 21DCE64E33 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 96C376B0006; Wed, 17 Feb 2021 10:01:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 91C676B006C; Wed, 17 Feb 2021 10:01:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 859996B006E; Wed, 17 Feb 2021 10:01:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0040.hostedemail.com [216.40.44.40]) by kanga.kvack.org (Postfix) with ESMTP id 702006B0006 for ; Wed, 17 Feb 2021 10:01:21 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C44673655 for ; Wed, 17 Feb 2021 15:01:20 +0000 (UTC) X-FDA: 77828073120.21.6BA61DE Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf20.hostedemail.com (Postfix) with ESMTP id E6F992BC1 for ; Wed, 17 Feb 2021 15:00:19 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1613574018; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=zNdiGqAoz3ZwhvlUdJkNCXpe7aX8eZygEhCr3tOEEWw=; b=eJ64cNAG041bxJF2KFRBBXXn727Q2vCB/jsaissiJL9tVdOl0408WcejPxq1wISIB11yAh 3bcxpmrBpQw1RZhQoYnHQMY1Ry9pO6CUsa8UvPvyYli1ScXKJiU36gScBtZlW/SG6/cc8R H1IurCthlR+0+3ClJ0ZsyFhyLONGLnI= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 9AF0EB8FE; Wed, 17 Feb 2021 15:00:18 +0000 (UTC) Date: Wed, 17 Feb 2021 16:00:11 +0100 From: Michal Hocko To: Oscar Salvador Cc: Andrew Morton , Mike Kravetz , David Hildenbrand , Muchun Song , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages Message-ID: References: <20210217100816.28860-1-osalvador@suse.de> <20210217100816.28860-2-osalvador@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210217100816.28860-2-osalvador@suse.de> X-Stat-Signature: 5x5tdecgjxgp5ecu4s6oy7877hwa5g84 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: E6F992BC1 Received-SPF: none (suse.com>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613574019-99444 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 17-02-21 11:08:15, Oscar Salvador wrote: [...] > +static bool alloc_and_dissolve_huge_page(struct hstate *h, struct page *page) > +{ > + gfp_t gfp_mask = htlb_alloc_mask(h); > + nodemask_t *nmask = &node_states[N_MEMORY]; > + struct page *new_page; > + bool ret = false; > + int nid; > + > + spin_lock(&hugetlb_lock); > + /* > + * Check one more time to make race-window smaller. > + */ > + if (!PageHuge(page)) { > + /* > + * Dissolved from under our feet. > + */ > + spin_unlock(&hugetlb_lock); > + return true; > + } Is this really necessary? dissolve_free_huge_page will take care of this and the race windown you are covering is really tiny. > + > + nid = page_to_nid(page); > + spin_unlock(&hugetlb_lock); > + > + /* > + * Before dissolving the page, we need to allocate a new one, > + * so the pool remains stable. > + */ > + new_page = alloc_fresh_huge_page(h, gfp_mask, nid, nmask, NULL); wrt. fallback to other zones, I haven't realized that the primary usecase is a form of memory offlining (from virt-mem). I am not yet sure what the proper behavior is in that case but if breaking hugetlb pools, similar to the normal hotplug operation, is viable then this needs a special mode. We do not want a random alloc_contig_range user to do the same. So for starter I would go with __GFP_THISNODE here. > + if (new_page) { > + /* > + * Ok, we got a new free hugepage to replace this one. Try to > + * dissolve the old page. > + */ > + if (!dissolve_free_huge_page(page)) { > + ret = true; > + } else if (dissolve_free_huge_page(new_page)) { > + /* > + * Seems the old page could not be dissolved, so try to > + * dissolve the freshly allocated page. If that fails > + * too, let us count the new page as a surplus. Doing so > + * allows the pool to be re-balanced when pages are freed > + * instead of enqueued again. > + */ > + spin_lock(&hugetlb_lock); > + h->surplus_huge_pages++; > + h->surplus_huge_pages_node[nid]++; > + spin_unlock(&hugetlb_lock); > + } > + /* > + * Free it into the hugepage allocator > + */ > + put_page(new_page); > + } > + > + return ret; > +} > + > +bool isolate_or_dissolve_huge_page(struct page *page) > +{ > + struct hstate *h = NULL; > + struct page *head; > + bool ret = false; > + > + spin_lock(&hugetlb_lock); > + if (PageHuge(page)) { > + head = compound_head(page); > + h = page_hstate(head); > + } > + spin_unlock(&hugetlb_lock); > + > + if (!h) > + /* > + * The page might have been dissolved from under our feet. > + * If that is the case, return success as if we dissolved it > + * ourselves. > + */ > + return true; nit I would put the comment above the conditin for both cases. It reads more easily that way. At least without { }. > + > + if (hstate_is_gigantic(h)) > + /* > + * Fence off gigantic pages as there is a cyclic dependency > + * between alloc_contig_range and them. > + */ > + return ret; > + > + if(!page_count(head) && alloc_and_dissolve_huge_page(h, head)) > + ret = true; > + > + return ret; > +} > + > struct page *alloc_huge_page(struct vm_area_struct *vma, > unsigned long addr, int avoid_reserve) > { Other than that I haven't noticed any surprises. -- Michal Hocko SUSE Labs