From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20327C6FA82 for ; Thu, 22 Sep 2022 05:40:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 746F86B0071; Thu, 22 Sep 2022 01:40:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CF3F6B0072; Thu, 22 Sep 2022 01:40:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5975F940007; Thu, 22 Sep 2022 01:40:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 43E926B0071 for ; Thu, 22 Sep 2022 01:40:06 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1D72F1A090D for ; Thu, 22 Sep 2022 05:40:06 +0000 (UTC) X-FDA: 79938620412.14.FDA9F1E Received: from out2.migadu.com (out2.migadu.com [188.165.223.204]) by imf20.hostedemail.com (Postfix) with ESMTP id 5F7E71C0010 for ; Thu, 22 Sep 2022 05:40:05 +0000 (UTC) Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1663825203; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=94DEUSFr4kTu5Sug7viZtZYQ+w/g2uT/ObUqOKkJPsc=; b=u7z6QmrDptw811X7xanN1IDmgeRsqwCsj37KtmqMwBhTWxBxhsuXPvO1LpdAWaP9+C/2A2 RF6PyrKpiJUZBrlWUOTMDLIwfn88KW5wBvFDdAJlFXI/qnqNtUfn+VM4uD/ydA5jDTiLrq NU7GxA5wIkot3vMeKSV/kgmbfhTiMY0= MIME-Version: 1.0 Subject: Re: [PATCH v3] hugetlb: freeze allocated pages before creating hugetlb pages X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20220921202702.106069-1-mike.kravetz@oracle.com> Date: Thu, 22 Sep 2022 13:39:55 +0800 Cc: Linux MM , linux-kernel@vger.kernel.org, Muchun Song , Joao Martins , Matthew Wilcox , Michal Hocko , Peter Xu , Miaohe Lin , Oscar Salvador , Naoya Horiguchi , Vlastimil Babka , Andrew Morton Content-Transfer-Encoding: quoted-printable Message-Id: <5248E364-6EA7-4CEA-AD16-9792424D5D5B@linux.dev> References: <20220921202702.106069-1-mike.kravetz@oracle.com> To: Mike Kravetz X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663825205; a=rsa-sha256; cv=none; b=TORRSYWXyGKs3hx5H+njC48o9WSVfPBN6VtxyPHUWrVQ5IlvSdrgIxYuGLWsQRE0fzZBc5 jddYQ0ZE3J4o2g2S5zMfydSKg2B6jjXllaEs42AUuN/hZvWyB9MY08kFE9R+NMGdGy1HZI dmoWKqYP0h2i4M5692eo7Q8BPwPbNlE= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=u7z6QmrD; spf=pass (imf20.hostedemail.com: domain of muchun.song@linux.dev designates 188.165.223.204 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663825205; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=94DEUSFr4kTu5Sug7viZtZYQ+w/g2uT/ObUqOKkJPsc=; b=NtrzFLMW8udsitLaTDxx6iiZ59LCNxP8aoPgFnRr7NUONN4Lfr/AbI9CSGKE2+2wD8jV/c Lq6PI5eSnhAwfsqQZcIfDUkftrgNn84JLUUyLW5i9V/5gRWwQPB11b6NvyP0wjSZQo/JFw /rz79FGyGKEj8lfYbc4cgV+3YVeZT58= X-Rspamd-Queue-Id: 5F7E71C0010 X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=u7z6QmrD; spf=pass (imf20.hostedemail.com: domain of muchun.song@linux.dev designates 188.165.223.204 as permitted sender) smtp.mailfrom=muchun.song@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Rspamd-Server: rspam11 X-Stat-Signature: cw5jgmoq76ddoqkb1zoncd145jey9xm9 X-HE-Tag: 1663825205-906747 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Sep 22, 2022, at 04:27, Mike Kravetz = wrote: >=20 > When creating hugetlb pages, the hugetlb code must first allocate > contiguous pages from a low level allocator such as buddy, cma or > memblock. The pages returned from these low level allocators are > ref counted. This creates potential issues with other code taking > speculative references on these pages before they can be transformed = to > a hugetlb page. This issue has been addressed with methods and code > such as that provided in [1]. >=20 > Recent discussions about vmemmap freeing [2] have indicated that it > would be beneficial to freeze all sub pages, including the head page > of pages returned from low level allocators before converting to a > hugetlb page. This helps avoid races if we want to replace the page > containing vmemmap for the head page. >=20 > There have been proposals to change at least the buddy allocator to > return frozen pages as described at [3]. If such a change is made, it > can be employed by the hugetlb code. However, as mentioned above > hugetlb uses several low level allocators so each would need to be > modified to return frozen pages. For now, we can manually freeze the > returned pages. This is done in two places: > 1) alloc_buddy_huge_page, only the returned head page is ref counted. > We freeze the head page, retrying once in the VERY rare case where > there may be an inflated ref count. > 2) prep_compound_gigantic_page, for gigantic pages the current code > freezes all pages except the head page. New code will simply freeze > the head page as well. >=20 > In a few other places, code checks for inflated ref counts on newly > allocated hugetlb pages. With the modifications to freeze after > allocating, this code can be removed. >=20 > After hugetlb pages are freshly allocated, they are often added to the > hugetlb free lists. Since these pages were previously ref counted, = this > was done via put_page() which would end up calling the hugetlb > destructor: free_huge_page. With changes to freeze pages, we simply > call free_huge_page directly to add the pages to the free list. >=20 > In a few other places, freshly allocated hugetlb pages were = immediately > put into use, and the expectation was they were already ref counted. = In > these cases, we must manually ref count the page. >=20 > [1] = https://lore.kernel.org/linux-mm/20210622021423.154662-3-mike.kravetz@orac= le.com/ > [2] = https://lore.kernel.org/linux-mm/20220802180309.19340-1-joao.m.martins@ora= cle.com/ > [3] = https://lore.kernel.org/linux-mm/20220809171854.3725722-1-willy@infradead.= org/ >=20 > Signed-off-by: Mike Kravetz Thanks Mike. Reviewed-by: Muchun Song