From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28F07C433DB for ; Wed, 6 Jan 2021 06:06:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 70DFD22CA2 for ; Wed, 6 Jan 2021 06:06:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 70DFD22CA2 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B845D8D00EA; Wed, 6 Jan 2021 01:06:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B333A8D0090; Wed, 6 Jan 2021 01:06:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FBE18D00EA; Wed, 6 Jan 2021 01:06:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id 8A3828D0090 for ; Wed, 6 Jan 2021 01:06:10 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 54725180AD811 for ; Wed, 6 Jan 2021 06:06:10 +0000 (UTC) X-FDA: 77674314900.02.level28_49105c3274df Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 3A65D10097AA2 for ; Wed, 6 Jan 2021 06:06:10 +0000 (UTC) X-HE-Tag: level28_49105c3274df X-Filterd-Recvd-Size: 7122 Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Wed, 6 Jan 2021 06:06:09 +0000 (UTC) Received: by mail-pf1-f178.google.com with SMTP id h186so1133550pfe.0 for ; Tue, 05 Jan 2021 22:06:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Hu3XTCtN8+YlWQXGEuBkVBXbO4gwbn3nfLwvPmsRKvE=; b=xTJyxzSUznkoxx6Uwj7tYJINTN1kUnIuZP6iOJsl/uKjWgk4rnAG9ZZSRjNzCAgUv6 hiCJttv3gwWCIZD5+0KM+FYT4QQCJyxShxktYpHj+YF+LYamgobBy/iVIxD6NrF3JYkd aTShqLeRcIVOISgA1wHRoMPfAv/MlG4jed2sNcKbocw+5S9sDh7Bclb8ACNh6JaV/heP BRVhn601+nGrWOHVnTqxMkYkGhg+1tVHu4/+75VffA/TUiTD/EwUiudJp2AUlp8BCB/s lrWA3aVjdl+nPn9mi224ztrpJlv7weRwPr/TDF40GbMjeF6F9FFL/sYMi4+S167lD4r4 6CVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Hu3XTCtN8+YlWQXGEuBkVBXbO4gwbn3nfLwvPmsRKvE=; b=PR/+hkT/oYfWmVIhhgcpR+OAVuziC2f8PjVXr0UudWUv1ek9R6JfVlKvGOdLEME6Qg wm4s2pOr2UPbAQBs+U7oHghQe1PHS75CLh/F+6/98QGdklny6QCCE18SFH6rrCHVKANs +qUIWMWmKUR0UCPdodTfSfOX1aRBlGphtXE7QliZhjwXH0kcmrjDO6Zcypv2asdYBX6G lTuumUAZcx4W9fvctvONMLdWeVoK5AABSUZoGfDwA8ELCw2o8ACYP05AtfmvAsw/Zt0/ btggW/jtI+xhmQ6okaifDvJXZBnCvFCvXE+1FanDSwjjthQuHbFP47gCk1qiwbhgGicB TqDg== X-Gm-Message-State: AOAM532R6XHzTEJnL7x+7FwbZCA6Q8/M4voD2zq5SemQ+b+8qaNoBhBE GLMuD1WB0JrkslhGDNcCNX96OMgZQ+XqRsBqHSJxLg== X-Google-Smtp-Source: ABdhPJw7cRMbBCXW5SN511upyvlIXqdnjQhMIt6wAKK3G2xGPCLv6v5Vt2RI6LGcqhI16TbiNv3xihfQCaXgH/6x570= X-Received: by 2002:a63:50a:: with SMTP id 10mr2810393pgf.273.1609913168372; Tue, 05 Jan 2021 22:06:08 -0800 (PST) MIME-Version: 1.0 References: <20210104065843.5658-1-songmuchun@bytedance.com> <20210104065843.5658-3-songmuchun@bytedance.com> <19e24147-f3cd-c6d2-dd0a-57e6182d60d8@oracle.com> In-Reply-To: <19e24147-f3cd-c6d2-dd0a-57e6182d60d8@oracle.com> From: Muchun Song Date: Wed, 6 Jan 2021 14:05:30 +0800 Message-ID: Subject: Re: [External] Re: [PATCH 3/6] mm: hugetlb: fix a race between freeing and dissolving the page To: Mike Kravetz Cc: Andrew Morton , Naoya Horiguchi , Andi Kleen , mhocko@suse.cz, Linux Memory Management List , LKML Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jan 6, 2021 at 7:22 AM Mike Kravetz wrote: > > On 1/4/21 6:55 PM, Muchun Song wrote: > > On Tue, Jan 5, 2021 at 8:02 AM Mike Kravetz wrote: > >> > >> On 1/3/21 10:58 PM, Muchun Song wrote: > >>> There is a race condition between __free_huge_page() > >>> and dissolve_free_huge_page(). > >>> > >>> CPU0: CPU1: > >>> > >>> // page_count(page) == 1 > >>> put_page(page) > >>> __free_huge_page(page) > >>> dissolve_free_huge_page(page) > >>> spin_lock(&hugetlb_lock) > >>> // PageHuge(page) && !page_count(page) > >>> update_and_free_page(page) > >>> // page is freed to the buddy > >>> spin_unlock(&hugetlb_lock) > >>> spin_lock(&hugetlb_lock) > >>> clear_page_huge_active(page) > >>> enqueue_huge_page(page) > >>> // It is wrong, the page is already freed > >>> spin_unlock(&hugetlb_lock) > >>> > >>> The race windows is between put_page() and spin_lock() which > >>> is in the __free_huge_page(). > >>> > >>> We should make sure that the page is already on the free list > >>> when it is dissolved. > >>> > >>> Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") > >>> Signed-off-by: Muchun Song > >>> --- > >>> mm/hugetlb.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ > >>> 1 file changed, 48 insertions(+) > >>> > >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c > >>> index 1f3bf1710b66..72608008f8b4 100644 > >>> --- a/mm/hugetlb.c > >>> +++ b/mm/hugetlb.c > >>> @@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock); > >>> static int num_fault_mutexes; > >>> struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; > >>> > >>> +static inline bool PageHugeFreed(struct page *head) > >>> +{ > >>> + return page_private(head) == -1UL; > >> > >> return page_private(head + 4) == -1UL; > >> > >>> +} > >>> + > >>> +static inline void SetPageHugeFreed(struct page *head) > >>> +{ > >>> + set_page_private(head + 4, -1UL); > >>> +} > >>> + > >>> +static inline void ClearPageHugeFreed(struct page *head) > >>> +{ > >>> + set_page_private(head + 4, 0); > >>> +} > >> > >> It is unfortunate that we can not use some existing value like > >> page_huge_active() to determine if dissolve_free_huge_page() should > >> proceed with freeing the page to buddy. If the existing check, > >> > >> if (!page_count(page)) { > >> > >> was changed to > >> > >> if (!page_count(page) && !page_huge_active(page)) { > >> > >> the race window would be shrunk. However, the most straight forward > >> way to fully close the window is with the approach taken here. > > > > I also thought about this fix. But this is not enough. Because > > we just call put_page to free the HugeTLB page without > > setting activeness in some place (e.g. error handling > > routines). > > > > If we use page_huge_active, we should set activeness > > before put_page. But we cannot guarantee this. > > Just FYI, > I went back and explored the option of doing set_page_huge_active > when a page was put on the active list and clear_page_huge_active > when put on the free list. This would be much like what you are > doing with PageHugeFreed. Commit bcc54222309c which added page_huge_active > implied that this was possible. Then I remembered a race fixed in > cb6acd01e2e4 that required delaying the call to set_page_huge_active > in hugetlb_no_page. So, such a scheme would not work. Sounds like a tortuous story. :) > > Also, > It seems we could use head[3].mapping for PageHugeFreed ? Not much > of an advantage. It does not add another tail page needed to store > page metadata. And, this fits within the already defined > HUGETLB_CGROUP_MIN_ORDER. It is fine to me. Will do. Thanks. > -- > Mike Kravetz