From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2AE4C43460 for ; Wed, 21 Apr 2021 08:41:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5F9D061446 for ; Wed, 21 Apr 2021 08:41:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5F9D061446 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BB6566B0070; Wed, 21 Apr 2021 04:41:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B68C36B0071; Wed, 21 Apr 2021 04:41:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2CC36B0072; Wed, 21 Apr 2021 04:41:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id 85F3B6B0070 for ; Wed, 21 Apr 2021 04:41:51 -0400 (EDT) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 43AE32C6D for ; Wed, 21 Apr 2021 08:41:51 +0000 (UTC) X-FDA: 78055731222.13.9FCE147 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by imf18.hostedemail.com (Postfix) with ESMTP id 3770B200024D for ; Wed, 21 Apr 2021 08:41:50 +0000 (UTC) Received: by mail-pg1-f169.google.com with SMTP id p2so13662918pgh.4 for ; Wed, 21 Apr 2021 01:41:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=7oNq/GoX/wWyLVMqPPyPFw2WqRnLII9SMnWeBDHEYWM=; b=UH3HvBtxuqUSRCRg7af5/NPOrhED4cbywm/RTd6mU8QwXHcZttyQmIrvmoLFsTj5b+ 9WvKBjrF8BTbQNYhgitrYasQ3jOxo69pf9TvvDhclm/SuuSNJN/i9bQ1A+TlG0pC2Nbo ERS8J0dML+Pb/MAVuou1GQqerAju9NroN8OzQYQbOHIsPzvXb+SdeMDKcHzjDeKE/ZjA u/NEirp2CXwJU0wy7BM02lFlHWuhy1kOHjjUawQh59xLCamL+LvVJoj1rLCiyy8gAynU grXR5qWZCNV/JBpthPCfyG6WBraQsfJI8DjC3L3IP5BF48CIcHrVIMisYS7ye98B9dvU Wq7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=7oNq/GoX/wWyLVMqPPyPFw2WqRnLII9SMnWeBDHEYWM=; b=pWDG9mhJj2xS7Xv1znJxlZWAvr5hp+bcyYK+czVgBkr00wT79ktmVEgsyPHjNqxSB6 SYeeNzepDZgKxBDu9baVBUTIQpA2xY11pa4CWVL77I+KQ0Hftvfg2H/XNyDyiCH7rPKG d83kEylzpccQ5C0uEizX7QiBSi0R4rwq4WCtsGXN/0sNGIDgqxln4zY2GDkm+VhYaQFY 6S+UQaV02ysQVIcOJZ3tMdnc6tQogGOYK1Y9WtvEDr2+aEoNJ8dXpTXv4hseswRRcEPZ Vw7y5TRFsbirytKJsv3PPUN1jjdWSFj/nj+8IFpe7EwSel/ryEPLy5Q2JtyFFaWSPY2R 1ZbQ== X-Gm-Message-State: AOAM531DQgKYKWEtPBCOKltcV9CSQ+C+uVnXAWHHkQ/PdPqaffPEJy9M JUvdjFBzhvAEdJe/zJEBcsSkCg63QHRLV32yMUeGFA== X-Google-Smtp-Source: ABdhPJyKinwWDm4DQYXcFdrKAq8xyp3HIKdOBPENuNYeXBT1tsmLHOAaBNvNmn6iP9jjqZC6um0la3MYOIJ/9AnUwDA= X-Received: by 2002:a63:1665:: with SMTP id 37mr21286782pgw.31.1618994507206; Wed, 21 Apr 2021 01:41:47 -0700 (PDT) MIME-Version: 1.0 References: <20210421060259.67554-1-songmuchun@bytedance.com> <20210421082103.GE22456@linux> In-Reply-To: <20210421082103.GE22456@linux> From: Muchun Song Date: Wed, 21 Apr 2021 16:41:10 +0800 Message-ID: Subject: Re: [External] Re: [PATCH] mm: hugetlb: fix a race between memory-failure/soft_offline and gather_surplus_pages To: Oscar Salvador Cc: Michal Hocko , Mike Kravetz , Andrew Morton , Linux Memory Management List , LKML , Naoya Horiguchi Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 3770B200024D X-Stat-Signature: 55kcn7r7r6xbd18fio5kyfna9x1xwekc Received-SPF: none (bytedance.com>: No applicable sender policy available) receiver=imf18; identity=mailfrom; envelope-from=""; helo=mail-pg1-f169.google.com; client-ip=209.85.215.169 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618994510-598323 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 21, 2021 at 4:21 PM Oscar Salvador wrote: > > On Wed, Apr 21, 2021 at 04:15:00PM +0800, Muchun Song wrote: > > > The hwpoison side of this looks really suspicious to me. It shouldn't > > > really touch the reference count of hugetlb pages without being very > > > careful (and having hugetlb_lock held). What would happen if the > > > reference count was increased after the page has been enqueed into the > > > pool? This can just blow up later. > > > > If the page has been enqueued into the pool, then the page can be > > allocated to other users. The page reference count will be reset to > > 1 in the dequeue_huge_page_node_exact(). Then memory-failure > > will free the page because of put_page(). This is wrong. Because > > there is another user. > > Note that dequeue_huge_page_node_exact() will not hand over any pages > which are poisoned, so in this case it will not be allocated. But softoffline does not set page hwpoison before __get_hwpoison_page(). So the page still can be allocated. Right? > But it is true that we might need hugetlb lock, this needs some more > thought. > > I will have a look. > > -- > Oscar Salvador > SUSE L3