From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE8E2C43334 for ; Mon, 18 Jul 2022 06:18:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56F636B0089; Mon, 18 Jul 2022 02:18:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4F7D76B008A; Mon, 18 Jul 2022 02:18:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39C966B008C; Mon, 18 Jul 2022 02:18:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 242076B0089 for ; Mon, 18 Jul 2022 02:18:13 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E014E33C7E for ; Mon, 18 Jul 2022 06:18:12 +0000 (UTC) X-FDA: 79699215624.04.7359504 Received: from alexa-out-sd-02.qualcomm.com (alexa-out-sd-02.qualcomm.com [199.106.114.39]) by imf29.hostedemail.com (Postfix) with ESMTP id 40F4012004B for ; Mon, 18 Jul 2022 06:18:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; i=@quicinc.com; q=dns/txt; s=qcdkim; t=1658125092; x=1689661092; h=date:from:to:cc:subject:message-id:references: mime-version:in-reply-to; bh=eG2wLKtE7ob+1Kxa4ilGbg0ocyZHYcoie/+7T0DieRM=; b=LBMbdtQm96P5hCSw7jQ6HITlNpF4U2PAOZtIb++Rnu2k5ehj+/43JgLy awWgM6NVbrNu3HZVKGHeKvD8gvhR9GHsbluJnti4TWZ8qDbLl2TzF2bja EeVABnhw+ZxTycq1P0LZcyQlqnGSmIO6KXWbiRu6+ckhH0WQyE+H4lY19 Q=; Received: from unknown (HELO ironmsg05-sd.qualcomm.com) ([10.53.140.145]) by alexa-out-sd-02.qualcomm.com with ESMTP; 17 Jul 2022 23:18:11 -0700 X-QCInternal: smtphost Received: from nasanex01c.na.qualcomm.com ([10.47.97.222]) by ironmsg05-sd.qualcomm.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jul 2022 23:18:10 -0700 Received: from nalasex01a.na.qualcomm.com (10.47.209.196) by nasanex01c.na.qualcomm.com (10.47.97.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Sun, 17 Jul 2022 23:18:10 -0700 Received: from hu-pkondeti-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Sun, 17 Jul 2022 23:11:24 -0700 Date: Mon, 18 Jul 2022 11:41:20 +0530 From: Pavan Kondeti To: Charan Teja Kalla CC: , , , , , , , , , , , , Subject: Re: [PATCH] mm: fix use-after free of page_ext after race with memory-offline Message-ID: <20220718061120.GA8922@hu-pkondeti-hyd.qualcomm.com> References: <1657810063-28938-1-git-send-email-quic_charante@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <1657810063-28938-1-git-send-email-quic_charante@quicinc.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658125092; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eG2wLKtE7ob+1Kxa4ilGbg0ocyZHYcoie/+7T0DieRM=; b=DtJBe4nm7BAIZIMRILKQHZLpqr/9NjOQbj12VBsU+vgW0h7CWWQ/mmJC/kXMuHK03xKyeS /WVNzN7vw8Xrnje0c5N8VFr+QaJmw+acgdbbikKI/bQXuJvii6A14tQa0xgmY7F+/EsjK+ yyGv5E78sI2AjKcXBtppw9X27O/n2YY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658125092; a=rsa-sha256; cv=none; b=SRBtmXP4DW10jUftqoF1DiTro63skX9K/kptsg0ztkhM3AKWdh4tnBHo8tF9Fcn8J7Ngeo W/N7PMuWzfOlt05Zqdlun4DzQII+LhK0bf8KZXn1VCOrZrK4i45a4Fg7K2tOEwIeMZsNQ/ nV1DDLKKCSaIifSzILgRs214VciZOgM= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcdkim header.b=LBMbdtQm; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf29.hostedemail.com: domain of quic_pkondeti@quicinc.com designates 199.106.114.39 as permitted sender) smtp.mailfrom=quic_pkondeti@quicinc.com X-Stat-Signature: mrp93fgqhebcksntdkr3xp1rc5hh8ffp X-Rspam-User: X-Rspamd-Queue-Id: 40F4012004B X-Rspamd-Server: rspam10 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcdkim header.b=LBMbdtQm; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf29.hostedemail.com: domain of quic_pkondeti@quicinc.com designates 199.106.114.39 as permitted sender) smtp.mailfrom=quic_pkondeti@quicinc.com X-HE-Tag: 1658125092-670082 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Charan, On Thu, Jul 14, 2022 at 08:17:43PM +0530, Charan Teja Kalla wrote: > The below is one path where race between page_ext and offline of the > respective memory blocks will cause use-after-free on the access of > page_ext structure. > > process1 process2 > --------- --------- > a)doing /proc/page_owner doing memory offline > through offline_pages. > > b)PageBuddy check is failed > thus proceed to get the > page_owner information > through page_ext access. > page_ext = lookup_page_ext(page); > > migrate_pages(); > ................. > Since all pages are successfully > migrated as part of the offline > operation,send MEM_OFFLINE notification > where for page_ext it calls: > offline_page_ext()--> > __free_page_ext()--> > free_page_ext()--> > vfree(ms->page_ext) > mem_section->page_ext = NULL > > c) Check for the PAGE_EXT flags > in the page_ext->flags access > results into the use-after-free(leading > to the translation faults). > > As mentioned above, there is really no synchronization between page_ext > access and its freeing in the memory_offline. > > The memory offline steps(roughly) on a memory block is as below: > 1) Isolate all the pages > 2) while(1) > try free the pages to buddy.(->free_list[MIGRATE_ISOLATE]) > 3) delete the pages from this buddy list. > 4) Then free page_ext.(Note: The struct page is still alive as it is > freed only during hot remove of the memory which frees the memmap, which > steps the user might not perform). > > This design leads to the state where struct page is alive but the struct > page_ext is freed, where the later is ideally part of the former which > just representing the page_flags. This seems to be a wrong design where > 'struct page' as a whole is not accessible(Thanks to Minchan for > pointing this out). > > The above mentioned race is just one example __but the problem persists > in the other paths too involving page_ext->flags access(eg: > page_is_idle())__. Since offline waits till the last reference on the > page goes down i.e. any path that took the refcount on the page can make > the memory offline operation to wait. Eg: In the migrate_pages() > operation, we do take the extra refcount on the pages that are under > migration and then we do copy page_owner by accessing page_ext. For > > Fix those paths where offline races with page_ext access by maintaining > synchronization with rcu lock. > > Thanks to David Hildenbrand for his views/suggestions on the initial > discussion[1]. > > [1] https://lore.kernel.org/linux-mm/59edde13-4167-8550-86f0-11fc67882107@quicinc.com/ > > Signed-off-by: Charan Teja Kalla > --- > include/linux/page_ext.h | 19 +++++++++++++++++++ > include/linux/page_idle.h | 40 ++++++++++++++++++++++++++++++---------- > mm/page_ext.c | 3 ++- > mm/page_owner.c | 18 +++++++++++++++--- > mm/page_table_check.c | 10 +++++++--- > 5 files changed, 73 insertions(+), 17 deletions(-) > > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index fabb2e1..df5d353 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -64,6 +64,25 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr) > return next; > } > > +static inline struct page_ext *get_page_ext(struct page *page) > +{ > + struct page_ext *page_ext; > + > + rcu_read_lock(); > + page_ext = lookup_page_ext(page); > + if (!page_ext) { > + rcu_read_unlock(); > + return NULL; > + } > + > + return page_ext; > +} > + > +static inline void put_page_ext(void) > +{ > + rcu_read_unlock(); > +} > + Would it be a harm if we make lookup_page_ext() completely a private function? Or is there any codepath that have the benefit of calling lookup_page_ext() without going through get_page_ext()? If that is the case, we should add RCU lockdep check inside lookup_page_ext() to make sure that this function is called with RCUs. Thanks, Pavan