From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 304D6C00144 for ; Mon, 1 Aug 2022 08:27:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ABCAD6B0071; Mon, 1 Aug 2022 04:27:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A6D7E6B0072; Mon, 1 Aug 2022 04:27:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9341F8E0001; Mon, 1 Aug 2022 04:27:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 847726B0071 for ; Mon, 1 Aug 2022 04:27:27 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4DEF1C087F for ; Mon, 1 Aug 2022 08:27:27 +0000 (UTC) X-FDA: 79750344534.13.9A22439 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf24.hostedemail.com (Postfix) with ESMTP id AC46B180105 for ; Mon, 1 Aug 2022 08:27:26 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 38C104D5D2; Mon, 1 Aug 2022 08:27:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1659342445; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=SdhTMIjPPoyPm4ytL5iSfFfAavCD7IKkQAlMtSUtLeU=; b=cgP3t2tXFUGdDQGWLqEOjh/oS6KIQ81XvO6+x3y60b18u+hx+j67HAFA7Vbz5xDoHKRf8C gnx7tOCviWhbwsym7YB9Gl3JwZsRpuwEmE5skkNWIRxxyg01XE8YUGu7CarwfQnNOIgQaU BoMnqdhmz5iUu41RCF8W7OZ0i3AoTjM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 0CBA713A72; Mon, 1 Aug 2022 08:27:25 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Fr5ZAG2O52JqKQAAMHmgww (envelope-from ); Mon, 01 Aug 2022 08:27:25 +0000 Date: Mon, 1 Aug 2022 10:27:24 +0200 From: Michal Hocko To: Charan Teja Kalla Cc: akpm@linux-foundation.org, david@redhat.com, quic_pkondeti@quicinc.com, pasha.tatashin@soleen.com, sjpark@amazon.de, sieberf@amazon.com, shakeelb@google.com, dhowells@redhat.com, willy@infradead.org, minchan@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH V2] mm: fix use-after free of page_ext after race with memory-offline Message-ID: References: <1658931303-17024-1-git-send-email-quic_charante@quicinc.com> <6b646ff2-b6f6-052e-f3f4-3bf05243f049@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6b646ff2-b6f6-052e-f3f4-3bf05243f049@quicinc.com> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659342446; a=rsa-sha256; cv=none; b=g6wTsLSj1sEQNsJ6b5rVdNvSBaDtgogun9LcORjpjQ3D3K9NIy4u2qgu/haqFlS0nbDXYD 4G40UyQMVs+w+RewFZbLMUN2h69R23B6PA8ewXB4s2XllL0n4Z39H0yBln6MGmsOmZysch JqWC8eeKFp5XG7YpyrhO3HCjca0yLl8= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=cgP3t2tX; spf=pass (imf24.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659342446; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SdhTMIjPPoyPm4ytL5iSfFfAavCD7IKkQAlMtSUtLeU=; b=GoeuN3VSFddHhd9mM77d3kLF+RA1sbRf41kyfksnFNBCnmlamjQOrVyzNPoSUnWDddse+l S2TcWUZqzI9SYUDadOkoDn2qf5xyRLZ++b+pJ1eZj5wrRO2I4oHaNq8YJS+2gxM4FWPVDT vWqF8bac5bOPzH+rS1hHmCl0EhQrEXw= Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=cgP3t2tX; spf=pass (imf24.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Stat-Signature: cmstjmws3j9znrshqp3ayb11ykz3f38b X-Rspamd-Queue-Id: AC46B180105 X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1659342446-31440 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 29-07-22 21:17:44, Charan Teja Kalla wrote: > Thanks Michal for the reviews!! > > On 7/28/2022 8:07 PM, Michal Hocko wrote: > >> FAQ's: > >> Q) Should page_ext_[get|put]() needs to be used for every page_ext > >> access? > >> A) NO, the synchronization is really not needed in all the paths of > >> accessing page_ext. One case is where extra refcount is taken on a > >> page for which memory block, this pages falls into, offline operation is > >> being performed. This extra refcount makes the offline operation not to > >> succeed hence the freeing of page_ext. Another case is where the page > >> is already being freed and we do reset its page_owner. > > This is just subtlety and something that can get misunderstood over > > time. Moreover there is no documentation explaining the difference. > > What is the reason to have these two different APIs in the first place. > > RCU read side is almost zero cost. So what is the point? > Currently not all the places where page_ext is being used is put under > the rcu_lock. I just used rcu lock in the places where it is possible to > have the use-after-free of page_ext. You recommend to use rcu lock while > using with page_ext in all the places? Yes. Using locking inconsistently just begs for future problems. There should be a very good reason to use lockless approach in some paths and that would be where the locking overhead is not really acceptable or when the locking cannot be used for other reasons. RCU read lock is essentially zero overhead so the only reason would be that the critical section would require to sleep. Is any of that the case? If there is a real need to have a lockless variant then I would propose to add __page_ext_get/put which would be lockless and clearly documented under which contexts it can be used and enfore those condictions (e.g. reference count assumption). > My only point here is since there may be a non-atomic context exist > across page_ext_get/put() and If users are sure that this page's > page_ext will not be freed by parallel offline operation, they need not > get the rcu lock. Existing users are probably easy to check but think about the future. Most developers (even a large part of the MM community) is not deeply familiar with the memory hotplug. Not to mention people do not tend to follow development in that area and assumptions might change. [...] > >> @@ -298,9 +300,26 @@ static void __free_page_ext(unsigned long pfn) > >> ms = __pfn_to_section(pfn); > >> if (!ms || !ms->page_ext) > >> return; > >> - base = get_entry(ms->page_ext, pfn); > >> + > >> + base = READ_ONCE(ms->page_ext); > >> + if (page_ext_invalid(base)) > >> + base = (void *)base - PAGE_EXT_INVALID; > > All page_ext accesses should use the same fetched pointer including the > > ms->page_ext check. Also page_ext_invalid _must_ be true here otherwise > > something bad is going on so I would go with > > if (WARN_ON_ONCE(!page_ext_invalid(base))) > > return; > > base = (void *)base - PAGE_EXT_INVALID; > > The roll back operation in the online_page_ext(), where we free the > allocated page_ext's, will not have the PAGE_EXT_INVALID flag thus > WARN() may not work here. no? Wouldn't ms->page_ext be NULL in that case? -- Michal Hocko SUSE Labs