From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34931C43334 for ; Wed, 20 Jul 2022 15:23:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C6F206B0071; Wed, 20 Jul 2022 11:23:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C1E116B0073; Wed, 20 Jul 2022 11:23:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B0C736B0074; Wed, 20 Jul 2022 11:23:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A37C86B0071 for ; Wed, 20 Jul 2022 11:23:06 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 73A77805C5 for ; Wed, 20 Jul 2022 15:23:06 +0000 (UTC) X-FDA: 79707846372.23.E429B4E Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf23.hostedemail.com (Postfix) with ESMTP id ECFED140016 for ; Wed, 20 Jul 2022 15:23:04 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 99D5722D4A; Wed, 20 Jul 2022 15:23:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1658330583; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=IMuZ4eCnSI6oHdQmcEsbMFnhYCl7kv5zScxjJLVRWrw=; b=vZOWC9U/s5bq/xWLeJXyMWazRGZv6L2pDO9EkmnjUPOEcqj58kJFdrNa88MRUAWcoyWp8s JHOx4sw/wO7FHGKDM8FQHtWIUwLHb7bCK28bcI39UN/oX1d5AOP3Z5Wqn/pNS/luHCLdSo uLljwqVuTEPWxS4nRbksJpQEZKvsIUo= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 82E0A2C141; Wed, 20 Jul 2022 15:22:59 +0000 (UTC) Date: Wed, 20 Jul 2022 17:22:58 +0200 From: Michal Hocko To: Charan Teja Kalla Cc: akpm@linux-foundation.org, pasha.tatashin@soleen.com, sjpark@amazon.de, sieberf@amazon.com, shakeelb@google.com, dhowells@redhat.com, willy@infradead.org, vbabka@suse.cz, david@redhat.com, minchan@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, "iamjoonsoo.kim@lge.com" , Pavan Kondeti Subject: Re: [PATCH] mm: fix use-after free of page_ext after race with memory-offline Message-ID: References: <1657810063-28938-1-git-send-email-quic_charante@quicinc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1658330585; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IMuZ4eCnSI6oHdQmcEsbMFnhYCl7kv5zScxjJLVRWrw=; b=eDF4Dbh3MMCl/6/MPkK/jnP1a8rjkNK3Xdt1DVuo7x8WCzqVvYlGYCPLQL738IvB/zeQvn oXLE3DnzInfZow+NI9PwFQvYp3YXhinnDZ641MQ9e5aQzo5A+3Vo+YBe6fesNQqtw+EdWX mM4rZKIm2ffQbMYQP/RfBTYE5L+KItk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1658330585; a=rsa-sha256; cv=none; b=8KcfnBWNjNPDGFlKPRdwg8sFvyTjIDK93/minZufWEnPrNlGU77xmTlyC8J5JugITAvy4u 8ShUxf+nksF+jAnISzBnv2XH4BXObvxqUm1Pknk3wOxo/kNihdifAh7Z2gMGSTInnlDGuD sdBrqRTbYwV9oosLjLNY16YV/LRgnI4= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="vZOWC9U/"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf23.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Rspamd-Queue-Id: ECFED140016 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="vZOWC9U/"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf23.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Rspamd-Server: rspam12 X-Rspam-User: X-Stat-Signature: 35mpxnq1adkdy5fcskobj1t7erap76hc X-HE-Tag: 1658330584-160890 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 20-07-22 20:38:58, Charan Teja Kalla wrote: > Thanks Michal here!! > > On 7/19/2022 9:13 PM, Michal Hocko wrote: > >>>>>> diff --git a/mm/page_ext.c b/mm/page_ext.c > >>>>>> index 3dc715d..5ccd3ee 100644 > >>>>>> --- a/mm/page_ext.c > >>>>>> +++ b/mm/page_ext.c > >>>>>> @@ -299,8 +299,9 @@ static void __free_page_ext(unsigned long pfn) > >>>>>> if (!ms || !ms->page_ext) > >>>>>> return; > >>>>>> base = get_entry(ms->page_ext, pfn); > >>>>>> - free_page_ext(base); > >>>>>> ms->page_ext = NULL; > >>>>>> + synchronize_rcu(); > >>>>>> + free_page_ext(base); > >>>>>> } > >>>>> So you are imposing the RCU grace period for each page_ext! This can get > >>>>> really expensive. Have you tried to measure the effect? > >>> I was wrong here! This is for each memory section which is not as > >>> terrible as every single page_ext. This can be still quite a lot memory > >>> sections in a single memory block (e.g. on ppc memory sections are > >>> ridiculously small). > >>> > >> On the ARM64, I see that the minimum a section size will go is 128MB. I > >> think 16MB is the section size on ppc. Any inputs on how frequently > >> offline/online operation is being done on this ppc arch? > > I have seen several reports where 16MB sections were used on PPC LPARs > > with a non trivial size. My usual answer to that is tha this is mostly a > > self inflicted injury but I am told that for some reasons I cannot > > udnerstand this is not easy to change. So reasonable or not this is not > > all that uncommon in PPC land. > > > > We definitely shouldn't optimize for those setups but we shouldn't make > > them suffer even more as well. Besides that it seems that a single > > rcu_synchronize per offline operation should be doable. > > I too feel it is doable but the code might look to need to traverse the > sections of a memory block twice. yes, this is to be expected unless you really want to optimize that further by some global flag which is probably not worth it. Traversing the sequence of section is not really an expensive operation comparing to do an expenensive operation to each of the iteration. > 1) with synchronize_rcu() calling for each memory section of a memblock: > --------------------------------------------------------------------- > for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) > __free_page_ext(pfn): > ms->page_ext = NULL > synchronize_rcu();// Called on every section. > free_page_ext();//free the page_ext. > > 2) With a single synchronize_rcu() for each offlined block: > ------------------------------------------------------- > for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) { > __free_page_ext(pfn): > ms->page_ext = NULL; you want WRITE_ONCE here > } > synchronize_rcu(); // call synchronize_rcu for just once > for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) > free_page_ext(); // Free the page_ext. Yes this is what I really meant but the thing is that you need the pointer to know what to free. One way to handle that would be to not clear the pointer but make it visibly invalid for later checks. E.g. set the lowest bit and check for it rather for NULL. > Any better code you have in mind here please? > > But since there are few sections the overhead of traversing them will be > definitely less compared to synchronize_rcu() per memsection. > > But I really doubt if there will be a real impact making sync_rcu per > section because,(as david also mentioned and you also corrected it I > think), the concern here is for ppc where the min memblock size is 256M > with section size of 16M and there is a single offline operation on that > block but can end up in calling 16 sync_rcu() calls. Should we really > optimize this case here? If yes, I can go with the approach 2) mentioned > above. Sorry if I am really undermining the problem here. I would prefer a single rcu barrier solution. Unless this turns into monster complicated stuff. Which I hope we can avoid completely by simply not doing the separate lifetime as well... Making any assumptions on the number of sections that are handled here is just a ticket to get issues later. Let's not put land mines ahead of us please. -- Michal Hocko SUSE Labs