From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04113C35274 for ; Mon, 18 Dec 2023 05:21:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54C006B007B; Mon, 18 Dec 2023 00:21:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FBCF6B007D; Mon, 18 Dec 2023 00:21:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C39F6B007E; Mon, 18 Dec 2023 00:21:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2AB416B007B for ; Mon, 18 Dec 2023 00:21:47 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id ECC3E14094C for ; Mon, 18 Dec 2023 05:21:46 +0000 (UTC) X-FDA: 81578791812.18.2926CD2 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) by imf03.hostedemail.com (Postfix) with ESMTP id 8A04D20006 for ; Mon, 18 Dec 2023 05:21:44 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=IAO2OXDE; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf03.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702876905; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7zPD0IAybpvHaUY2y+n2OYyigh1023kLxTQk5Y3iTzs=; b=NBnAIYbwkpMMJHnlsWXYl2VPjdAksIy1lHhhP6eODtsAJtKoSxCEEPkxTjFzRYLtl2xW1H hPOfujPiWisZWHQJMDT/MxV2cbpkhOvVJ7WeUqKV352SVFKfvvK8pmS6yujCEs7Z5QZugc rCTqF+ombNDo87Qt8NWCLTaesYeiOeE= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=IAO2OXDE; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf03.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702876905; a=rsa-sha256; cv=none; b=pPDlfn7JmPcdL3ChzzR97URhHGiycsU6jgvMpU/3oEOZllkli7N7Emeu6lShsmAVL1cGaC AOVSr7ysiD+b74JrksJefwFsIUfJmJC8+Ada5DwFR4iAkmt3ytVxt0xubXo1IG0nAFwwKj MdwY196nJW3HBLmJOsqoQ13oCwsj06M= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702876904; x=1734412904; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=pWAph+3ALoBihTN+qcMiiaqzpr8mUhWLiS9rsc3/Ews=; b=IAO2OXDEN8qQAMjFb9looqUMnhgKv//LTqqDm36jnz1Qt7r6BSGMlPbe ACnasR4OPAdIxroCF+YHq8mMRBBdKK2IsBNJUYh7LtMo7YQShxVMbtL6V OLG2KQir+PE6dUFJ8lh9jbShrLmLIEs55kOwEJR1+b+H9pZubFL4zRXeD /Vx4i3Rd9eaci4Zfil1vluydxW0IEFnNgYQ7MZwzJNuXXXhJsi3v2GtZb XlQZXuiNXQ1Bl5JMWWetgRkMahrMOsXJIbz02UX9MTFpcY0AB+0EnelFd e+A9MKIyvUdt31DWPSupQ+spW0j8rG2lpMDPUN2fMDCiq375i1n922Mry w==; X-IronPort-AV: E=McAfee;i="6600,9927,10927"; a="2283736" X-IronPort-AV: E=Sophos;i="6.04,284,1695711600"; d="scan'208";a="2283736" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2023 21:21:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10927"; a="866081170" X-IronPort-AV: E=Sophos;i="6.04,284,1695711600"; d="scan'208";a="866081170" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2023 21:21:40 -0800 From: "Huang, Ying" To: Baolin Wang Cc: , , , , , Subject: Re: [PATCH] mm: migrate: fix getting incorrect page mapping during page migration In-Reply-To: (Baolin Wang's message of "Fri, 15 Dec 2023 20:07:52 +0800") References: Date: Mon, 18 Dec 2023 13:19:41 +0800 Message-ID: <87wmtcnl3m.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 8A04D20006 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 3x8dmp11wbfa36a98zgzimky3wikhpnm X-HE-Tag: 1702876904-800119 X-HE-Meta: U2FsdGVkX1/XKaYn8UUme7+2tGn+zro7ODL5kMyEfcFahP6JIpPezQSjrWrJwRZGXhcii7QPMUYpG7+QvapNmF6uSF/SWslbJc+RzbIP8zWj7myB1T3JH+GUgiMuZh5nF76HO30mwuCRB77Ir5dBv3kN2tRg8iZW/bRjCwlH8TNSzBxTRmPrzjdkCcsYA1YoMJWbEO+JcxyBJMdB143b7deMlmv0xMwpdI7+7JhYK3O7WkD7q+OmVr4QMNw294ql4TJ1wAQV/Vkouc4BBNi7FejHwaM75tsesPjiu3X+UTTlzVuqkMcHqlI3YGQQnoMvXR9MVEIkDCaO1+xiB1T4C6xlcP4+0ZPqkLm3n2416yeH9bpjkR/SMuiHiqd2GvARFNLLNfpMjbCjI7gLVHi26G12n7MvOgCF/eQKJG7okdmFi1w6j/I3cAopjAJY3aW4p6WEcQSaskn3mmECJo59KuHv8GxNIh2bQv3uAbGQJKZLLjuLN8M/X1pDyNzTCjqp5PBYu1kF6NyMLYfA0IySeaSscLKDAcvstq3Y5+9o9WTreF9FMWJu1qZ5UplJTaovwppe7YhCSysXwdmoj0Z+MA2g5KzgbgwUTHaOSdVPeyaxge3uWRcY7q0rI0/5yK9TPxAMeew8IeAEi1EZPGxxd5kM984Fexk0ftIEptKliijJmQpqf3pOL78QB1u+G5R2HI8m7rf+dO4YSoa28qPxY7qQM2lqZNFQKxrQ3WcuVnQUMbGmyylihLSfwu9LQHRHS74+BYfI77b88NaMlODlHxC4JfB6FaNudQ4+2mDAAB3NMWIAqFbk2G4YmrS//VaQc/AyeYz9thRlSb1+y0k9T3ogK6eLfcRKukA5ck8dYeo+RWr0zMgPZcfF2am8dmXrpArqctGnCjp0LPEI7yab+sQ7jQmjCPfOOrEpqeJLkivoFp5YW1wO89zEnkCR14ClEMjQYyfU3syfiyofocs qSjXTHqj Cf/K3r0Wvpu2/plO2sC7qlkAyhU7TICmd7eZMqZ/cmu9GcX7SmfX2Ja5X0L2iUbNqdOHQT/IfvQMIcZRW+wjYglC7M/AwHaGQKsJAGfwggwtV56z+wI6LFOTO8uSEDDiMWGmYuxAThbEmJsIMA0vfMw20xxwGmFMXRVfvrrwoOOU9Ozh8636Hp762jCLF8sE7D2I2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Baolin Wang writes: > When running stress-ng testing, we found below kernel crash after a few hours: > > Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 > pc : dentry_name+0xd8/0x224 > lr : pointer+0x22c/0x370 > sp : ffff800025f134c0 > ...... > Call trace: > dentry_name+0xd8/0x224 > pointer+0x22c/0x370 > vsnprintf+0x1ec/0x730 > vscnprintf+0x2c/0x60 > vprintk_store+0x70/0x234 > vprintk_emit+0xe0/0x24c > vprintk_default+0x3c/0x44 > vprintk_func+0x84/0x2d0 > printk+0x64/0x88 > __dump_page+0x52c/0x530 > dump_page+0x14/0x20 > set_migratetype_isolate+0x110/0x224 > start_isolate_page_range+0xc4/0x20c > offline_pages+0x124/0x474 > memory_block_offline+0x44/0xf4 > memory_subsys_offline+0x3c/0x70 > device_offline+0xf0/0x120 > ...... > > After analyzing the vmcore, I found this issue is caused by page migration. > The scenario is that, one thread is doing page migration, and we will use the > target page's ->mapping field to save 'anon_vma' pointer between page unmap and > page move, and now the target page is locked and refcount is 1. > > Currently, there is another stress-ng thread performing memory hotplug, > attempting to offline the target page that is being migrated. It discovers that > the refcount of this target page is 1, preventing the offline operation, thus > proceeding to dump the page. However, page_mapping() of the target page may > return an incorrect file mapping to crash the system in dump_mapping(), since > the target page->mapping only saves 'anon_vma' pointer without setting > PAGE_MAPPING_ANON flag. > > There are seveval ways to fix this issue: > (1) Setting the PAGE_MAPPING_ANON flag for target page's ->mapping when saving > 'anon_vma', but this can confuse PageAnon() for PFN walkers, since the target > page has not built mappings yet. > (2) Getting the page lock to call page_mapping() in __dump_page() to avoid crashing > the system, however, there are still some PFN walkers that call page_mapping() > without holding the page lock, such as compaction. > (3) Using target page->private field to save the 'anon_vma' pointer and 2 bits > page state, just as page->mapping records an anonymous page, which can remove > the page_mapping() impact for PFN walkers and also seems a simple way. > > So I choose option 3 to fix this issue, and this can also fix other potential > issues for PFN walkers, such as compaction. > > Fixes: 64c8902ed441 ("migrate_pages: split unmap_and_move() to _unmap() and _move()") > Signed-off-by: Baolin Wang Good catch! Thanks! Reviewed-by: "Huang, Ying" > --- > mm/migrate.c | 27 ++++++++++----------------- > 1 file changed, 10 insertions(+), 17 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index 397f2a6e34cb..bad3039d165e 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1025,38 +1025,31 @@ static int move_to_new_folio(struct folio *dst, struct folio *src, > } > > /* > - * To record some information during migration, we use some unused > - * fields (mapping and private) of struct folio of the newly allocated > - * destination folio. This is safe because nobody is using them > - * except us. > + * To record some information during migration, we use unused private > + * field of struct folio of the newly allocated destination folio. > + * This is safe because nobody is using it except us. > */ > -union migration_ptr { > - struct anon_vma *anon_vma; > - struct address_space *mapping; > -}; > - > enum { > PAGE_WAS_MAPPED = BIT(0), > PAGE_WAS_MLOCKED = BIT(1), > + PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED, > }; > > static void __migrate_folio_record(struct folio *dst, > - unsigned long old_page_state, > + int old_page_state, > struct anon_vma *anon_vma) > { > - union migration_ptr ptr = { .anon_vma = anon_vma }; > - dst->mapping = ptr.mapping; > - dst->private = (void *)old_page_state; > + dst->private = (void *)anon_vma + old_page_state; > } > > static void __migrate_folio_extract(struct folio *dst, > int *old_page_state, > struct anon_vma **anon_vmap) > { > - union migration_ptr ptr = { .mapping = dst->mapping }; > - *anon_vmap = ptr.anon_vma; > - *old_page_state = (unsigned long)dst->private; > - dst->mapping = NULL; > + unsigned long private = (unsigned long)dst->private; > + > + *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES); > + *old_page_state = private & PAGE_OLD_STATES; > dst->private = NULL; > }