From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=0.1 required=3.0 tests=BITCOIN_SPAM_02,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,PDS_BTC_ID,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C8A8C43603 for ; Thu, 5 Dec 2019 00:03:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 47A402077B for ; Thu, 5 Dec 2019 00:03:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="sQWCA0CQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 47A402077B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BB6EC6B0D3E; Wed, 4 Dec 2019 19:03:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B66B36B0D3F; Wed, 4 Dec 2019 19:03:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A55B26B0D40; Wed, 4 Dec 2019 19:03:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id 8C8146B0D3E for ; Wed, 4 Dec 2019 19:03:42 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id 35326180AD806 for ; Thu, 5 Dec 2019 00:03:42 +0000 (UTC) X-FDA: 76229139084.07.ice32_203eed7d41c0e X-HE-Tag: ice32_203eed7d41c0e X-Filterd-Recvd-Size: 6685 Received: from hqnvemgate25.nvidia.com (hqnvemgate25.nvidia.com [216.228.121.64]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Dec 2019 00:03:41 +0000 (UTC) Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate25.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 04 Dec 2019 16:03:35 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 04 Dec 2019 16:03:39 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 04 Dec 2019 16:03:39 -0800 Received: from [10.110.48.28] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 5 Dec 2019 00:03:39 +0000 Subject: Re: bug: move_pages(2) does not udpate "status" if no pages are moved To: Yang Shi , Felix Abecassis CC: Linux MM , Andrew Morton References: X-Nvconfidentiality: public From: John Hubbard Message-ID: <217bc4ba-6c3b-4067-9ba8-bf4e2eceb1e2@nvidia.com> Date: Wed, 4 Dec 2019 16:03:39 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1575504215; bh=+O+vHTbny50zIrJMQmFa3KvkwACZkj9fKiESzq8N9sg=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=sQWCA0CQVDrs+IQT0HnCfNA/vJl8YNxvzTrCLA5mOOa+N5dFoXq/sukSqt4FT4PP2 OAuV4ImJD3B2dMKNtWFkGFDExJQYXasldOeAURfbB9ENYR2KYG4fHn70ryg61xAzAB q7wiYddyGZpBBpGHzICjBxBhwxX2IPwIEfVx5WlNhev0Ln82A9c25+uZzROiNyd2TS HFM0ZvleLu82fN52hHyQXSpehl8Ir9mbD73HRs54R/tQaDS01w46hTT1puqR/AoL4S omTOJxMaoUUJJHOwwvm6cAkNCaBrdryyRBSb6cs6ZisIh0c30ZEZDD5Q8xT6AlyuCy JJwM+ANuOS/0g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/4/19 12:17 PM, Yang Shi wrote: > On Wed, Dec 4, 2019 at 11:01 AM Felix Abecassis wrote: >> >> Hello all, >> >> On kernel 5.3, when using the move_pages syscall (wrapped by libnuma) and all >> pages happen to be on the right node already, this function returns 0 but the >> "status" array is not updated. This array potentially contains garbage values >> (e.g. from malloc(3)), and I don't see a way to detect this. >> >> Looking at the kernel code, we are probably exiting do_pages_move here: >> out_flush: >> if (list_empty(&pagelist)) >> return err; > > May you please give the below patch a try? I just did build test. > > diff --git a/mm/migrate.c b/mm/migrate.c > index a8f87cb..f2f1279 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1517,7 +1517,8 @@ static int do_move_pages_to_node(struct mm_struct *mm, > * the target node > */ > static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, > - int node, struct list_head *pagelist, bool migrate_all) > + int node, struct list_head *pagelist, bool migrate_all, > + int __user *status, int start) > { > struct vm_area_struct *vma; > struct page *page; > @@ -1543,8 +1544,10 @@ static int add_page_for_migration(struct > mm_struct *mm, unsigned long addr, > goto out; > > err = 0; > - if (page_to_nid(page) == node) > + if (page_to_nid(page) == node) { > + err = store_status(status, start, node, 1); > goto out_putpage; > + } > > err = -EACCES; > if (page_mapcount(page) > 1 && !migrate_all) > @@ -1639,7 +1642,9 @@ static int do_pages_move(struct mm_struct *mm, > nodemask_t task_nodes, > * report them via status > */ > err = add_page_for_migration(mm, addr, current_node, > - &pagelist, flags & MPOL_MF_MOVE_ALL); > + &pagelist, flags & MPOL_MF_MOVE_ALL, status, > + i); > + > if (!err) > continue; > Hi Yang, The patch looks correct, and I *think* the following lockdep report is a pre-existing problem, but it happened with your patch applied to today's linux.git (commit aedc0650f9135f3b92b39cbed1a8fe98d8088825), using the unmodified version of Felix's test program: ============================================ WARNING: possible recursive locking detected 5.4.0-hubbard-github+ #552 Not tainted -------------------------------------------- move_pages_bug/1286 is trying to acquire lock: ffff8882a365ab18 (&mm->mmap_sem#2){++++}, at: __might_fault+0x3e/0x90 but task is already holding lock: ffff8882a365ab18 (&mm->mmap_sem#2){++++}, at: do_pages_move+0x129/0x6a0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&mm->mmap_sem#2); lock(&mm->mmap_sem#2); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by move_pages_bug/1286: #0: ffff8882a365ab18 (&mm->mmap_sem#2){++++}, at: do_pages_move+0x129/0x6a0 stack backtrace: CPU: 6 PID: 1286 Comm: move_pages_bug Not tainted 5.4.0-hubbard-github+ #552 Hardware name: ASUS X299-A/PRIME X299-A, BIOS 2002 09/25/2019 Call Trace: dump_stack+0x71/0xa0 validate_chain.cold+0x122/0x15f ? find_held_lock+0x2b/0x80 __lock_acquire+0x39c/0x790 lock_acquire+0x95/0x190 ? __might_fault+0x3e/0x90 __might_fault+0x68/0x90 ? __might_fault+0x3e/0x90 do_pages_move+0x2c4/0x6a0 kernel_move_pages+0x1f5/0x3e0 ? do_syscall_64+0x1c/0x230 __x64_sys_move_pages+0x25/0x30 do_syscall_64+0x5a/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7efd42f581ad Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f08 RSP: 002b:00007ffffb207c78 EFLAGS: 00000216 ORIG_RAX: 0000000000000117 RAX: ffffffffffffffda RBX: 0000556eb240cd28 RCX: 00007efd42f581ad RDX: 0000556eb240ccf0 RSI: 0000000000000008 RDI: 0000000000000000 RBP: 00007ffffb207d10 R08: 0000556eb240cd70 R09: 0000000000000002 R10: 0000556eb240cd40 R11: 0000000000000216 R12: 0000556eb04b70a0 R13: 00007ffffb207df0 R14: 0000000000000000 R15: 0000000000000000 thanks, -- John Hubbard NVIDIA