From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,PDS_BTC_ID, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBAAFC43603 for ; Thu, 5 Dec 2019 00:40:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AA04D2077B for ; Thu, 5 Dec 2019 00:40:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VRiBKgHm" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AA04D2077B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3FBB36B0D63; Wed, 4 Dec 2019 19:40:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3AADC6B0D64; Wed, 4 Dec 2019 19:40:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 299966B0D65; Wed, 4 Dec 2019 19:40:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 13E146B0D63 for ; Wed, 4 Dec 2019 19:40:42 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 9CC9E2A87 for ; Thu, 5 Dec 2019 00:40:41 +0000 (UTC) X-FDA: 76229232282.29.need81_4030f27cc4b5c X-HE-Tag: need81_4030f27cc4b5c X-Filterd-Recvd-Size: 7381 Received: from mail-ed1-f68.google.com (mail-ed1-f68.google.com [209.85.208.68]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Dec 2019 00:40:40 +0000 (UTC) Received: by mail-ed1-f68.google.com with SMTP id l63so1158757ede.0 for ; Wed, 04 Dec 2019 16:40:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=HE0fsm6E3n9W6qaZOzuWuF0xbyb/RvyrOk6rIZEM0n0=; b=VRiBKgHmNzBHkcRaQNx7oaUCOX1Lj5JOgcMoGaDePhnHEzwxaNNcLgL5GzPdJuKL2t BKF2GnB4JnhGphoV0VWmBOnnNJOnrU/k8USXE1uofB3rttpPNiGYwzdIOcxqO9o+hhGu +F6/EFI3QBvXHMO7p8F4D/ktrG4xnxB9Xbng6Z5F73bnCNpWAjE9PygBlqi3XQ3hK58X RaCq1d47g7DdzxnzzmGGYcPCcwV4KLJWesO09RModuX7kMfVjQW7fO+CGNXN1c+e+DAj Sgs2x4rtUO6sGqH8W4e+UI+4Ml6IBE6y5zN6PChAZPAFyB00scZBIALYK/1yaFBrnFHi sklg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HE0fsm6E3n9W6qaZOzuWuF0xbyb/RvyrOk6rIZEM0n0=; b=bjatxKjynQAa6wTMGPOpdeDD1LrXLoVMKr/b/2Jm1d94q1x0y5E9Z1sLQzVsQUytq0 RSwOBYSIak9l6S4+nnG5H9VNgcSnYHxul3vLyytwDIVjFzmSXMHVcZKRqpLFTURXE1Ct hkmVnb7luM5WwVITwc3LKt+9Dfo551GLMwnTuTdPvrAUdS5wxNiNp4p/k3POlQbdmtGe 0IYoTW9iFL6aaYol+0VFUMAe+f4sWjEYSi62UICYoO5xHjHyl1GpVBK3SrkTNKAFI4OU WlcjTNdir5IepKoCWw8Q7eT/fyH8G+3l1KAav+N5TNK0z7xx5SCVzlUzi6dwf3RIi4DX kzig== X-Gm-Message-State: APjAAAVIW8OoHpb/g+tSSYBmemvHqIsqs0I/SYAVGsnpdq8SkS/dMWlv PQ5hwKbj7w6GVp6wr7KHs0Bjah+iXflPBCmhrDA= X-Google-Smtp-Source: APXvYqxUxm9ObfOa1NKtM0lLNCxniaB0GUlZNvPuVgm6i2cq6JmcN7eOKJNeiITnzM4ViCMdNQ58zvomF6C8om7LTKo= X-Received: by 2002:a17:906:12c6:: with SMTP id l6mr6208981ejb.53.1575506439702; Wed, 04 Dec 2019 16:40:39 -0800 (PST) MIME-Version: 1.0 References: <217bc4ba-6c3b-4067-9ba8-bf4e2eceb1e2@nvidia.com> In-Reply-To: <217bc4ba-6c3b-4067-9ba8-bf4e2eceb1e2@nvidia.com> From: Yang Shi Date: Wed, 4 Dec 2019 16:40:00 -0800 Message-ID: Subject: Re: bug: move_pages(2) does not udpate "status" if no pages are moved To: John Hubbard Cc: Felix Abecassis , Linux MM , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 4, 2019 at 4:03 PM John Hubbard wrote: > > On 12/4/19 12:17 PM, Yang Shi wrote: > > On Wed, Dec 4, 2019 at 11:01 AM Felix Abecassis wrote: > >> > >> Hello all, > >> > >> On kernel 5.3, when using the move_pages syscall (wrapped by libnuma) and all > >> pages happen to be on the right node already, this function returns 0 but the > >> "status" array is not updated. This array potentially contains garbage values > >> (e.g. from malloc(3)), and I don't see a way to detect this. > >> > >> Looking at the kernel code, we are probably exiting do_pages_move here: > >> out_flush: > >> if (list_empty(&pagelist)) > >> return err; > > > > May you please give the below patch a try? I just did build test. > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > index a8f87cb..f2f1279 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -1517,7 +1517,8 @@ static int do_move_pages_to_node(struct mm_struct *mm, > > * the target node > > */ > > static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, > > - int node, struct list_head *pagelist, bool migrate_all) > > + int node, struct list_head *pagelist, bool migrate_all, > > + int __user *status, int start) > > { > > struct vm_area_struct *vma; > > struct page *page; > > @@ -1543,8 +1544,10 @@ static int add_page_for_migration(struct > > mm_struct *mm, unsigned long addr, > > goto out; > > > > err = 0; > > - if (page_to_nid(page) == node) > > + if (page_to_nid(page) == node) { > > + err = store_status(status, start, node, 1); > > goto out_putpage; > > + } > > > > err = -EACCES; > > if (page_mapcount(page) > 1 && !migrate_all) > > @@ -1639,7 +1642,9 @@ static int do_pages_move(struct mm_struct *mm, > > nodemask_t task_nodes, > > * report them via status > > */ > > err = add_page_for_migration(mm, addr, current_node, > > - &pagelist, flags & MPOL_MF_MOVE_ALL); > > + &pagelist, flags & MPOL_MF_MOVE_ALL, status, > > + i); > > + > > if (!err) > > continue; > > > > Hi Yang, > > The patch looks correct, and I *think* the following lockdep report > is a pre-existing problem, but it happened with your patch applied to today's > linux.git (commit aedc0650f9135f3b92b39cbed1a8fe98d8088825), using the > unmodified version of Felix's test program: Thanks for catching this. It looks it is caused by my patch. The patch calls store_status() in add_page_for_migration() which holds mmap_sem. I will take a deeper look. > > ============================================ > WARNING: possible recursive locking detected > 5.4.0-hubbard-github+ #552 Not tainted > -------------------------------------------- > move_pages_bug/1286 is trying to acquire lock: > ffff8882a365ab18 (&mm->mmap_sem#2){++++}, at: __might_fault+0x3e/0x90 > > but task is already holding lock: > ffff8882a365ab18 (&mm->mmap_sem#2){++++}, at: do_pages_move+0x129/0x6a0 > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&mm->mmap_sem#2); > lock(&mm->mmap_sem#2); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 1 lock held by move_pages_bug/1286: > #0: ffff8882a365ab18 (&mm->mmap_sem#2){++++}, at: do_pages_move+0x129/0x6a0 > > stack backtrace: > CPU: 6 PID: 1286 Comm: move_pages_bug Not tainted 5.4.0-hubbard-github+ #552 > Hardware name: ASUS X299-A/PRIME X299-A, BIOS 2002 09/25/2019 > Call Trace: > dump_stack+0x71/0xa0 > validate_chain.cold+0x122/0x15f > ? find_held_lock+0x2b/0x80 > __lock_acquire+0x39c/0x790 > lock_acquire+0x95/0x190 > ? __might_fault+0x3e/0x90 > __might_fault+0x68/0x90 > ? __might_fault+0x3e/0x90 > do_pages_move+0x2c4/0x6a0 > kernel_move_pages+0x1f5/0x3e0 > ? do_syscall_64+0x1c/0x230 > __x64_sys_move_pages+0x25/0x30 > do_syscall_64+0x5a/0x230 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > RIP: 0033:0x7efd42f581ad > Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f08 > RSP: 002b:00007ffffb207c78 EFLAGS: 00000216 ORIG_RAX: 0000000000000117 > RAX: ffffffffffffffda RBX: 0000556eb240cd28 RCX: 00007efd42f581ad > RDX: 0000556eb240ccf0 RSI: 0000000000000008 RDI: 0000000000000000 > RBP: 00007ffffb207d10 R08: 0000556eb240cd70 R09: 0000000000000002 > R10: 0000556eb240cd40 R11: 0000000000000216 R12: 0000556eb04b70a0 > R13: 00007ffffb207df0 R14: 0000000000000000 R15: 0000000000000000 > > > thanks, > -- > John Hubbard > NVIDIA