From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68752C2D0BF for ; Thu, 5 Dec 2019 05:47:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F09452077B for ; Thu, 5 Dec 2019 05:47:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="sPG3P7lQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F09452077B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 614316B0E96; Thu, 5 Dec 2019 00:47:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C4ED6B0E97; Thu, 5 Dec 2019 00:47:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B3336B0E98; Thu, 5 Dec 2019 00:47:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0133.hostedemail.com [216.40.44.133]) by kanga.kvack.org (Postfix) with ESMTP id 35C3E6B0E96 for ; Thu, 5 Dec 2019 00:47:15 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id C9F4A180AD80F for ; Thu, 5 Dec 2019 05:47:14 +0000 (UTC) X-FDA: 76230004788.18.floor80_795f80a7e6f29 X-HE-Tag: floor80_795f80a7e6f29 X-Filterd-Recvd-Size: 9920 Received: from hqemgate16.nvidia.com (hqemgate16.nvidia.com [216.228.121.65]) by imf22.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Dec 2019 05:47:13 +0000 (UTC) Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 04 Dec 2019 21:47:16 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 04 Dec 2019 21:47:12 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 04 Dec 2019 21:47:12 -0800 Received: from [10.2.163.157] (172.20.13.39) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 5 Dec 2019 05:47:11 +0000 Subject: Re: [v2 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node To: Yang Shi , , , , , , CC: , , References: <1575519678-86510-1-git-send-email-yang.shi@linux.alibaba.com> X-Nvconfidentiality: public From: John Hubbard Message-ID: Date: Wed, 4 Dec 2019 21:44:22 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: <1575519678-86510-1-git-send-email-yang.shi@linux.alibaba.com> X-Originating-IP: [172.20.13.39] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1575524836; bh=3BzUfhn2L2EFnKI0pmf+h4twdu2lT7xDJGJSbSIlTSk=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=sPG3P7lQrHMdzhkufix7Ov4qDOGYBHDxCVVz7lyYgqHXbXyHQHY/o9tY1myCgUyK1 KUL1R27VMJxzT0f7Qzy60Cnt9d5FMt5xlnJ2zeNT7iuWn8YrlQ4cLLVT/+hot5iHfS mnlbO887v7yrEDobZZkkwISpx6Xt0IXJGzBrquVjMN3dLnef9HJ0HMFqeQKxmGkbrm sevaJ4HMeymYkcEZx3TJTAeZ/uHJHR2aO2LpY8Mr+x2wfzY30JnaVK/mI2QnatNVpV 8gRnd2r/uYnshCL6Ro/KeBvD8tgei1cRR2nwVizmSLoex/+ZzRmvFRmoWokjl1WZIb vyYsWwtNRRGwQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/4/19 8:21 PM, Yang Shi wrote: > Felix Abecassis reports move_pages() would return random status if the > pages are already on the target node by the below test program: > > ---8<--- This is correct correct code, so: Reviewed-by: John Hubbard ...with a few nitpicky notes about comments, below, that might help: > > int main(void) > { > const long node_id = 1; > const long page_size = sysconf(_SC_PAGESIZE); > const int64_t num_pages = 8; > > unsigned long nodemask = 1 << node_id; > long ret = set_mempolicy(MPOL_BIND, &nodemask, sizeof(nodemask)); > if (ret < 0) > return (EXIT_FAILURE); > > void **pages = malloc(sizeof(void*) * num_pages); > for (int i = 0; i < num_pages; ++i) { > pages[i] = mmap(NULL, page_size, PROT_WRITE | PROT_READ, > MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS, > -1, 0); > if (pages[i] == MAP_FAILED) > return (EXIT_FAILURE); > } > > ret = set_mempolicy(MPOL_DEFAULT, NULL, 0); > if (ret < 0) > return (EXIT_FAILURE); > > int *nodes = malloc(sizeof(int) * num_pages); > int *status = malloc(sizeof(int) * num_pages); > for (int i = 0; i < num_pages; ++i) { > nodes[i] = node_id; > status[i] = 0xd0; /* simulate garbage values */ > } > > ret = move_pages(0, num_pages, pages, nodes, status, MPOL_MF_MOVE); > printf("move_pages: %ld\n", ret); > for (int i = 0; i < num_pages; ++i) > printf("status[%d] = %d\n", i, status[i]); > } > ---8<--- > > Then running the program would return nonsense status values: > $ ./move_pages_bug > move_pages: 0 > status[0] = 208 > status[1] = 208 > status[2] = 208 > status[3] = 208 > status[4] = 208 > status[5] = 208 > status[6] = 208 > status[7] = 208 > > This is because the status is not set if the page is already on the > target node, but move_pages() should return valid status as long as it > succeeds. The valid status may be errno or node id. > > We can't simply initialize status array to zero since the pages may be > not on node 0. Fix it by updating status with node id which the page is > already on. And, it looks we have to update the status inside > add_page_for_migration() since the page struct is not available outside > it. > > Make add_page_for_migration() return 1 if store_status() is failed in > order to not mix up the status value since -EFAULT is also a valid > status. > > Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move") > Reported-by: Felix Abecassis > Tested-by: Felix Abecassis > Cc: John Hubbard > Cc: Michal Hocko > Cc: Christoph Lameter > Cc: Vlastimil Babka > Cc: Mel Gorman > Cc: 4.17+ > Signed-off-by: Yang Shi > --- > v2: *Correted the return value when add_page_for_migration() returns 1. > > John noticed another return value inconsistency between the implementation and > the manpage. The manpage says it should return -ENOENT if the page is already > on the target node, but it doesn't. It looks the original code didn't return > -ENOENT either, I'm not sure if this is a document issue or not. Anyway this > is another issue, once we confirm it we can fix it later. > > mm/migrate.c | 36 ++++++++++++++++++++++++++++++------ > 1 file changed, 30 insertions(+), 6 deletions(-) > > diff --git a/mm/migrate.c b/mm/migrate.c > index a8f87cb..f1090a0 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1512,17 +1512,21 @@ static int do_move_pages_to_node(struct mm_struct *mm, > /* > * Resolves the given address to a struct page, isolates it from the LRU and > * puts it to the given pagelist. > - * Returns -errno if the page cannot be found/isolated or 0 when it has been > - * queued or the page doesn't need to be migrated because it is already on > - * the target node > + * Returns: > + * errno - if the page cannot be found/isolated > + * 0 - when it has been queued or the page doesn't need to be migrated > + * because it is already on the target node > + * 1 - if store_status() is failed I recommend this wording instead: * Returns: * errno - if the page cannot be found/isolated * 0 - when it has been queued or the page doesn't need to be migrated * because it is already on the target node * 1 - The page doesn't need to be migrated because it is already on the * target node. However, attempting to store the node ID in the status * array failed. Unlike other failures in this function, this case * needs to turn into a fatal failure in the calling function. > */ > static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, > - int node, struct list_head *pagelist, bool migrate_all) > + int node, struct list_head *pagelist, bool migrate_all, > + int __user *status, int start) > { > struct vm_area_struct *vma; > struct page *page; > unsigned int follflags; > int err; > + bool same_node = false; > > down_read(&mm->mmap_sem); > err = -EFAULT; > @@ -1543,8 +1547,10 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, > goto out; > > err = 0; > - if (page_to_nid(page) == node) > + if (page_to_nid(page) == node) { > + same_node = true; > goto out_putpage; > + } > > err = -EACCES; > if (page_mapcount(page) > 1 && !migrate_all) > @@ -1578,6 +1584,16 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, > put_page(page); > out: > up_read(&mm->mmap_sem); > + > + /* > + * Must call store_status() after releasing mmap_sem since put_user > + * need acquire mmap_sem too, otherwise potential deadlock may exist. > + */ > + if (same_node) { > + if (store_status(status, start, node, 1)) > + err = 1; > + } > + > return err; > } > > @@ -1639,10 +1655,18 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, > * report them via status > */ Let's change the comment above add_page_for_migration(), to read: /* * Most errors in the page lookup or isolation are not fatal * and we simply report them via the status array. However, * positive error values are fatal. */ > err = add_page_for_migration(mm, addr, current_node, > - &pagelist, flags & MPOL_MF_MOVE_ALL); > + &pagelist, flags & MPOL_MF_MOVE_ALL, status, > + i); > + > if (!err) > continue; > > + /* store_status() failed in add_page_for_migration() */ ...and let's replace the above line, with the following: /* * Most errors in the page lookup or isolation are not fatal * and we simply report them via the status array. However, * positive error values are fatal. */ > + if (err > 0) { > + err = -EFAULT; > + goto out_flush; > + } > + > err = store_status(status, i, err, 1); > if (err) > goto out_flush; > And with that, I think the comments help a little bit more, in reading through the code. thanks, -- John Hubbard NVIDIA