From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2BB4C43603 for ; Fri, 6 Dec 2019 01:12:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 632892077B for ; Fri, 6 Dec 2019 01:12:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 632892077B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B8BB36B1324; Thu, 5 Dec 2019 20:12:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3D166B1325; Thu, 5 Dec 2019 20:12:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A044F6B1326; Thu, 5 Dec 2019 20:12:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0131.hostedemail.com [216.40.44.131]) by kanga.kvack.org (Postfix) with ESMTP id 874C66B1324 for ; Thu, 5 Dec 2019 20:12:07 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 5BC2D583B for ; Fri, 6 Dec 2019 01:12:07 +0000 (UTC) X-FDA: 76232940294.12.wing25_69bd4b729045c X-HE-Tag: wing25_69bd4b729045c X-Filterd-Recvd-Size: 5081 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Fri, 6 Dec 2019 01:12:06 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R911e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e07488;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0Tk4mbZG_1575594719; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0Tk4mbZG_1575594719) by smtp.aliyun-inc.com(127.0.0.1); Fri, 06 Dec 2019 09:12:02 +0800 Subject: Re: [v3 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node To: Qian Cai , John Hubbard Cc: fabecassis@nvidia.com, mhocko@suse.com, cl@linux.com, vbabka@suse.cz, mgorman@techsingularity.net, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <22b5bfde-45be-95bd-5c98-2ab13302c107@nvidia.com> From: Yang Shi Message-ID: <9588b886-7ced-6502-67f0-0eb623045903@linux.alibaba.com> Date: Thu, 5 Dec 2019 17:11:58 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/5/19 4:19 PM, Qian Cai wrote: > >> On Dec 5, 2019, at 7:04 PM, John Hubbard wrote: >> >> Felix's code is not random test code. It's code he wrote and he expect= ed it to work. > Sure, but could he show a bit detail if the kernel will start to behavi= or as expected like what was written in the manpage to return ENOENT in t= his case, why is it not acceptable? i.e., why is it important to depend o= n the broken behavior? I think I found the root cause. It did return -ENOENT when the syscall=20 was introduced (my first impression was wrong), but the behavior was=20 changed since 2.6.28 by commit e78bbfa82624 ("mm: stop returning -ENOENT=20 from sys_move_pages() if nothing got migrated"). The full commit log is=20 as below: Author: Brice Goglin Date:=C2=A0=C2=A0 Sat Oct 18 20:27:15 2008 -0700 =C2=A0=C2=A0=C2=A0 mm: stop returning -ENOENT from sys_move_pages() if n= othing got=20 migrated =C2=A0=C2=A0=C2=A0 A patchset reworking sys_move_pages().=C2=A0 It remov= es the possibly large =C2=A0=C2=A0=C2=A0 vmalloc by using multiple chunks when migrating large= buffers. It also =C2=A0=C2=A0=C2=A0 dramatically increases the throughput for large buffe= rs since the=20 lookup =C2=A0=C2=A0=C2=A0 in new_page_node() is now limited to a single chunk, = causing the=20 quadratic =C2=A0=C2=A0=C2=A0 complexity to have a much slower impact.=C2=A0 There = is no need to use any =C2=A0=C2=A0=C2=A0 radix-tree-like structure to improve this lookup. =C2=A0=C2=A0=C2=A0 sys_move_pages() duration on a 4-quadcore-opteron 234= 7HE (1.9Gz), =C2=A0=C2=A0=C2=A0 migrating between nodes #2 and #3: =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 lengt= h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 move_pages (us)=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 move_pages+patch (us) =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 4kB=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 126= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 98 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 40kB=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 198=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 168 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 400kB= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 963=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 937 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 4MB=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 125= 03=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 11930 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 40MB=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 246867=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 11848 =C2=A0=C2=A0=C2=A0 Patches #1 and #4 are the important ones: =C2=A0=C2=A0=C2=A0 1) stop returning -ENOENT from sys_move_pages() if no= thing got migrated =C2=A0=C2=A0=C2=A0 2) don't vmalloc a huge page_to_node array for do_pag= es_stat() =C2=A0=C2=A0=C2=A0 3) extract do_pages_move() out of sys_move_pages() =C2=A0=C2=A0=C2=A0 4) rework do_pages_move() to work on page_sized chunk= s =C2=A0=C2=A0=C2=A0 5) move_pages: no need to set pp->page to ZERO_PAGE(0= ) by default =C2=A0=C2=A0=C2=A0 This patch: =C2=A0=C2=A0=C2=A0 There is no point in returning -ENOENT from sys_move_= pages() if all=20 pages =C2=A0=C2=A0=C2=A0 were already on the right node, while we return 0 if = only 1 page=20 was not. =C2=A0=C2=A0=C2=A0 Most application don't know where their pages are all= ocated, so=20 it's not =C2=A0=C2=A0=C2=A0 an error to try to migrate them anyway. =C2=A0=C2=A0=C2=A0 Just return 0 and let the status array in user-space = be checked if the =C2=A0=C2=A0=C2=A0 application needs details. =C2=A0=C2=A0=C2=A0 It will make the upcoming chunked-move_pages() suppor= t much easier. =C2=A0=C2=A0=C2=A0 Signed-off-by: Brice Goglin =C2=A0=C2=A0=C2=A0 Acked-by: Christoph Lameter =C2=A0=C2=A0=C2=A0 Cc: Nick Piggin =C2=A0=C2=A0=C2=A0 Signed-off-by: Andrew Morton =C2=A0=C2=A0=C2=A0 Signed-off-by: Linus Torvalds So the behavior was changed in kernel intentionally but never reflected=20 in the manpage. I will propose a patch to fix the document.