From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C89CC43603 for ; Wed, 4 Dec 2019 19:01:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C72502073B for ; Wed, 4 Dec 2019 19:01:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="KBNghy9V" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C72502073B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 36C726B0C10; Wed, 4 Dec 2019 14:01:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 31CE26B0C11; Wed, 4 Dec 2019 14:01:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 232EC6B0C12; Wed, 4 Dec 2019 14:01:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id 0DC476B0C10 for ; Wed, 4 Dec 2019 14:01:23 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id ADF0A82DDBC1 for ; Wed, 4 Dec 2019 19:01:22 +0000 (UTC) X-FDA: 76228377204.08.grade46_bf3f6ad32c24 X-HE-Tag: grade46_bf3f6ad32c24 X-Filterd-Recvd-Size: 5837 Received: from hqnvemgate24.nvidia.com (hqnvemgate24.nvidia.com [216.228.121.143]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Wed, 4 Dec 2019 19:01:21 +0000 (UTC) Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 04 Dec 2019 11:01:04 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 04 Dec 2019 11:01:20 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 04 Dec 2019 11:01:20 -0800 Received: from [10.110.40.24] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 4 Dec 2019 19:01:19 +0000 To: CC: Andrew Morton From: Felix Abecassis Subject: bug: move_pages(2) does not udpate "status" if no pages are moved X-Nvconfidentiality: public Message-ID: Date: Wed, 4 Dec 2019 11:01:20 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Language: en-US DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1575486065; bh=u9DZHStcuQ2Ldf2BksrtwRa9HscZRP/PhMu4FNPS1Bw=; h=X-PGP-Universal:To:CC:From:Subject:X-Nvconfidentiality:Message-ID: Date:User-Agent:MIME-Version:X-Originating-IP:X-ClientProxiedBy: Content-Type:Content-Transfer-Encoding:Content-Language; b=KBNghy9VLkZgWqGATYcWUp5QkkUtCpHyaJMufTiXXpkB9FR5mIhOadIAzLlrERCGI kxJeLV9JCQAumfGJariK28RABGt5uO5nPugtUKwqS9BXnxzadbaEKwg0wajp47fOPo BTGl2nDGf507pSS6RHS0EcntCQOvZ0tI7y64WSfyheKuQ5vtfjGi8aAqKNPIiQV+0d 5Nf4Cy7PLMOUG3uc/U2FiAOtOHfXRkKW+u6uF01KJhyi8apVnWCtzTogGRIj856s+L j2NHPpabpUzluX4c/w2MPjImZe19wLrjr8hwEmpMo6yGIw+SrgwREy766nMIRnoS/e c2yytl+zAWV4w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello all, On kernel 5.3, when using the move_pages syscall (wrapped by libnuma) and a= ll pages happen to be on the right node already, this function returns 0 but t= he "status" array is not updated. This array potentially contains garbage valu= es (e.g. from malloc(3)), and I don't see a way to detect this. Looking at the kernel code, we are probably exiting do_pages_move here: out_flush: =C2=A0=C2=A0=C2=A0 if (list_empty(&pagelist)) =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return err; Here is a sample C program to reproduce the problem: /* $ gcc move_pages_bug.c -lnuma -o move_pages_bug */ #define _DEFAULT_SOURCE #include #include #include #include #include int main(void) { =C2=A0=C2=A0=C2=A0 const long node_id =3D 1; =C2=A0=C2=A0=C2=A0 const long page_size =3D sysconf(_SC_PAGESIZE); =C2=A0=C2=A0=C2=A0 const int64_t num_pages =3D 8; =C2=A0=C2=A0=C2=A0 unsigned long nodemask =3D=C2=A0 1 << node_id; =C2=A0=C2=A0=C2=A0 long ret =3D set_mempolicy(MPOL_BIND, &nodemask, sizeof(= nodemask)); =C2=A0=C2=A0=C2=A0 if (ret < 0) =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return (EXIT_FAILURE); =C2=A0=C2=A0=C2=A0 void **pages =3D malloc(sizeof(void*) * num_pages); =C2=A0=C2=A0=C2=A0 for (int i =3D 0; i < num_pages; ++i) { =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 pages[i] =3D mmap(NULL, page_size, PR= OT_WRITE | PROT_READ, =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0= MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS, =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0= -1, 0); =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 if (pages[i] =3D=3D MAP_FAILED) =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return (EXIT_FAILU= RE); =C2=A0=C2=A0=C2=A0 } =C2=A0=C2=A0=C2=A0 ret =3D set_mempolicy(MPOL_DEFAULT, NULL, 0); =C2=A0=C2=A0=C2=A0 if (ret < 0) =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 return (EXIT_FAILURE); =C2=A0=C2=A0=C2=A0 int *nodes =3D malloc(sizeof(int) * num_pages); =C2=A0=C2=A0=C2=A0 int *status =3D malloc(sizeof(int) * num_pages); =C2=A0=C2=A0=C2=A0 for (int i =3D 0; i < num_pages; ++i) { =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 nodes[i] =3D node_id; =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 status[i] =3D 0xd0; /* simulate garba= ge values */ =C2=A0=C2=A0=C2=A0 } =C2=A0=C2=A0=C2=A0 ret =3D move_pages(0, num_pages, pages, nodes, status, M= POL_MF_MOVE); =C2=A0=C2=A0=C2=A0 printf("move_pages: %ld\n", ret); =C2=A0=C2=A0=C2=A0 for (int i =3D 0; i < num_pages; ++i) =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 printf("status[%d] =3D %d\n", i, stat= us[i]); } And here is a sample output showing the "garbage" values: $ ./move_pages_bug move_pages: 0 status[0] =3D 208 status[1] =3D 208 status[2] =3D 208 status[3] =3D 208 status[4] =3D 208 status[5] =3D 208 status[6] =3D 208 status[7] =3D 208 Note that passing NULL as the "nodes" argument works as expected here. Also, it seems that it's the last "run-length" of pages on the right node(s= ) that triggers this problem, e.g. if I add "nodes[0] =3D nodes[1] =3D 0", th= en the output becomes: $ ./move_pages_bug move_pages: 0 status[0] =3D 0 status[1] =3D 0 status[2] =3D 208 status[3] =3D 208 status[4] =3D 208 status[5] =3D 208 status[6] =3D 208 status[7] =3D 208 And with just "nodes[7] =3D 0;", the first run-length of pages gets assigne= d correctly: $ ./move_pages_bug move_pages: 0 status[0] =3D 1 status[1] =3D 1 status[2] =3D 1 status[3] =3D 1 status[4] =3D 1 status[5] =3D 1 status[6] =3D 1 status[7] =3D 0 Thank you,