From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73655C2BA2B for ; Sat, 11 Apr 2020 15:40:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AF3B32075E for ; Sat, 11 Apr 2020 15:40:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=oneplus.com header.i=@oneplus.com header.b="chBjSCTr" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AF3B32075E Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=oneplus.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 152A88E00AB; Sat, 11 Apr 2020 11:40:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 104438E0007; Sat, 11 Apr 2020 11:40:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0F578E00AB; Sat, 11 Apr 2020 11:40:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0101.hostedemail.com [216.40.44.101]) by kanga.kvack.org (Postfix) with ESMTP id D93E08E0007 for ; Sat, 11 Apr 2020 11:40:08 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 969B0B785 for ; Sat, 11 Apr 2020 15:40:08 +0000 (UTC) X-FDA: 76695985296.17.touch10_586c0cdcbb759 X-HE-Tag: touch10_586c0cdcbb759 X-Filterd-Recvd-Size: 9231 Received: from APC01-SG2-obe.outbound.protection.outlook.com (mail-eopbgr1310130.outbound.protection.outlook.com [40.107.131.130]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Sat, 11 Apr 2020 15:40:07 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=nCtkaXwqAR4z9w3tGickd71hzJAFFt43QpBeZfQFvAdsSJiw3YWUUAB9KkoY5TbrPnpXbmtFPHrvGIgf1rZrvrYB0wITirMdhvBc1CSy8LsQS/C1Tl5wlceowRvWugj0Dp8z3OA715G2ca+Hsz8MCI03Y4MfWzU0+VZX68H0mX3IDWfnE+FrEy2+VR3tgljgiiGP3fpV9gnnuj/+BHjgxdyYHn3oC21975XnRUTh4soPRkW6U4bJ3RhpMVXfMomUwF2tAO9hjTs24FP238EP1d2dQsCodXsYKq1tdan6+X5dypD/s3KEK49ntDDFivRgfXb14yk8lIiuggZt3NIbNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tk95VDKYuJ1UA70hV/yUfDy2wFQjGpRU6R7j3Rup7m0=; b=Y1LH3wShrjMEBNiPIaappntTfyBBBrebIUeamU9WkL1rLYpOm8nwABQCp0+S8m4LmTXB8R4TOSpfmb36506ZodcsFrNDHDZDfwKX0E8hE9k/0sHpx1DxfHv5ikLicasISUdKBRBD1Y4fRt245abtXlrLqmPe433NR5nKDY8ZotlOlwuGJF1nKtHkOn5X4AT8ihFpb2nhNPQAY+EM+iEwPvYy4m7jQOcdYdgxEj1IhjxeN4DLwCW/aIcmS+7tgCJS1617gc4oSiEkj1AbR1ayZ7YQQNP+AEomhPX3OHBj2pXhF9m2LWOzwJie90wLgXNywzaTELAjnb8pt41nOemdiA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=oneplus.com; dmarc=pass action=none header.from=oneplus.com; dkim=pass header.d=oneplus.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oneplus.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tk95VDKYuJ1UA70hV/yUfDy2wFQjGpRU6R7j3Rup7m0=; b=chBjSCTrgbZhWZcIUgGAqgDyrFOYuEP3HV1iIzYhfc5/d5vyi5BBkGwZhE3dGYqyoDikcXAE9rs+PONLfRL4VSbTFLN5ze315TBe0AEt3MAvA2vYVmXieVvVGT1tGq3CoD7TcqtgSrDF3rxGoYiSRGXXy5fU3pTqD0/OpoE4jLM= Received: from SG2PR04MB2921.apcprd04.prod.outlook.com (20.177.19.12) by SG2PR04MB3301.apcprd04.prod.outlook.com (20.177.94.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2900.24; Sat, 11 Apr 2020 15:40:01 +0000 Received: from SG2PR04MB2921.apcprd04.prod.outlook.com ([fe80::f59e:7124:faf:cb36]) by SG2PR04MB2921.apcprd04.prod.outlook.com ([fe80::f59e:7124:faf:cb36%7]) with mapi id 15.20.2900.026; Sat, 11 Apr 2020 15:40:01 +0000 From: Chintan Pandya To: "Huang, Ying" CC: Michal Hocko , Prathu Baronia , "akpm@linux-foundation.org" , "linux-mm@kvack.org" , "gregkh@linuxfoundation.org" , "gthelen@google.com" , "jack@suse.cz" , Ken Lin , Gasine Xu Subject: RE: [RFC] mm/memory.c: Optimizing THP zeroing routine for !HIGHMEM cases Thread-Topic: [RFC] mm/memory.c: Optimizing THP zeroing routine for !HIGHMEM cases Thread-Index: AQHWCZBuAkmbbonPI0yJXhpllfvnoahnFyaAgAnc+ACAAASVAIAA5ifQgAA8eOqAAfMokA== Date: Sat, 11 Apr 2020 15:40:01 +0000 Message-ID: References: <20200403081812.GA14090@oneplus.com> <20200403085201.GX22681@dhcp22.suse.cz> <20200409152913.GA9878@oneplus.com> <20200409154538.GR18386@dhcp22.suse.cz> <87lfn390db.fsf@yhuang-dev.intel.com> In-Reply-To: <87lfn390db.fsf@yhuang-dev.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=chintan.pandya@oneplus.com; x-originating-ip: [103.140.231.58] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 9559b445-8f5a-492b-4021-08d7de2e992f x-ms-traffictypediagnostic: SG2PR04MB3301:|SG2PR04MB3301: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:1332; x-forefront-prvs: 03706074BC x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SG2PR04MB2921.apcprd04.prod.outlook.com;PTR:;CAT:NONE;SFTY:;SFS:(10019020)(4636009)(136003)(396003)(39860400002)(346002)(376002)(366004)(66476007)(44832011)(66946007)(8936002)(66556008)(33656002)(6506007)(66446008)(64756008)(86362001)(76116006)(4326008)(316002)(7696005)(2906002)(71200400001)(54906003)(186003)(8676002)(55016002)(6916009)(5660300002)(52536014)(81156014)(26005)(9686003)(107886003)(478600001);DIR:OUT;SFP:1102; received-spf: None (protection.outlook.com: oneplus.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: mpcwytZYDuqdBtrrPoLZHEWF5YExichwxfdX/OgGErjh1+5mMB/yzv1W0gGFXKgAG8sIJK0RF1tiuoxwgbwsLELxbzpIdA3R+rtAAulpCgKefp/FlqrWfFrUq23mGBoGLsz67Ewj+x/jP2jriE44UONzGmVDB9/mcTmJ4eZWaGXmMjgbzuPlgm0g9RL7NEmMHBucHPf6pNH2W1kovUK1FCdhBrACRJpYkvonppAG1BxlaJ7NNtXRj060W0C3pytWDcpnczbDGrv00gpjoObm+0uqf8K7rRmawk/ABAEoeJsqgTubE1zKba3DeTDMUs0Ta+0uhcfFbNl+pZFFk5Fl4VUad/BdOGpUQ9w38FHEtFexfjWbrumBBSL1P7xGr83lOJRB/t0zpb/ikFueaB1jQsZtSksEJJwyAJFrI6XVlvUdRw+OWSfrbG4s3kLXFeo+ x-ms-exchange-antispam-messagedata: YUcU/bK3NY4jEqT5sJSXowZ21VX9WCNPn5c+GtKSFCIqMB/LJGAfbz+8nsfgYN4MVQ1s8N9D5dLs6JPQ9txzI8dOBqHoIhZZWeqA7cPuAd5LnYGytjlYCj6F6boMhkH2IxwBrG0xmOV5uEn+4LqW4Q== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: oneplus.com X-MS-Exchange-CrossTenant-Network-Message-Id: 9559b445-8f5a-492b-4021-08d7de2e992f X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Apr 2020 15:40:01.4937 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 0423909d-296c-463e-ab5c-e5853a518df8 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Q0DGa5CezuP8cTsNPJNhXDv6T4X4YzJFkRTybXIFRKckigY7sXb92H3UUyaAnb2fXSWXrHAqp8+vqjybioAgZurCUj3Ca0kmffC8qCEPwBg= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SG2PR04MB3301 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > Generally, many architectures are optimized for serial loads, be it > > initialization or access, as it is simplest form of prediction. Any > > random access pattern would kill that pre-fetching. And for now, I > > suspect that to be the case here. Probably, we can run more tests to co= nfirm > this part. >=20 > Please prove your theory with test. Better to test x86 too. Wrote down below userspace test code. Code: #include #include #include #include #define SZ_1M 0x100000 #define SZ_4K 0x1000 #define NUM 100 Int main () { void *p; void *q; void *r; unsigned long total_pages, total_size; int i, j; struct timeval t0, t1, t2, t3; int elapsed; printf ("Hello World\n"); total_size =3D NUM * SZ_1M; total_pages =3D NUM * (SZ_1M / SZ_4K); p =3D malloc (total_size); q =3D malloc (total_size); r =3D malloc (total_size); /* So that all pages gets allocated */ memset (r, 0xa, total_size); memset (q, 0xa, total_size); memset (p, 0xa, total_size); gettimeofday (&t0, NULL); /* One shot memset */ memset (r, 0xd, total_size); gettimeofday (&t1, NULL); /* traverse in forward order */ for (j =3D 0; j < total_pages; j++) { memset (q + (j * SZ_4K), 0xc, SZ_4K); } gettimeofday (&t2, NULL); /* traverse in reverse order */ for (i =3D 0; i < total_pages; i++) { memset (p + total_size - (i + 1) * SZ_4K, 0xb, SZ_4K); } gettimeofday (&t3, NULL); free (p); free (q); free (r); /* Results time */ elapsed =3D ((t1.tv_sec - t0.tv_sec) * 1000000) + (t1.tv_usec - t0.tv_use= c); printf ("One shot: %d micro seconds\n", elapsed); elapsed =3D ((t2.tv_sec - t1.tv_sec) * 1000000) + (t2.tv_usec - t1.tv_use= c); printf ("Forward order: %d micro seconds\n", elapsed); elapsed =3D ((t3.tv_sec - t2.tv_sec) * 1000000) + (t3.tv_usec - t2.tv_use= c); printf ("Reverse order: %d micro seconds\n", elapsed); return 0; } =20 ---------------------------------------------------------------------------= --------------------- Results for ARM64 target (SM8150 , CPU0 & 6 are online, running at max freq= uency) All numbers are mean of 100 iterations. Variation is ignorable. - Oneshot : 3389.26 us - Forward : 8876.16 us - Reverse : 18157.6 us Results for x86-64 (Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz, only CPU 0 in = max frequency) All numbers are mean of 100 iterations. Variation is ignorable. - Oneshot : 3203.49 us - Forward : 5766.46 us - Reverse : 5187.86 us To conclude, I observed optimized serial writes in case of ARM processor. B= ut strangely, memset in reverse order performs better than forward order quite consistent= ly across multiple x86 machines. I don't have much insight into x86 so to clarify, I = would like to restrict my previous suspicion to ARM only. >=20 > Best Regards, > Huang, Ying