From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22349C3DA6E for ; Thu, 4 Jan 2024 01:33:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D2326B0252; Wed, 3 Jan 2024 20:33:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 982026B0254; Wed, 3 Jan 2024 20:33:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8492D6B0256; Wed, 3 Jan 2024 20:33:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 69D5C6B0252 for ; Wed, 3 Jan 2024 20:33:12 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 41727A0180 for ; Thu, 4 Jan 2024 01:33:12 +0000 (UTC) X-FDA: 81639905424.28.7708ED0 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by imf14.hostedemail.com (Postfix) with ESMTP id 734F6100014 for ; Thu, 4 Jan 2024 01:33:10 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OawkKxbt; spf=pass (imf14.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704331990; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IkptymwIcolGBCWQSViAkLeAN8+cobOJ+37e+oCBn8M=; b=LH5qtlR+oP2q/K//6TkQMc6Bqh2sN5bH3WfpJUCX2OdBPx85nTVtK/52gLAeIp2njg4w9K N4Dlm3wG6o9H8Q5mM+C54R+zh+MGgZ/1/IaDEv64zOGSX5tQncBaYKcdLeL6cyWem7EKf6 Tri77qA6sSf6e+Fdkb12qgZUjD1yqkY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704331990; a=rsa-sha256; cv=none; b=Su88wPFQORRTO4VivA18MH46s59tFzbbz8Tj6QQR/IvN/pNpv4y2G9UKO6we2aa3IDF9ff iUPxGUk7j+yWnG8qGnsUDU1jC8vnPr7G+IK8peeiAvvrvsmteTXmPdX+snRUcGosWURwZ+ g3joDug5Oh/iJykTYe0Xm9cSH/88lDY= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OawkKxbt; spf=pass (imf14.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.182 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-6d9cdd0a5e6so10407b3a.3 for ; Wed, 03 Jan 2024 17:33:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704331989; x=1704936789; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=IkptymwIcolGBCWQSViAkLeAN8+cobOJ+37e+oCBn8M=; b=OawkKxbtZ31b8YvT5Rd6vvAkl2tHJ7BwwEjIHoIe7C1C/zU3xsXx1tnhwk4VRrX8Ft eAxrB/T/OdTqtHYCf4KDtwVyFXpyghIkExXeJCvz3QCwHM9FwGMVO58Ab776xzgCdemB IEThF7pgP9WjvIGmoxu0s1Fe/Qx+N9V5nSOcq4vws47u0HImLkz3EPg4kZrHhGxTWdha 4WXWhYdMzcRlvvQpvRdRhYJAZnKX4rELKPgjNyTuer2JyMQZtAn/O6nI/usQ7EF4eZIm BGkSryqEC69lQEjTn6Y/A/1oz08I7iLEK9EXXMbIx5Kq3EgA/kypIFyMngEy69gTKnBV 5gcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704331989; x=1704936789; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IkptymwIcolGBCWQSViAkLeAN8+cobOJ+37e+oCBn8M=; b=mFMltA9QexatmToOjZf0r8Pb4fgXC+c02edhTi8+6STihnWFFd332Mu1IwFP+J1Gkk cMO3MA4FF8fsLjK4UFNInDxiHx+FPQhmVT8qSGvSWCGCCjiKVEBTLFDsDo53J7vOEsH9 wPWUc34p2nh+f+tZtwfjOUxEmWn2O8w4y/2J9mQq7mz1xBzXsM2qTl//oJdbDjyjQGAj i45LBCOfUvYyNBx5rKy/bKA/w9m3WkXfdkRz2rkBqfPvrGF9KoMSNnJO0XHEgpDsD56v 1xquPTL94U6r+TZNnmAIX4vDjeWfMebLqx18kV88Iy1b4FL9swHIuaKxjOMrDmAKdJ1H TChA== X-Gm-Message-State: AOJu0Yx7B6z6KOQ/fjRKzApDUR2DV2t13iIVYTp0kf9lFW2Jdy0fbsSf tMJ9JzZ4r5qhp30QXDMX6ZJeUkBBEWfq+abWA7w= X-Google-Smtp-Source: AGHT+IFH8MA3VH9B/1LcoFUyQVWZLgqhnBrenWbWB//u91NtAW+irtDyYRK1zRxJIpsMrjEhqk+uFLYmrEbpmpEtXd8= X-Received: by 2002:a05:6a20:e11f:b0:198:21c5:657f with SMTP id kr31-20020a056a20e11f00b0019821c5657fmr1374782pzb.39.1704331989149; Wed, 03 Jan 2024 17:33:09 -0800 (PST) MIME-Version: 1.0 References: <202312192310.56367035-oliver.sang@intel.com> <5753c5cb-62e3-42e6-bf04-b12b4c77b259@intel.com> <988d265a-29a0-4252-9bdc-c47659e336c3@intel.com> In-Reply-To: <988d265a-29a0-4252-9bdc-c47659e336c3@intel.com> From: Yang Shi Date: Wed, 3 Jan 2024 17:32:56 -0800 Message-ID: Subject: Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression To: "Yin, Fengwei" Cc: kernel test robot , Rik van Riel , oe-lkp@lists.linux.dev, lkp@intel.com, Linux Memory Management List , Andrew Morton , Matthew Wilcox , Christopher Lameter , ying.huang@intel.com, feng.tang@intel.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 734F6100014 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: dfhyx5uw8c5u5djh4tkf5whbwfsifmy8 X-HE-Tag: 1704331990-488489 X-HE-Meta: U2FsdGVkX19iD+vvaV+M7vnLwh04lTx/a4Z4DUWXLjLqPGcC8fn9usHoXq6JbWUo+0CP4STJtJ/72+32E/bG0d/zpBWrrZ3UAT1UbRSsPIhW4SJCCAUS1xzACONiuB1KOIVTxHHJBkz6eSzCvd6eG9tP4aCEbvfU0mDXDxbOdv62tXzzRBpDonvYMwZ0wUvgNESphx02sJQXgAyqnLxBIcXODDreTVfSJpsF5K9aB8AMjXaYKUEI+BJIliEJ9Tkk9kt4rx65GReiULTZ1utPW3S46UyjZAo8GutjHExmnS4TN0OkOh5sH+j6WaUowHhYelmsaB2F5Oia0BE1XoJtE9qGAmhT8YFcO7ZE2dQrruvokZAOszD94q68IdvzWj942VPaYNphdOHWcHfGjDGvuG2TK9v+BPU0X0wOGqbvOp3Lqk61IdaIAwGViRSiF2WPqhDWCi0kDdGrMcDcWoHudqx96USPxRrNpi0OY2WOE4tnRTSnxTFh47N/juaQTjBFLBO/RyJWo9a27xGO5WY8FAgwUcwm8zFkCbfQb0UmI7HBQjgGpiyBNyiknIjUbBw7KU8VG8naw+gEYLjinVWXa5YqZbcminhAkMG9PSeVDmExa1/o+RaU5bht0f7taDXA+kGYEFwDL28/NKk/3ie+PmaijUhFWMSZlqf2hUT7FkUccWPHBHRTpuUn5KF/u0fmAEGu7wLFocVp0hbmti/Z+nBkvhTj+S9UlPnv3IHqHtHSL2u+PJkXqb91ASDLVN05jg455iOOinQZzdvP3XnSqsMmyo9F03pOrS6R4M4H5fSp8f3zW9y1BvhQxZayJKcjMGAVh0+jgdD1vremKFDbBPVv+pQVk5XBFPVcaIY5FQNH5yqRLU7CbEihfx8gUV8wmKGSb67oGxPuTBQRl2RaOOtDGGH7oJDwL/mPFtz+ixTQDLBb1pT79MTbWctGsasTvXx0LneYIAtF9QH3kCn GbfrUh5E Ouse6APgFkcKTOfyWwjEfmh+1qkZDd1szOn9pXcm76+TftS07J6SYGmPakS2/ub5mgBBb/yVc6aVg7JLnFDJ2birOBVnJWSevudMffNGafYYCaUAH/O5X7tCPBtt/QiQ7Os3846/lcsKS02KqoIwDR33DsRVKqTObEUVruZ7KFO45PDZn3zrwFsHj84VkmGL4ArXoQYZ2bT/sGPrjDEJrXSjtRrxhWCKW+gx8zd5qXikP2o2oIg3fM+dg88qlKGXLkrgGYfPbdHxMwKKUNCKOjhkPmCYdyG3WzjZX61uvTKRPb4uL9h3PuQdD+m0RbsEkvfQVfONxtVOXvJPlVpMqEsZ2JC0gVItIxOxrFeShgF/SExhpz/BShIxngwLASv2nwaUXejL3lss9Zm4BaMgXUoJlVWEHz6fCN3C0tlJvrG+DNKC+mw+Y26oaTc0MZNHD5ns0WO5zgn2UEWOhI9EzP6AqhXMTYF3QdmNuplmDd74b5ys= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 21, 2023 at 5:13=E2=80=AFPM Yin, Fengwei wrote: > > > > On 12/22/2023 2:11 AM, Yang Shi wrote: > > On Thu, Dec 21, 2023 at 5:40=E2=80=AFAM Yin, Fengwei wrote: > >> > >> > >> > >> On 12/21/2023 8:58 AM, Yin Fengwei wrote: > >>> But what I am not sure was whether it's worthy to do such kind of cha= nge > >>> as the regression only is seen obviously in micro-benchmark. No evide= nce > >>> showed the other regressionsin this report is related with madvise. A= t > >>> least from the perf statstics. Need to check more on stream/ramspeed. > >>> Thanks. > >> > >> With debugging patch (filter out the stack mapping from THP aligned), > >> the result of stream can be restored to around 2%: > >> > >> commit: > >> 30749e6fbb3d391a7939ac347e9612afe8c26e94 > >> 1111d46b5cbad57486e7a3fab75888accac2f072 > >> 89f60532d82b9ecd39303a74589f76e4758f176f -> 1111d46b5cbad with > >> debugging patch > >> > >> 30749e6fbb3d391a 1111d46b5cbad57486e7a3fab75 89f60532d82b9ecd39303a745= 89 > >> ---------------- --------------------------- -------------------------= -- > >> 350993 -15.6% 296081 =C2=B1 2% -1.5% 345= 689 > >> stream.add_bandwidth_MBps > >> 349830 -16.1% 293492 =C2=B1 2% -2.3% 341= 860 =C2=B1 > >> 2% stream.add_bandwidth_MBps_harmonicMean > >> 333973 -20.5% 265439 =C2=B1 3% -1.7% 328= 403 > >> stream.copy_bandwidth_MBps > >> 332930 -21.7% 260548 =C2=B1 3% -2.5% 324= 711 =C2=B1 > >> 2% stream.copy_bandwidth_MBps_harmonicMean > >> 302788 -16.2% 253817 =C2=B1 2% -1.4% 298= 421 > >> stream.scale_bandwidth_MBps > >> 302157 -17.1% 250577 =C2=B1 2% -2.0% 296= 054 > >> stream.scale_bandwidth_MBps_harmonicMean > >> 339047 -12.1% 298061 -1.4% 334206 > >> stream.triad_bandwidth_MBps > >> 338186 -12.4% 296218 -2.0% 331469 > >> stream.triad_bandwidth_MBps_harmonicMean > >> > >> > >> The regression of ramspeed is still there. > > > > Thanks for the debugging patch and the test. If no one has objection > > to honor MAP_STACK, I'm going to come up with a more formal patch. > > Even though thp_get_unmapped_area() is not called for MAP_STACK, stack > > area still may be allocated at 2M aligned address theoretically. And > > it may be worse with multi-sized THP, for 1M. > Right. Filtering out MAP_STACK can't make sure no THP for stack. Just > reduce the possibility of using THP for stack. Can you please help test the below patch? diff --git a/include/linux/mman.h b/include/linux/mman.h index 40d94411d492..dc7048824be8 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags) return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | + _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | arch_calc_vm_flag_bits(flags); } But I can't reproduce the pthread regression on my aarch64 VM. It might be due to the guard stack (the 64K guard stack is at 2M aligned, the 8M stack is right next to it which starts at 2M + 64K). But I can see the stack area is not THP eligible anymore with this patch. See: fffd18e10000-fffd19610000 rw-p 00000000 00:00 0 Size: 8192 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Rss: 12 kB Pss: 12 kB Pss_Dirty: 12 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 12 kB Referenced: 12 kB Anonymous: 12 kB KSM: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB FilePmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 0 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd wr mr mw me ac nh The "nh" flag is set. > > > > > Do you have any instructions regarding how to run ramspeed? Anyway I > > may not have time debug it until after holidays. > 0Day leverages phoronix-test-suite to run ramspeed. So I don't have > direct answer here. > > I suppose we could check the configuration of ramspeed in phoronix-test- > suite to understand what's the build options and command options to run > ramspeed: > https://openbenchmarking.org/test/pts/ramspeed Downloaded the test suite. It looks phronix just runs test 3 (int) and 6 (float). They basically does 4 sub tests to benchmark memory bandwidth: * copy * scale copy * add copy * triad copy The source buffer is initialized (page fault is triggered), but the destination area is not. So the page fault + page clear time is accounted to the result. Clearing huge page may take a little bit more time. But I didn't see noticeable regression on my aarch64 VM either. Anyway I'm supposed such test should be run with THP off. > > > Regards > Yin, Fengwei > > > > >> > >> > >> Regards > >> Yin, Fengwei