From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1EC6C4332F for ; Wed, 19 Oct 2022 02:06:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D1906B0072; Tue, 18 Oct 2022 22:06:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 681636B0073; Tue, 18 Oct 2022 22:06:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 570266B0074; Tue, 18 Oct 2022 22:06:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 44AE36B0072 for ; Tue, 18 Oct 2022 22:06:35 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id F32F5A976C for ; Wed, 19 Oct 2022 02:06:34 +0000 (UTC) X-FDA: 80036059908.29.3D8EDF7 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf19.hostedemail.com (Postfix) with ESMTP id 070DD1A0030 for ; Wed, 19 Oct 2022 02:06:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1666145194; x=1697681194; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=wJzOhrQWqcsuSASOVqRvm9mp5Iy2u4n3d/wxjNCySmY=; b=YpGNAV7UQC8KA0D3dh6pB+cYkEwHLp9kgFZMUtaRTHRl8WEZtRXn5Rw6 t2BnTa/CDu03rmYXr8Z2EukQt22kqjPIE78WZty/YdVHT0FlRji3cFMNc j9bSKudfa5TGGwwo682nvc65w5QZ+cYSdMCDa9uCiFSg4mdJ3Gq8mVkor p+hndMTGJdYBY4UzRoenN84v2xcgt84BPlqYAweHUYwluBUoMvcbwFSqc c8m66VKpB6z96RR2MMDokw3PN8V+pnmjTkMItCroDkOECYTnsvW8ROKQm 2WwkDSJaW3OzP981eurS/z582wqfS3L0ZUN2YPCaUNiM0DgBp7Osv387Q Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10504"; a="306270607" X-IronPort-AV: E=Sophos;i="5.95,194,1661842800"; d="scan'208";a="306270607" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2022 19:06:32 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10504"; a="874201142" X-IronPort-AV: E=Sophos;i="5.95,194,1661842800"; d="scan'208";a="874201142" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Oct 2022 19:06:29 -0700 From: "Huang, Ying" To: kernel test robot Cc: Rik van Riel , , , Andrew Morton , Yang Shi , Matthew Wilcox , , , , , Subject: Re: [mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression References: <202210181535.7144dd15-yujie.liu@intel.com> Date: Wed, 19 Oct 2022 10:05:50 +0800 In-Reply-To: <202210181535.7144dd15-yujie.liu@intel.com> (kernel test robot's message of "Tue, 18 Oct 2022 16:44:59 +0800") Message-ID: <87edv4r2ip.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666145194; a=rsa-sha256; cv=none; b=036tYlF5O8iJwpF8kmEgoQzcxRJOsb2kWqJlQrUTw/eB+CRR8kDsUtzVdy0jM3Dzn3XrNb PpYfJCumGhblXgGRUNu0Zk542SxnFsnrRGgl1dhB7cbj9CW9vMOW+dN2RvHICE92WkXM7y nJ/QGJp9MFfjfOamSGEqj4gB5Dauh9Y= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=YpGNAV7U; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666145194; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c5nK0JCPpNEDvvauAqZtwzB404oGBk/KNA3Lg5ejedQ=; b=M1bZz207myYMmxL+9MoPe3kftSwKlCqVD2nuTOvPKMPqZGDNNhkfiyX7jWWWopS8dpy+l+ 7xJt0WkfUkZ/bwg0i+0bm+Cu1X+RHBhS2LZCudGzh9Dq3xxmpIsvFYa8S/3AY5V5w7nebp jl2ZoNNxNI4YGdNBHFubamL2sHUukaU= X-Rspamd-Server: rspam05 X-Rspam-User: X-Rspamd-Queue-Id: 070DD1A0030 Authentication-Results: imf19.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=YpGNAV7U; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com X-Stat-Signature: umut56tnmmgc3u8fu5i3qyso1x7hrc5i X-HE-Tag: 1666145193-198404 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Yujie, > 32528 48% +147.6% 80547 38% numa-meminfo.node0.AnonHugePages > 92821 23% +59.3% 147839 28% numa-meminfo.node0.AnonPages The Anon pages allocated are much more than the parent commit. This is expected, because THP instead of normal page will be allocated for aligned memory area. > 95.23 -79.8 15.41 6% perf-profile.calltrace.cycles-pp.__munmap > 95.08 -79.7 15.40 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap > 95.02 -79.6 15.39 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 94.96 -79.6 15.37 6% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 94.95 -79.6 15.37 6% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap > 94.86 -79.5 15.35 6% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe > 94.38 -79.2 15.22 6% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 > 42.74 -42.7 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap > 42.74 -42.7 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap > 42.72 -42.7 0.00 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap > 41.84 -41.8 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region > 41.70 -41.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain > 41.62 -41.6 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region > 41.55 -41.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu > 41.52 -41.5 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu > 41.28 -41.3 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush In the parent commit, most CPU cycles are used for contention on LRU lock. > 0.00 +4.8 4.82 7% perf-profile.calltrace.cycles-pp.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault > 0.00 +4.9 4.88 7% perf-profile.calltrace.cycles-pp.zap_huge_pmd.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region > 0.00 +8.2 8.22 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.rmqueue.get_page_from_freelist > 0.00 +8.2 8.23 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages > 0.00 +8.3 8.35 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page.release_pages > 0.00 +8.3 8.35 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.free_pcppages_bulk.free_unref_page.release_pages.tlb_batch_pages_flush > 0.00 +8.4 8.37 8% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu > 0.00 +9.6 9.60 6% perf-profile.calltrace.cycles-pp.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region > 0.00 +65.5 65.48 2% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault > 0.00 +72.5 72.51 2% perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault With the commit, most CPU cycles are consumed for clear huge page. This is expected. We allocate more pages, so, we need more cycles to clear them. Check the source code of test case (will-it-scale/malloc1), I found that it will allocate some memory with malloc() then free it. In the parent commit, because the virtual memory address isn't aligned with 2M, normal page will be allocated. With the commit, THP will be allocated, so more page clearing and less LRU lock contention. I think this is the expected behavior of the commit. And the test case isn't so popular (malloc() then free() but don't access the memory allocated). So this regression isn't important. We can just ignore it. Best Regards, Huang, Ying