From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62458C54791 for ; Mon, 11 Mar 2024 01:47:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E59366B0072; Sun, 10 Mar 2024 21:47:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DE1F76B0075; Sun, 10 Mar 2024 21:47:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5AF66B0078; Sun, 10 Mar 2024 21:47:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id AD9FF6B0072 for ; Sun, 10 Mar 2024 21:47:11 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 7485840C56 for ; Mon, 11 Mar 2024 01:47:11 +0000 (UTC) X-FDA: 81883070262.08.C3CC842 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) by imf05.hostedemail.com (Postfix) with ESMTP id 9265310000B for ; Mon, 11 Mar 2024 01:47:09 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=KbQcb4tM; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710121629; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Nr0sLV7fiBws5WGrC7xcJaoL73JQZNogIXUDTCtocP0=; b=hIwfQfHqxddiuaFbXvuzDzSRUfJ19jnma60nwaPoWEjc/4gnzz63uVG4gkPiOLoMPYEuJE NrciI/XZw5fwUSWUslemabYzGRidkdXbFw/OQ4hQNHDwSH+YfDubFkajz2aSbxJwk6fGQq a7mrvHab8DCburtf6dLDz/RhX5T/P9E= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=KbQcb4tM; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 192.198.163.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710121629; a=rsa-sha256; cv=none; b=2UJZ+e/YbktxRz/spi0hnSeWS/xdNa/J9VtoV1G+Fq0z+MJpRva8Q2Ntc6yBMZufPr9rGO zD9toKtCg5hg+nuFWV4vD8+TMry1VTMko521dhS6wzikpynCrAX2G7irBnEvsLn/btwE9h a5sJ0sJExtuSpjUHJWT7OxVPLCjKrlQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710121629; x=1741657629; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=x8vNQYnIKfWnz4IwxFgey2b527iwOpVxHgeN9LfHKNw=; b=KbQcb4tMZ2kgby4bc20xMWfgDI/2rH6G923VAIkw3Km1hlu6xlu7680b 1BZtNpos2x1bePzK72q697K/a9CZ/kcJc790uEERlcjXC6xi0uK/R3Nia EqSKHWVPMAbF6maq9JmFukPg0QkxfcXIdFnwtdZZnqy0vaq2+CTB5OdX0 JQ6aca8/JMNs63A2Q5tTJ4FgQBas9X9d51RA6Du3LvNNZdfXUKWToPjgm IdNjnNwmtuRtXpVpoyGlAR2DFjSCg0zuXazFCKSCCYFsFhzdum3lyCVYJ T4z9UKMc4vn/UluV5PaVL4ny4kSeeXNbLO/+8UZh9J/LNlGu+a/HVSFpy Q==; X-IronPort-AV: E=McAfee;i="6600,9927,11009"; a="4936333" X-IronPort-AV: E=Sophos;i="6.07,115,1708416000"; d="scan'208";a="4936333" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2024 18:47:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,115,1708416000"; d="scan'208";a="41945296" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Mar 2024 18:47:03 -0700 From: "Huang, Ying" To: Donet Tom Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aneesh Kumar , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan Subject: Re: [PATCH v2 0/2] Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy In-Reply-To: (Donet Tom's message of "Fri, 8 Mar 2024 09:15:36 -0600") References: Date: Mon, 11 Mar 2024 09:45:08 +0800 Message-ID: <87zfv54k4b.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 9265310000B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: imfiu84rsmuwftwnu411pg5tcwd6ahnh X-HE-Tag: 1710121629-6244 X-HE-Meta: U2FsdGVkX1+1g/8G+26fgNTVClkYMjSrfy80ViLIGrGr/i5nd73Au6xikwBhQZCPecNJ7UaZF1f/vtRuuYnnkC5mt7psbNsZ6kFNoBCoYGR2UuQS3UkC005TU0fcWA8/SbWABBlWwkocfex7Ns4KtwqxXwiAqzOrxjxTm4QEyifEN3bWzenatPicQGI7TVS/s2uECoIUTDDvtCIxm4HDJzCB+XIYTaogl7IcdVl6ZJ8NwvqfwXy5UeD12TN1VxeV2mRdIbcwy7KWVJ0LA++LNQiQOlvmMIGd1pvKu5RSdC6rUwTrZvdABKbkUOxihvC+PVGm8M1hp1oENRZYhkmZ0nCfV0/Jp7CpTPa4AP9bM3Yj6HQ4D9/z7W32BFU0Ii1GBjQJBbvfi+7ZoaSIl3i0GJyKCgiK2oAQIC8kcldhxo6bG+aX3AbyXZ86hh4XeSMzdX7QmHDam0soiLAyOE3Yt0aP1WfLkAaRWvrNCOz4Bx4JJTaw5uA33DbXwvVa1l6NeMcDVQcm1A1HWuE2JBVkfJJdeiLyTLoicOX4lk/inDlxfA4/lFGfR+SlfizkfUTBURgOYVGct023y5dtdQfzHPbWiXC7vLAv1YJYgNhTmQoxGsouBJ48W3li52zYp4NfhyTuaTew1kNG+dClmte21MLzAJ0Bj1E/P6e5v2Wb3HaNLY9eTd8iz58pgTaZTZxJGJdgMl6Q+vQdDWev7EsZj+35AirexvFl2YDbEdFUA7vjOGq3ZlFZgNM9Kak2t1FeigJ/OaJfUzGKzGFXkXM3PZC3bs9FbbP4TkToYx1evNxByG1iB/clp3VGM1yYqeJxjNkI3egWHme1wzv4wtMGryyrI6zr09PHwGG4YGCh2Tezwf7Qkg4534/MAOFuoDojyoR/zQOcd53n0QnVO2s2og== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Donet Tom writes: > This patchset is to optimize the cross-socket memory access with > MPOL_PREFERRED_MANY policy. > > To test this patch we ran the following test on a 3 node system. > Node 0 - 2GB - Tier 1 > Node 1 - 11GB - Tier 1 > Node 6 - 10GB - Tier 2 > > Below changes are made to memcached to set the memory policy, > It select Node0 and Node1 as preferred nodes. > > #include > #include > > unsigned long nodemask; > int ret; > > nodemask = 0x03; > ret = set_mempolicy(MPOL_PREFERRED_MANY | MPOL_F_NUMA_BALANCING, > &nodemask, 10); > /* If MPOL_F_NUMA_BALANCING isn't supported, > * fall back to MPOL_PREFERRED_MANY */ > if (ret < 0 && errno == EINVAL){ > printf("set mem policy normal\n"); > ret = set_mempolicy(MPOL_PREFERRED_MANY, &nodemask, 10); > } > if (ret < 0) { > perror("Failed to call set_mempolicy"); > exit(-1); > } > > Test Procedure: > =============== > 1. Make sure memory tiring and demotion are enabled. Nit picking. s/tiring/tiering/ -- Best Regards, Huang, Ying > 2. Start memcached. > > # ./memcached -b 100000 -m 204800 -u root -c 1000000 -t 7 > -d -s "/tmp/memcached.sock" > > 3. Run memtier_benchmark to store 3200000 keys. > > #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary > --threads=1 --pipeline=1 --ratio=1:0 --key-pattern=S:S --key-minimum=1 > --key-maximum=3200000 -n allkeys -c 1 -R -x 1 -d 1024 > > 4. Start a memory eater on node 0 and 1. This will demote all memcached > pages to node 6. > 5. Make sure all the memcached pages got demoted to lower tier by reading > /proc//numa_maps. > > # cat /proc/2771/numa_maps > --- > default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64 > default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64 > --- > > 6. Kill memory eater. > 7. Read the pgpromote_success counter. > 8. Start reading the keys by running memtier_benchmark. > > #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary > --pipeline=1 --distinct-client-seed --ratio=0:3 --key-pattern=R:R > --key-minimum=1 --key-maximum=3200000 -n allkeys > --threads=64 -c 1 -R -x 6 > > 9. Read the pgpromote_success counter. > > Test Results: > ============= > Without Patch > ------------------ > 1. pgpromote_success before test > Node 0: pgpromote_success 11 > Node 1: pgpromote_success 140974 > > pgpromote_success after test > Node 0: pgpromote_success 11 > Node 1: pgpromote_success 140974 > > 2. Memtier-benchmark result. > AGGREGATED AVERAGE RESULTS (6 runs) > ================================================================== > Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency > ------------------------------------------------------------------ > Sets 0.00 --- --- --- --- > Gets 305792.03 305791.93 0.10 0.18949 0.16700 > Waits 0.00 --- --- --- --- > Totals 305792.03 305791.93 0.10 0.18949 0.16700 > > ====================================== > p99 Latency p99.9 Latency KB/sec > ------------------------------------- > --- --- 0.00 > 0.44700 1.71100 11542.69 > --- --- --- > 0.44700 1.71100 11542.69 > > With Patch > --------------- > 1. pgpromote_success before test > Node 0: pgpromote_success 5 > Node 1: pgpromote_success 89386 > > pgpromote_success after test > Node 0: pgpromote_success 57895 > Node 1: pgpromote_success 141463 > > 2. Memtier-benchmark result. > AGGREGATED AVERAGE RESULTS (6 runs) > ==================================================================== > Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency > -------------------------------------------------------------------- > Sets 0.00 --- --- --- --- > Gets 521942.24 521942.07 0.17 0.11459 0.10300 > Waits 0.00 --- --- --- --- > Totals 521942.24 521942.07 0.17 0.11459 0.10300 > > ======================================= > p99 Latency p99.9 Latency KB/sec > --------------------------------------- > --- --- 0.00 > 0.23100 0.31900 19701.68 > --- --- --- > 0.23100 0.31900 19701.68 > > > Test Result Analysis: > ===================== > 1. With patch we could observe pages are getting promoted. > 2. Memtier-benchmark results shows that, with the patch, > performance has increased more than 50%. > > Ops/sec without fix - 305792.03 > Ops/sec with fix - 521942.24 > > Changes: > v2: > - Rebased on latest upstream (v6.8-rc7) > - Used 'numa_node_id()' to get the current execution node ID, Added > 'lockdep_assert_held' to make sure that the 'mpol_misplaced()' is > called with ptl held. > - The migration condition has been updated; now, migration will only > occur if the execution node is present in the policy nodemask. > > -v1: https://lore.kernel.org/linux-mm/9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com/#t > > > Donet Tom (2): > mm/mempolicy: Use numa_node_id() instead of cpu_to_node() > mm/numa_balancing:Allow migrate on protnone reference with > MPOL_PREFERRED_MANY policy > > include/linux/mempolicy.h | 5 +++-- > mm/huge_memory.c | 2 +- > mm/internal.h | 2 +- > mm/memory.c | 8 +++++--- > mm/mempolicy.c | 34 ++++++++++++++++++++++++++-------- > 5 files changed, 36 insertions(+), 15 deletions(-)