From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E17CC54E58 for ; Tue, 26 Mar 2024 02:40:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2F2D6B0096; Mon, 25 Mar 2024 22:40:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DDDC36B0099; Mon, 25 Mar 2024 22:40:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C7E446B009A; Mon, 25 Mar 2024 22:40:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id B81706B0096 for ; Mon, 25 Mar 2024 22:40:02 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7BF6514078E for ; Tue, 26 Mar 2024 02:40:02 +0000 (UTC) X-FDA: 81937635444.03.F32EFF1 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.15]) by imf18.hostedemail.com (Postfix) with ESMTP id 25C651C0004 for ; Tue, 26 Mar 2024 02:39:59 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=N6vh83kf; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711420800; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tjwkluIVfDTIsp4UcND1T82W1UbcVehjQk3vVVPkQQE=; b=Ms9rkTW+Zk7f9EdP8PBV51RoNmXjnLh6+/xjfK7L855cVePc3KVLioO5r4jSIH5koo6A3S zlXtq/NzkPDq5Zu4RBoaxcqGigQOKn/9lx3oFLgvuY47dDJD1ZVd2Z0Ntwye3R/FdkONQI 73VWviY6rC7Vwlp2NukpNuMMxvpC08E= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=N6vh83kf; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf18.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.15 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711420800; a=rsa-sha256; cv=none; b=PfNZCcOa5caQLdXVQyvrnRmaGxaYw0iBm1FvW1tSBvQLlMDgm6Urhxl2ZSSw52iKj8VN/u fesKHlfjB39s+qNHlrpqVAdg/IuF0UzvsxEBk0Ttcdgx4Jobt60+wxQidiRILpKwCuHpsR 2/0PCIO5/GEozGAtgYtNqjutbx1WPIU= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711420800; x=1742956800; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=a+rFEOl+Jv1er7xV3rfzd9s10Ay9aPwSRWomvQxiApg=; b=N6vh83kfRYT2cu8QwQkvCLfhje+lQXhYsw+/l+uD0Rhb8lX6vvGsrm8I EBV3xA2+ZZ5ymeouP4tBQ+TcCkfm6njzIOgaor1hjwEGEN8eFSLl6S3ZN JcX+sNpMKUaaPcYDuivmp7C07yuv+Ql6ykV2Ewo98yscIXuW2sBqtVyWh ZjQFK/QWqIxfXG9NN+vfOkQwyOOkqEnf5YX/QnqqVvnoZsmwqT51P/xmC orGvUb3hqlfeW35c5jXb9r1PeYc1tqi2/b75Yrhda2FVSqvWdBqEhZIgJ FcTO27DuEPdzRCFcif0SBSIqnwyY8DPEKbyFcOywJFcZHe+0Psx9MQiL3 w==; X-IronPort-AV: E=McAfee;i="6600,9927,11024"; a="10248319" X-IronPort-AV: E=Sophos;i="6.07,155,1708416000"; d="scan'208";a="10248319" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by orvoesa107.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Mar 2024 19:39:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,155,1708416000"; d="scan'208";a="15879132" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmviesa006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 Mar 2024 19:39:53 -0700 From: "Huang, Ying" To: Donet Tom Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Aneesh Kumar , Michal Hocko , Dave Hansen , Mel Gorman , Feng Tang , Andrea Arcangeli , Peter Zijlstra , Ingo Molnar , Rik van Riel , Johannes Weiner , Matthew Wilcox , Vlastimil Babka , Dan Williams , Hugh Dickins , Kefeng Wang , Suren Baghdasaryan Subject: Re: [PATCH v4 0/2] Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy In-Reply-To: (Donet Tom's message of "Mon, 25 Mar 2024 09:24:12 -0500") References: Date: Tue, 26 Mar 2024 10:38:00 +0800 Message-ID: <875xx9pvjr.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 25C651C0004 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 4uh3eia3ozr8u3bjxxrsrigdyn31ffya X-HE-Tag: 1711420799-380197 X-HE-Meta: U2FsdGVkX18XDJ23ff4taoRka/o6Z3/f1PK/S5bkKlZVwTgivipiMX7y/srfw5Eu2ZLRYjOzgs6G4/+yX+9RqENDxHdL6nBQ8R+i4jv7l4Xx+NCzL/ok2YPEcve2V/PljGkRjlkbiOTYVuRvNC4nE31AKiiRmgi1Vfqk2GiGG14aL2Hzeqcd/tSbpKmLmQyrWYvuIPBzNXChLLKT74y8fVF4qRoJzeso1NAwaK+7pKXvwaR+ju5Vy5Blxecq9X2J0jMVKK+IyItKWHZ92LOOhiRMv4bKVCCqJquodVSbqb12JdsO2ISjtqawegZtWLcalwm7ZWhIVdOcGYQjmo7EQonCoVeDPVhlbIPGRNeYUUaafpk+grhkTjKK5fhIk6C3su9vEt4gFqrGH9yq5d+AGYs80BEAGCvecNqACRYFlwVSFIbAIy6DACNmajy+OTAh2pldJYDmeKdktV3y/RJQwqXA5hKyQcW25LvK0msEpFWWMOyJv1FDo09T3LInUoJ76PvQl6ryOai59cIlbtRwVn5EIjSI6l0z5v+M4WsxvrW2l7wG7NxJUnu6Z7mGBDcnWQyKcGFTFNPIhc1N+O2el9U4mkzATBp4uNfrNud18laY4kMz/mn9Wts5mwjCtM6zfNXMBTKbF2AUaNiAXLsr6ml7XX/T4qG0+uZ5uYJEG8u6oPNM7P+asR/yEWoXUOVYEn416cjhZo+P6yc89Cz/YapSKVuyHS1v94kNMQszClvcdxEMpy5geE2a8yyC35UNYiBGjdda5rWLvPy1vRJTwII9h5ni15GVJcmcFxzmyVGKnEnPSMdw3zcuhMZOhUsuOWy6ESXleDv+txzVSNcvO509QSQXK2uHyIZigXAX8L6hFR4+3XcODvrhZpKigj2PZ/d/ADvTF+4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Donet Tom writes: > This patchset is to optimize the cross-socket memory access with > MPOL_PREFERRED_MANY policy. > > To test this patch we ran the following test on a 3 node system. > Node 0 - 2GB - Tier 1 > Node 1 - 11GB - Tier 1 > Node 6 - 10GB - Tier 2 > > Below changes are made to memcached to set the memory policy, > It select Node0 and Node1 as preferred nodes. > > #include > #include > > unsigned long nodemask; > int ret; > > nodemask = 0x03; > ret = set_mempolicy(MPOL_PREFERRED_MANY | MPOL_F_NUMA_BALANCING, > &nodemask, 10); > /* If MPOL_F_NUMA_BALANCING isn't supported, > * fall back to MPOL_PREFERRED_MANY */ > if (ret < 0 && errno == EINVAL){ > printf("set mem policy normal\n"); > ret = set_mempolicy(MPOL_PREFERRED_MANY, &nodemask, 10); > } > if (ret < 0) { > perror("Failed to call set_mempolicy"); > exit(-1); > } > > Test Procedure: > =============== > 1. Make sure memory tiering and demotion are enabled. > 2. Start memcached. > > # ./memcached -b 100000 -m 204800 -u root -c 1000000 -t 7 > -d -s "/tmp/memcached.sock" > > 3. Run memtier_benchmark to store 3200000 keys. > > #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary > --threads=1 --pipeline=1 --ratio=1:0 --key-pattern=S:S --key-minimum=1 > --key-maximum=3200000 -n allkeys -c 1 -R -x 1 -d 1024 > > 4. Start a memory eater on node 0 and 1. This will demote all memcached > pages to node 6. > 5. Make sure all the memcached pages got demoted to lower tier by reading > /proc//numa_maps. > > # cat /proc/2771/numa_maps > --- > default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64 > default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64 > --- > > 6. Kill memory eater. > 7. Read the pgpromote_success counter. > 8. Start reading the keys by running memtier_benchmark. > > #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary > --pipeline=1 --distinct-client-seed --ratio=0:3 --key-pattern=R:R > --key-minimum=1 --key-maximum=3200000 -n allkeys > --threads=64 -c 1 -R -x 6 > > 9. Read the pgpromote_success counter. > > Test Results: > ============= > Without Patch > ------------------ > 1. pgpromote_success before test > Node 0: pgpromote_success 11 > Node 1: pgpromote_success 140974 > > pgpromote_success after test > Node 0: pgpromote_success 11 > Node 1: pgpromote_success 140974 > > 2. Memtier-benchmark result. > AGGREGATED AVERAGE RESULTS (6 runs) > ================================================================== > Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency > ------------------------------------------------------------------ > Sets 0.00 --- --- --- --- > Gets 305792.03 305791.93 0.10 0.18949 0.16700 > Waits 0.00 --- --- --- --- > Totals 305792.03 305791.93 0.10 0.18949 0.16700 > > ====================================== > p99 Latency p99.9 Latency KB/sec > ------------------------------------- > --- --- 0.00 > 0.44700 1.71100 11542.69 > --- --- --- > 0.44700 1.71100 11542.69 > > With Patch > --------------- > 1. pgpromote_success before test > Node 0: pgpromote_success 5 > Node 1: pgpromote_success 89386 > > pgpromote_success after test > Node 0: pgpromote_success 57895 > Node 1: pgpromote_success 141463 > > 2. Memtier-benchmark result. > AGGREGATED AVERAGE RESULTS (6 runs) > ==================================================================== > Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency > -------------------------------------------------------------------- > Sets 0.00 --- --- --- --- > Gets 521942.24 521942.07 0.17 0.11459 0.10300 > Waits 0.00 --- --- --- --- > Totals 521942.24 521942.07 0.17 0.11459 0.10300 > > ======================================= > p99 Latency p99.9 Latency KB/sec > --------------------------------------- > --- --- 0.00 > 0.23100 0.31900 19701.68 > --- --- --- > 0.23100 0.31900 19701.68 > > > Test Result Analysis: > ===================== > 1. With patch we could observe pages are getting promoted. > 2. Memtier-benchmark results shows that, with the patch, > performance has increased more than 50%. > > Ops/sec without fix - 305792.03 > Ops/sec with fix - 521942.24 > > Changes: > V4 > - Added an example in the "PATCH 2/2" commit message as per the discussion > from V3. > V3: > - Added "* @vmf: structure describing the fault" comment for > mpol_misplaced() to fix the warning. > https://lore.kernel.org/oe-kbuild-all/202403202229.WZeAnUuO-lkp@intel.com/ > -https://lore.kernel.org/lkml/cover.1711002865.git.donettom@linux.ibm.com/ > v2: > - Rebased on latest upstream (v6.8-rc7) > - Used 'numa_node_id()' to get the current execution node ID, Added > 'lockdep_assert_held' to make sure that the 'mpol_misplaced()' is > called with ptl held. > - The migration condition has been updated; now, migration will only > occur if the execution node is present in the policy nodemask. > -https://lore.kernel.org/lkml/cover.1709909210.git.donettom@linux.ibm.com/ > > -v1: https://lore.kernel.org/linux-mm/9c3f7b743477560d1c5b12b8c111a584a2cc92ee.1708097962.git.donettom@linux.ibm.com/#t > > > Donet Tom (2): > mm/mempolicy: Use numa_node_id() instead of cpu_to_node() > mm/numa_balancing:Allow migrate on protnone reference with > MPOL_PREFERRED_MANY policy > > include/linux/mempolicy.h | 5 +++-- > mm/huge_memory.c | 2 +- > mm/internal.h | 2 +- > mm/memory.c | 8 +++++--- > mm/mempolicy.c | 36 +++++++++++++++++++++++++++--------- > 5 files changed, 37 insertions(+), 16 deletions(-) LGTM, Thanks! Feel free to add Reviewed-by: "Huang, Ying" in the future version. -- Best Regards, Huang, Ying