From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35200FA373D for ; Mon, 31 Oct 2022 01:34:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 349FB6B0071; Sun, 30 Oct 2022 21:34:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D39F6B0073; Sun, 30 Oct 2022 21:34:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 174686B0074; Sun, 30 Oct 2022 21:34:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E090E6B0071 for ; Sun, 30 Oct 2022 21:34:44 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6C027120375 for ; Mon, 31 Oct 2022 01:34:44 +0000 (UTC) X-FDA: 80079525288.11.94784AF Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf28.hostedemail.com (Postfix) with ESMTP id CD51DC0017 for ; Mon, 31 Oct 2022 01:34:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667180083; x=1698716083; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=Pr/aS2rdynHUpdhyMcCOSJRgj/kU8LUfG/+C6yMaxzY=; b=mrYlLBd3gva/qKdiX9m7e1NM9lR4wA5vf7OXVP8XXXGfxbYqbdeyVGWk qQ0m26qzeY8Yd1kSCofyMk0a/PG3Fiv3XXM9PAbZZL3Fzo+KI4ad/J/yt evJ+T0CsSU5wgUN6b3AJr5aBPoD/wj3MQFBQtPVKMoLPd7CfUBjdwCmMa Tvcktt/GKIXFIbkLsg3x70Rf3y5lqTlfyTDi/i/kwEDyJTs9wgfBd+JiN rkuVL2+EcFpMW+qR7hYSh4nl/d38dWfZ2mz8oiZQQB5TIgnm9oxSw/dIs K0+DgVj7H9gA0hNQxXT5y/+5vzBqxm4bAdd0rbSkmeEW6z0DTMqbgqsFN A==; X-IronPort-AV: E=McAfee;i="6500,9779,10516"; a="310490881" X-IronPort-AV: E=Sophos;i="5.95,227,1661842800"; d="scan'208";a="310490881" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2022 18:34:40 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10516"; a="696832417" X-IronPort-AV: E=Sophos;i="5.95,227,1661842800"; d="scan'208";a="696832417" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2022 18:34:37 -0700 From: "Huang, Ying" To: Bharata B Rao Cc: Aneesh Kumar K V , , , Andrew Morton , Alistair Popple , Dan Williams , Dave Hansen , "Davidlohr Bueso" , Hesham Almatary , Jagdish Gediya , Johannes Weiner , Jonathan Cameron , "Michal Hocko" , Tim Chen , Wei Xu , Yang Shi Subject: Re: [RFC] memory tiering: use small chunk size and more tiers References: <20221027065925.476955-1-ying.huang@intel.com> <578c9b89-10eb-1e23-8868-cdd6685d8d4e@linux.ibm.com> <877d0kk5uf.fsf@yhuang6-desk2.ccr.corp.intel.com> <59291b98-6907-0acf-df11-6d87681027cc@linux.ibm.com> <8735b8jy9k.fsf@yhuang6-desk2.ccr.corp.intel.com> <0d938c9f-c810-b10a-e489-c2b312475c52@amd.com> <87tu3oibyr.fsf@yhuang6-desk2.ccr.corp.intel.com> <07912a0d-eb91-a6ef-2b9d-74593805f29e@amd.com> Date: Mon, 31 Oct 2022 09:33:49 +0800 In-Reply-To: <07912a0d-eb91-a6ef-2b9d-74593805f29e@amd.com> (Bharata B. Rao's message of "Fri, 28 Oct 2022 19:23:33 +0530") Message-ID: <87leowepz6.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667180083; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q2NK+4BMt1cFzv1xISNIpUWytgwVP4Qu7XAYjTOKFrc=; b=2FtwuNilKv8GWzfOav02GT6bwsLZZrgOQohF11Js2unFOYI6Ln7vAiw1nuio4eVJrQNjv/ ZveWHPM15wqBJh1fojsQIWg0UN5dud/83DR3+vAGJ9kw1S9Nsyn44VGDV+kVOE4mi9L6U/ XrzchNhKJBoi5s4JXUldKDB5RMj26Mw= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=mrYlLBd3; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667180083; a=rsa-sha256; cv=none; b=F+Mi7r5LBlnJhXyWnx8e3bedBIJqJZ7gzUpGvDyIdnT1eU2PTFK/AwOoExc8g9ZrT+dOLY h+4MSSOt4fSSGimJcdGe0RRR119F+MZIqVThMDypvrlqigyKsxPVGo/3aGP9uOdZ8clDbl 07WBHHGnl5/w+bM4hySyFZlD5t5C5zY= Authentication-Results: imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=mrYlLBd3; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspam-User: X-Rspamd-Queue-Id: CD51DC0017 X-Rspamd-Server: rspam03 X-Stat-Signature: dmescdbhrrnzcx1cxkwcpnrf7ndzerxt X-HE-Tag: 1667180082-418520 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Bharata B Rao writes: > On 10/28/2022 2:03 PM, Huang, Ying wrote: >> Bharata B Rao writes: >> >>> On 10/28/2022 11:16 AM, Huang, Ying wrote: >>>> If my understanding were correct, you think the latency / bandwidth of >>>> these NUMA nodes will near each other, but may be different. >>>> >>>> Even if the latency / bandwidth of these NUMA nodes isn't exactly same, >>>> we should deal with that in memory types instead of memory tiers. >>>> There's only one abstract distance for each memory type. >>>> >>>> So, I still believe we will not have many memory tiers with my proposal. >>>> >>>> I don't care too much about the exact number, but want to discuss some >>>> general design choice, >>>> >>>> a) Avoid to group multiple memory types into one memory tier by default >>>> at most times. >>> >>> Do you expect the abstract distances of two different types to be >>> close enough in real life (like you showed in your example with >>> CXL - 5000 and PMEM - 5100) that they will get assigned into same tier >>> most times? >>> >>> Are you foreseeing that abstract distance that get mapped by sources >>> like HMAT would run into this issue? >> >> Only if we set abstract distance chunk size large. So, I think that >> it's better to set chunk size as small as possible to avoid potential >> issue. What is the downside to set the chunk size small? > > I don't see anything in particular. However > > - With just two memory types (default_dram_type and dax_slowmem_type > with adistance values of 576 and 576*5 respectively) defined currently, > - With no interface yet to set/change adistance value of a memory type, > - With no defined way to convert the performance characteristics info > (bw and latency) from sources like HMAT into a adistance value, > > I find it a bit difficult to see how a chunk size of 10 against the > existing 128 could be more useful. OK. Maybe we pay too much attention to specific number. My target isn't to push this specific RFC into kernel. I just want to discuss the design choices with community. My basic idea is NOT to group memory types into memory tiers via customizing abstract distance chunk size. Because that's hard to be used and implemented. So far, it appears that nobody objects this. Then, it's even better to avoid to adjust abstract chunk size in kernel as much as possible. This will make the life of the user space tools/scripts easier. One solution is to define more than enough possible tiers under DRAM (we have unlimited number of tiers above DRAM). In the upstream implementation, 4 tiers are possible below DRAM. That's enough for now. But in the long run, it may be better to define more. 100 possible tiers below DRAM may be too extreme. How about define the abstract distance of DRAM to be 1050 and chunk size to be 100. Then we will have 10 possible tiers below DRAM. That may be more than enough even in the long run? Again, the specific number isn't so important for me. So please suggest your number if necessary. Best Regards, Huang, Ying