From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78639C43334 for ; Wed, 8 Jun 2022 19:14:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD0C46B0071; Wed, 8 Jun 2022 15:14:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C80416B0072; Wed, 8 Jun 2022 15:14:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B480F6B0073; Wed, 8 Jun 2022 15:14:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A58D76B0071 for ; Wed, 8 Jun 2022 15:14:08 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 4EA271213BD for ; Wed, 8 Jun 2022 19:14:08 +0000 (UTC) X-FDA: 79556018976.11.A3C7440 Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) by imf03.hostedemail.com (Postfix) with ESMTP id 9609D2005D for ; Wed, 8 Jun 2022 19:14:07 +0000 (UTC) Received: by mail-qk1-f170.google.com with SMTP id 68so7779263qkk.9 for ; Wed, 08 Jun 2022 12:14:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=RsiYEHMzgtofeSbg+NUkK3PrW8Vl8NwFPdNoBD4cGLE=; b=ZCKWwyjYMXMvERz1mEuKEwdnIsgZQcWXH/LH7IDAxkbgNWgeq9C1aj8P6prYLO6XIU oUTIItWtnoEITNukFot8hM2ff66s4GNgCnI/efoAR3ZJxqhaB6TbOTNXS8DI03qaiB/x FStgRZycmEnednA3Tt/LznzmnLMtI6AKrNQIS3PkfA426KT5R5qsuZF/GgPqG3v0dOgN OHg1AoUruS19BRSqe0+i3xyYE47z9DklV+qutktr/wVxAWJK5ZXt9MNmKzf9GvX/hYwM BV9TH8nK6MlDRJs7v3XTSqcGIp4x5lBMwNeT/R8SuKVjtl+8b4VkvHEZzER82xq2btcb l6Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=RsiYEHMzgtofeSbg+NUkK3PrW8Vl8NwFPdNoBD4cGLE=; b=KNaR7q8iVDlwVRS+3PS5abFCPKkJ9mb+N+4Aw1IeS/32++/CE9AqWlaS6h5JZQAHVc Q1XxIukkxGgtbmmPG+2YUF0bHdb27zm5yZyWR4yf+U+EMtjps4PT7b2Xy98qUG5piHCc 0e4a6DdnuUqGx88Xg/u9aYM3nemyqrMeK33fIbD6d1D+0gqiO/O4scJJS8qpFTAEt4Dg GDW+8x1FscacFF7B1p6J9P47yZcKtnZRdLgJ3MIRfLS5vU8Xq3NIOrZtiYh3SjwSKJYv oG77XRvZpM11Hfj1r/m3KMNQasWL/IOGt1B8mRS2DFsNH31kTdykXMGpGfebtnBd0Cmx xH5w== X-Gm-Message-State: AOAM531GswooyxwFU0g0HJWA40CFaaSQJUCM57AUHyZ42bPaVF/xxoM0 Vs5KgGYtk+8nbsaLqT5dFohZRw== X-Google-Smtp-Source: ABdhPJwc8FSCV47a62ziXYcRqDEpzNDsZiYoJN8dY4tp27t6NnpqMNl8YbBm6KS8OWYgWyKHXVNmbQ== X-Received: by 2002:a37:ad18:0:b0:6a6:1d94:ff50 with SMTP id f24-20020a37ad18000000b006a61d94ff50mr15483096qkm.426.1654715646596; Wed, 08 Jun 2022 12:14:06 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:4759]) by smtp.gmail.com with ESMTPSA id j19-20020a05620a289300b006a6ab259261sm11744690qkp.29.2022.06.08.12.14.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 08 Jun 2022 12:14:06 -0700 (PDT) Date: Wed, 8 Jun 2022 15:14:06 -0400 From: Johannes Weiner To: Tim Chen Cc: linux-mm@kvack.org, Hao Wang , Abhishek Dhanotia , "Huang, Ying" , Dave Hansen , Yang Shi , Davidlohr Bueso , Adam Manzanares , linux-kernel@vger.kernel.org, kernel-team@fb.com, Hasan Al Maruf Subject: Re: [PATCH] mm: mempolicy: N:M interleave policy for tiered memory nodes Message-ID: References: <20220607171949.85796-1-hannes@cmpxchg.org> <6096c96086187e51706898e58610fc0148b4ca23.camel@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6096c96086187e51706898e58610fc0148b4ca23.camel@linux.intel.com> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9609D2005D X-Stat-Signature: w7fz66ojow5ddspaupn89wp1fr5mgja9 X-Rspam-User: Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=ZCKWwyjY; spf=pass (imf03.hostedemail.com: domain of hannes@cmpxchg.org designates 209.85.222.170 as permitted sender) smtp.mailfrom=hannes@cmpxchg.org; dmarc=pass (policy=none) header.from=cmpxchg.org X-HE-Tag: 1654715647-782722 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Tim, On Wed, Jun 08, 2022 at 11:15:27AM -0700, Tim Chen wrote: > On Tue, 2022-06-07 at 13:19 -0400, Johannes Weiner wrote: > > > > /* Do dynamic interleaving for a process */ > > static unsigned interleave_nodes(struct mempolicy *policy) > > { > > unsigned next; > > struct task_struct *me = current; > > > > - next = next_node_in(me->il_prev, policy->nodes); > > + if (numa_tier_interleave[0] > 1 || numa_tier_interleave[1] > 1) { > > When we have three memory tiers, do we expect an N:M:K policy? > Like interleaving between DDR5, DDR4 and PMEM memory. > Or we expect an N:M policy still by interleaving between two specific tiers? In the context of the proposed 'explicit tiers' interface, I think it would make sense to have a per-tier 'interleave_ratio knob. Because the ratio is configured based on hardware properties, it can be configured meaningfully for the entire tier hierarchy, even if individual tasks or vmas interleave over only a subset of nodes. > The other question is whether we will need multiple interleave policies depending > on cgroup? > One policy could be interleave between tier1, tier2, tier3. > Another could be interleave between tier1 and tier2. This is a good question. One thing that has defined cgroup development in recent years is the concept of "work conservation". Moving away from fixed limits and hard partitioning, cgroups are increasingly configured with weights, priorities, and guarantees (cpu.weight, io.latency/io.cost.qos, memory.low). These weights and priorities are enforced when cgroups are directly competing over a resource; but if there is no contention, any active cgroup, regardless of priority, has full access to the surplus (which could be the entire host if the main load is idle). With that background, yes, we likely want some way of prioritizing tier access when multiple cgroups are competing. But we ALSO want the ability to say that if resources are NOT contended, a cgroup should interleave memory over all tiers according to optimal bandwidth. That means that regardless of how the competitive cgroup rules for tier access end up looking like, it makes sense to have global interleaving weights based on hardware properties as proposed here. The effective cgroup IL ratio for each tier could then be something like cgroup.tier_weight[tier] * tier/interleave_weight.