From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31C39C433ED for ; Thu, 15 Apr 2021 20:25:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C0C0161107 for ; Thu, 15 Apr 2021 20:25:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C0C0161107 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 258176B006C; Thu, 15 Apr 2021 16:25:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 207F96B0070; Thu, 15 Apr 2021 16:25:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0837B6B0071; Thu, 15 Apr 2021 16:25:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id DE3666B006C for ; Thu, 15 Apr 2021 16:25:20 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9B96618045A45 for ; Thu, 15 Apr 2021 20:25:20 +0000 (UTC) X-FDA: 78035731200.07.CFEB566 Received: from mail-io1-f47.google.com (mail-io1-f47.google.com [209.85.166.47]) by imf08.hostedemail.com (Postfix) with ESMTP id B678E80192C0 for ; Thu, 15 Apr 2021 20:25:05 +0000 (UTC) Received: by mail-io1-f47.google.com with SMTP id e186so25528664iof.7 for ; Thu, 15 Apr 2021 13:25:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=olbuZ/mNJfDWZdGKBFNMZr0U0A3cz9Cq6tnv1KxE8ZQ=; b=U2o1zir6cWC/gUvJMIzysxsi1RHpyOWNV33sZRt0cR6oKbqe9RFQ1w2sN8E27CvYeP E+8yKaYclrlZORYN8O06PLW9TsNcut21OGQpmOUn21FxUaWMAQobTMWDeHU7vYR77n6s QZjw7Akf3rOol62Brs0jtlXoGTYy3jEnxA+TVbA3uA68/ANFd0ak1TKtiwdJIP+JPsky g9uO3ZvPLTzyr675+tsXEb1OVGYuTpzl6tRFDjDhxV/0+lUKalkOU3Y3Zcol5oeKswKQ Yzy2Edk5uD4Pr+umrlmPZ6Ny5M6mDivqNN0V1Pk8dXQjSUts5Kh9hwIj7/Q1bGv5fME8 Cggw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=olbuZ/mNJfDWZdGKBFNMZr0U0A3cz9Cq6tnv1KxE8ZQ=; b=uIu5oUUymmcZgr4LgPBzjLEiLRUHL/1AoHks1UpqWw7VLcIQ1PM+b8hvKMy+dYw14Q qYiE3IZZ1sxkMxuVjw2iWfJIMCe7PlNG3DMKcQu9nNe/kJ3OyiUYz+Xibay1cRhkfqla agxCVzUmS3Uz5OxWYsj5xXYCmXjWG66HeK63X7XfGKGV2BWZWL6AejS684UtcTbndTZ9 lqn2IAh+/x+h2WMoRmUJsQC3OT1rvobTDmEoZ/GapZgjs5bHjlNmr1LmdmrPseB0o0Br vAE7kx2cUAZ8DGOf085sfKIOJdBGaacDeaM8TxYjZMRhJeMQ1nzqMfIx69CO7Z7SbUn/ 2ETg== X-Gm-Message-State: AOAM531tTTw2LAc7oaJ4MvUJGW0IDS/HaOzZOS9oEy1TW1i5ypBPMnuk RM1WRCn+1gVMU/9nXUm6f7EhJuuJvSVfE+toG03eMQ== X-Google-Smtp-Source: ABdhPJyY9hB5ONCGcbYe/7y+BPiYKbtx/tUIt9FiM4SvOyVinpKvV9NQ3k8xglasd0Xkry5pXqJfIY2BCtkkPpuaAec= X-Received: by 2002:a5d:9c03:: with SMTP id 3mr802184ioe.32.1618518319512; Thu, 15 Apr 2021 13:25:19 -0700 (PDT) MIME-Version: 1.0 References: <20210401183216.443C4443@viggo.jf.intel.com> <20210401183219.DC1928FA@viggo.jf.intel.com> <20210414080849.GA20886@linux> <6215a690-d14a-de7e-72cb-1aa4e2822f2e@intel.com> In-Reply-To: <6215a690-d14a-de7e-72cb-1aa4e2822f2e@intel.com> From: Wei Xu Date: Thu, 15 Apr 2021 13:25:08 -0700 Message-ID: Subject: Re: [PATCH 02/10] mm/numa: automatically generate node migration order To: Dave Hansen Cc: Oscar Salvador , Dave Hansen , Linux MM , Linux Kernel Mailing List , Yang Shi , David Rientjes , Huang Ying , Dan Williams , David Hildenbrand Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B678E80192C0 X-Stat-Signature: 7khgf59rdjdhgks5hzapmbdceidjm9pk Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf08; identity=mailfrom; envelope-from=""; helo=mail-io1-f47.google.com; client-ip=209.85.166.47 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618518305-9868 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 15, 2021 at 8:35 AM Dave Hansen wrote: > > This can help enable more flexible demotion policies to be > > configured, such as to allow a cgroup to allocate from all fast tier > > nodes, but only demote to a local slow tier node. Such a policy can > > reduce memory stranding at the fast tier (compared to if memory > > hardwall is used) and still allow demotion from all fast tier nodes > > without incurring the expensive random accesses to the demoted pages > > if they were demoted to remote slow tier nodes. > > Could you explain this stranding effect in a bit more detail? I'm not > quite following. By memory stranding, I mean that memory on a machine (or a NUMA node) cannot be utilized even under extremely high work loads. Memory stranding happens usually due to mismatches between job/machine shapes as well as resource fragmentation resulted from bin-packing scheduling. It is an important problem for cloud resource efficiency. If NUMA hardwalling is used, we effectively split a single machine into multiple smaller machines based on NUMA nodes. This changes the machine shapes and also makes memory more fragmented, which can lead to more memory being stranded. Here is a simple example: Suppose that each machine has 2 NUMA nodes, each with 4 cores and 5GB RAM, and all the jobs have the shape of 2 CPUs and 3GB memory. Without NUMA memory hardwalling, we can pack 3 jobs onto each machine, which leaves 1GB memory and 2 cores in stranding. However, with NUMA memory hardwalling enabled, we can then only pack 2 jobs onto each machine (one job on each NUMA node), which increases the resource stranding to 4GB memory and 4 cores.