From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51B4BC433EF for ; Fri, 8 Apr 2022 04:10:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF6556B0072; Fri, 8 Apr 2022 00:10:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA5518D0002; Fri, 8 Apr 2022 00:10:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBB568D0001; Fri, 8 Apr 2022 00:10:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id BF0BD6B0072 for ; Fri, 8 Apr 2022 00:10:45 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7ED3525482 for ; Fri, 8 Apr 2022 04:10:45 +0000 (UTC) X-FDA: 79332385650.06.808F97B Received: from mail-io1-f49.google.com (mail-io1-f49.google.com [209.85.166.49]) by imf20.hostedemail.com (Postfix) with ESMTP id 106EC1C0003 for ; Fri, 8 Apr 2022 04:10:44 +0000 (UTC) Received: by mail-io1-f49.google.com with SMTP id g21so9241297iom.13 for ; Thu, 07 Apr 2022 21:10:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kLJsh09usQM6KN7slbu+hDyAz10KLaAr5TuAGBF1ET8=; b=gBo+/HbruUwOBKndO6LMcDu/iJv+/tOrAmtZhCfGRubv6KuCygoW0XzqGPxgTnIjUE YsTDHr2rrSBBP4YfBA072r25eyCoMDp623Z9qh4T2KJ/8B9Rst7np2vKyRSlZ6aWzWJ9 ZrVqZ0kipXK2bXR/XkFovvZKgn4UZJfIehQfiXfSUyIkHnZs4TO8t80L0fJfGBfxdQ0e Ks3hUVwCWQARQLa89WDBX16qmbJeFLC3C+MqOnQu4ghjjxCkOQcnGZUWk62YF7TAP13I 8FRV6jbqN0tq1xYnyrUCZVi227qm1C7BkKEQywopeprRzWbJ2f0+P4RyC2+AILgu8SD8 MUKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kLJsh09usQM6KN7slbu+hDyAz10KLaAr5TuAGBF1ET8=; b=RgLWMkPW/V6Ucr0XCKEMl5x7babsh8X2i0v+OH51SSMXrGgKCVO0+mEOha9q1lokF+ 5Anx4vRmbj8rZvyd00u/TyMhzTHUhxYMXZWb2Gt8Oa8styhk9l+vS2Nu8yEZibqqw0tx ZVkbqBsOsFeavoG2WmqZ9VQqGBTPuZZxX+v/M6uZe4RzHbM+IYuB5L2WXjFRO0jVCTXO DJ/9riF1PY8kzrcbS4GnPI5yXleaKQgLuMeMpt2On3flK1IsxJuzgOcrJjYCQtk7z5A8 NzisZBYY07X7mPB4vEFSly4Ab9fbiyzzhDBhTAKz9XWN3Hxc+aovV9j+MnbUO+yeNfK4 A4oA== X-Gm-Message-State: AOAM5320bpFtLHNZfW1AkedlbdHn/YC0rAu/i6UxuetIeG1/Zauxqh3V EPxtxqcxJckpFfx0RuW5MsnYcOIJ816fO+Vgq3gUiQ== X-Google-Smtp-Source: ABdhPJzsHGDQGCBMIEMjZZ+ZfDHTSSvZY148evFFWftTD2yA+ROjdj/6Mou/Sxg7Iwz3kSErR9RaYM0kkHP9g4gRVLk= X-Received: by 2002:a5e:dc4c:0:b0:64c:ceff:8916 with SMTP id s12-20020a5edc4c000000b0064cceff8916mr7651721iop.117.1649391044303; Thu, 07 Apr 2022 21:10:44 -0700 (PDT) MIME-Version: 1.0 References: <20220331084151.2600229-1-yosryahmed@google.com> <87y20nzyw4.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o81fujdc.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkxfudrk.fsf@yhuang6-desk2.ccr.corp.intel.com> <215bd7332aee0ed1092bad4d826a42854ebfd04a.camel@linux.intel.com> <87y20gtgpf.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87y20gtgpf.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Wei Xu Date: Thu, 7 Apr 2022 21:10:33 -0700 Message-ID: Subject: Re: [PATCH resend] memcg: introduce per-memcg reclaim interface To: "Huang, Ying" Cc: Tim Chen , Michal Hocko , Yosry Ahmed , Johannes Weiner , Shakeel Butt , Andrew Morton , David Rientjes , Tejun Heo , Zefan Li , Roman Gushchin , Cgroups , "open list:DOCUMENTATION" , Linux Kernel Mailing List , Linux MM , Jonathan Corbet , Yu Zhao , Dave Hansen , Greg Thelen Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="gBo+/Hbr"; spf=pass (imf20.hostedemail.com: domain of weixugc@google.com designates 209.85.166.49 as permitted sender) smtp.mailfrom=weixugc@google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: 3uu168863dsuoba8456uozmbouodwqgr X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 106EC1C0003 X-HE-Tag: 1649391044-320380 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 7, 2022 at 8:08 PM Huang, Ying wrote: > > Wei Xu writes: > > > On Thu, Apr 7, 2022 at 4:11 PM Tim Chen wrote: > >> > >> On Thu, 2022-04-07 at 15:12 -0700, Wei Xu wrote: > >> > >> > > >> > (resending in plain-text, sorry). > >> > > >> > memory.demote can work with any level of memory tiers if a nodemask > >> > argument (or a tier argument if there is a more-explicitly defined, > >> > userspace visible tiering representation) is provided. The semantics > >> > can be to demote X bytes from these nodes to their next tier. > >> > > >> > >> We do need some kind of userspace visible tiering representation. > >> Will be nice if I can tell the memory type, nodemask of nodes in tier Y with > >> > >> cat memory.tier_Y > >> > >> > >> > memory_dram/memory_pmem assumes the hardware for a particular memory > >> > tier, which is undesirable. For example, it is entirely possible that > >> > a slow memory tier is implemented by a lower-cost/lower-performance > >> > DDR device connected via CXL.mem, not by PMEM. It is better for this > >> > interface to speak in either the NUMA node abstraction or a new tier > >> > abstraction. > >> > >> Just from the perspective of memory.reclaim and memory.demote, I think > >> they could work with nodemask. For ease of management, > >> some kind of abstraction of tier information like nodemask, memory type > >> and expected performance should be readily accessible by user space. > >> > > > > I agree. The tier information should be provided at the system level. > > One suggestion is to have a new directory "/sys/devices/system/tier/" > > for tiers, e.g.: > > > > /sys/devices/system/tier/tier0/memlist: all memory nodes in tier 0. > > /sys/devices/system/tier/tier1/memlist: all memory nodes in tier 1. > > I think that it may be sufficient to make tier an attribute of "node". > Some thing like, > > /sys/devices/system/node/nodeX/memory_tier > This works. If we want additional information about each tier, we can then add a tier-specific subtree. In addition, it would be good to also expose the demotion target nodes (node_demotion[]) via sysfs, e.g.: /sys/devices/system/node/nodeX/demotion_path which returns node_demotion[X]. > Best Regards, > Huang, Ying > > > We can discuss this tier representation in a new thread. > > > >> Tim > >> > >> > > >> > It is also desirable to make this interface stateless, i.e. not to > >> > require the setting of memory_dram.reclaim_policy. Any policy can be > >> > specified as arguments to the request itself and should only affect > >> > that particular request. > >> > > >> > Wei > >> >