From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B01B3C433F5 for ; Tue, 30 Nov 2021 00:37:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2FA946B006C; Mon, 29 Nov 2021 19:36:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 282A56B0072; Mon, 29 Nov 2021 19:36:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FD146B0073; Mon, 29 Nov 2021 19:36:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id ED3D76B006C for ; Mon, 29 Nov 2021 19:36:56 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 96E4180D74D8 for ; Tue, 30 Nov 2021 00:36:46 +0000 (UTC) X-FDA: 78863731212.03.9F309CC Received: from mail-qt1-f182.google.com (mail-qt1-f182.google.com [209.85.160.182]) by imf18.hostedemail.com (Postfix) with ESMTP id 5D3F240020A5 for ; Tue, 30 Nov 2021 00:36:41 +0000 (UTC) Received: by mail-qt1-f182.google.com with SMTP id a2so18473413qtx.11 for ; Mon, 29 Nov 2021 16:36:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=KgwVk16X0tOT5NSJOLn+LfHJOb/YERyJUpEEJC77JU4=; b=Tu21pq73TbpWi5gw5rAjEBt7yiLpLwisKT+fZJobynWtWaa0moXDt6unbV6P5sW8kd PKS3H6FBJaWDHJ4CkZfzRBMRKVMkONkV/h1KvwGVN2Yd+4T5Fh6v+k2YMeybBRjw7DhL J/+nzA3zcRHe1Nxvly2lzcCFfxOJDnLxe+KMhn0nGUxm5DTwh+YZ/NwZiJs3WQptciKH +PEQS4Nhd+P4pSbrnRu9xfADZkSuVD4aRU0Zu34edEv/HGTWlvgBwLNks1pIVGnH9xr7 QlOfWSaLKLaIF5TEaurOxyXdW2f3gcUKJ5aI80sUxDG9cXUB5pSOR8v/iA5pQoA2vtCr TJ6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KgwVk16X0tOT5NSJOLn+LfHJOb/YERyJUpEEJC77JU4=; b=NbU0dKy3fLhWFvpqy9dEiyFXcQ8v1JuSamaF31PZTplBL0Wyx0UwPffoMdU5wCTR3T TA5wmNJA8i/EbCs53M54lEk2BdPebehftZol4TbRnkkWa4oOchCoP5X4hW0vbPE9iC1W HTy6dAmuNF1/r2tErgHjL/CFCfR9Ybh5oUxhu5mafUzoNhAtemXZbVVS6QYJvA9gNaE2 0n4L7Osnygkzsqjjimk/zdfv5DjVlEbwFCbk1G2lgK7Fch9BwJhf+1ydnSLr2zRYxM4U hrvL8NS/+aPzmCFgrfeetfttmKxDNH5Wp9hoHHJ9m5BtGg5LkAb9w7gbhD5D/d+tX+HY DQPg== X-Gm-Message-State: AOAM533XAUBWO6Ur7WCKQVXzsomCTQwRAQWICR3ISo9zLvxugrdU8poH GgVQoOAJZdNPAgs2Ctm/xo8= X-Google-Smtp-Source: ABdhPJwcg05cF3u5IqP8YFO/nvfPUjduJntt4JFGXp5OsMI35r9TFNkH73R5/PzGGFu2T6Dz1m3WDw== X-Received: by 2002:a05:622a:1a9c:: with SMTP id s28mr38544546qtc.428.1638232605197; Mon, 29 Nov 2021 16:36:45 -0800 (PST) Received: from hasanalmaruf-mbp.thefacebook.com ([2620:10d:c091:480::1:86be]) by smtp.gmail.com with ESMTPSA id y20sm9274131qkj.24.2021.11.29.16.36.44 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Nov 2021 16:36:44 -0800 (PST) From: Hasan Al Maruf X-Google-Original-From: Hasan Al Maruf To: ying.huang@intel.com Cc: dave.hansen@linux.intel.com, hannes@cmpxchg.org, hasan3050@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mgorman@techsingularity.net, riel@surriel.com, yang.shi@linux.alibaba.com Subject: Re: [PATCH 0/5] Transparent Page Placement for Tiered-Memory Date: Mon, 29 Nov 2021 19:36:34 -0500 Message-Id: <20211130003634.35468-1-hasanalmaruf@fb.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <874k812fl8.fsf@yhuang6-desk2.ccr.corp.intel.com> References: <874k812fl8.fsf@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 5D3F240020A5 X-Stat-Signature: 87b3zeycc15ruftj6swweatnh7434c19 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Tu21pq73; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf18.hostedemail.com: domain of hasan3050@gmail.com designates 209.85.160.182 as permitted sender) smtp.mailfrom=hasan3050@gmail.com X-Rspamd-Server: rspam02 X-HE-Tag: 1638232601-981112 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.007210, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Huang, We find the patches in the tiering series are well thought and helpful. For our workloads, we initially started with that series and we find the whole series is too complex and some features do not benefit as expected. Therefore, we have come up with the current basic patches which are essential and help achieve most of the intended behaviors while reducing complexity as much as possible. As we started with your tiering series (with 72 patches), there are overlaps between our patches and the tiering series. We adopt the functionalities from the tiering series, modify, and extend them to make page placement mechanism simpler but workable. Here is the key points for each of the patches in our Transparent Page Placement series. Patch #1: We combine all the promotion and demotion related statistics in this patc= h Having statistics on both promotion, demotion, and failures help observe the systems behavior and reason about performance behavior. Besides, anon vs file breakdown in both promotion and demotion path help understand application behavior on a tiered memory systems. As applications may have different sensitivity toward the anon and file placements, this breakdown in the migration path is often helpful to assess the effectiveness of the page placement policy. Patch #2: This patch largely overlaps with your current series on NUMA Balancing. https://lore.kernel.org/lkml/20211116013522.140575-1-ying.huang@intel.com= / This patch is a combination of your Patch #2 and Patch #3 except the static 10MB free space in the top-tier node to maintain a free headroom for new allocation and promotion. Rather, we find having a user defined demote watermark would make it more generic that we include in our patch#= 3 Patch #3: This patch has the logic for having a separate demote watermark per node. In the tiering series, that demote watermark is somewhat bound to the cgroup and triggered on per-application basis. Besides, It only supports cgroup-v1. However, we think, instead of cgroup based soft reclamation, a global per-node demote watermark is more meaningful and should be the basic one to start with. In that case, the user does not have to think about per-application setup. Patch #4: This patch includes the code for kswapd based reclamation. As I mentioned earlier, instead of cgroup-based reclamation, here we look whether a node is balanced during each kswapd invocation. For top-tier node, we check whether kswapd reclaimed till DEMOTE_WMARK is satisfied, for other nodes the default mechanism continues. The differences between tiering series and this patch is the cgroup based reclamation vs per-node reclamation. Patch #5: In your patches for promotion, you consider re-fault time for promotion candidate selection. Although the hot-threshold is tunable, from our experiments, we find this not helpful to some extent. For example, if different subset of pages have different re-access time, time-based promotion should not be able to distinguish between them. If you make the time window long enough, then any infrequently accessed pages will also become the promotion candidate, and later be a candidate for the demotion. In this patch, we propose LRU based promotion, which would give anon and files different promotion paths. If pages are used sporadically at high frequency, irregular pages would be eventually moved from the active LRU list. We find that our LRU based approach can reduce up to 11x promotion traffic while retaining the same application throughput for multiple workloads. Besides, with promotion rate limit, if files largely get promoted to top-tier, anon promotion rate often gets hampered as files are taking the large portion of the total rate (which often happen for applications that generates huge caches). In our LRU-based approach, each type has their ow= n separate LRU to check. So for workloads with smaller anons and large file usage, with LRU-based approach, we can see more anons are being promoted rather than the files. I don't mind this patchset being merged to your current patchset under discussion or any later ones. But, I think this series contains the very basic functionalities to have a workable page placement mechanism for tiered-memory. This can obviously be augmented by the other features in you future tiering series. Best, Hasan