From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64D6BC47422 for ; Fri, 26 Jan 2024 23:47:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE0146B0092; Fri, 26 Jan 2024 18:47:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A8F2E6B0093; Fri, 26 Jan 2024 18:47:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90A3B6B0095; Fri, 26 Jan 2024 18:47:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7C6506B0092 for ; Fri, 26 Jan 2024 18:47:02 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4752F120FF6 for ; Fri, 26 Jan 2024 23:47:02 +0000 (UTC) X-FDA: 81723100284.09.570B18E Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) by imf25.hostedemail.com (Postfix) with ESMTP id 72026A0008 for ; Fri, 26 Jan 2024 23:47:00 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U8HW2B4D; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=zokeefe@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706312820; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y8P+XJ/jMY+ZVg4eu/lKcHtx6J74L66cPCkyHQ3/Mik=; b=GdSpVlwgdVZpB+mV8pF03oiCM1KzNmCL2X19xV8ytwCfCnt1SfbR/QdJP5GAS+4AXnhjuO Bx6SAJgbmqyB9xIBUpvqWYOGazgC/xbeM96nPHL8faGiJnpdwV77dLgCBbSz3b18mxoe24 Akgk96O44csx296QMrZPkarRabMpZfs= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=U8HW2B4D; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=zokeefe@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706312820; a=rsa-sha256; cv=none; b=71IXu7E3d3IzzelVu6VJX7ugvyNEUO4GvZ7hmLr2bTNumtTlFGRCCiLIhCYeEsgxUrEEIZ JmEPADj9qV/QssORXtsxtEODA0lo6sBNmkbIMzyYCSYHZUPidfHpamFu8Xos49IqexqnDo i0bWlX9ZpioGu4zxt6h3dx8rh8ubkUk= Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-55818b7053eso5729a12.0 for ; Fri, 26 Jan 2024 15:47:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706312819; x=1706917619; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Y8P+XJ/jMY+ZVg4eu/lKcHtx6J74L66cPCkyHQ3/Mik=; b=U8HW2B4DIrG0PlvkqQjth22ksZgzcYgvqW0RMabytsyfzukMJIRE5zwnv3ZM/ipN1Y M+4XeBV1ee5PQSgNBZ3X8GVPfEXAhY9a/RdS8tKLrUxryLbr2r5Y3mlqPLdwmxuq+5T+ An+itcb6LfAZOLon7HixgcpgJGlaE/mbxTf8bKsphcVfW7GIxsa4WruFFj25q59u7cQT MiGiflBGZ8EsmtkHjsRZwXe3YVcF9kXUj2N8fjXxvotyUmzeaICUtgovpm9UVzshaoru EtCxbbL3ssHdfTHldCuBvuSmccmQQdJgbLER6QPZ5FeaaI7kEF1LM4Wgj6fIi5wt3D7U RQaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706312819; x=1706917619; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Y8P+XJ/jMY+ZVg4eu/lKcHtx6J74L66cPCkyHQ3/Mik=; b=wDR+kvR/a4uT6vLBiC5EncAemOWAk24GTDLoSyiMNhriAZ7sXXx7nqibxZQ8hZtp8v fKBMIrZR4S/Q4Jt/OzqOX4PmPJKzVJmOSW+S5m2wuWrhOVCP6Tg945BYxlJkHDbOEVPh AAWVz7AP2HvXKFnznDR9SHNtuZE+upe8twQ+wHO4tD5DwmmPaWLlJi8s5Oug4hIUJG0r XT0GEJBowILb2kczSzQ72fjdQPHNEtoQIirRf/4N7D919ujkjT30vuT8ab/mnAB+deg5 C/iCqYWeIXTc7TNjl39vXJ89WExlLNjqKZY0y54BxjkO5ZYGEGOm2g1XSs9eRVhDh3YO dGgg== X-Gm-Message-State: AOJu0YzglUBDeJccdZIoujL8+3Dbyevsor+LUzz5lpgyMAXRimrZLbgd eRbH8LEWAgLNjqRQj0mNcpyPTJZMdUV0rN1JXyOgpRqto/i23Upm8azz7MJtW7JgkBM+2MUAQdt /z3rrNIYTE1Db3y3QUmg9dnTYYK4fwtty55Ev X-Google-Smtp-Source: AGHT+IGpmdrvQDq4WRf83ulNQxOn5H/FpoDx7dziJCeUqQjFVJywJeuANOjbk5DgM+2dAZd5lTCM3DOnXI68oquTL7o= X-Received: by 2002:a05:6402:5d86:b0:55d:4375:c39c with SMTP id if6-20020a0564025d8600b0055d4375c39cmr170416edb.0.1706312818819; Fri, 26 Jan 2024 15:46:58 -0800 (PST) MIME-Version: 1.0 References: <20240118120347.61817-1-ioworker0@gmail.com> In-Reply-To: From: "Zach O'Keefe" Date: Fri, 26 Jan 2024 15:46:21 -0800 Message-ID: Subject: Re: [PATCH v2 1/1] mm/madvise: add MADV_F_COLLAPSE_LIGHT to process_madvise() To: Lance Yang Cc: akpm@linux-foundation.org, Michal Hocko , Yang Shi , David Hildenbrand , songmuchun@bytedance.com, peterx@redhat.com, mknyszek@google.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 72026A0008 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: q9wkb6kp5ifx9ihz5issb1hww76d5qgb X-HE-Tag: 1706312820-273063 X-HE-Meta: U2FsdGVkX1+6YT8dSlNs4NKIVGBw6U9iXFvxCETRNz4QU6XFMqgg5G+XTKIkd49w6p2+ykOpFzSv74fz2fE7M8ssvmD9F0EzQQjEO3TtWFipVqf/YwyEeFZNH2ww5v9t0TDxsxhXcuosJTCSg4ApCUUootYT17/X5yFars/Im/VDdgW+H/aG1OnKlEqdl8QiCfYodFejwSev0p/CGF2ZDjNpClS756IPJuABxdF3+V065icyTcQ0/c4KBN66KiiSna2eoOQtQmDF54k5tAE6bngrOIzmbHjtjIZvRBlbQK5vsy0DgoYiTAV+7Le5ix/mffMRxKKVKl0TtoNtRqZEAGfAwfZEq0wm1MM/0agvZZgxxKHpsjtuuTs7/K+So6MU7DNFzW6UCJ6sDjKKIZnagxft+1g7sZYM51Zd1MCEOP/U00jJUIkl0wBjVSoeCBvbleUhKAf5p5ho2VwpmEteveJ44Yxi2LrodUmjJkvkL9V/tkJ1SZknGBKiWjosgsL6cSkFiasxWhmT562qD9OrchHkD1I3hy9gORt9eHw0HC2qrgQSZPnTBOjFJ81NTtRDuZMnlMzgLdfaUG0dIaGWk/10shobGwIKdjTZ5A6p6ocle1bZSMJ1SEEQ8eqxKMbbq3yhUkgihtzmVichj1/lc9I0NoMcMe8khpQm4L45KBoSjxVNofFbgTcledoGK66NM79VN0u6aM6FxBkHif4S/Ii/jDIi1Eg7LjYQo76keYfAlGvFFmkgMcvusY/kXZWTh2n1r8PVN7zCKprVn8cUztdlIf/QnW3uCA7Es/yM3jlEtvhv6u8Awgm1okYl//AocdPbguX4Izg+26BLQVt4qlg2NWpPuBlZ+wjBuLuzEFR/iwht27aTTdjAO1G/RvkpKN4bkVA5c7RqMeD82PyGGAY41XCxOAA3PMSEse6iOQkphlx4YFyygATAbucYR9LzTi4LR4wWplUHvFwyRBA s16BgrOV 8nXCXa7ZqELvymC9IrofMU/8A6pylzKhkWubO8Ocx4gt4Y/D3bCIwI5PZfLTE70ec0S95rfpPaBS2l7jOnDe5iSDos24vavJCAkRfmDGvAeqFS2BDxXviVVLJ3UjqIT53nVO+e978MOOXy01uhiCT85TDpFB0HiYJ6dZ2ocPtoEqPfFt2CxMRhcSGPZliv47b/kJ7qcfk7bd7rlVIM1OwUVHJvTx0hYHj6v8xJT84X0YPX6swmMP+RFTORlf4tkLV37YS3i0TDJNLqGm20CggjiKs1w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: > I=E2=80=99d like to add another real use case. > > In our company, we deploy applications using offline-online > hybrid deployment. This approach leverages the distinctive > resource utilization patterns of online services, utilizing idle > resources during various time periods by filling them with > offline jobs. This helps reduce the growing cost expenditures > for the enterprise. > > Whether for online services or offline jobs, their requirements > for THP can be roughly categorized into three types: > > * The first type aims to use huge pages as much as possible > and tolerates unpredictable stalls caused by direct reclaim > and/or compaction. > * The second type attempts to use huge pages but is relatively > latency-sensitive and cannot tolerate unpredictable stalls. > * The third type prefers not to use huge pages at all and is > extremely latency-sensitive. > > After careful consideration, we decided to prioritize the > requirements of the first type and modify the THP settings > as follows: > > echo madvise >/sys/kernel/mm/transparent_hugepage/enabled > echo defer >/sys/kernel/mm/transparent_hugepage/defrag > > With the introduction of MADV_COLLAPSE into the kernel, > it is no longer dependent on any sysfs setting under > /sys/kernel/mm/transparent_hugepage. MADV_COLLAPSE > offers the potential for fine-grained synchronous control over > the huge page allocation mechanism, marking a significant > enhancement for THP. > > If the kernel supports a more relaxed (opportunistic) > MADV_COLLAPSE, we will modify the THP settings as follows: > > echo madvise >/sys/kernel/mm/transparent_hugepage/enabled > echo madvise >/sys/kernel/mm/transparent_hugepage/defrag [corrected, via 2 previous mails, to: echo madvise >/sys/kernel/mm/transparent_hugepage/enabled echo defer+madvise >/sys/kernel/mm/transparent_hugepage/defrag] > Then, we will use process_madvise(MADV_COLLAPSE, xx_relaxed_flag) > to address the requirements of the second type. > > Why don't we favor madvise(MADV_COLLAPSE) for the first type > of requirements? > The main reason is that these requirements are typically for offline > jobs in the Hadoop ecosystem, such as MapReduce and Spark, > which run primarily on the JVM. [..] Hey Lance, Thanks for proving this context, it's very helpful. Though, couldn't you use enabled=3Dalways, defrag=3Ddefer+madvise, then just use prctl(PR_SET_THP_DISABLE) on type-3 workloads to get the behaviour you want? i.e. type 1: apply MADV_HUGEPAGE -> sync defrag to get THP type 2: don't apply MADV_HUGEPAGE -> use THP if available, kick kswapd+kcompactd otherwise type 3: use prctl(PR_SET_THP_DISABLE) (or MADV_NOHUGEPAGE) -> no THPs Or am I missing something? It sounds like a confounding issue is that these are external workloads, or you don't have ability to modify? But that would preclude MADV_COLLAPSE (unless you're using process_madvise()). Appreciate the help understanding the use case. I'm not opposed to the idea in general, but IMO would be great to have a clear need for it (and right now, we don't currently have alignment with the original motivating usecase (Go) in that regard w.r.t their plans). Thanks, Zach