From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27DEBE68956 for ; Thu, 31 Oct 2024 06:06:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EA5F6B0085; Thu, 31 Oct 2024 02:06:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39C6F6B0088; Thu, 31 Oct 2024 02:06:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 289516B0089; Thu, 31 Oct 2024 02:06:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0BF616B0085 for ; Thu, 31 Oct 2024 02:06:55 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 5FA52141430 for ; Thu, 31 Oct 2024 06:06:54 +0000 (UTC) X-FDA: 82732862940.25.DE809BB Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf30.hostedemail.com (Postfix) with ESMTP id 2215780003 for ; Thu, 31 Oct 2024 06:06:00 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of stepanov.anatoly@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=stepanov.anatoly@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730354756; a=rsa-sha256; cv=none; b=EBulhK/7LlJ/+TCSIYIZ9wPJK+EdLImS0LIUm4PdhwQi/m7bDUxXbcEakg+gEyr9AcYjkS tM1Ewf9dSENOhmp2lujA3PGRxLnG519KmeHzcN8789rUsd3LjF5ZpS1cNbJXDpFmBkvIJL YNgXGLlmgg/f48lUmiEh1Ka2Y0ukaGI= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf30.hostedemail.com: domain of stepanov.anatoly@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=stepanov.anatoly@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730354756; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2vEgh9YV7fC5ZLJZkVCcU9k2c1sxF/Yrl/9sDPoE0WU=; b=q0zGA96pQ9KfD5jDzFm6B/CXYwDNGaEnwHxt7c8qx+hu60r2jgL8XPonwN3WRW71Xdi5/E 1ixsl9cTG9SxwpKbE41deXHm2PwizvPoW030ieSzz51RP/cseHaSBLLCkqlkjfj3/bQQ6z sgWq9FJHO1meHffIEdeXVd/WAdK7SfU= Received: from mail.maildlp.com (unknown [172.18.186.31]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4XfD4R4RTcz6JB0Y; Thu, 31 Oct 2024 14:05:27 +0800 (CST) Received: from mscpeml500003.china.huawei.com (unknown [7.188.49.51]) by mail.maildlp.com (Postfix) with ESMTPS id BDC8D140155; Thu, 31 Oct 2024 14:06:47 +0800 (CST) Received: from [10.123.123.226] (10.123.123.226) by mscpeml500003.china.huawei.com (7.188.49.51) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1258.34; Thu, 31 Oct 2024 09:06:47 +0300 Message-ID: Date: Thu, 31 Oct 2024 09:06:47 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 0/3] Cgroup-based THP control To: Michal Hocko , Gutierrez Asier CC: , , , , , , , , , , , , , , , , , , , , , References: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> <770bf300-1dbb-42fc-8958-b9307486178e@huawei-partners.com> Content-Language: en-US From: Stepanov Anatoly In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.123.123.226] X-ClientProxiedBy: mscpeml100004.china.huawei.com (7.188.51.133) To mscpeml500003.china.huawei.com (7.188.49.51) X-Rspam-User: X-Rspamd-Queue-Id: 2215780003 X-Rspamd-Server: rspam11 X-Stat-Signature: 69t6ud4qnqnm39a9kh4whqs3mf8e351z X-HE-Tag: 1730354760-325439 X-HE-Meta: U2FsdGVkX1/+Hlbuz3d4//PpQLB1a+o94M9CjqUCQIzP6Wn4Vh1CUt13zdXF+JcC/hnsLLoRPV9iK9ELdaiokdExPQ9Rw5WZ7Xghw9ZgattaU9jxY4+Y7VdkAosIFApq4I/vfrmlBfOLoXcZZC9V4zQxYkyZOYjmnA6HhsOUw/H/B643bhTqG1CxKnSANnGUNSPEOpTcCjQ5pZxhHFOXN1zM2n8UQcc86AvbyxQsecEQK63oCfSDJaj+G+mrFCZcanWw7L/4n9PioNT7ZfxlKzpIutKXCXTn4lbjmBLSDgXMrw3DfssRofgabfeKbv4yEPolEfKLRLSwtGCrDpMWXqPKzrTmQCqtKh/846ogxVu1HPuVYft7KZHyg93dRgl1aZGAstTBNBiRd80kWKBu7vVHcHNDn4Q8iz+EHfNCYB8IEHba7ilmRHko3IjAId8X1qo8qLoYYPRQt/CqJqZ969mP8ixi0xvtUfl6T2q1taRkFGWYQmZ44eRF5DvQlEoO5tApclu8FuPZfgwuNHL+g1eT5POMTX1Gr7xKHiiab4FzznzZYHZY3skHrXmMaGiL77fCrDAAANKjwhDqmzSGZVNviyXd1HQOo2hhmaIBsVIvzsT/ukktXyZuxcUM+nC8w9sJyiodPX0pUpfPTzZXBQanBkNuuR23obCxpacuSu8kp3eacRaY3f7yEZW6zmkgWJDIDVmnk9iKFxlSCpoh8qRr8KpVEwQbgrdYHZU0bNEvSeCEzrGRLdC3k/ulnsgx6ccC9ifo3TJkFh7J9GL03mUOL6fjHoxLI6qmcGO+GLtHhmMX7JgwEQRheyLZeZMEVezmZ652pn77wOtSwb40IkjfBscpjWLDLca6H/y400VJx1nn9gHRpYYv4waML8XLFDx6T7HiwZqEBR1secO6Nd3z2hNnvXOTIM6XaB5xOshcZuPkijOB/b625OjdG1m2RLWAQR3gY3/ngSzQFVJ RhsaIxdu sgPN8FqIVFyoVrpQdSJZbfvsGNl+H8iK1Bm67s+NTjnfBDAiZ06IkN0QphQMO84esUkNxmD++yd/gmlAWlazJjOEcAaAHbxNfzCVAYEo8jm6DbOgI+0GV2Kk+e/6gDbl3HZTEFz3WqGZLG2fvChZt8dJmX/rCRK/MhxRf3n6vlAkX2LfBOwymsgW9hThKJY70Wbgmfu2+sU8b+qw1arLalUbUTED1PN/57W91 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 10/30/2024 6:15 PM, Michal Hocko wrote: > On Wed 30-10-24 17:58:04, Gutierrez Asier wrote: >> >> >> On 10/30/2024 4:27 PM, Michal Hocko wrote: >>> On Wed 30-10-24 15:51:00, Gutierrez Asier wrote: >>>> >>>> >>>> On 10/30/2024 11:38 AM, Michal Hocko wrote: >>>>> On Wed 30-10-24 16:33:08, gutierrez.asier@huawei-partners.com wrote: >>>>>> From: Asier Gutierrez >>>>>> >>>>>> Currently THP modes are set globally. It can be an overkill if only some >>>>>> specific app/set of apps need to get benefits from THP usage. Moreover, various >>>>>> apps might need different THP settings. Here we propose a cgroup-based THP >>>>>> control mechanism. >>>>>> >>>>>> THP interface is added to memory cgroup subsystem. Existing global THP control >>>>>> semantics is supported for backward compatibility. When THP modes are set >>>>>> globally all the changes are propagated to memory cgroups. However, when a >>>>>> particular cgroup changes its THP policy, the global THP policy in sysfs remains >>>>>> the same. >>>>> >>>>> Do you have any specific examples where this would be benefitial? >>>> >>>> Now we're mostly focused on database scenarios (MySQL, Redis). >>> >>> That seems to be more process than workload oriented. Why the existing >>> per-process tuning doesn't work? >>> >>> [...] >> >> 1st Point >> >> We're trying to provide a transparent mechanism, but all the existing per-process >> methods require to modify an app itself (MADV_HUGE, MADV_COLLAPSE, hugetlbfs) > > There is also prctl to define per-process policy. We currently have > means to disable THP for the process to override the defeault behavior. > That would be mostly transparent for the application. (Answering as a co-author of the feature) As prctl(PR_SET_THP_DISABLE) can only be used from the calling thread, it needs app. developer participation anyway. In theory, kind of a launcher-process can be used, to utilize the inheritance of the corresponding prctl THP setting, but this seems not transparent for the user-space. And what if we'd like to enable THP for a specific set of unrelated (in terms of parent-child) tasks? IMHO, an alternative approach would be changing per-process THP-mode by PID, thus also avoiding any user app. changes. But that kind of thing doesn't exist yet. Anyway, it would require maintaining a set of PIDs for a specific group of processes, that's also some extra-work for a sysadmin. > > You have not really answered a more fundamental question though. Why the > THP behavior should be at the cgroup scope? From a practical POV that > would represent containers which are a mixed bag of applications to > support the workload. Why does the same THP policy apply to all of them? For THP there're 3 possible levels of fine-control: - global THP - THP per-group of processes - THP per-process I agree, that in a container, different apps might have different THP requirements. But it also depends on many factors, such as: container "size"(tiny/huge container), diversity of apps/functions inside a container. I mean, for some cases, we might not need to go below "per-group" level in terms of THP control. > > Doesn't this make the sub-optimal global behavior the same on the cgroup > level when some parts will benefit while others will not? > I think the key idea for the sub-optimal behavior is "predictability", so we know for sure which apps/services would consume THPs. We observed a significant THP usage on almost idle Ubuntu server, with simple test running, (some random system services consumed few hundreds Mb of THPs). Of course, on other distros me might have different situation. But with fine-grained per-group control it's a lot more predictable. Am i got you question right? >> Moreover we're using file-backed THPs too (for .text mostly), which make it for >> user-space developers even more complicated. >> >>>>>> Child cgroups inherit THP settings from parent cgroup upon creation. Particular >>>>>> cgroup mode changes aren't propagated to child cgroups. >>>>> >>>>> So this breaks hierarchical property, doesn't it? In other words if a >>>>> parent cgroup would like to enforce a certain policy to all descendants >>>>> then this is not really possible. >>>> >>>> The first idea was to have some flexibility when changing THP policies. >>>> >>>> I will submit a new patch set which will enforce the cgroup hierarchy and change all >>>> the children recursively. >>> >>> What is the expected semantics then? >> >> 2nd point (on semantics) >> 1. Children inherit the THP policy upon creation >> 2. Parent's policy changes are propagated to all the children >> 3. Children can set the policy independently > > So if the parent decides that none of the children should be using THP > they can override that so the tuning at parent has no imperative > control. This is breaking hierarchical property that is expected from > cgroup control files. Actually, i think we can solve this. As we mostly need just a single children level, "flat" case (root->child) is enough, interpreting root-memcg THP mode as "global THP setting", where sub-children are forbidden to override an inherited THP-mode.