From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 41A3CE68944 for ; Thu, 31 Oct 2024 08:33:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7C456B008C; Thu, 31 Oct 2024 04:33:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B2C056B0092; Thu, 31 Oct 2024 04:33:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F3A36B0093; Thu, 31 Oct 2024 04:33:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 812126B008C for ; Thu, 31 Oct 2024 04:33:15 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3B813C1317 for ; Thu, 31 Oct 2024 08:33:15 +0000 (UTC) X-FDA: 82733231280.09.36922FE Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by imf05.hostedemail.com (Postfix) with ESMTP id 81186100025 for ; Thu, 31 Oct 2024 08:32:22 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=EoOzIeJX; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf05.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730363431; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AnS03BBo4QpsOCYCGr/jAfpuBdo72FatDxA9433QoBA=; b=tCmceuAE39ExDAlJlQ+w3LF3CiYFQ+iXtEC1tWPUILB5uUbkcqS81V07fdKThDZzXQ4HtP D76oKF7XC/4119KTRDT6cTE+6R4G8PAXqp9TEPxrmpUTKZOfjY8gK3Ko24TjZUKCYF3czZ JHw9VafUjPMkIyV26JWB730ACyyulWU= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b=EoOzIeJX; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf05.hostedemail.com: domain of mhocko@suse.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730363431; a=rsa-sha256; cv=none; b=YORTPyGiw2j97vdbNt2j0UnlMf82hVjzGjtmMBchvA+jRM4ot3LTKUHJTtLAm+5GHkZwtu k/FTreC8W2Bk2tzcR+csgli6RK1XzA8Cdt713xlR1UM6whM30THN/mxm4Zgpstjxs0fo4E tSovT6Qm4XPDrz5O1d96YdGHYh+JT00= Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-a7aa086b077so74904266b.0 for ; Thu, 31 Oct 2024 01:33:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1730363592; x=1730968392; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=AnS03BBo4QpsOCYCGr/jAfpuBdo72FatDxA9433QoBA=; b=EoOzIeJXomyrB5wuqb1VgkfsM3kLJgtdrkOpPANtNtPXisA+qPKMorxXRcT0Rnu05K 0FyahT66zfqvt7yovym/ZxlWGQiKaivr6nHNdKR4Rk9+IRzq+nst2UQCflEnAzW+IyAf 9LJgGTW984OYp01RN4rZS+53mGwjURCJsChiN3ysUHVC9i4gI1Kmi71BdTHqaykkaTKL 58LYD86B9fuqZTmtyw/h+GJXTXiFEDDR/ga49WTJZyl1fxuEnDrGYIQEbiihYjHjeeO9 ET78Eoc6QkYP787TMnvAUXE9tw5iCwvk7n1pAigOoRz/Dgkf01RL+lTw0lGYUIgVgxro m5Yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730363592; x=1730968392; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=AnS03BBo4QpsOCYCGr/jAfpuBdo72FatDxA9433QoBA=; b=lP9wLLtHHHs9u+lbhfVpywiFZS4WT9qS9tTJ9XwhFZkKejzuuLS0lW3NrtmrOn3Jcy 0mubYj81SV86qoptOENUs8genVQ1ZCltI2zQDfySV/54P6CgS6ft6M1Dl6kDnVXWMaCU JPjo2nfSyEBRKYfg/ubZ1up644UjtrziCgQXjpCo3A1ElKZ2LidBXRNnAW7Mk6V9RSni YeR6+dBabiyMufn0do6cKA2pf94c+NEyaROOAcL3kUh/8DCO070w6HhuaMX6P2nBUwh+ +Yn7wO06Gq05CtSiq//qKm81Ifi7XwrbShlpofrbrS1SEXrR0y2o/UjN0rw3DaM1S8b2 QIbA== X-Forwarded-Encrypted: i=1; AJvYcCUDy7iQbKkWZOaCCYAB59EjSRD0B1BeR81V+PEmBMniYXKbq2FB/g3z320x33AGzQobusoEDCy5Ew==@kvack.org X-Gm-Message-State: AOJu0Yy1y1Ap8UdYKcbxLSX0jthF0bxytZII7GARJGs9jDm6AS6nWIkj LlT3qCuM8JVDZKGIDDgCDsPyjWoPHVfViKCuj4NmVsRL3zyEK6V9aG2Ggmau4PQ= X-Google-Smtp-Source: AGHT+IGHTEKzBKUVtnYe2a+ryORO5DF46XNGMFFFG30oEA+vIuO4M8RnW3rXX+fthYfNoUcwAKa5AQ== X-Received: by 2002:a17:907:7e8b:b0:a9a:3cec:b322 with SMTP id a640c23a62f3a-a9e3a6c99e0mr528256866b.45.1730363591611; Thu, 31 Oct 2024 01:33:11 -0700 (PDT) Received: from localhost (109-81-81-105.rct.o2.cz. [109.81.81.105]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a9e564c531csm41316866b.51.2024.10.31.01.33.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Oct 2024 01:33:11 -0700 (PDT) Date: Thu, 31 Oct 2024 09:33:10 +0100 From: Michal Hocko To: Stepanov Anatoly Cc: Gutierrez Asier , akpm@linux-foundation.org, david@redhat.com, ryan.roberts@arm.com, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, hannes@cmpxchg.org, hocko@kernel.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, alexander.kozhevnikov@huawei-partners.com, guohanjun@huawei.com, weiyongjun1@huawei.com, wangkefeng.wang@huawei.com, judy.chenhui@huawei.com, yusongping@huawei.com, artem.kuzin@huawei.com, kang.sun@huawei.com Subject: Re: [RFC PATCH 0/3] Cgroup-based THP control Message-ID: References: <20241030083311.965933-1-gutierrez.asier@huawei-partners.com> <770bf300-1dbb-42fc-8958-b9307486178e@huawei-partners.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 81186100025 X-Stat-Signature: 5ifd95zdogaqhbtjyamw7br6z8aio3fs X-Rspam-User: X-HE-Tag: 1730363542-557826 X-HE-Meta: U2FsdGVkX19a4KZo3u7WVTBN7PXyYM6KJa1lIzLEC1Xduj3QHeqgYUoGmJixfjTjiWjWGpcZPkJSvkppYSEj7HIDmiTl0hoBNknxI005+NUTU1teDGHBQqUfOV4CXmLb5v7CilaAXZU39mVb3FGoigD1vqN32KDkH/Pvn0AQ/EBiTykbCjaxsmvhI3MdomiuPXUkbYQYr70oABNE4P6/gO9gi3nOokDbuLaLJ91shIu2XbmOVi5gNO1ZO5TBFH+i18sKq1sPhZ6rkzF/YuBFv97XNDER87RKs0IE9a8cd8SIkbIAZ+qH/1ZoNP+FPczwHUfAKddiJzqM5jIm/qsVt33R/N78cjEMVRmmmVMs23K9jLtiW34e9UEYY5L5gnnnKOBxxxSbIBWDy4osp96JEYnKOz/2eSyrm4Ycydn/dTkO8s0XUdSvDG7M1AZLBEHb3J6TFm0WF7YVD2FixyzNQWk6tE3M9pOZRv+dT4m+F1LS9uTOuUZFZ5nZZMPCwJfyJE2hug1ohpObpNWgScd1rQPa0Ge6SQHbz5Ra8UB1q0iyEKtMh7jZdwzBYPVx7kxRsIqRoFcPvV1di/FDYU/ZLEFBkm/ZPniuXNq+BS8x2yLeYnpU9eRJMmFjQ4JMVJsObEAFPkFO1/YOR/Y4gK0LgyhwjH4CCvN3mEKG8FMY2u9OhB2IIp1ZNh1p8Ra9kQ87CxO18fLopjW9r1xEE9VYVu1IXSOEzJgnHVRFulpEIKoSRsiVPmeNC55XHqJM5kgvEfVPrPdrGJAVEBwnXX+M/MvJqjmqQUS9KTj7uxKdXn/pdLd/6ddqKFQSLFxBa6FS8h6Yx3h2llavg6DQ5G7yTjx6Xa4bAbuHTF3Rgogx3GV/QtsbCDfwT6eCIMFNSGI9R78PtwpciBcUuCukpMCk8qeCoR4iotEzE/51VzCXpPsXnNXo58eSsc8vwHJ+4HrO3Jf7IEXlSc+4ez4NJcd U3UFbq30 1maUqJlNMsFVjP3qieFVb6BWyh8BatDgMfdT6gQDD8jaZyc8j1RrcLZqKkWCnQy8vDkIxl/v4S7351nDsUAGWbemD94p+REfNkrG82DfpWJxFqxhL1GdWe6GskXDEY4EGzadroUCZ+B7uBj6EmruyAvDLolzY+ipIsUKVmAZP4O7TIxflBwYHK7Lrr5bOaJUFUt4/wH37X5MdG6IGhPcZXO6F+0KQrJVyYB/qISrVqrwpehhYO1pkbv1ES6LZ4yi9gJsMOrInwOOFgkYv4etXtUY/zyByY/XKSEVE2a8MrizrtlhyFoYOo7Nuow== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu 31-10-24 09:06:47, Stepanov Anatoly wrote: [...] > As prctl(PR_SET_THP_DISABLE) can only be used from the calling thread, > it needs app. developer participation anyway. > In theory, kind of a launcher-process can be used, to utilize the inheritance > of the corresponding prctl THP setting, but this seems not transparent > for the user-space. No, this is not in theaory. This is a very common usage pattern to allow changing the behavior for the target application transparently. > And what if we'd like to enable THP for a specific set of unrelated (in terms of parent-child) > tasks? This is what I've had in mind. Currently we only have THP disable option. If we really need an override to enforce THP on an application then this could be a more viable path. > IMHO, an alternative approach would be changing per-process THP-mode by PID, > thus also avoiding any user app. changes. We already have process_madvise. MADV_HUGEPAGE resp. MADV_COLLAPSE are not supported but we can discuss that option of course. This interface requires much more orchestration of course because it is VMA range based. > > You have not really answered a more fundamental question though. Why the > > THP behavior should be at the cgroup scope? From a practical POV that > > would represent containers which are a mixed bag of applications to > > support the workload. Why does the same THP policy apply to all of them? > > For THP there're 3 possible levels of fine-control: > - global THP > - THP per-group of processes > - THP per-process > > I agree, that in a container, different apps might have different > THP requirements. > But it also depends on many factors, such as: > container "size"(tiny/huge container), diversity of apps/functions inside a container. > I mean, for some cases, we might not need to go below "per-group" level in terms of THP control. I am sorry but I do not really see any argument why this should be per-memcg. Quite contrary. having that per memcg seems more muddy. > > Doesn't this make the sub-optimal global behavior the same on the cgroup > > level when some parts will benefit while others will not? > > > > I think the key idea for the sub-optimal behavior is "predictability", > so we know for sure which apps/services would consume THPs. OK, that seems fair. > We observed a significant THP usage on almost idle Ubuntu server, with simple test running, > (some random system services consumed few hundreds Mb of THPs). I assume that you are using Always as global default configuration, right? If that is the case then the high (in fact as high as feasible) THP utilization is a real goal. If you want more targeted THP use then madvise is what you are looking for. This will not help applications which are not THP aware of course but then we are back to the discussion whether the interface should be per a) per process b) per cgroup c) process_madvise. > Of course, on other distros me might have different situation. > But with fine-grained per-group control it's a lot more predictable. > > Am i got you question right? Not really but at least I do understand (hopefully) that you are trying to workaround THP overuse by changing the global default to be more restrictive while some workloads to be less restrictive. The question why pushing that down to memcg scope makes the situation better is not answered AFAICT. [...] > > So if the parent decides that none of the children should be using THP > > they can override that so the tuning at parent has no imperative > > control. This is breaking hierarchical property that is expected from > > cgroup control files. > > Actually, i think we can solve this. > As we mostly need just a single children level, > "flat" case (root->child) is enough, interpreting root-memcg THP mode as "global THP setting", > where sub-children are forbidden to override an inherited THP-mode. This reduced case is not really sufficient to justify the non hiearchical semantic, I am afraid. There must be a _really_ strong case to break this property and even then I am rather skeptical to be honest. We have been burnt by introducing stuff like memcg.swappiness that seemed like a good idea initially but backfired with unexpected behavior to many users. -- Michal Hocko SUSE Labs