From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AADFEC3F2CD for ; Wed, 4 Mar 2020 09:44:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4B0AB20848 for ; Wed, 4 Mar 2020 09:44:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4B0AB20848 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sipsolutions.net Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id DC28D6B0003; Wed, 4 Mar 2020 04:44:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D742B6B0005; Wed, 4 Mar 2020 04:44:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB00E6B0006; Wed, 4 Mar 2020 04:44:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0062.hostedemail.com [216.40.44.62]) by kanga.kvack.org (Postfix) with ESMTP id B39CB6B0003 for ; Wed, 4 Mar 2020 04:44:52 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 464B0181AC9C6 for ; Wed, 4 Mar 2020 09:44:52 +0000 (UTC) X-FDA: 76557195624.30.stick89_d625a017411f X-HE-Tag: stick89_d625a017411f X-Filterd-Recvd-Size: 6340 Received: from sipsolutions.net (s3.sipsolutions.net [144.76.43.62]) by imf50.hostedemail.com (Postfix) with ESMTP for ; Wed, 4 Mar 2020 09:44:51 +0000 (UTC) Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.93) (envelope-from ) id 1j9QaA-00FLQ5-7t; Wed, 04 Mar 2020 10:44:50 +0100 Message-ID: Subject: Memory reclaim protection and cgroup nesting (desktop use) From: Benjamin Berg To: cgroups@vger.kernel.org, linux-mm@kvack.org Date: Wed, 04 Mar 2020 10:44:44 +0100 Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-X/bjWAe/RYw/yZnfBd4v" User-Agent: Evolution 3.34.4 (3.34.4-1.fc31) MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --=-X/bjWAe/RYw/yZnfBd4v Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, TL;DR: I seem to need memory.min/memory.max to be set on each child cgroup and not just the parents. Is this expected? I have been experimenting with using cgroups to protect a GNOME session. The intention is that the GNOME Shell itself and important other services remain responsive, even if the application workload is thrashing. The long term goal here is to bridge the time until an OOM killer like oomd would get the system back into normal conditions using memory pressure information. Note that I have done these tests without any swap and with huge memory.min/memory.low values. I consider this scenario pathological, however, it seems like a reasonable way to really exercise the cgroup reclaim protection logic. The resulting cgroup hierarchy looked something like: -.slice =E2=94=9C=E2=94=80user.slice =E2=94=82 =E2=94=94=E2=94=80user-1000.slice =E2=94=82 =E2=94=9C=E2=94=80user@1000.service =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80session.slice =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80gsd-*.service =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80208803 /usr/lib= exec/gsd-rfkill =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80gnome-shell-wayland.servi= ce =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80208493 /usr/bin= /gnome-shell =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80208549 /usr/bin= /Xwayland :0 -rootless -noreset -accessx -core -auth /run/user/1000/.mutter= -Xwayla> =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80 =E2=80=A6 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80apps.slice =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80gnome-launched-tracker-miner-fs.d= esktop-208880.scope =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80208880 /usr/libexec/tra= cker-miner-fs =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80dbus-:1.2-org.gnome.OnlineAccount= s@0.service =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80208668 /usr/libexec/goa= -daemon =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80flatpak-org.gnome.Fractal-210350.= scope =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80gnome-terminal-server.service =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80209261 /usr/libexec/gno= me-terminal-server =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80209434 bash =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80 =E2=80=A6 including th= e test load i.e. "make -j32" of a C++ code I also enabled the CPU and IO controllers in my tests, but I don't think that is as relevant. The main thing is that I set memory.min: 2GiB memory.low: 4GiB using systemd on all of * user.slice, * user-1000.slice, * user@1000.slice, * session.slice and * everything inside session.slice (i.e. gnome-shell-wayland.service, gsd-*.service, =E2=80=A6) excluding apps.slice from protection. (In a realistic scenario I expect to have swap and then reserving maybe a few hundred MiB; DAMON might help with finding good values.) At that point, the protection started working pretty much flawlessly. i.e. my gnome-shell would continue to run without major page faulting even though everything in apps.slice was thrashing heavily. The mouse/keyboard remained completely responsive, and interacting with applications ended up working much better thanks to knowing where input was going. Even if the applications themselves took seconds to react. So far, so good. What surprises me is that I needed to set the protection on the child cgroups (i.e. gnome-shell-wayland.service). Without this, it would not work (reliably) and my gnome-shell would still have a lot of re-faults to load libraries and other mmap'ed data back into memory (I used "perf --no-syscalls -F" to trace this and observed these to be repeatedly for the same pages loading e.g. functions for execution). Due to accounting effects, I would expect re-faults to happen up to one time in this scenario. At that point the page in question will be accounted against the shell's cgroup and reclaim protection could kick in. Unfortunately, that did not seem to happen unless the shell's cgroup itself had protections and not just all of its parents. Is it expected that I need to set limits on each child? Benjamin --=-X/bjWAe/RYw/yZnfBd4v Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEED2NO4vMS33W8E4AFq6ZWhpmFY3AFAl5feI0ACgkQq6ZWhpmF Y3AWGw/9HeiEx6iQ3yGu+Al2MmjgsXcOk3kpHJshbHFACzCT89QaAaHUMfUen3yV MUe92csEBuIcRVf0qMFvgMHpcHNT+O8K5aymgw+Hhs74MveMs2qyi1RQ0lw6ooqo qABY6uLspyZWQxzgIlJmIzZvlKUwS07CR7SulBBYxtRQSrdsHIq9J/R0Wpt9RD7L QI5cWpPqCm8GlqBbtGyaK78kN00oS1LlivSsdMbPEffDZG4XCsJv2+BQDLgiFfPu KlropNo+djQVQ/ccLa4ZXxIC0uoxVZbU6hy4D3G6xLckqmRN5xEWQmtjRS/4NMz8 LH9eD1/j2rv4jcoIhndWtqNGy8xSaN72HK9Cor9sujud4qZkUS2oG30m5BQEtSM8 nwJETv4euIP1bVw9URoEtt3Rwli9DuBVoTBUSJGtu8JfopNdezwckfldCGe0Hekg R99dsqpKf+kThq29tRfRcbCPZU7Lrkn3/lPIBpJEf9VZuL8mx70j+mouSBWSA8R4 HHXxkzDU9YBneROgcFgjAOgrhM54pcfAIRexbmjJX1jiZ2WkurfO6QlaNsT54Vkz fto8/aE+MIR5qBmAFWRbmypZVOL1Od/j0X7w2CmDavOcDQqdS25BoHSvpTBITjYM bfLrEnh7CjStdaXNLIBbY3Orbzz5MNkXolqsKtLM2DKE3vqtSVQ= =YegP -----END PGP SIGNATURE----- --=-X/bjWAe/RYw/yZnfBd4v--