From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9031FC3F2D1 for ; Wed, 4 Mar 2020 16:30:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3343721739 for ; Wed, 4 Mar 2020 16:30:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iVS3T6wq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3343721739 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D24FD6B0006; Wed, 4 Mar 2020 11:30:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB76A6B0007; Wed, 4 Mar 2020 11:30:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA5DA6B0008; Wed, 4 Mar 2020 11:30:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0068.hostedemail.com [216.40.44.68]) by kanga.kvack.org (Postfix) with ESMTP id 9FE036B0006 for ; Wed, 4 Mar 2020 11:30:47 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3B6941EF3 for ; Wed, 4 Mar 2020 16:30:47 +0000 (UTC) X-FDA: 76558218534.17.time61_40fedcd63013d X-HE-Tag: time61_40fedcd63013d X-Filterd-Recvd-Size: 7728 Received: from mail-qk1-f194.google.com (mail-qk1-f194.google.com [209.85.222.194]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Wed, 4 Mar 2020 16:30:46 +0000 (UTC) Received: by mail-qk1-f194.google.com with SMTP id m2so2207705qka.7 for ; Wed, 04 Mar 2020 08:30:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=K4BeHblYXMy0jWZevx0X7etvY4HSc10p3TfF4SsNt5M=; b=iVS3T6wquyX/FWV1nq/cJXE2LdWrz8AEhTgPW18XqQ2ELQwisn/W76smiuCyICLB0p TAuQJ0fJRabTH6jMTNK89S9FTpC0JKCVgih+m7VKtN3BbfzMX/0U9DD9oJxdQQaMs1+C zs4xCYDaiTiEpGJHW+QzoVI5K5OsoD1PUSS2ibvwrty7ZE4fPVPlPAz8WNaU0lDKXhmO j7JKPSrzzP94Ky2NJN++LguGAGKgi7x7IwtHmQgSC5Xat8yN8aCJvvCeYWf6zYkD0/Np 0MEaHYzQyScqzH0QmIfa0tFkWIJc1v8+pveYzECCi8suYzEonyDvZlXMknoNkof7QBOt RxCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition :content-transfer-encoding:in-reply-to; bh=K4BeHblYXMy0jWZevx0X7etvY4HSc10p3TfF4SsNt5M=; b=qhjjudbhRa6tcc9QgMRa5Ars0mTYN5EiR/ZyHSGWaRCwFdvHfSSvmV7xFzqaaKsOTY 646h3lKVQ8XJXdzrKUoxuqX0NJQi2CO7u+L27k27CMen5OQsDZOpmbgi3WIwbkGXMrwV a7pE9Lc/qzsEkkfJn6fFnR6z+ucNFjLd2ibhOHA7ECubmvT+9nafP1vDcFwpnJ0eilH3 eS1UtuSeggJ7JuU/SIAp2MjBVYy1jpN2zQNCDxLPkJNo1/B8sLgsQquaUk/EZMTmi2w7 CLdSuDDOY7JqZZXDOMHpZLdEIu/rjQlEl6AW4V8NZou43v7kaHQA9H3J0G/1aeD3SJ2d i38g== X-Gm-Message-State: ANhLgQ3bJoEZXRqgi/M3xw4heAO/Dsejg3YCbexKdWsJuJ5xtNMcCs5K QJ0fz0nBAhGjQu+uTgwagNQ= X-Google-Smtp-Source: ADFU+vsWAl8Llot94c7MeH+hc39to+NLcgycHlu+sCcv/afdavFw0EEZd5NGLAN1fj9gX5kSNrw1Qw== X-Received: by 2002:ae9:f205:: with SMTP id m5mr3787154qkg.152.1583339446004; Wed, 04 Mar 2020 08:30:46 -0800 (PST) Received: from localhost ([71.172.127.161]) by smtp.gmail.com with ESMTPSA id i91sm14577151qtd.70.2020.03.04.08.30.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 04 Mar 2020 08:30:45 -0800 (PST) Date: Wed, 4 Mar 2020 11:30:44 -0500 From: Tejun Heo To: Benjamin Berg Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner Subject: Re: Memory reclaim protection and cgroup nesting (desktop use) Message-ID: <20200304163044.GF189690@mtj.thefacebook.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, (cc'ing Johannes and quoting whole msg) On Wed, Mar 04, 2020 at 10:44:44AM +0100, Benjamin Berg wrote: > Hi, >=20 > TL;DR: I seem to need memory.min/memory.max to be set on each child > cgroup and not just the parents. Is this expected? Yes, currently. However, v5.7+ will have a cgroup2 mount option to propagate protection automatically. https://lore.kernel.org/linux-mm/20191219200718.15696-4-hannes@cmpxchg.= org/ > I have been experimenting with using cgroups to protect a GNOME > session. The intention is that the GNOME Shell itself and important > other services remain responsive, even if the application workload is > thrashing. The long term goal here is to bridge the time until an OOM > killer like oomd would get the system back into normal conditions using > memory pressure information. >=20 > Note that I have done these tests without any swap and with huge > memory.min/memory.low values. I consider this scenario pathological, > however, it seems like a reasonable way to really exercise the cgroup > reclaim protection logic. It's incomplete and more brittle in that the kernel has to treat a large portion of memory usage as essentially memlocked. > The resulting cgroup hierarchy looked something like: >=20 > -.slice > =E2=94=9C=E2=94=80user.slice > =E2=94=82 =E2=94=94=E2=94=80user-1000.slice > =E2=94=82 =E2=94=9C=E2=94=80user@1000.service > =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80session.slice > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80gsd-*.service > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80208803 /usr= /libexec/gsd-rfkill > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80gnome-shell-wayland.s= ervice > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80208493 /usr= /bin/gnome-shell > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80208549 /usr= /bin/Xwayland :0 -rootless -noreset -accessx -core -auth /run/user/1000/.= mutter-Xwayla> > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80 =E2=80=A6 > =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80apps.slice > =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80gnome-launched-tracker-miner-= fs.desktop-208880.scope > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80208880 /usr/libexec= /tracker-miner-fs > =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80dbus-:1.2-org.gnome.OnlineAcc= ounts@0.service > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80208668 /usr/libexec= /goa-daemon > =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80flatpak-org.gnome.Fractal-210= 350.scope > =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80gnome-terminal-server.service > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80209261 /usr/libexec= /gnome-terminal-server > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80209434 bash > =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=94=E2=94=80 =E2=80=A6 includin= g the test load i.e. "make -j32" of a C++ code >=20 >=20 > I also enabled the CPU and IO controllers in my tests, but I don't > think that is as relevant. The main thing is that I set CPU control isn't but IO is. Without working IO isolation, it's relatively easy to drive the system into the ground given enough stress ouside the protected area. > memory.min: 2GiB > memory.low: 4GiB >=20 > using systemd on all of >=20 > * user.slice, > * user-1000.slice, > * user@1000.slice, > * session.slice and > * everything inside session.slice > (i.e. gnome-shell-wayland.service, gsd-*.service, =E2=80=A6) >=20 > excluding apps.slice from protection. >=20 > (In a realistic scenario I expect to have swap and then reserving maybe > a few hundred MiB; DAMON might help with finding good values.) What's DAMON? > At that point, the protection started working pretty much flawlessly. > i.e. my gnome-shell would continue to run without major page faulting > even though everything in apps.slice was thrashing heavily. The > mouse/keyboard remained completely responsive, and interacting with > applications ended up working much better thanks to knowing where input > was going. Even if the applications themselves took seconds to react. >=20 > So far, so good. What surprises me is that I needed to set the > protection on the child cgroups (i.e. gnome-shell-wayland.service). > Without this, it would not work (reliably) and my gnome-shell would > still have a lot of re-faults to load libraries and other mmap'ed data > back into memory (I used "perf --no-syscalls -F" to trace this and > observed these to be repeatedly for the same pages loading e.g. > functions for execution). >=20 > Due to accounting effects, I would expect re-faults to happen up to one > time in this scenario. At that point the page in question will be > accounted against the shell's cgroup and reclaim protection could kick > in. Unfortunately, that did not seem to happen unless the shell's > cgroup itself had protections and not just all of its parents. >=20 > Is it expected that I need to set limits on each child? Yes, right now, memory.low needs to be configured all the way down to the leaf to be effective, which can be rather cumbersome. As written above, future kernels will be easier to work with in this respect. Thanks. --=20 tejun