From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90438CA9EB6 for ; Wed, 23 Oct 2019 08:17:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5B45B2064A for ; Wed, 23 Oct 2019 08:17:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B45B2064A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 076CE6B0003; Wed, 23 Oct 2019 04:17:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 000B76B0006; Wed, 23 Oct 2019 04:17:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E099E6B0007; Wed, 23 Oct 2019 04:17:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0215.hostedemail.com [216.40.44.215]) by kanga.kvack.org (Postfix) with ESMTP id B55B66B0003 for ; Wed, 23 Oct 2019 04:17:33 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 585B7181AEF3C for ; Wed, 23 Oct 2019 08:17:33 +0000 (UTC) X-FDA: 76074345186.18.doll65_5f01f11554d4b X-HE-Tag: doll65_5f01f11554d4b X-Filterd-Recvd-Size: 6048 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Wed, 23 Oct 2019 08:17:32 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 15D43C167; Wed, 23 Oct 2019 08:17:30 +0000 (UTC) Date: Wed, 23 Oct 2019 10:17:29 +0200 From: Michal Hocko To: Hillf Danton Cc: linux-mm , Andrew Morton , linux-kernel , Johannes Weiner , Shakeel Butt , Minchan Kim , Mel Gorman , Vladimir Davydov , Jan Kara Subject: Re: [RFC v1] mm: add page preemption Message-ID: <20191023081729.GI754@dhcp22.suse.cz> References: <20191020134304.11700-1-hdanton@sina.com> <20191022121439.7164-1-hdanton@sina.com> <20191022142802.14304-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191022142802.14304-1-hdanton@sina.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 22-10-19 22:28:02, Hillf Danton wrote: > > On Tue, 22 Oct 2019 14:42:41 +0200 Michal Hocko wrote: > > > > On Tue 22-10-19 20:14:39, Hillf Danton wrote: > > > > > > On Mon, 21 Oct 2019 14:27:28 +0200 Michal Hocko wrote: > > [...] > > > > Why do we care and which workloads would benefit and how much. > > > > > > Page preemption, disabled by default, should be turned on by those > > > who wish that the performance of their workloads can survive memory > > > pressure to certain extent. > > > > I am sorry but this doesn't say anything to me. How come not all > > workloads would fit that description? > > That means pp plays a role when kswapd becomes active, and it may > prevent too much jitters in active lru pages. This is still too vague to be useful in any way. > > > The number of pp users is supposed near the people who change the > > > nice value of their apps either to -1 or higher at least once a week, > > > less than vi users among UK's undergraduates. > > > > > > > And last but not least why the existing infrastructure doesn't help > > > > (e.g. if you have clearly defined workloads with different > > > > memory consumption requirements then why don't you use memory cgroups to > > > > reflect the priority). > > > > > > Good question:) > > > > > > Though pp is implemented by preventing any task from reclaiming as many > > > pages as possible from other tasks that are higher on priority, it is > > > trying to introduce prio into page reclaiming, to add a feature. > > > > > > Page and memcg are different objects after all; pp is being added at > > > the page granularity. It should be an option available in environments > > > without memcg enabled. > > > > So do you actually want to establish LRUs per priority? > > No, no change other than the prio for every lru page was added. LRU per prio > is too much to implement. Well, considering that per page priority is a no go as already pointed out by Willy then you do not have other choice right? > > Why using memcgs is not an option? > > I have plan to add prio in memcg. As you see, I sent a rfc before v0 with > nice added in memcg, and realised a couple days ago that its dependence on > soft limit reclaim is not acceptable. > > But we can't do that without determining how to define memcg's prio. > What is in mind now is the highest (or lowest) prio of tasks in a memcg > with a knob offered to userspace. > > If you like, I want to have a talk about it sometime later. This doesn't really answer my question. Why cannot you use memcgs as they are now. Why exactly do you need a fixed priority? > > This is the main facility to partition reclaimable > > memory in the first place. You should really focus on explaining on why > > a much more fine grained control is needed much more thoroughly. > > > > > What is way different from the protections offered by memory cgroup > > > is that pages protected by memcg:min/low can't be reclaimed regardless > > > of memory pressure. Such guarantee is not available under pp as it only > > > suggests an extra factor to consider on deactivating lru pages. > > > > Well, low limit can be breached if there is no eliglible memcg to be > > reclaimed. That means that you can shape some sort of priority by > > setting the low limit already. > > > > [...] > > > > > What was added on the reclaimer side is > > > > > > 1, kswapd sets pgdat->kswapd_prio, the switch between page reclaimer > > > and allocator in terms of prio, to the lowest value before taking > > > a nap. > > > > > > 2, any allocator is able to wake up the reclaimer because of the > > > lowest prio, and it starts reclaiming pages using the waker's prio. > > > > > > 3, allocator comes while kswapd is active, its prio is checked and > > > no-op if kswapd is higher on prio; otherwise switch is updated > > > with the higher prio. > > > > > > 4, every time kswapd raises sc.priority that starts with DEF_PRIORITY, > > > it is checked if there is pending update of switch; and kswapd's > > > prio steps up if there is a pending one, thus its prio never steps > > > down. Nor prio inversion. > > > > > > 5, goto 1 when kswapd finishes its work. > > > > What about the direct reclaim? > > Their prio will not change before reclaiming finishes, so leave it be. This doesn't answer my question. > > What if pages of a lower priority are > > hard to reclaim? Do you want a process of a higher priority stall more > > just because it has to wait for those lower priority pages? > > The problems above are not introduced by pp, let Mr. Kswapd take care of > them. No, this is not an answer. -- Michal Hocko SUSE Labs