From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69F0FC2BB1D for ; Fri, 17 Apr 2020 21:51:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0324620B1F for ; Fri, 17 Apr 2020 21:51:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VVM8cusG" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0324620B1F Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 794638E0003; Fri, 17 Apr 2020 17:51:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 71E338E0001; Fri, 17 Apr 2020 17:51:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E5298E0003; Fri, 17 Apr 2020 17:51:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id 3DF418E0001 for ; Fri, 17 Apr 2020 17:51:24 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E7CEF824556B for ; Fri, 17 Apr 2020 21:51:23 +0000 (UTC) X-FDA: 76718693646.12.store29_29472eb546055 X-HE-Tag: store29_29472eb546055 X-Filterd-Recvd-Size: 7309 Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Apr 2020 21:51:23 +0000 (UTC) Received: by mail-lj1-f193.google.com with SMTP id e25so3595168ljg.5 for ; Fri, 17 Apr 2020 14:51:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=tmTmDDUdpiLDV9LVGQCn0/OLUoBHllDUrwiB8bPf7PE=; b=VVM8cusG0MQVqrRDrUMvM8EhIwHOn0sQjHBjQhOSHLGEZCqJMCc4gROw7t4V/VvMaM rTksXe0M88FFeD+d+4/bU9Umfz4tE0rDCFqJOsAP3yaEDjpWmUR9eMq3AzzsiqYBQZNi Qi5TmhCZLqxmx93vn63TQb3vbwPVpF0H+JIhVikUm85foDo7F5Xb6KvZFwLmyFrNIkSL z0wQyU2jNtR4QvbOlI3OHXgzEbM+bT53bjqp3l7bkCopGgWyws5tk2hxrzjT4QynMgT6 dD6JSbo1qnWGSX02bZGkk10xSJ+aXhLrECxwYZ4spLR0m5HatbRaYOnwAgY37gWpKmxN IdYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tmTmDDUdpiLDV9LVGQCn0/OLUoBHllDUrwiB8bPf7PE=; b=hM+14EhBdNHTWuOiOIDXqVhy93jre40ABgl4NA3JT8cU5QcGi1dAh2feDBYCorU7Wr v4R28RgAKyBxH6lnUs29Jn/Ap2g3dcI4+zv/g5+rO8cKAyTFxIGkcJQnp6aij0ddZVjT yeU6jhNyVq9WcUHrZuTOpmQ5vErsJXZVuXrh6I+Bdg2Fm7VMfA2+DWlGfCSZHEgi10bu nNdmcwh+NzzpHkReOuNQ6FLMgqDQxug3SXBvpecmaOV4X2L6J9YDJhkmh64cS7KsTsrn feRKxs1xfM19TPFFbCY01fK1B499a5qkXfoTAb/ryJYeaSZxJelCqMvkJEI5Q4L0AcK7 n84g== X-Gm-Message-State: AGi0PuaoVRLIcEJmOOWFhtwBYy2xohSrwcH4d8Kii/yLKizL475Iknz8 wLwfe73N4LVkpZF+qpt+7ZpVIQZ8wVDIKZd77AeeLw== X-Google-Smtp-Source: APiQypK8Y9XOmeXokupsh1wQ9/dgy9//zGMwaI7Km7M6TBcUB/VFbedYG7nmlzeeQlUJIJ1cPSuBGxYo1kBc46PpLxg= X-Received: by 2002:a2e:45c3:: with SMTP id s186mr2894597lja.270.1587160281667; Fri, 17 Apr 2020 14:51:21 -0700 (PDT) MIME-Version: 1.0 References: <20200417010617.927266-1-kuba@kernel.org> <20200417162355.GA43469@mtj.thefacebook.com> <20200417173615.GB43469@mtj.thefacebook.com> <20200417193539.GC43469@mtj.thefacebook.com> In-Reply-To: <20200417193539.GC43469@mtj.thefacebook.com> From: Shakeel Butt Date: Fri, 17 Apr 2020 14:51:09 -0700 Message-ID: Subject: Re: [PATCH 0/3] memcg: Slow down swap allocation as the available space gets depleted To: Tejun Heo Cc: Jakub Kicinski , Andrew Morton , Linux MM , Kernel Team , Johannes Weiner , Chris Down , Cgroups Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 17, 2020 at 12:35 PM Tejun Heo wrote: > > Hello, > > On Fri, Apr 17, 2020 at 10:51:10AM -0700, Shakeel Butt wrote: > > > Can you please elaborate concrete scenarios? I'm having a hard time seeing > > > differences from page cache. > > > > Oh I was talking about the global reclaim here. In global reclaim, any > > task can be throttled (throttle_direct_reclaim()). Memory freed by > > using the CPU of high priority low latency jobs can be stolen by low > > priority batch jobs. > > I'm still having a hard time following this thread of discussion, most > likely because my knoweldge of mm is fleeting at best. Can you please ELI5 > why the above is specifically relevant to this discussion? > No, it is not relevant to this discussion "now". The mention of performance isolation in my first email was mostly due to my lack of understanding about what problem this patch series is trying to solve. So, let's skip this topic. > I'm gonna list two things that come to my mind just in case that'd help > reducing the back and forth. > > * With protection based configurations, protected cgroups wouldn't usually > go into direct reclaim themselves all that much. > > * We do have holes in accounting CPU cycles used by reclaim to the orgins, > which, for example, prevents making memory.high reclaim async and lets > memory pressure contaminate cpu isolation possibly to a significant degree > on lower core count machines in some scenarios, but that's a separate > issue we need to address in the future. > I have an opinion on the above but I will restrain as those are not relevant to the patch series. > > > cgroup A has memory.low protection and no other restrictions. cgroup B has > > > no protection and has access to swap. When B's memory starts bloating and > > > gets the system under memory contention, it'll start consuming swap until it > > > can't. When swap becomes depleted for B, there's nothing holding it back and > > > B will start eating into A's protection. > > > > > > > In this example does 'B' have memory.high and memory.max set and by A > > B doesn't have anything set. > > > having no other restrictions, I am assuming you meant unlimited high > > and max for A? Can 'A' use memory.min? > > Sure, it can but 1. the purpose of the example is illustrating the > imcompleteness of the existing mechanism I understand but is this a real world configuration people use and do we want to support the scenario where without setting high/max, the kernel still guarantees the isolation. > 2. there's a big difference between > letting the machine hit the wall and waiting for the kernel OOM to trigger > and being able to monitor the situation as it gradually develops and respond > to it, which is the whole point of the low/high mechanisms. > I am not really against the proposed solution. What I am trying to see is if this problem is more general than an anon/swap-full problem and if a more general solution is possible. To me it seems like, whenever a large portion of reclaimable memory (anon, file or kmem) becomes non-reclaimable abruptly, the memory isolation can be broken. You gave the anon/swap-full example, let me see if I can come up with file and kmem examples (with similar A & B). 1) B has a lot of page cache but temporarily gets pinned for rdma or something and the system gets low on memory. B can attack A's low protected memory as B's page cache is not reclaimable temporarily. 2) B has a lot of dentries/inodes but someone has taken a write lock on shrinker_rwsem and got stuck in allocation/reclaim or CPU preempted. B can attack A's low protected memory as B's slabs are not reclaimable temporarily. I think the aim is to slow down B enough to give the PSI monitor a chance to act before either B targets A's protected memory or the kernel triggers oom-kill. My question is do we really want to solve the issue without limiting B through high/max? Also isn't fine grained PSI monitoring along with limiting B through memory.[high|max] general enough to solve all three example scenarios? thanks, Shakeel