From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81CB6C433F5 for ; Thu, 10 Feb 2022 23:53:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7CDF6B0073; Thu, 10 Feb 2022 18:53:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B2AD76B0075; Thu, 10 Feb 2022 18:53:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F2876B0078; Thu, 10 Feb 2022 18:53:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0064.hostedemail.com [216.40.44.64]) by kanga.kvack.org (Postfix) with ESMTP id 8D39D6B0073 for ; Thu, 10 Feb 2022 18:53:14 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 4E9FA95AE5 for ; Thu, 10 Feb 2022 23:53:14 +0000 (UTC) X-FDA: 79128523908.11.B181606 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by imf28.hostedemail.com (Postfix) with ESMTP id D98F3C0002 for ; Thu, 10 Feb 2022 23:53:13 +0000 (UTC) Received: by mail-lf1-f51.google.com with SMTP id k13so13351060lfg.9 for ; Thu, 10 Feb 2022 15:53:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FYlkrxPZ2Z2zJ0ERqrS7qUlWPElIVoGvIlqioJZlkEg=; b=N7sJdoUSvYyq4MjeAPZ+AFQb0pTRwxjWTrsPhX0XMSt6+2JQTgiCBZcetXZs1nn0iF 4Mnb+PdRGBlJgMhfrAdhn5bYjANfKDHCNuCeLZDHMFckT6QMOsgNCIMfFzeekLGylJ5P wNUQvGaZVE164HcLoLdHVIv4PMt/RYLPqLzyYPWKs2LWg0XHv4Cng2Bg0iIqBVVl9dhC PuoIWkNoA7GZVZcRSnk/dg6fjwvxyIJ1DSLaTAgIgroZT3KUMkI62C/BmXmLWdbltMZD FtofcFN19iXXP4W5nA4N2/9YVNIv3zJSO3wYrw2lnv6lXYCFZED9GP4KMjf5FAU7UHK2 JrEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FYlkrxPZ2Z2zJ0ERqrS7qUlWPElIVoGvIlqioJZlkEg=; b=EQUMYWh15iWZmlNsjHWa0G3vaL3uRfMlbTisHnEm7WQFh22giyWbG2m0WXuWEmuem6 mPvs3DtPF9RoGWT1KxR3t8KhnocLOA1kKBfvhyaHN+jRu23e9O5Mb0PEMJmDQ1SuJBP8 lNw6sCyHWb1uSOSzFpaPvK7IDqOUNAMwRyGpfFbqTV30rgNxwDnV5XKv5U0D4R3QTzGO qJWwBEThjEscKrEyFfda4wc1EGWImJ+WEovwavJVEhnGNKzAclD8NxIH7xmKKZh7962p AN0jzPAgDKAePNOqx+sCCKCinMMdoMKWtDNiDj0mLQ99eHU2Pz1c0ifQXvEygEkooPZv i+FQ== X-Gm-Message-State: AOAM530q4Z6gBrJJfKIcjbj0Tm3kmbp4ME1pSnv/tLAf1Jjy2yoNGTOR ra8i4u1McwjG6pQxwyQneGbWQ240kypA5KM07UUeuQ== X-Google-Smtp-Source: ABdhPJzlQ7C2povlgj2WhXWBIUQvgOe7nN/U/YEt/xipMo3xl/7n7Z7MXLZqPTYBMl70EN25TbWtI04N4CiekUZ7TOQ= X-Received: by 2002:a19:9144:: with SMTP id y4mr6846910lfj.494.1644537192004; Thu, 10 Feb 2022 15:53:12 -0800 (PST) MIME-Version: 1.0 References: <20220210081437.1884008-1-shakeelb@google.com> <20220210081437.1884008-5-shakeelb@google.com> In-Reply-To: From: Shakeel Butt Date: Thu, 10 Feb 2022 15:53:00 -0800 Message-ID: Subject: Re: [PATCH 4/4] memcg: synchronously enforce memory.high To: Roman Gushchin Cc: Johannes Weiner , Michal Hocko , Chris Down , Andrew Morton , Cgroups , Linux MM , LKML Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: D98F3C0002 X-Stat-Signature: g84xb588x4qdca5aeokfc8rgkos1hyzd Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=N7sJdoUS; spf=pass (imf28.hostedemail.com: domain of shakeelb@google.com designates 209.85.167.51 as permitted sender) smtp.mailfrom=shakeelb@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1644537193-258515 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Feb 10, 2022 at 3:29 PM Roman Gushchin wrote: > > On Thu, Feb 10, 2022 at 02:22:36PM -0800, Shakeel Butt wrote: > > On Thu, Feb 10, 2022 at 12:15 PM Roman Gushchin wrote: > > > > > [...] > > > > > > Has this approach been extensively tested in the production? > > > > > > Injecting sleeps at return-to-userspace moment is safe in terms of priority > > > inversions: a slowed down task will unlikely affect the rest of the system. > > > > > > It way less predictable for a random allocation in the kernel mode, what if > > > the task is already holding a system-wide resource? > > > > > > Someone might argue that it's not better than a system-wide memory shortage > > > and the same allocation might go into a direct reclaim anyway, but with > > > the way how memory.high is used it will happen way more often. > > > > > > > Thanks for the review. > > > > This patchset is tested in the test environment for now and I do plan > > to test this in production but that is a slow process and will take > > some time. > > > > Let me answer the main concern you have raised i.e. the safety of > > throttling a task synchronously in the charge code path. Please note > > that synchronous memory reclaim and oom-killing can already cause the > > priority inversion issues you have mentioned. The way we usually > > tackle such issues are through userspace controllers. For example oomd > > is the userspace solution for catering such issues related to > > oom-killing. Here we have a similar userspace daemon monitoring the > > workload and deciding if it should let the workload grow or kill it. > > > > Now should we keep the current high limit enforcement implementation > > and let it be ineffective for some real workloads or should we make > > the enforcement more robust and let the userspace tackle some corner > > case priority inversion issues. I think we should follow the second > > option as we already have precedence of doing the same for reclaim and > > oom-killing. > > Well, in a theory it sounds good and I have no intention to oppose the > idea. However in practice we might easily get quite serious problems. > So I think we should be extra careful here. In the end we don't want to > pull and then revert this patch. > > The difference between the system-wide direct reclaim and this case is that > usually kswapd is doing a good job of refilling the empty buffer, so we don't > usually work in the circumstances of the global memory shortage. And when we do, > often it's not working out quite well, this is why oomd and other similar > solutions are required. >. > Another option is to use your approach only for special cases (e.g. huge > allocations) and keep the existing approach for most other allocations. > These are not necessarily huge allocations and can be a large number of small allocations. However I think we can make this idea work by checking current->memcg_nr_pages_over_high. If order(current->memcg_nr_pages_over_high) is, let's say, larger than PAGE_ALLOC_COSTLY_ORDER, then throttle synchronously. WDYT?