From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED1F2C2BA19 for ; Tue, 14 Apr 2020 14:32:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A69322076D for ; Tue, 14 Apr 2020 14:32:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A69322076D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5580D8E0003; Tue, 14 Apr 2020 10:32:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 508608E0001; Tue, 14 Apr 2020 10:32:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 41E588E0003; Tue, 14 Apr 2020 10:32:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id 294E98E0001 for ; Tue, 14 Apr 2020 10:32:34 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C8234582D for ; Tue, 14 Apr 2020 14:32:33 +0000 (UTC) X-FDA: 76706701386.15.can78_6da993fe81641 X-HE-Tag: can78_6da993fe81641 X-Filterd-Recvd-Size: 5086 Received: from mail-wm1-f68.google.com (mail-wm1-f68.google.com [209.85.128.68]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Tue, 14 Apr 2020 14:32:33 +0000 (UTC) Received: by mail-wm1-f68.google.com with SMTP id d77so13193967wmd.3 for ; Tue, 14 Apr 2020 07:32:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=YGduE4uZ3acGR3QcE/tR47HP2OkPktoXsosG74rcwVM=; b=davqXIzGA4WSFf0Nmq0vBi2bdKBszNVjXl1LfvGV4i60D8/p+a+HoS0Pxq2d3ZvtN/ LeRofbl9MLeTT3YipaewX2qrTNj1+iaAuDd2iOtTrcDGerWafHWvIIUbEIyKXEb2gzLk qeoQ6LAJI0ybmVtvlMkN1F0GnivHUEf2tDa7e5hx9Uh8emNRDerl03eXiTEa5XyDqPAJ LNnKz7zGTOPOlgdjbI6HsMBWg2J4x6F07npiQadWMFmZvcVMtdeJIgELmI4A7IXLy3y8 U040SjdYbBEtTJwJmpyZhUTtgZLbBXjq1NA0TZq4oHkU0PiICkZmF0lWhI8Dj1Qa62YM jk8g== X-Gm-Message-State: AGi0PuaEE0X5kZe8jhSWpFn0zVrwfJ5BHKmJ8ZBz3HCbHBzTYnhYFiGB lx+829FPhNC1MCrfHu7g0n8= X-Google-Smtp-Source: APiQypJkAf10wOF+5ljDCSWDKEtDlxMpCUS4O7iwmj9yCnduKRYFVIwAgO5gYeFPycRIqx8krVDUTw== X-Received: by 2002:a7b:c5d8:: with SMTP id n24mr165461wmk.158.1586874752232; Tue, 14 Apr 2020 07:32:32 -0700 (PDT) Received: from localhost (ip-37-188-180-223.eurotel.cz. [37.188.180.223]) by smtp.gmail.com with ESMTPSA id z10sm17309727wrg.69.2020.04.14.07.32.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Apr 2020 07:32:31 -0700 (PDT) Date: Tue, 14 Apr 2020 16:32:29 +0200 From: Michal Hocko To: Yafang Shao Cc: Andrew Morton , Linux MM Subject: Re: [RFC PATCH] mm, oom: oom ratelimit auto tuning Message-ID: <20200414143229.GN4629@dhcp22.suse.cz> References: <1586597774-6831-1-git-send-email-laoar.shao@gmail.com> <20200414073911.GC4629@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 14-04-20 20:32:54, Yafang Shao wrote: > On Tue, Apr 14, 2020 at 3:39 PM Michal Hocko wrote: [...] > > Besides that I strongly suspect that you would be much better of > > by disabling /proc/sys/vm/oom_dump_tasks which would reduce the amount > > of output a lot. Or do you really require this information when > > debugging oom reports? > > > > Yes, disabling /proc/sys/vm/oom_dump_tasks can save lots of time. > But I'm not sure whehter we can disable it totally, because disabling > it would prevent the tasks log from being wrote into /var/log/messages > neither. Yes, eligible tasks would be really missing. The real question is whether you are really going to miss that information. From my experience of looking into oom reports for years I can tell that the list might be useful but in a vast majority of cases I simply do not really neeed it because the stat of memory and chosen victims are much more important. The list of tasks is usually interesting only when you want to double check whether the victim selection was reasonable or cases where a list of tasks itself can tell whether something went wild in the userspace. > > > The OOM ratelimit starts with a slow rate, and it will increase slowly > > > if the speed of the console is rapid and decrease rapidly if the speed > > > of the console is slow. oom_rs.burst will be in [1, 10] and > > > oom_rs.interval will always greater than 5 * HZ. > > > > I am not against increasing the ratelimit timeout. But this patch seems > > to be trying to be too clever. Why cannot we simply increase the > > parameters of the ratelimit? > > I justed worried that the user may complain it if too many > oom_kill_process callbacks are suppressed. This can be a real concern indeed. > But considering that OOM burst at the same time are always because of > the same reason, This is not really the case. Please note that many parallel OOM killers might happen in memory cgroup setups. > so I think one snapshot of the OOM may be enough. > Simply setting oom_rs with {20 * HZ, 1} can resolve this issue. Does it really though? The ratelimit doesn't stop the long taking output. It simply cannot because the work is already done. That being said, making the ratelimiting more aggressive sounds more like a workaround than an actual fix. So I would go that route only if there is no other option. I believe the real problem here is in printk being too synchronous here. This is a general problem and something printk maintainers are already working on. For now I would recommend to workaround this problem by reducing the log level or disabling dump_tasks. -- Michal Hocko SUSE Labs