From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 213C0C433E2 for ; Tue, 15 Sep 2020 08:18:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9661B21D1B for ; Tue, 15 Sep 2020 08:18:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9661B21D1B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EC24490002D; Tue, 15 Sep 2020 04:18:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E727990001D; Tue, 15 Sep 2020 04:18:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D88E690002D; Tue, 15 Sep 2020 04:18:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0046.hostedemail.com [216.40.44.46]) by kanga.kvack.org (Postfix) with ESMTP id BF09690001D for ; Tue, 15 Sep 2020 04:18:35 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 803F13499 for ; Tue, 15 Sep 2020 08:18:35 +0000 (UTC) X-FDA: 77264594190.15.jam39_240278b2710f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin15.hostedemail.com (Postfix) with ESMTP id 554D81814B0C1 for ; Tue, 15 Sep 2020 08:18:35 +0000 (UTC) X-HE-Tag: jam39_240278b2710f X-Filterd-Recvd-Size: 3838 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Sep 2020 08:18:34 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id D2FFEB042; Tue, 15 Sep 2020 08:18:48 +0000 (UTC) Date: Tue, 15 Sep 2020 10:18:32 +0200 From: Michal Hocko To: Vijay Balakrishna Cc: Andrew Morton , "Kirill A. Shutemov" , Oleg Nesterov , Song Liu , Andrea Arcangeli , Pavel Tatashin , Allen Pais , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [[PATCH]] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged Message-ID: <20200915081832.GA4649@dhcp22.suse.cz> References: <1599770859-14826-1-git-send-email-vijayb@linux.microsoft.com> <20200914143312.GU16999@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 554D81814B0C1 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon 14-09-20 09:57:02, Vijay Balakrishna wrote: > > > On 9/14/2020 7:33 AM, Michal Hocko wrote: > > On Thu 10-09-20 13:47:39, Vijay Balakrishna wrote: > > > When memory is hotplug added or removed the min_free_kbytes must be > > > recalculated based on what is expected by khugepaged. Currently > > > after hotplug, min_free_kbytes will be set to a lower default and higher > > > default set when THP enabled is lost. This leaves the system with small > > > min_free_kbytes which isn't suitable for systems especially with network > > > intensive loads. Typical failure symptoms include HW WATCHDOG reset, > > > soft lockup hang notices, NETDEVICE WATCHDOG timeouts, and OOM process > > > kills. > > > > Care to explain some more please? The whole point of increasing > > min_free_kbytes for THP is to get a larger free memory with a hope that > > huge pages will be more likely to appear. While this might help for > > other users that need a high order pages it is definitely not the > > primary reason behind it. Could you provide an example with some more > > data? > > Thanks Michal. I haven't looked into THP as part of my investigation, so I > cannot comment. > > In our use case we are hotplug removing ~2GB of 8GB total (on our SoC) > during normal reboot/shutdown. This memory is hotplug hot-added as movable > type via systemd late service during start-of-day. > > In our stress test first we ran into HW WATCHDOG recovery, on enabling > kernel watchdog we started seeing soft lockup hung task notices, failure > symptons varied, where stack trace of hung tasks sometimes trying to > allocate GFP_ATOMIC memory, looping in do_notify_resume, NETDEVICE WATCHDOG > timeouts, OOM process kills etc., During investigation we reran stress test > without hotplug use case. Surprisingly this run didn't encounter the said > problems. This led to comparing what is different between the two runs, > while looking at various globals, studying hotplug code I uncovered the > issue of failing to restore min_free_kbytes. In particular on our 8GB SoC > min_free_kbytes went down to 8703 from 22528 after hotplug add. Did you try to increase min_free_kbytes manually after hot remove? Btw. I would consider oom killer invocation due to min_free_kbytes really weird behavior. If anything the higher value would cause more memory reclaim and potentially oom rather than smaller one. -- Michal Hocko SUSE Labs