From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2452C43461 for ; Tue, 15 Sep 2020 15:48:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 16E8D20756 for ; Tue, 15 Sep 2020 15:48:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=linux.microsoft.com header.i=@linux.microsoft.com header.b="q0OQamjl" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 16E8D20756 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.microsoft.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 962EE900022; Tue, 15 Sep 2020 11:48:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9137D90000E; Tue, 15 Sep 2020 11:48:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 804BA900022; Tue, 15 Sep 2020 11:48:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 68E5F90000E for ; Tue, 15 Sep 2020 11:48:12 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2F41B8249980 for ; Tue, 15 Sep 2020 15:48:12 +0000 (UTC) X-FDA: 77265727224.18.arm66_220bc4c27112 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id DFDE5100ED9E0 for ; Tue, 15 Sep 2020 15:48:11 +0000 (UTC) X-HE-Tag: arm66_220bc4c27112 X-Filterd-Recvd-Size: 6656 Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Sep 2020 15:48:11 +0000 (UTC) Received: from [192.168.0.121] (unknown [209.134.121.133]) by linux.microsoft.com (Postfix) with ESMTPSA id C0ECE20A1B0A; Tue, 15 Sep 2020 08:48:09 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com C0ECE20A1B0A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1600184890; bh=GbNvjXtIotnTBVgQShUjpy3PRjg3FpkBOXs/mo4yho0=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=q0OQamjlujfpL3CaZpBs0H+t42o65JG6/7LkHnKyEKK2VcoEEz5UVoH9HqOacK+YJ c2hcty9QYfWwvY/lZcIxp6+Jtt6EDz5dvz3e8SmBxPFzCMLDrDExgbljN4Yyf9m7K4 SFCniytE2+51nt9Uarh5u8vho2jvxJggvJ8pbV8o= Subject: Re: [[PATCH]] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged To: Michal Hocko Cc: Andrew Morton , "Kirill A. Shutemov" , Oleg Nesterov , Song Liu , Andrea Arcangeli , Pavel Tatashin , Allen Pais , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1599770859-14826-1-git-send-email-vijayb@linux.microsoft.com> <20200914143312.GU16999@dhcp22.suse.cz> <20200915081832.GA4649@dhcp22.suse.cz> From: Vijay Balakrishna Message-ID: <53dd1e2c-f07e-ee5b-51a1-0ef8adb53926@linux.microsoft.com> Date: Tue, 15 Sep 2020 08:48:08 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20200915081832.GA4649@dhcp22.suse.cz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB X-Rspamd-Queue-Id: DFDE5100ED9E0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam05 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/15/2020 1:18 AM, Michal Hocko wrote: > On Mon 14-09-20 09:57:02, Vijay Balakrishna wrote: >> >> >> On 9/14/2020 7:33 AM, Michal Hocko wrote: >>> On Thu 10-09-20 13:47:39, Vijay Balakrishna wrote: >>>> When memory is hotplug added or removed the min_free_kbytes must be >>>> recalculated based on what is expected by khugepaged. Currently >>>> after hotplug, min_free_kbytes will be set to a lower default and hi= gher >>>> default set when THP enabled is lost. This leaves the system with sm= all >>>> min_free_kbytes which isn't suitable for systems especially with net= work >>>> intensive loads. Typical failure symptoms include HW WATCHDOG reset= , >>>> soft lockup hang notices, NETDEVICE WATCHDOG timeouts, and OOM proce= ss >>>> kills. >>> >>> Care to explain some more please? The whole point of increasing >>> min_free_kbytes for THP is to get a larger free memory with a hope th= at >>> huge pages will be more likely to appear. While this might help for >>> other users that need a high order pages it is definitely not the >>> primary reason behind it. Could you provide an example with some more >>> data? >> >> Thanks Michal. I haven't looked into THP as part of my investigation,= so I >> cannot comment. >> >> In our use case we are hotplug removing ~2GB of 8GB total (on our SoC) >> during normal reboot/shutdown. This memory is hotplug hot-added as mo= vable >> type via systemd late service during start-of-day. >> >> In our stress test first we ran into HW WATCHDOG recovery, on enabling >> kernel watchdog we started seeing soft lockup hung task notices, failu= re >> symptons varied, where stack trace of hung tasks sometimes trying to >> allocate GFP_ATOMIC memory, looping in do_notify_resume, NETDEVICE WAT= CHDOG >> timeouts, OOM process kills etc., During investigation we reran stres= s test >> without hotplug use case. Surprisingly this run didn't encounter the = said >> problems. This led to comparing what is different between the two run= s, >> while looking at various globals, studying hotplug code I uncovered th= e >> issue of failing to restore min_free_kbytes. In particular on our 8GB= SoC >> min_free_kbytes went down to 8703 from 22528 after hotplug add. >=20 > Did you try to increase min_free_kbytes manually after hot remove? Btw. No, in our use case memory hot remove done during shutdown. > I would consider oom killer invocation due to min_free_kbytes really > weird behavior. If anything the higher value would cause more memory > reclaim and potentially oom rather than smaller one. Yes, we wondered about it too. One panic stack trace (after many OOM kil= ls) [330321.174240] Out of memory and no killable processes... [330321.179658] Kernel panic - not syncing: System is deadlocked on memor= y [330321.186489] CPU: 4 PID: 1 Comm: systemd Kdump: loaded Tainted: G=20 O 5.4.51-xxx #1 [330321.196900] Hardware name: Overlake (DT) [330321.201038] Call trace: [330321.203660] dump_backtrace+0x0/0x1d0 [330321.207533] show_stack+0x20/0x2c [330321.211048] dump_stack+0xe8/0x150 [330321.214656] panic+0x18c/0x3b4 [330321.217901] out_of_memory+0x4c0/0x6e4 [330321.221863] __alloc_pages_nodemask+0xbdc/0x1c90 [330321.226722] alloc_pages_current+0x21c/0x2b0 [330321.231220] alloc_slab_page+0x1e0/0x7d8 [330321.235361] new_slab+0x2e8/0x2f8 [330321.238874] ___slab_alloc+0x45c/0x59c [330321.242835] kmem_cache_alloc+0x2d4/0x360 [330321.247065] getname_flags+0x6c/0x2a8 [330321.250938] user_path_at_empty+0x3c/0x68 [330321.255168] do_readlinkat+0x7c/0x17c [330321.259039] __arm64_sys_readlinkat+0x5c/0x70 [330321.263627] el0_svc_handler+0x1b8/0x32c [330321.267767] el0_svc+0x10/0x14 [330321.271026] SMP: stopping secondary CPUs [330321.275382] Starting crashdump kernel... [330321.279526] Bye! Then while searching I came across documented warning below. In above=20 instance panic after OOM kills happened after 3+ days of stress run (a=20 mixure of ttcp, cpuloadgen and fio). https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/= html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_t= uning_guide-configuration_tools-configuring_system_memory_capacity Warning Extreme values can damage your system. Setting min_free_kbytes to an=20 extremely low value prevents the system from reclaiming memory, which=20 can result in system hangs and OOM-killing processes. However, setting=20 min_free_kbytes too high (for example, to 5=E2=80=9310% of total system m= emory)=20 causes the system to enter an out-of-memory state immediately, resulting=20 in the system spending too much time reclaiming memory.