From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AB1BC32771 for ; Mon, 27 Jan 2020 10:06:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B4CA92087F for ; Mon, 27 Jan 2020 10:06:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="aRPYRp0h" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B4CA92087F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3C6936B0003; Mon, 27 Jan 2020 05:06:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 376B66B0006; Mon, 27 Jan 2020 05:06:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28D406B0007; Mon, 27 Jan 2020 05:06:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0242.hostedemail.com [216.40.44.242]) by kanga.kvack.org (Postfix) with ESMTP id 1252B6B0003 for ; Mon, 27 Jan 2020 05:06:51 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id BAB35181AEF07 for ; Mon, 27 Jan 2020 10:06:50 +0000 (UTC) X-FDA: 76422985380.16.act06_58c01c72bda3e X-HE-Tag: act06_58c01c72bda3e X-Filterd-Recvd-Size: 7801 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Mon, 27 Jan 2020 10:06:49 +0000 (UTC) Received: by mail-wr1-f42.google.com with SMTP id z3so10442887wru.3 for ; Mon, 27 Jan 2020 02:06:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=p1Zj/wKLZ1+Wz/jr/2UyCksXVKzSMGCiHqmwlOffMmc=; b=aRPYRp0hIxIz1212LJCjgRDD8tzhTfBMt6tC8x6sh7c2r56eNaMvu3cbBxubzqDmrx PaP9vPvRdpS4kkDgm7F2cs/6pDckq1Fj0kHh00pHptADBLV8VMoo0rH9/NHUSOejc9j8 pvPKz6wTI5ESHKd8Tt25Dif8+s7RvPn3mah+CzJduA1rvYvgawghRVttLMqiOA824Ug8 IWrrMsfl5X76Poi0Gn3f7rCi2xrQiX9tSUb+vgegnYq7I7PO4tFZ92KWznMnLSQr4chC oXzxU0IEPZ60JTKNUg2nXIdW4m3oWioE3OnclA7OnAWvmcAMBH/imFs53h+UJ1/Rgxp+ xv8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=p1Zj/wKLZ1+Wz/jr/2UyCksXVKzSMGCiHqmwlOffMmc=; b=cZFls94baVmF1WnRrpSRYnJRBD+RsSBW+OYBlcTYF5vG3T8xsecNZ2kYpx86+gNzrn bEq4Z4w8O5EMCxVB5p07aNJ5Id5/vULwPnO1de1SFMTQ16k0mxjOZ5h0JSvtj000ZrqN xHvKNDtmGU9ly7xK0noHdpW7BT+qZ9xq00kwg5UvLwvaWEnltMCceZjoEkOnR9TdBdHj k+Wo8/idCyCWPKHLQXS4uTVG+TBioBj7Q5/MK621MHsm8eTmzW0i4QqF8OYljO5SpzsG E2+vc0p/5Bt4vjZ8goEga/d6AaoOuZblK8Br9mM4lsT8YLipYcGkMj5bYolih4eKXJu3 1Bqw== X-Gm-Message-State: APjAAAV3QMAlXVKWPvYApJicgN9veNlC6p5FrJXYh/XRkf6dH2lcApcA ZiKHuZDM7L/R+y3hnRlzstx8MA== X-Google-Smtp-Source: APXvYqxgVH7WqnlWKB5HkRGY+dp9N1J2SgX0j0ZZn/TpNZCQYJdBFpYAtMV57y0L+RB3TnbzbDOoHQ== X-Received: by 2002:a5d:540f:: with SMTP id g15mr19788685wrv.86.1580119608314; Mon, 27 Jan 2020 02:06:48 -0800 (PST) Received: from localhost ([2620:10d:c092:200::1:e516]) by smtp.gmail.com with ESMTPSA id 21sm8674284wmo.8.2020.01.27.02.06.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Jan 2020 02:06:47 -0800 (PST) Date: Mon, 27 Jan 2020 10:06:46 +0000 From: Johannes Weiner To: Michal Hocko Cc: Chris Edwards , "linux-mm@kvack.org" Subject: Re: Paging out when free memory is low but not exhausted (and available memory remains high) Message-ID: <20200127100646.GA203985@cmpxchg.org> References: <20200123123127.GK29276@dhcp22.suse.cz> <1579844599463.32567@otago.ac.nz> <20200124100423.GP29276@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200124100423.GP29276@dhcp22.suse.cz> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jan 24, 2020 at 11:04:23AM +0100, Michal Hocko wrote: > [Cc Johannes. The collected vmstat data is in http://lkml.kernel.org/r/1579844599463.32567@otago.ac.nz] > > On Fri 24-01-20 05:43:19, Chris Edwards wrote: > > > Could you collect /proc/vmstat every second or so while you observe this > > behavior? This should give us more information that vmstat(8) output. > > > > Hi Michal, > > > > Thanks for the suggestion - I've re-run the test on a 5.5.0-rc6 kernel > > built from source using the default config, which exhibits the same > > behaviour. Please see attachment; I hope the format is OK. > > I personally would have liked one snapshot per file slightly easier to > parse but no problem (I have simply broken out counters per file). In > future the following would be easier to process ;) > while true > do > TS="$(date +%s)" > cp /proc/vmstat vmstat.$TS > sleep 1s > done > > > Here's the timeline of events: > > 18:25:00 start > > 18:25:10 run `stress` to limit available memory (grabs 0.9 x MemAvailable) > > I assume this will allocate anonymous memory. > time 18:25:10 > nr_free_pages 2934822 > nr_inactive_anon 57550 > nr_active_anon 5733 > nr_inactive_file 1428 > nr_active_file 21857 > nr_unevictable 6102 > pswpin 8 > pswpout 390136 > > So there is 11GB of free memory. And 1.5GB of memory swapped out in the > past (probably a result of previous tests), we are going to use this > number as a base for future comparing because pswpout counter is > incremental. > > Anonymous LRUs have 240MB of memory and there is 90MB of file backed. > > > 18:25:20 run `dd` to exercise the buffer cache > > time 18:25:20 > nr_free_pages 367818 > nr_inactive_anon 57693 > nr_active_anon 2560480 > nr_inactive_file 7110 > nr_active_file 23332 > nr_unevictable 6195 > pswpin 8 > pswpout 390136 > > The free memory dropped to 1.4GB as a result of your `stress` load. All > that memory landed in the anonymous LRU lists (9GB of memory comparing > to 240MB before the test). File backed memory's grown to 118MB. No > swapout/in durinf that time period. > > Nothing really unexpected so far. There is still quite some room to fit > the IO workload in. Let's see how the pswpout evolves over time. > > $ awk '{diff=$1-prev; if (prev&&diff) printf "%d %d %d\n", NR, $1, diff; prev=$1}' pswpout > 30 392136 2000 > 31 395513 3377 > 32 399132 3619 > 33 403101 3969 > 34 407211 4110 > 35 410812 3601 > 36 414120 3308 > 37 418119 3999 > 38 422116 3997 > 39 424154 2038 > 40 428110 3956 > > So the swappout started around 18:25:00 > $ sed '1,28d;' nr_free_pages | head > 118413 > 100516 > 98751 > 95914 > 97059 > 101303 > 101588 > 97801 > 99415 > 99842 > > The free memory dropped down to ~400MB which is likely the > min_free_kbytes defined watermark > > $ sed '1,28d;' nr_inactive_anon | head -n3 > 57633 > 57828 > 58932 > $ sed '1,28d;' nr_active_anon | head -n3 > 2560522 > 2560148 > 2559087 > > Anonymous list around 10GB > > $ sed '1,28d;' nr_inactive_file | head -n3 > 255957 > 276400 > 278865 > $ sed '1,28d;' nr_active_file | head -n3 > 23334 > 23439 > 22743 > > File lists 1.1GB. Inactive file LRU is quite large and > $ sed '1,28d;' nr_dirty | head -n3 > 0 > 0 > 0 > $ sed '1,28d;' nr_writeback | head -n3 > 0 > 0 > 141 > > The data shouldn't be dirty so we should preferably reclaim those pages > rather than swap out. That is little bit surprising to me. Johannes what > do you think about this? There are a couple of workingset_activate - not many, but it could be enough. I wonder if Kuo-Hsin's patch is a bit too aggressive: 2c012a4ad1a2cd3fb5a0f9307b9d219f84eda1fa We may want to change the logic such that we scan active file when there are inactive refaults, but only go for anon if there are active refaults. We'd need to take snapshots of WORKINGSET_RESTORE as well. (There are no restore events in the logs, meaning the active file list is turning over a bit, but the cache isn't thrashing per se). Just to confirm, Chris, would you be able to test whether the following patch fixes the problem you are seeing? diff --git a/mm/vmscan.c b/mm/vmscan.c index 74e8edce83ca..1f1403681960 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2744,7 +2744,7 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) * anonymous pages. */ file = lruvec_page_state(target_lruvec, NR_INACTIVE_FILE); - if (file >> sc->priority && !(sc->may_deactivate & DEACTIVATE_FILE)) + if (file >> sc->priority && !inactive_is_low(target_lruvec, LRU_INACTIVE_FILE)) sc->cache_trim_mode = 1; else sc->cache_trim_mode = 0;