From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9C25C433F5 for ; Tue, 19 Oct 2021 17:14:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6DEB96134F for ; Tue, 19 Oct 2021 17:14:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 6DEB96134F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 0B1F5900002; Tue, 19 Oct 2021 13:14:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 061C56B007B; Tue, 19 Oct 2021 13:14:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6C43900002; Tue, 19 Oct 2021 13:14:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id D94E76B0078 for ; Tue, 19 Oct 2021 13:14:40 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 913608249980 for ; Tue, 19 Oct 2021 17:14:40 +0000 (UTC) X-FDA: 78713836320.27.D8A8CD3 Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf27.hostedemail.com (Postfix) with ESMTP id 0BC9970000A9 for ; Tue, 19 Oct 2021 17:14:38 +0000 (UTC) Received: by mail-ed1-f44.google.com with SMTP id t16so15573804eds.9 for ; Tue, 19 Oct 2021 10:14:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=s1vpsVT4eppPkZBxB6NXc3TQG9WW5IPMtSBbwJ6JNj8=; b=GoUfi9F0j4SpUU2uZzUQTM4Cf31PKk50HrLdpxtB6l8Rw01yXXHWp3B77Ua+Rt8tvJ 4GzDmKZE4NRAtAqfjMHPNqGcmq71IV5ZinBWDGTFk9jyPkTY8qszkhNKKIBIYMaikmvr y+puBELJ739FVNpbMeWjm01+KATipiG1fZm+TJ+dw234bQ9ATDC9VpYIzC7zClglJelr 2v6H3aNHhX5wvqdjx8Bu9Rx+VRQDwUTvTkC/o8NFZDQlEqb27ocHeFiyahzhI6Bfcas7 T490gu3rbO4nBA8cm+SXjB2/DDFLlWZ0VXsJWogYYwLzZFJ06SC/2/fKgbQpAvrh/Cz5 l9rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=s1vpsVT4eppPkZBxB6NXc3TQG9WW5IPMtSBbwJ6JNj8=; b=feB28FkUzmtlrHbE3GPoYroblVSixUJ5mObNjQjOa1LTDNlkzzibrJeR5L/asntlc4 KDYLMZ9vGhPOrhizBKl6m9hB6BtIMNSeL6Jrzn7Im1Z6lXvxTqXd3NMXiuhoE4zy1U3T h6+4YGSPN89TqCdr/ZVHx4laTRyfXjSbRBgJ/qvF3IgNzDmtnfiNuAKskNLf8tWPrG+O Po27l5VrWoFrrhf4EjXLyuqHNIgmFVDbFoCCsrHTGOioc2AxryfPN3Rd44PZZLEUN23k xlcCuYKouIfEMT2kUjBvCqLs00MSMSK66Jos/3n0GS/pVaiwC1Fal9Af4RX6SFhXgMCZ lxIQ== X-Gm-Message-State: AOAM532cktyaOGTfoOR4kWZl+ItNvu9IWY4FE3Tx4DSU5KnLKFNAibZ3 ichDZlSpXv/IsyRLAES4B/dZLk0PplIYJ4mk3JE= X-Google-Smtp-Source: ABdhPJzR7awcR9SswDpf0j4fiyIzybXFKx/1Wt4HKnKVy0kJrb0Ewld6gmukWy0QBKkzyHMXHngIb/hTzOnCqa9kN/4= X-Received: by 2002:a05:6402:d70:: with SMTP id ec48mr52193347edb.312.1634663540475; Tue, 19 Oct 2021 10:12:20 -0700 (PDT) MIME-Version: 1.0 References: <20211019090108.25501-1-mgorman@techsingularity.net> <20211019090108.25501-3-mgorman@techsingularity.net> In-Reply-To: <20211019090108.25501-3-mgorman@techsingularity.net> From: Yang Shi Date: Tue, 19 Oct 2021 10:12:08 -0700 Message-ID: Subject: Re: [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated To: Mel Gorman Cc: Andrew Morton , NeilBrown , "Theodore Ts'o" , Andreas Dilger , "Darrick J . Wong" , Matthew Wilcox , Michal Hocko , Dave Chinner , Rik van Riel , Vlastimil Babka , Johannes Weiner , Jonathan Corbet , Linux-MM , Linux-fsdevel , LKML Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 0BC9970000A9 X-Stat-Signature: cm5gom11s8nw89h48s7s3x9zkyrx93so Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=GoUfi9F0; spf=pass (imf27.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1634663678-2159 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 19, 2021 at 2:01 AM Mel Gorman wrote: > > Page reclaim throttles on congestion if too many parallel reclaim instances > have isolated too many pages. This makes no sense, excessive parallelisation > has nothing to do with writeback or congestion. > > This patch creates an additional workqueue to sleep on when too many > pages are isolated. The throttled tasks are woken when the number > of isolated pages is reduced or a timeout occurs. There may be > some false positive wakeups for GFP_NOIO/GFP_NOFS callers but > the tasks will throttle again if necessary. > > [shy828301@gmail.com: Wake up from compaction context] Reviewed-by: Yang Shi > [vbabka@suse.cz: Account number of throttled tasks only for writeback] > Signed-off-by: Mel Gorman > Acked-by: Vlastimil Babka > --- > include/linux/mmzone.h | 6 ++++-- > include/trace/events/vmscan.h | 4 +++- > mm/compaction.c | 10 ++++++++-- > mm/internal.h | 13 ++++++++++++- > mm/page_alloc.c | 6 +++++- > mm/vmscan.c | 28 +++++++++++++++++++--------- > 6 files changed, 51 insertions(+), 16 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index ef0a63ebd21d..58a25d42c31c 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -275,6 +275,8 @@ enum lru_list { > > enum vmscan_throttle_state { > VMSCAN_THROTTLE_WRITEBACK, > + VMSCAN_THROTTLE_ISOLATED, > + NR_VMSCAN_THROTTLE, > }; > > #define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++) > @@ -846,8 +848,8 @@ typedef struct pglist_data { > int node_id; > wait_queue_head_t kswapd_wait; > wait_queue_head_t pfmemalloc_wait; > - wait_queue_head_t reclaim_wait; /* wq for throttling reclaim */ > - atomic_t nr_reclaim_throttled; /* nr of throtted tasks */ > + wait_queue_head_t reclaim_wait[NR_VMSCAN_THROTTLE]; > + atomic_t nr_writeback_throttled;/* nr of writeback-throttled tasks */ > unsigned long nr_reclaim_start; /* nr pages written while throttled > * when throttling started. */ > struct task_struct *kswapd; /* Protected by > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h > index c317f9fe0d17..d4905bd9e9c4 100644 > --- a/include/trace/events/vmscan.h > +++ b/include/trace/events/vmscan.h > @@ -28,10 +28,12 @@ > ) : "RECLAIM_WB_NONE" > > #define _VMSCAN_THROTTLE_WRITEBACK (1 << VMSCAN_THROTTLE_WRITEBACK) > +#define _VMSCAN_THROTTLE_ISOLATED (1 << VMSCAN_THROTTLE_ISOLATED) > > #define show_throttle_flags(flags) \ > (flags) ? __print_flags(flags, "|", \ > - {_VMSCAN_THROTTLE_WRITEBACK, "VMSCAN_THROTTLE_WRITEBACK"} \ > + {_VMSCAN_THROTTLE_WRITEBACK, "VMSCAN_THROTTLE_WRITEBACK"}, \ > + {_VMSCAN_THROTTLE_ISOLATED, "VMSCAN_THROTTLE_ISOLATED"} \ > ) : "VMSCAN_THROTTLE_NONE" > > > diff --git a/mm/compaction.c b/mm/compaction.c > index bfc93da1c2c7..7359093d8ac0 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -761,6 +761,8 @@ isolate_freepages_range(struct compact_control *cc, > /* Similar to reclaim, but different enough that they don't share logic */ > static bool too_many_isolated(pg_data_t *pgdat) > { > + bool too_many; > + > unsigned long active, inactive, isolated; > > inactive = node_page_state(pgdat, NR_INACTIVE_FILE) + > @@ -770,7 +772,11 @@ static bool too_many_isolated(pg_data_t *pgdat) > isolated = node_page_state(pgdat, NR_ISOLATED_FILE) + > node_page_state(pgdat, NR_ISOLATED_ANON); > > - return isolated > (inactive + active) / 2; > + too_many = isolated > (inactive + active) / 2; > + if (!too_many) > + wake_throttle_isolated(pgdat); > + > + return too_many; > } > > /** > @@ -822,7 +828,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, > if (cc->mode == MIGRATE_ASYNC) > return -EAGAIN; > > - congestion_wait(BLK_RW_ASYNC, HZ/10); > + reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED, HZ/10); > > if (fatal_signal_pending(current)) > return -EINTR; > diff --git a/mm/internal.h b/mm/internal.h > index 90764d646e02..3461a1055975 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -39,12 +39,21 @@ void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page, > static inline void acct_reclaim_writeback(struct page *page) > { > pg_data_t *pgdat = page_pgdat(page); > - int nr_throttled = atomic_read(&pgdat->nr_reclaim_throttled); > + int nr_throttled = atomic_read(&pgdat->nr_writeback_throttled); > > if (nr_throttled) > __acct_reclaim_writeback(pgdat, page, nr_throttled); > } > > +static inline void wake_throttle_isolated(pg_data_t *pgdat) > +{ > + wait_queue_head_t *wqh; > + > + wqh = &pgdat->reclaim_wait[VMSCAN_THROTTLE_ISOLATED]; > + if (waitqueue_active(wqh)) > + wake_up_all(wqh); > +} > + > vm_fault_t do_swap_page(struct vm_fault *vmf); > > void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, > @@ -120,6 +129,8 @@ extern unsigned long highest_memmap_pfn; > */ > extern int isolate_lru_page(struct page *page); > extern void putback_lru_page(struct page *page); > +extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, > + long timeout); > > /* > * in mm/rmap.c: > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index d849ddfc1e51..78e538067651 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7389,6 +7389,8 @@ static void pgdat_init_kcompactd(struct pglist_data *pgdat) {} > > static void __meminit pgdat_init_internals(struct pglist_data *pgdat) > { > + int i; > + > pgdat_resize_init(pgdat); > > pgdat_init_split_queue(pgdat); > @@ -7396,7 +7398,9 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat) > > init_waitqueue_head(&pgdat->kswapd_wait); > init_waitqueue_head(&pgdat->pfmemalloc_wait); > - init_waitqueue_head(&pgdat->reclaim_wait); > + > + for (i = 0; i < NR_VMSCAN_THROTTLE; i++) > + init_waitqueue_head(&pgdat->reclaim_wait[i]); > > pgdat_page_ext_init(pgdat); > lruvec_init(&pgdat->__lruvec); > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 735b1f2b5d9e..29434d4fc1c7 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1006,12 +1006,12 @@ static void handle_write_error(struct address_space *mapping, > unlock_page(page); > } > > -static void > -reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, > +void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, > long timeout) > { > - wait_queue_head_t *wqh = &pgdat->reclaim_wait; > + wait_queue_head_t *wqh = &pgdat->reclaim_wait[reason]; > long ret; > + bool acct_writeback = (reason == VMSCAN_THROTTLE_WRITEBACK); > DEFINE_WAIT(wait); > > /* > @@ -1023,7 +1023,8 @@ reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, > current->flags & (PF_IO_WORKER|PF_KTHREAD)) > return; > > - if (atomic_inc_return(&pgdat->nr_reclaim_throttled) == 1) { > + if (acct_writeback && > + atomic_inc_return(&pgdat->nr_writeback_throttled) == 1) { > WRITE_ONCE(pgdat->nr_reclaim_start, > node_page_state(pgdat, NR_THROTTLED_WRITTEN)); > } > @@ -1031,7 +1032,9 @@ reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason, > prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE); > ret = schedule_timeout(timeout); > finish_wait(wqh, &wait); > - atomic_dec(&pgdat->nr_reclaim_throttled); > + > + if (acct_writeback) > + atomic_dec(&pgdat->nr_writeback_throttled); > > trace_mm_vmscan_throttled(pgdat->node_id, jiffies_to_usecs(timeout), > jiffies_to_usecs(timeout - ret), > @@ -1061,7 +1064,7 @@ void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page, > READ_ONCE(pgdat->nr_reclaim_start); > > if (nr_written > SWAP_CLUSTER_MAX * nr_throttled) > - wake_up_all(&pgdat->reclaim_wait); > + wake_up_all(&pgdat->reclaim_wait[VMSCAN_THROTTLE_WRITEBACK]); > } > > /* possible outcome of pageout() */ > @@ -2176,6 +2179,7 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, > struct scan_control *sc) > { > unsigned long inactive, isolated; > + bool too_many; > > if (current_is_kswapd()) > return 0; > @@ -2199,7 +2203,13 @@ static int too_many_isolated(struct pglist_data *pgdat, int file, > if ((sc->gfp_mask & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS)) > inactive >>= 3; > > - return isolated > inactive; > + too_many = isolated > inactive; > + > + /* Wake up tasks throttled due to too_many_isolated. */ > + if (!too_many) > + wake_throttle_isolated(pgdat); > + > + return too_many; > } > > /* > @@ -2308,8 +2318,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, > return 0; > > /* wait a bit for the reclaimer. */ > - msleep(100); > stalled = true; > + reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED, HZ/10); > > /* We are about to die and free our memory. Return now. */ > if (fatal_signal_pending(current)) > @@ -4343,7 +4353,7 @@ static int kswapd(void *p) > > WRITE_ONCE(pgdat->kswapd_order, 0); > WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES); > - atomic_set(&pgdat->nr_reclaim_throttled, 0); > + atomic_set(&pgdat->nr_writeback_throttled, 0); > for ( ; ; ) { > bool ret; > > -- > 2.31.1 > >