From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83DB0C4360C for ; Sat, 12 Oct 2019 13:28:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 10B012087E for ; Sat, 12 Oct 2019 13:27:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 10B012087E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6E4F26B0003; Sat, 12 Oct 2019 09:27:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 695666B0005; Sat, 12 Oct 2019 09:27:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5A9E08E0001; Sat, 12 Oct 2019 09:27:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0251.hostedemail.com [216.40.44.251]) by kanga.kvack.org (Postfix) with ESMTP id 339EC6B0003 for ; Sat, 12 Oct 2019 09:27:59 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id C4A654DA7 for ; Sat, 12 Oct 2019 13:27:58 +0000 (UTC) X-FDA: 76035210636.26.store48_7fa1b8660c440 X-HE-Tag: store48_7fa1b8660c440 X-Filterd-Recvd-Size: 5898 Received: from mail3-163.sinamail.sina.com.cn (mail3-163.sinamail.sina.com.cn [202.108.3.163]) by imf28.hostedemail.com (Postfix) with SMTP for ; Sat, 12 Oct 2019 13:27:56 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([222.130.246.252]) by sina.com with ESMTP id 5DA1D4D50002AFF1; Sat, 12 Oct 2019 21:27:53 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 17471749283212 From: Hillf Danton To: mm Cc: fsdev , Andrew Morton , linux , Roman Gushchin , Tejun Heo , Jan Kara , Johannes Weiner , Shakeel Butt , Minchan Kim , Mel Gorman , Hillf Danton Subject: [RFC] writeback: add elastic bdi in cgwb bdp Date: Sat, 12 Oct 2019 21:27:40 +0800 Message-Id: <20191012132740.12968-1-hdanton@sina.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The behaviors of the elastic bdi (ebdi) observed in the current cgwb bandwidth measurement include 1, like spinning disks on market ebdi can do ~128MB/s IOs in consective minutes in few scenarios, or higher like SSD, or lower like USB key. 2, with ebdi a bdi_writeback, wb-A, is able to do 80MB/s writeouts in the current time window of 200ms, while it was 16M/s in the previous one. 3, it will be either 100MB/s in the next time window if wb-B joins wb-A writing pages out or 18MB/s if wb-C also decides to chime in. With the help of bandwidth gauged above, what is left in balancing dirty pages, bdp, is try to make wb-A's laundry speed catch up dirty speed in every 200ms interval without knowing what wb-B is doing. No heuristic is added in this work because ebdi does bdp without it. Cc: Roman Gushchin Cc: Tejun Heo Cc: Jan Kara Cc: Johannes Weiner Cc: Shakeel Butt Cc: Minchan Kim Cc: Mel Gorman Signed-off-by: Hillf Danton --- --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -157,6 +157,9 @@ struct bdi_writeback { struct list_head memcg_node; /* anchored at memcg->cgwb_list */ struct list_head blkcg_node; /* anchored at blkcg->cgwb_list */ =20 +#ifdef CONFIG_CGWB_BDP_WITH_EBDI + struct wait_queue_head bdp_waitq; +#endif union { struct work_struct release_work; struct rcu_head rcu; --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -324,6 +324,10 @@ static int wb_init(struct bdi_writeback goto out_destroy_stat; } =20 + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + init_waitqueue_head(&wb->bdp_waitq); + return 0; =20 out_destroy_stat: --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1551,6 +1551,45 @@ static inline void wb_dirty_limits(struc } } =20 +#if defined(CONFIG_CGROUP_WRITEBACK) && defined(CONFIG_CGWB_BDP_WITH_EBD= I) +static bool cgwb_bdp_should_throttle(struct bdi_writeback *wb) +{ + struct dirty_throttle_control gdtc =3D { GDTC_INIT_NO_WB }; + + if (fatal_signal_pending(current)) + return false; + + gdtc.avail =3D global_dirtyable_memory(); + + domain_dirty_limits(&gdtc); + + gdtc.dirty =3D global_node_page_state(NR_FILE_DIRTY) + + global_node_page_state(NR_UNSTABLE_NFS) + + global_node_page_state(NR_WRITEBACK); + + if (gdtc.dirty < gdtc.bg_thresh) + return false; + + if (!writeback_in_progress(wb)) + wb_start_background_writeback(wb); + + /* + * throttle if laundry speed remarkably falls behind dirty speed + * in the current time window of 200ms + */ + return gdtc.dirty > gdtc.thresh && + wb_stat(wb, WB_DIRTIED) > + wb_stat(wb, WB_WRITTEN) + + wb_stat_error(); +} + +static inline void cgwb_bdp(struct bdi_writeback *wb) +{ + wait_event_interruptible_timeout(wb->bdp_waitq, + !cgwb_bdp_should_throttle(wb), HZ); +} +#endif + /* * balance_dirty_pages() must be called by processes which are generatin= g dirty * data. It looks at the number of dirty pages in the machine and will = force @@ -1910,7 +1949,11 @@ void balance_dirty_pages_ratelimited(str preempt_enable(); =20 if (unlikely(current->nr_dirtied >=3D ratelimit)) - balance_dirty_pages(wb, current->nr_dirtied); + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + cgwb_bdp(wb); + else + balance_dirty_pages(wb, current->nr_dirtied); =20 wb_put(wb); } --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -632,6 +632,11 @@ void wbc_detach_inode(struct writeback_c if (!wb) return; =20 + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + if (waitqueue_active(&wb->bdp_waitq)) + wake_up_all(&wb->bdp_waitq); + history =3D inode->i_wb_frn_history; avg_time =3D inode->i_wb_frn_avg_time; =20 @@ -811,6 +816,9 @@ static long wb_split_bdi_pages(struct bd if (nr_pages =3D=3D LONG_MAX) return LONG_MAX; =20 + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + return nr_pages; /* * This may be called on clean wb's and proportional distribution * may not make sense, just use the original @nr_pages in those @@ -1599,6 +1607,10 @@ static long writeback_chunk_size(struct if (work->sync_mode =3D=3D WB_SYNC_ALL || work->tagged_writepages) pages =3D LONG_MAX; else { + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) + return work->nr_pages; + pages =3D min(wb->avg_write_bandwidth / 2, global_wb_domain.dirty_limit / DIRTY_SCOPE); pages =3D min(pages, work->nr_pages); --