From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01186C10F14 for ; Tue, 15 Oct 2019 10:22:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AE81620854 for ; Tue, 15 Oct 2019 10:22:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AE81620854 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4DABB8E0005; Tue, 15 Oct 2019 06:22:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 463B38E0001; Tue, 15 Oct 2019 06:22:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32B7A8E0005; Tue, 15 Oct 2019 06:22:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0183.hostedemail.com [216.40.44.183]) by kanga.kvack.org (Postfix) with ESMTP id 0A1828E0001 for ; Tue, 15 Oct 2019 06:22:14 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id AC6CD18019F0C for ; Tue, 15 Oct 2019 10:22:13 +0000 (UTC) X-FDA: 76045628946.01.pot14_87c935096aa55 X-HE-Tag: pot14_87c935096aa55 X-Filterd-Recvd-Size: 6788 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf27.hostedemail.com (Postfix) with ESMTP for ; Tue, 15 Oct 2019 10:22:12 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5BE50B2D2; Tue, 15 Oct 2019 10:22:11 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 5A2961E485F; Tue, 15 Oct 2019 12:22:10 +0200 (CEST) Date: Tue, 15 Oct 2019 12:22:10 +0200 From: Jan Kara To: Hillf Danton Cc: mm , fsdev , Andrew Morton , linux , Roman Gushchin , Tejun Heo , Jan Kara , Johannes Weiner , Shakeel Butt , Minchan Kim , Mel Gorman Subject: Re: [RFC] writeback: add elastic bdi in cgwb bdp Message-ID: <20191015102210.GA29554@quack2.suse.cz> References: <20191012132740.12968-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191012132740.12968-1-hdanton@sina.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Sat 12-10-19 21:27:40, Hillf Danton wrote: > The behaviors of the elastic bdi (ebdi) observed in the current cgwb > bandwidth measurement include > > 1, like spinning disks on market ebdi can do ~128MB/s IOs in consective > minutes in few scenarios, or higher like SSD, or lower like USB key. > > 2, with ebdi a bdi_writeback, wb-A, is able to do 80MB/s writeouts in the > current time window of 200ms, while it was 16M/s in the previous one. > > 3, it will be either 100MB/s in the next time window if wb-B joins wb-A > writing pages out or 18MB/s if wb-C also decides to chime in. > > With the help of bandwidth gauged above, what is left in balancing dirty > pages, bdp, is try to make wb-A's laundry speed catch up dirty speed in > every 200ms interval without knowing what wb-B is doing. > > No heuristic is added in this work because ebdi does bdp without it. Thanks for the patch but honestly, I have hard time understanding what is the purpose of this patch from the changelog. Some kind of writeback throttling? And why is this needed? Also some highlevel description of what your solution is would be good... Honza > Cc: Roman Gushchin > Cc: Tejun Heo > Cc: Jan Kara > Cc: Johannes Weiner > Cc: Shakeel Butt > Cc: Minchan Kim > Cc: Mel Gorman > Signed-off-by: Hillf Danton > --- > > --- a/include/linux/backing-dev-defs.h > +++ b/include/linux/backing-dev-defs.h > @@ -157,6 +157,9 @@ struct bdi_writeback { > struct list_head memcg_node; /* anchored at memcg->cgwb_list */ > struct list_head blkcg_node; /* anchored at blkcg->cgwb_list */ > > +#ifdef CONFIG_CGWB_BDP_WITH_EBDI > + struct wait_queue_head bdp_waitq; > +#endif > union { > struct work_struct release_work; > struct rcu_head rcu; > --- a/mm/backing-dev.c > +++ b/mm/backing-dev.c > @@ -324,6 +324,10 @@ static int wb_init(struct bdi_writeback > goto out_destroy_stat; > } > > + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && > + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) > + init_waitqueue_head(&wb->bdp_waitq); > + > return 0; > > out_destroy_stat: > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -1551,6 +1551,45 @@ static inline void wb_dirty_limits(struc > } > } > > +#if defined(CONFIG_CGROUP_WRITEBACK) && defined(CONFIG_CGWB_BDP_WITH_EBDI) > +static bool cgwb_bdp_should_throttle(struct bdi_writeback *wb) > +{ > + struct dirty_throttle_control gdtc = { GDTC_INIT_NO_WB }; > + > + if (fatal_signal_pending(current)) > + return false; > + > + gdtc.avail = global_dirtyable_memory(); > + > + domain_dirty_limits(&gdtc); > + > + gdtc.dirty = global_node_page_state(NR_FILE_DIRTY) + > + global_node_page_state(NR_UNSTABLE_NFS) + > + global_node_page_state(NR_WRITEBACK); > + > + if (gdtc.dirty < gdtc.bg_thresh) > + return false; > + > + if (!writeback_in_progress(wb)) > + wb_start_background_writeback(wb); > + > + /* > + * throttle if laundry speed remarkably falls behind dirty speed > + * in the current time window of 200ms > + */ > + return gdtc.dirty > gdtc.thresh && > + wb_stat(wb, WB_DIRTIED) > > + wb_stat(wb, WB_WRITTEN) + > + wb_stat_error(); > +} > + > +static inline void cgwb_bdp(struct bdi_writeback *wb) > +{ > + wait_event_interruptible_timeout(wb->bdp_waitq, > + !cgwb_bdp_should_throttle(wb), HZ); > +} > +#endif > + > /* > * balance_dirty_pages() must be called by processes which are generating dirty > * data. It looks at the number of dirty pages in the machine and will force > @@ -1910,7 +1949,11 @@ void balance_dirty_pages_ratelimited(str > preempt_enable(); > > if (unlikely(current->nr_dirtied >= ratelimit)) > - balance_dirty_pages(wb, current->nr_dirtied); > + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && > + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) > + cgwb_bdp(wb); > + else > + balance_dirty_pages(wb, current->nr_dirtied); > > wb_put(wb); > } > --- a/fs/fs-writeback.c > +++ b/fs/fs-writeback.c > @@ -632,6 +632,11 @@ void wbc_detach_inode(struct writeback_c > if (!wb) > return; > > + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && > + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) > + if (waitqueue_active(&wb->bdp_waitq)) > + wake_up_all(&wb->bdp_waitq); > + > history = inode->i_wb_frn_history; > avg_time = inode->i_wb_frn_avg_time; > > @@ -811,6 +816,9 @@ static long wb_split_bdi_pages(struct bd > if (nr_pages == LONG_MAX) > return LONG_MAX; > > + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && > + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) > + return nr_pages; > /* > * This may be called on clean wb's and proportional distribution > * may not make sense, just use the original @nr_pages in those > @@ -1599,6 +1607,10 @@ static long writeback_chunk_size(struct > if (work->sync_mode == WB_SYNC_ALL || work->tagged_writepages) > pages = LONG_MAX; > else { > + if (IS_ENABLED(CONFIG_CGROUP_WRITEBACK) && > + IS_ENABLED(CONFIG_CGWB_BDP_WITH_EBDI)) > + return work->nr_pages; > + > pages = min(wb->avg_write_bandwidth / 2, > global_wb_domain.dirty_limit / DIRTY_SCOPE); > pages = min(pages, work->nr_pages); > -- > -- Jan Kara SUSE Labs, CR