From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3111C47422 for ; Fri, 26 Jan 2024 09:49:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 46B5E6B009C; Fri, 26 Jan 2024 04:49:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 400156B00B4; Fri, 26 Jan 2024 04:49:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 250516B00B5; Fri, 26 Jan 2024 04:49:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0CA096B009C for ; Fri, 26 Jan 2024 04:49:50 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B167540E7E for ; Fri, 26 Jan 2024 09:49:49 +0000 (UTC) X-FDA: 81720990498.25.0297F3D Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf14.hostedemail.com (Postfix) with ESMTP id 9646D100006 for ; Fri, 26 Jan 2024 09:49:47 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZUc6KBVg; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706262587; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ptFTQQXSz+aL0eaWKXOJXvJWajvtj/+L68MiyW1Ryc4=; b=zz6P3LEUxMjcDPsh9+vYsT7TZMGL74q//T0m8sl8hj9kMIZ1adyVXyy7MPRf2TBYbHMMN4 DBXMAB/Y2KfNr4uRKj3JEtKSpMWxQsokLvqhT5RfW+/77Ymfu4RZefOfR/jrTpyJTCwAuU Nf7APt9Mo/zHX8NMdNqhMDcVCr+riBE= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZUc6KBVg; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf14.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.167.48 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706262587; a=rsa-sha256; cv=none; b=aYaUmB904+PgH9z86NqEBjU0flhpI92q6u42mVk47jkMbxqaXsKlksxXDuuIypHMRF1EfP QAcQCC+28Xf7V5ghrfJhZja46uq77eToG16sM7a8ShZRBX6iOb+gUGWdmBj+8KvnaTD6GC dQWe+JqdTHxOKP2u1I09Jkbe04Ht3jc= Received: by mail-lf1-f48.google.com with SMTP id 2adb3069b0e04-51028acdcf0so207854e87.0 for ; Fri, 26 Jan 2024 01:49:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706262586; x=1706867386; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ptFTQQXSz+aL0eaWKXOJXvJWajvtj/+L68MiyW1Ryc4=; b=ZUc6KBVg5g1/23A3Y5lepCjEJKZ7XXh93DhkkOu5FwzKSVNRZJNN97iSJ77CH1vShF b1k2Cya0bIdqy0r7hKIXitt7JMxd9KB/pYBpdCnG4hj9+YnBKB0KfbjJiZyG3dgBSqDT kgv38d6GKBr1HPs/qveUsdZosqPosaNf4bm8DHdZeNNPpHNiZY6+aJKzujcVBenOXjuF rZOC9xn73/pQai6cwi+Eflghg/4JA2JsmLo05bhEsHrEBSSrg7kqOkuF+z4Q/+JrBdAm /X8BU7+li5eFnfDo2Mxm6bHTckbhRnAClZT8KDeNt0c8D6Xa1PoNAulhnYXSoP4bGqzc 3EYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706262586; x=1706867386; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ptFTQQXSz+aL0eaWKXOJXvJWajvtj/+L68MiyW1Ryc4=; b=jwzuOQpk8GW6YQx9EOnzSiJ9a1K1XUTf7lKYHUT3wPh3GEMY4wG4zCWioY5ojORbMj RFIfAtu1qC6t+ccXaSZrH4+/pNhYTzaVpdvkPju7KTyb0R0fuRTqwwefqCZQ31lyJ4Hk hKhy9IqyyomtVa+iz/lKa/49IfcOQcvuhzyvNlLqoP966TViI7Gc4QQf+psxH02lr1MM QEOOo3WgzToKhOElPzPL/gD1xq7R/Xge/hVVeq7ahTiO9XF6gAXNNzamHeMnNUMHP1l3 1C3/eW+vt3gujmPFSWcNxGFQxhrEvraN0SVGRraMDBti93X2Jr3Fy/RPlLGN47JD2Two qUIA== X-Gm-Message-State: AOJu0Yz/WJppDPKWIFpj+MjcpPoRL6BAgThlU+BY0VJdiGeDd26D5bHd 89CkMjD4oix6R/cV52q26jiT5Q+yVZaWFEJG9G8kgB+W1lz04ivaPnoSqS4V9/wxWvDK3sSM87s Jc1K6z2SNkoLApJrHxRgr+xSARQ8= X-Google-Smtp-Source: AGHT+IEdQhrVqmM6vTIx80vQbNKtGUOsr4W2pIZnL2O1JX8ZzISg+Z4bWrn7fiFxFoLyYNcSEqwGCFDCzWyrrQn/OSQ= X-Received: by 2002:ac2:51a5:0:b0:50e:71d4:a37f with SMTP id f5-20020ac251a5000000b0050e71d4a37fmr594147lfk.55.1706262585552; Fri, 26 Jan 2024 01:49:45 -0800 (PST) MIME-Version: 1.0 References: <20240125071901.3223188-1-zhaoyang.huang@unisoc.com> In-Reply-To: From: Zhaoyang Huang Date: Fri, 26 Jan 2024 17:49:34 +0800 Message-ID: Subject: Re: [PATCHv3 1/1] block: introduce content activity based ioprio To: Matthew Wilcox Cc: "zhaoyang.huang" , Alexander Viro , Christian Brauner , linux-fsdevel@vger.kernel.org, Andrew Morton , Jens Axboe , Yu Zhao , Damien Le Moal , Niklas Cassel , "Martin K . Petersen" , Hannes Reinecke , Linus Walleij , linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, steve.kang@unisoc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 9646D100006 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 9ug4b5nj5rraqeddazprupccywpk4hkf X-HE-Tag: 1706262587-416127 X-HE-Meta: U2FsdGVkX18L1rX53V4GHjH73XnldJtdK36q0zPbZ/7f12k3VIaE8lc+Hkobfghlo7dOksFGj0osph40/fm2ck+fzXf96wdEACJFQNmttbeGvuSGfnr360xAUycpc6s47ZLb2+GAOeIkKgANZev9Ahh+21i7+Yq0rGxsPlts6E8Fz7/CbcaLwwndCJdYVGGPksnHv12inkOuF8g+UpL2E3xsT/9zsRnUKOjVNvYO7FJqtjE4D4qJtjiv5CMRgXdqsvMMVc5go8UzL32SskPPqQ46xk7CclrpQLBMu006wsxIWRU6ZPBWltZMBcZQc2OtlXXLDhgPOS4YolMnL/mSHsVmRzvFvCB2I1QgUWuiXYmrKvCPqJGqC5f2xk5jzxLceK/WAfuCLpKsexp4+LDCfBLaL1JY15TQl5jnbCLFyXdiUaj+PNVu2NxxBd+NUihpcQxD1EC1jse7aj8f8Q1STEK5XM6YDCfeRi+Uk9SWuFjFGAQOLhKUB/HaRiINlyX6QfzqZmpQtUcJS66nTokazWRMqNSRnx/WXHj2H/+ddM/zMxzwwHwfvVf4c2bi/2Gs8mmLe1v9ZFtRXWEqzOdVyMZD10FV7DsKCnDDF1u0RRQWHM9hYcs7GD/N1L06XCVwdi/OLg70xnKlYcYRpc5EQIp6dSBCW7ZRPaxeI8zOazbL5zKDtTpyLGzDifk05zFBoz0wHG8paxAeAc4RhMhQS4dkTnIrfqO1ivzfPMkBcUBpFgldHjpXbXqjXNTBy6pWi6gvlOn/fYdMCQmHhutEAKdhXjhdehQeQVjsoQGaP7OOUj3mgRvaUgTLZttSdEbvWnbUu6qiV0nOVaR1VrzyVHnoc8TiEtm8gPKeknQGEwQ4yYkufrZRiWt7QjsEV+IVHYbpjANR72VUwoAWS5T+tTMNr9d2tMKXbteE+JmXcUMqazSF6YoM0tiLZucqZ82RcSJW7AW5PKOEmOhbMbl c484eDQH VntK4xV7NFjV3hB8segqckp1L5PGpwQ86lZZe3HMULL0cm1WgOJZBlPedahXLTjeq+oXYFBWkzM+a6k2hx5gg3pw1Xd/XM3n7wfsjr5A8Qz95zCXyCBiO9yzgn97QUZPX8erJWu4VeYllRcoPEVlv9tVrsOcpm7YcNYFl+d98q7hWDLRY37RG5YD3Yjv2Pi4bknQQDIVv/yflo8UAEyDS/a3W99JSIjESeQx8wyZeHxiWIiga4IjP0aNJjXvTFp6TWTM9kA4Ap8P8XXeFNDPcDfhbR6gP0HL71JltN2omZ06r9pqmHUeIGr8Ndy/lmysiI9e5/mSGk5UHQavqswrHVIgMuaFPQDz/myfpyn6Qifxf00eLC1YcviEc2W2PX2oayhA5GlaciW2X7rWV7dPgyhVbwEEDGR5dQ/wx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jan 26, 2024 at 5:36=E2=80=AFPM Matthew Wilcox wrote: > > On Fri, Jan 26, 2024 at 05:28:58PM +0800, Zhaoyang Huang wrote: > > On Fri, Jan 26, 2024 at 4:55=E2=80=AFPM Matthew Wilcox wrote: > > > > > > On Fri, Jan 26, 2024 at 03:59:48PM +0800, Zhaoyang Huang wrote: > > > > loop more mm and fs guys for more comments > > > > > > I agree with everything Damien said. But also ... > > ok, I will find a way to solve this problem. > > > > > > > > +bool BIO_ADD_FOLIO(struct bio *bio, struct folio *folio, size_t = len, > > > > > + size_t off) > > > > > > You don't add any users of these functions. It's hard to assess whet= her > > > this is the right API when there are no example users. > > Actually, the code has been tested on ext4 and f2fs by patchv2 on a > > v6.6 6GB android system where I get the test result posted on the > > commit message. These APIs is to keep block layer clean and wrap > > things up for fs. > > well, where's patch v2? i don't see it in my inbox. i'm not going > to go hunting around the email lists for it. this is not good enough. > > > > why are BIO_ADD_PAGE and BIO_ADD_FOLIO so very different from each > > > other? > > These two API just repeat the same thing that bio_add_page and > > bio_add_folio do. > > what? > > here's the patch you sent. these two functions do wildly different > things: > > +bool BIO_ADD_FOLIO(struct bio *bio, struct folio *folio, size_t len, > + size_t off) > +{ > + int class, level, hint, activity; > + > + if (len > UINT_MAX || off > UINT_MAX) > + return false; > + > + class =3D IOPRIO_PRIO_CLASS(bio->bi_ioprio); > + level =3D IOPRIO_PRIO_LEVEL(bio->bi_ioprio); > + hint =3D IOPRIO_PRIO_HINT(bio->bi_ioprio); > + activity =3D IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio); > + > + activity +=3D (bio->bi_vcnt + 1 <=3D IOPRIO_NR_ACTIVITY && > + PageWorkingset(&folio->page)) ? 1 : 0; > + if (activity >=3D bio->bi_vcnt / 2) > + class =3D IOPRIO_CLASS_RT; > + else if (activity >=3D bio->bi_vcnt / 4) > + class =3D max(IOPRIO_PRIO_CLASS(get_current_ioprio()), IO= PRIO_CLASS_BE); > + > + bio->bi_ioprio =3D IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint,= activity); > + > + return bio_add_page(bio, &folio->page, len, off) > 0; > +} > + > +int BIO_ADD_PAGE(struct bio *bio, struct page *page, > + unsigned int len, unsigned int offset) > +{ > + int class, level, hint, activity; > + > + if (bio_add_page(bio, page, len, offset) > 0) { > + class =3D IOPRIO_PRIO_CLASS(bio->bi_ioprio); > + level =3D IOPRIO_PRIO_LEVEL(bio->bi_ioprio); > + hint =3D IOPRIO_PRIO_HINT(bio->bi_ioprio); > + activity =3D IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio); > + activity +=3D (bio->bi_vcnt <=3D IOPRIO_NR_ACTIVITY && Pa= geWorkingset(page)) ? 1 : 0; > + bio->bi_ioprio =3D IOPRIO_PRIO_VALUE_ACTIVITY(class, leve= l, hint, activity); > + } > + > + return len; > +} > > did you change one and forget to change the other? Sorry for missing you in the list. Please find below patchv2 where all activity calculation is located within _bio_add_page which aims at avoiding iterating the bio->bvec before submit_bio. This is rejected by Jens as it introduces page operation in the block layer. block/Kconfig | 8 ++++++++ block/bio.c | 10 ++++++++++ block/blk-mq.c | 21 +++++++++++++++++++++ fs/buffer.c | 6 ++++++ include/linux/buffer_head.h | 1 + include/uapi/linux/ioprio.h | 20 +++++++++++++++----- 6 files changed, 61 insertions(+), 5 deletions(-) diff --git a/block/Kconfig b/block/Kconfig index f1364d1c0d93..8d6075575eae 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -228,6 +228,14 @@ config BLOCK_HOLDER_DEPRECATED config BLK_MQ_STACKING bool +config CONTENT_ACT_BASED_IOPRIO + bool "Enable content activity based ioprio" + depends on LRU_GEN + default y + help + This item enable the feature of adjust bio's priority by + calculating its content's activity. + source "block/Kconfig.iosched" endif # BLOCK diff --git a/block/bio.c b/block/bio.c index 816d412c06e9..1228e2a4940f 100644 --- a/block/bio.c +++ b/block/bio.c @@ -24,6 +24,7 @@ #include "blk.h" #include "blk-rq-qos.h" #include "blk-cgroup.h" +#include "blk-ioprio.h" #define ALLOC_CACHE_THRESHOLD 16 #define ALLOC_CACHE_MAX 256 @@ -1069,12 +1070,21 @@ EXPORT_SYMBOL_GPL(bio_add_zone_append_page); void __bio_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off) { + int class, level, hint, activity; + + class =3D IOPRIO_PRIO_CLASS(bio->bi_ioprio); + level =3D IOPRIO_PRIO_LEVEL(bio->bi_ioprio); + hint =3D IOPRIO_PRIO_HINT(bio->bi_ioprio); + activity =3D IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio); + WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED)); WARN_ON_ONCE(bio_full(bio, len)); bvec_set_page(&bio->bi_io_vec[bio->bi_vcnt], page, len, off); bio->bi_iter.bi_size +=3D len; bio->bi_vcnt++; + activity +=3D bio_page_if_active(bio, page, IOPRIO_NR_ACTIVITY); + bio->bi_ioprio =3D IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint, activity); } EXPORT_SYMBOL_GPL(__bio_add_page); diff --git a/block/blk-mq.c b/block/blk-mq.c index 1fafd54dce3c..05cdd3adde94 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2939,6 +2939,26 @@ static inline struct request *blk_mq_get_cached_request(struct request_queue *q, return rq; } +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO +static void bio_set_ioprio(struct bio *bio) +{ + int class, level, hint, activity; + + class =3D IOPRIO_PRIO_CLASS(bio->bi_ioprio); + level =3D IOPRIO_PRIO_LEVEL(bio->bi_ioprio); + hint =3D IOPRIO_PRIO_HINT(bio->bi_ioprio); + activity =3D IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio); + + if (activity >=3D bio->bi_vcnt / 2) + class =3D IOPRIO_CLASS_RT; + else if (activity >=3D bio->bi_vcnt / 4) + class =3D max(IOPRIO_PRIO_CLASS(get_current_ioprio()), IOPRIO_CLASS_BE); + + bio->bi_ioprio =3D IOPRIO_PRIO_VALUE_ACTIVITY(class, level, hint, activity); + + blkcg_set_ioprio(bio); +} +#else static void bio_set_ioprio(struct bio *bio) { /* Nobody set ioprio so far? Initialize it based on task's nice val= ue */ @@ -2946,6 +2966,7 @@ static void bio_set_ioprio(struct bio *bio) bio->bi_ioprio =3D get_current_ioprio(); blkcg_set_ioprio(bio); } +#endif /** * blk_mq_submit_bio - Create and send a request to block device. diff --git a/fs/buffer.c b/fs/buffer.c index 12e9a71c693d..b15bff481706 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2832,6 +2832,12 @@ void submit_bh(blk_opf_t opf, struct buffer_head *bh= ) } EXPORT_SYMBOL(submit_bh); +int bio_page_if_active(struct bio *bio, struct page *page, unsigned short limit) +{ + return (bio->bi_vcnt <=3D limit && PageWorkingset(page)) ? 1 : 0; +} +EXPORT_SYMBOL(bio_page_if_active); + void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags) { lock_buffer(bh); diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 44e9de51eedf..9a374f5965ec 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -248,6 +248,7 @@ int bh_uptodate_or_lock(struct buffer_head *bh); int __bh_read(struct buffer_head *bh, blk_opf_t op_flags, bool wait); void __bh_read_batch(int nr, struct buffer_head *bhs[], blk_opf_t op_flags, bool force_lock); +int bio_page_if_active(struct bio *bio, struct page *page, unsigned short limit); /* * Generic address_space_operations implementations for buffer_head-backed diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h index bee2bdb0eedb..d1c6081e796b 100644 --- a/include/uapi/linux/ioprio.h +++ b/include/uapi/linux/ioprio.h @@ -71,12 +71,18 @@ enum { * class and level. */ #define IOPRIO_HINT_SHIFT IOPRIO_LEVEL_NR_BITS -#define IOPRIO_HINT_NR_BITS 10 +#define IOPRIO_HINT_NR_BITS 3 #define IOPRIO_NR_HINTS (1 << IOPRIO_HINT_NR_BITS) #define IOPRIO_HINT_MASK (IOPRIO_NR_HINTS - 1) #define IOPRIO_PRIO_HINT(ioprio) \ (((ioprio) >> IOPRIO_HINT_SHIFT) & IOPRIO_HINT_MASK) +#define IOPRIO_ACTIVITY_SHIFT (IOPRIO_HINT_NR_BITS + IOPRIO_LEVEL_NR_BITS) +#define IOPRIO_ACTIVITY_NR_BITS 7 +#define IOPRIO_NR_ACTIVITY (1 << IOPRIO_ACTIVITY_NR_BITS) +#define IOPRIO_ACTIVITY_MASK (IOPRIO_NR_ACTIVITY - 1) +#define IOPRIO_PRIO_ACTIVITY(ioprio) \ + (((ioprio) >> IOPRIO_ACTIVITY_SHIFT) & IOPRIO_ACTIVITY_MASK) /* * I/O hints. */ @@ -108,20 +114,24 @@ enum { * Return an I/O priority value based on a class, a level and a hint. */ static __always_inline __u16 ioprio_value(int prioclass, int priolevel, - int priohint) + int priohint, int activity) { if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) || IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) || - IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS)) + IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS) || + IOPRIO_BAD_VALUE(activity, IOPRIO_NR_ACTIVITY)) return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT; return (prioclass << IOPRIO_CLASS_SHIFT) | + (activity << IOPRIO_ACTIVITY_SHIFT) | (priohint << IOPRIO_HINT_SHIFT) | priolevel; } #define IOPRIO_PRIO_VALUE(prioclass, priolevel) \ - ioprio_value(prioclass, priolevel, IOPRIO_HINT_NONE) + ioprio_value(prioclass, priolevel, IOPRIO_HINT_NONE, 0) #define IOPRIO_PRIO_VALUE_HINT(prioclass, priolevel, priohint) \ - ioprio_value(prioclass, priolevel, priohint) + ioprio_value(prioclass, priolevel, priohint, 0) +#define IOPRIO_PRIO_VALUE_ACTIVITY(prioclass, priolevel, priohint, activity) \ + ioprio_value(prioclass, priolevel, priohint, activity) #endif /* _UAPI_LINUX_IOPRIO_H */ > > > These white spaces are trimmed by vim, I will change them back in next = version. > > vim doesn't do that by default. >