From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15803C47DDF for ; Tue, 30 Jan 2024 13:29:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8AEBD6B0087; Tue, 30 Jan 2024 08:29:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 885A26B0089; Tue, 30 Jan 2024 08:29:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 74DBE6B008A; Tue, 30 Jan 2024 08:29:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 644176B0087 for ; Tue, 30 Jan 2024 08:29:08 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3C347A0B85 for ; Tue, 30 Jan 2024 13:29:08 +0000 (UTC) X-FDA: 81736058376.13.C7753E9 Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) by imf20.hostedemail.com (Postfix) with ESMTP id 380451C0021 for ; Tue, 30 Jan 2024 13:29:05 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mt3ZTzk+; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706621346; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dNLcEGX4A9fxdoh3pmT0Jg1o9swAaHX5RacLSzjUp9g=; b=JjBAygkcS9sX3HnAGNK8b1m01NY4aLtRePRPXfERAeP/FS1J1aPHvDyEXRBJVf3VLMD7iT I/J653emTSSrs5xPnMnJCNdFGz08S6aRoOntmoFQI1wmCntqt9oz2f9CIKA6yeMIY6qYIz pEYMe804994EpkJ9POTr0+YZ+GH2EtQ= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mt3ZTzk+; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of huangzhaoyang@gmail.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=huangzhaoyang@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706621346; a=rsa-sha256; cv=none; b=iekX+AI6CTed9e+YINzYSNhsrP2s+g+RL5nOYiNAF9jR8/SYMDfb3C8K0TCc2AEmgxpF0I q04Ex7dJES0xum9Aw8sjh16gj7vl0CFqtLJ0PIC//4wa5FLVWmHrwcUX4ThDtZgMGMk3aC BuY1Onz1jJFH914D7nlcALcLQDat3qY= Received: by mail-lj1-f176.google.com with SMTP id 38308e7fff4ca-2cf328885b4so45764871fa.3 for ; Tue, 30 Jan 2024 05:29:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706621344; x=1707226144; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dNLcEGX4A9fxdoh3pmT0Jg1o9swAaHX5RacLSzjUp9g=; b=mt3ZTzk+O9x6r0Va0jSqlzzQBpTIUnbk02VhPfjnOHN7AVl2LWDBE687BqgejlF1bX 6ShoFybo8JM98guW/1ZpH+8ZMNTN6AmnyOCKDmAcpQH81Ss+ugtdFdBKD1pSM/eTpqWM VpGUTTEmQ2nC2p+9J9y8Do60SRkUta0T1p4JUmw/xgQW4H1wTiCYHzJ3s+r20UogGOx5 tOUxdTq5x0+tL+EuSpRCmx+juaTLOIjR88KnJ0gqoq5lRJJxj+sNz4UI/ZX10V/gHc0Q WNkEifdxv90/26zCcGWufblVRWEQmSRTZxfGFIlWBRbqy8gLGelG0hfAZPEjzkcOnTN2 4Vaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706621344; x=1707226144; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dNLcEGX4A9fxdoh3pmT0Jg1o9swAaHX5RacLSzjUp9g=; b=RmfPGMm97Y8x2hjIa685BrpD+VMIC42OAo0hEG/JeQ6gKEzHzp5bttDb/MUV25jaz/ h6OG0yycn236ljZ0a59RUcMbm8qSabBeobeyH4u4bFAyNXG6liOuIrzKKpxaAn3neQO9 GXKTbDXfTOuX5+V84ixQwmR1TnHAfv+BdOGlXKz8dpJ2bzLHitUhT9uDsoc3QqqJhwEY abumhZegHlLgpN4cNI/2bjBaNNncwyVoSTlAN1nyB2mOBIag56r3xDhDOMyN0mfXdVw8 mIN896dJKMYxMNA/frebONP1el2k6Z5P3CIG++Aq5BJsl8AwdTTnuM5510KEAQjLWDKC yEQQ== X-Gm-Message-State: AOJu0YwerXWq17p5jwcLh2gvQTdJllfPyxmNbQt53VFb14x+PSf8uijk TqqhG5S1MeL4uVUquKigX+rvCePfwSBSv+qEE/pM4ajsom0N+F7Q8R7avDx8ZgxzDIOADiQrzUO tcXS5h2LNfFYQXxzyAQRHlnGgEIw= X-Google-Smtp-Source: AGHT+IGZWpx8hDNjBl1tmnZdSd/zfw3n6nm0qEZM/nJ1HEozV4ymoFI0oxV2t5qVZQtwfBtUHwqgDCzcH1fz0frX7MU= X-Received: by 2002:a05:651c:1a29:b0:2cf:1ae2:dca with SMTP id by41-20020a05651c1a2900b002cf1ae20dcamr6506212ljb.16.1706621343828; Tue, 30 Jan 2024 05:29:03 -0800 (PST) MIME-Version: 1.0 References: <20240130084207.3760518-1-zhaoyang.huang@unisoc.com> In-Reply-To: From: Zhaoyang Huang Date: Tue, 30 Jan 2024 21:28:52 +0800 Message-ID: Subject: Re: [PATCHv5 1/1] block: introduce content activity based ioprio To: Damien Le Moal Cc: "zhaoyang.huang" , Andrew Morton , Jens Axboe , Matthew Wilcox , Yu Zhao , Niklas Cassel , "Martin K . Petersen" , Hannes Reinecke , Linus Walleij , linux-mm@kvack.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, steve.kang@unisoc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 380451C0021 X-Stat-Signature: oeut759mbmndyyzryr4di6mp7df8irbo X-HE-Tag: 1706621345-738886 X-HE-Meta: U2FsdGVkX18O1yo/xBImXhA/QChoP/9WWuqAw2dNzCP5Q95g7/NXBJqgy9dBX1H/v/aGZbMuCCDeBQaNd7ORV3rDk5SAj0pKBRGwxmbSg85BmssVn49AwXIMskhSkV/GH5iy/bcJ25/Uer1x1wsGu0kWydgONsxlDMVM34sqCsCmZoxJO2gX49d54QiXqBE7JcH5c2d+odZH9nU/qlBcKOMRlRPDf/0TcOMfMmvGQgf6L5rwQwdU+TnriC37uJaVb8N3QHiba82790M1AGngNhLTbwQNps++hFLWC5L6AraxChLvjp7zJ2NcnHWOg+e3Q1CJweiehooPBbWs35K5f7q0lmtKGfDlYSdrhfUbMbz0lev3ld38ccvliRyXANP17H8kHNaNstwiOO1m8NXUvmtFl94kDvX7tHbH+Owtv0wNUKaIJwHOc0hxnfKBmOV+DOwteHkbnIvoNlBYRJ6RdQRPE4vFNjWv3pA1rmiCmXTcu0+0SkTg6gX1x/jKc82GL9ygf9DlN1kLyB8FZxhoHu0kl6fT1mgGJWMly8gIBbbGEaeS1QVcfV8AyubsFltwc0zrSsHkle/5DbD+Z9k199bNITtyXtH0HCJu0/GzSUkxWB2T13SNMT2Qa8324xtm/tXFtXTfCejP4JXaaTP4b1DlVSSht6bwEc49JYTqqA7lmWGyoetJfaenL/mGpuVKZaJzAKfdmpTTy38RSA2arcfhRJKjzuUwDK+miOpCiZ4iLLiDNCWE+JsIz//DLlI4kL1LoaSt9Wzoj+exYoptaCOHXhJzZl57niw3v2h4hHiXtoy6pG+mylrCfGRAYT/i/WsTDQK69Vmr0Kipzm7MGq6x6JtWa1yQEZ2Z1BxMYRSRCBs8Odu9QPIARwC/OV27CHvYF9juUheIb0GAEi3vdY5/7O5G4y4208Jj4cDaQFeCwWw5xEKBqlhH49psAa/gVTscMC7aeB8anteC+Bh 9nwLyLtx iuJ0RCD8QGFlI3KQa6WsKcHwRpra0gmvfH2dNqnA8jXiyRVsGiKV0nApds3SCaokQRqH5z3UFuAcH2TiM8lkn2V/4fHVNeWbZNSQMoLptDn35YN5/aupbfV9Axt5p1ph7P09r15SpQMwflvc2w7Bt0v/YlRta5u8CNS0yGrHKia5f768f209JGg0J/CdJ1uqbuk5qtUArYQOup8Hh5Wkr1HwAzTZI6y6oaoANXawftZ1zKXKbpwhsOb7y4+wIKXAU50dYrmVK4fvJf7V6X9ygVZYn3k3VC9SvEWeSVx1f/JsR9sq2m62CxRbAbK9gIx/rSqMel5W6arI4cNM29g8sE3ihC3LQnM2hkB1YSFgvkZIA+X397gSvbGvU4N8jDMrTXI7YuI4OseB8Zy/ROwbwUqOmfzk+2wmiH5SFOAwbPfHYCOztOtnvETY1st0zH5snuYZ5YaLT88exPXWv8nmMTFzF9AxqyjPHxWze X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 30, 2024 at 9:17=E2=80=AFPM Damien Le Moal = wrote: > > On 1/30/24 21:43, Zhaoyang Huang wrote: > > On Tue, Jan 30, 2024 at 5:17=E2=80=AFPM Damien Le Moal wrote: > >> > >> On 1/30/24 17:42, zhaoyang.huang wrote: > >>> From: Zhaoyang Huang > >>> > >>> Currently, request's ioprio are set via task's schedule priority(when= no > >>> blkcg configured), which has high priority tasks possess the privileg= e on > >>> both of CPU and IO scheduling. > >>> This commit works as a hint of original policy by promoting the reque= st ioprio > >>> based on the page/folio's activity. The original idea comes from LRU_= GEN > >>> which provides more precised folio activity than before. This commit = try > >>> to adjust the request's ioprio when certain part of its folios are ho= t, > >>> which indicate that this request carry important contents and need be > >>> scheduled ealier. > >>> > >>> This commit is verified on a v6.6 6GB RAM android14 system via 4 test= cases > >>> by changing the bio_add_page/folio API in erofs, ext4 and f2fs in > >>> another commit. > >>> > >>> Case 1: > >>> script[a] which get significant improved fault time as expected[b] > >>> where dd's cost also shrink from 55s to 40s. > >>> (1). fault_latency.bin is an ebpf based test tool which measure all t= ask's > >>> iowait latency during page fault when scheduled out/in. > >>> (2). costmem generate page fault by mmaping a file and access the VA. > >>> (3). dd generate concurrent vfs io. > >>> > >>> [a] > >>> ./fault_latency.bin 1 5 > /data/dd_costmem & > >>> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null & > >>> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null & > >>> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null & > >>> costmem -c0 -a2048000 -b128000 -o0 1>/dev/null & > >>> dd if=3D/dev/block/sda of=3D/data/ddtest bs=3D1024 count=3D2048000 & > >>> dd if=3D/dev/block/sda of=3D/data/ddtest1 bs=3D1024 count=3D2048000 & > >>> dd if=3D/dev/block/sda of=3D/data/ddtest2 bs=3D1024 count=3D2048000 & > >>> dd if=3D/dev/block/sda of=3D/data/ddtest3 bs=3D1024 count=3D2048000 > >>> [b] > >>> mainline commit > >>> io wait 836us 156us > >>> > >>> Case 2: > >>> fio -filename=3D/dev/block/by-name/userdata -rw=3Drandread -direct=3D= 0 -bs=3D4k -size=3D2000M -numjobs=3D8 -group_reporting -name=3Dmytest > >>> mainline: 513MiB/s > >>> READ: bw=3D531MiB/s (557MB/s), 531MiB/s-531MiB/s (557MB/s-557MB/s), i= o=3D15.6GiB (16.8GB), run=3D30137-30137msec > >>> READ: bw=3D543MiB/s (569MB/s), 543MiB/s-543MiB/s (569MB/s-569MB/s), i= o=3D15.6GiB (16.8GB), run=3D29469-29469msec > >>> READ: bw=3D474MiB/s (497MB/s), 474MiB/s-474MiB/s (497MB/s-497MB/s), i= o=3D15.6GiB (16.8GB), run=3D33724-33724msec > >>> READ: bw=3D535MiB/s (561MB/s), 535MiB/s-535MiB/s (561MB/s-561MB/s), i= o=3D15.6GiB (16.8GB), run=3D29928-29928msec > >>> READ: bw=3D523MiB/s (548MB/s), 523MiB/s-523MiB/s (548MB/s-548MB/s), i= o=3D15.6GiB (16.8GB), run=3D30617-30617msec > >>> READ: bw=3D492MiB/s (516MB/s), 492MiB/s-492MiB/s (516MB/s-516MB/s), i= o=3D15.6GiB (16.8GB), run=3D32518-32518msec > >>> READ: bw=3D533MiB/s (559MB/s), 533MiB/s-533MiB/s (559MB/s-559MB/s), i= o=3D15.6GiB (16.8GB), run=3D29993-29993msec > >>> READ: bw=3D524MiB/s (550MB/s), 524MiB/s-524MiB/s (550MB/s-550MB/s), i= o=3D15.6GiB (16.8GB), run=3D30526-30526msec > >>> READ: bw=3D529MiB/s (554MB/s), 529MiB/s-529MiB/s (554MB/s-554MB/s), i= o=3D15.6GiB (16.8GB), run=3D30269-30269msec > >>> READ: bw=3D449MiB/s (471MB/s), 449MiB/s-449MiB/s (471MB/s-471MB/s), i= o=3D15.6GiB (16.8GB), run=3D35629-35629msec > >>> > >>> commit: 633MiB/s > >>> READ: bw=3D668MiB/s (700MB/s), 668MiB/s-668MiB/s (700MB/s-700MB/s), i= o=3D15.6GiB (16.8GB), run=3D23952-23952msec > >>> READ: bw=3D589MiB/s (618MB/s), 589MiB/s-589MiB/s (618MB/s-618MB/s), i= o=3D15.6GiB (16.8GB), run=3D27164-27164msec > >>> READ: bw=3D638MiB/s (669MB/s), 638MiB/s-638MiB/s (669MB/s-669MB/s), i= o=3D15.6GiB (16.8GB), run=3D25071-25071msec > >>> READ: bw=3D714MiB/s (749MB/s), 714MiB/s-714MiB/s (749MB/s-749MB/s), i= o=3D15.6GiB (16.8GB), run=3D22409-22409msec > >>> READ: bw=3D600MiB/s (629MB/s), 600MiB/s-600MiB/s (629MB/s-629MB/s), i= o=3D15.6GiB (16.8GB), run=3D26669-26669msec > >>> READ: bw=3D592MiB/s (621MB/s), 592MiB/s-592MiB/s (621MB/s-621MB/s), i= o=3D15.6GiB (16.8GB), run=3D27036-27036msec > >>> READ: bw=3D691MiB/s (725MB/s), 691MiB/s-691MiB/s (725MB/s-725MB/s), i= o=3D15.6GiB (16.8GB), run=3D23150-23150msec > >>> READ: bw=3D569MiB/s (596MB/s), 569MiB/s-569MiB/s (596MB/s-596MB/s), i= o=3D15.6GiB (16.8GB), run=3D28142-28142msec > >>> READ: bw=3D563MiB/s (590MB/s), 563MiB/s-563MiB/s (590MB/s-590MB/s), i= o=3D15.6GiB (16.8GB), run=3D28429-28429msec > >>> READ: bw=3D712MiB/s (746MB/s), 712MiB/s-712MiB/s (746MB/s-746MB/s), i= o=3D15.6GiB (16.8GB), run=3D22478-22478msec > >>> > >>> Case 3: > >>> This commit is also verified by the case of launching camera APP whic= h is > >>> usually considered as heavy working load on both of memory and IO, wh= ich > >>> shows 12%-24% improvement. > >>> > >>> ttl =3D 0 ttl =3D 50 ttl =3D 100 > >>> mainline 2267ms 2420ms 2316ms > >>> commit 1992ms 1806ms 1998ms > >>> > >>> case 4: > >>> androbench has no improvment as well as regression which supposed to = be > >>> its test time is short which MGLRU hasn't take effect yet. > >>> > >>> Signed-off-by: Zhaoyang Huang > >>> --- > >>> change of v2: calculate page's activity via helper function > >>> change of v3: solve layer violation by move API into mm > >>> change of v4: keep block clean by removing the page related API > >>> change of v5: introduce the macros of bio_add_folio/page for read dir= . > >>> --- > >>> --- > >>> include/linux/act_ioprio.h | 60 +++++++++++++++++++++++++++++++++++= ++ > >>> include/uapi/linux/ioprio.h | 38 +++++++++++++++++++++++ > >>> mm/Kconfig | 8 +++++ > >>> 3 files changed, 106 insertions(+) > >>> create mode 100644 include/linux/act_ioprio.h > >>> > >>> diff --git a/include/linux/act_ioprio.h b/include/linux/act_ioprio.h > >>> new file mode 100644 > >>> index 000000000000..ca7309b85758 > >>> --- /dev/null > >>> +++ b/include/linux/act_ioprio.h > >>> @@ -0,0 +1,60 @@ > >>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ > >>> +#ifndef _ACT_IOPRIO_H > >>> +#define _ACT_IOPRIO_H > >>> + > >>> +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO > >>> +#include > >>> + > >>> +static __maybe_unused > >>> +bool act_bio_add_folio(struct bio *bio, struct folio *folio, size_t = len, > >>> + size_t off) > >>> +{ > >>> + int class, level, hint, activity; > >>> + bool ret; > >>> + > >>> + ret =3D bio_add_folio(bio, folio, len, off); > >>> + if (bio_op(bio) =3D=3D REQ_OP_READ && ret) { > >>> + class =3D IOPRIO_PRIO_CLASS(bio->bi_ioprio); > >>> + level =3D IOPRIO_PRIO_LEVEL(bio->bi_ioprio); > >>> + hint =3D IOPRIO_PRIO_HINT(bio->bi_ioprio); > >>> + activity =3D IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio); > >>> + activity +=3D (activity < IOPRIO_NR_ACTIVITY && > >>> + folio_test_workingset(folio)) ? 1 : 0; > >>> + if (activity >=3D bio->bi_vcnt / 2) > >>> + class =3D IOPRIO_CLASS_RT; > >>> + else if (activity >=3D bio->bi_vcnt / 4) > >>> + class =3D max(IOPRIO_PRIO_CLASS(get_current_iop= rio()), IOPRIO_CLASS_BE); > >>> + activity =3D min(IOPRIO_NR_ACTIVITY - 1, activity); > >>> + bio->bi_ioprio =3D IOPRIO_PRIO_VALUE_ACTIVITY(class, le= vel, hint, activity); > >>> + } > >>> + return ret; > >>> +} > >> > >> Big non-inline functions in a header file... That is unusual, to say t= he least. > >> So every FS that includes this will get its own copy of the binary for= these > >> functions. That is not exactly optimal. > > Thanks for quick reply:D > > This is a trade-off method for having both the block layer and fs be > > clean and do no modification. There is less calling bio_add_xxx within > > fs actually. > >> > >>> + > >>> +static __maybe_unused > >>> +int act_bio_add_page(struct bio *bio, struct page *page, > >>> + unsigned int len, unsigned int offset) > >>> +{ > >>> + int class, level, hint, activity; > >>> + int ret =3D 0; > >>> + > >>> + ret =3D bio_add_page(bio, page, len, offset); > >>> + if (bio_op(bio) =3D=3D REQ_OP_READ && ret > 0) { > >>> + class =3D IOPRIO_PRIO_CLASS(bio->bi_ioprio); > >>> + level =3D IOPRIO_PRIO_LEVEL(bio->bi_ioprio); > >>> + hint =3D IOPRIO_PRIO_HINT(bio->bi_ioprio); > >>> + activity =3D IOPRIO_PRIO_ACTIVITY(bio->bi_ioprio); > >>> + activity +=3D (activity < IOPRIO_NR_ACTIVITY && > >>> + PageWorkingset(page)) ? 1 : 0; > >>> + if (activity >=3D bio->bi_vcnt / 2) > >>> + class =3D IOPRIO_CLASS_RT; > >>> + else if (activity >=3D bio->bi_vcnt / 4) > >>> + class =3D max(IOPRIO_PRIO_CLASS(get_current_iop= rio()), IOPRIO_CLASS_BE); > >>> + activity =3D min(IOPRIO_NR_ACTIVITY - 1, activity); > >>> + bio->bi_ioprio =3D IOPRIO_PRIO_VALUE_ACTIVITY(class, le= vel, hint, activity); > >>> + } > >>> + return ret; > >>> +} > >>> +#define bio_add_folio(bio, folio, len, off) act_bio_add_folio(bi= o, folio, len, off) > >>> +#define bio_add_page(bio, page, len, offset) act_bio_add_page(bio= , page, len, offset) > >> > >> These functions are *NOT* part of the block layer. So please do not pr= etend they > >> are. Why don't you simply write a function equivalent to what you have= inside > >> the "if" above and have the FS call that after bio_add_Page() ? > > The iteration of bio is costly(could be maximum to 256 pages) and > > needs fs's code modification. I will implement a version as you > > suggested. > >> > >> And I seriously doubt that all compilers will be happy with these macr= o names > >> clashing with real function names... > >> > >>> +#endif > >>> +#endif > >>> diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.= h > >>> index bee2bdb0eedb..64cf5ff0ac5f 100644 > >>> --- a/include/uapi/linux/ioprio.h > >>> +++ b/include/uapi/linux/ioprio.h > >>> @@ -71,12 +71,24 @@ enum { > >>> * class and level. > >>> */ > >>> #define IOPRIO_HINT_SHIFT IOPRIO_LEVEL_NR_BITS > >>> +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO > >>> +#define IOPRIO_HINT_NR_BITS 3 > >>> +#else > >>> #define IOPRIO_HINT_NR_BITS 10 > >>> +#endif > >>> #define IOPRIO_NR_HINTS (1 << IOPRIO_HINT_NR_BI= TS) > >>> #define IOPRIO_HINT_MASK (IOPRIO_NR_HINTS - 1) > >>> #define IOPRIO_PRIO_HINT(ioprio) \ > >>> (((ioprio) >> IOPRIO_HINT_SHIFT) & IOPRIO_HINT_MASK) > >>> > >>> +#ifdef CONFIG_CONTENT_ACT_BASED_IOPRIO > >>> +#define IOPRIO_ACTIVITY_SHIFT (IOPRIO_HINT_NR_BITS + = IOPRIO_LEVEL_NR_BITS) > >>> +#define IOPRIO_ACTIVITY_NR_BITS 7 > >> > >> I already told you that taking all the free hint bits for yourself, le= aving no > >> room fo future IO hints, is not nice. Do you really need 7 bits for yo= ur thing ? > >> Why does the activity even need to be part of the IO priority ? From t= he rather > >> short explanation in the commit message, it seems that activity should= simply > >> raise the priority (either class or level or both). I do not see why t= hat > >> activity number needs to be in the ioprio. Who in the kernel will look= at it ? > >> IO scheduler ? the storage device ? > > As I explained above, 7 bits(128 of 256) within ioprio is the minimum > > number for counting active pages carried by this bio and will end at > > the IO scheduler. bio has to be enlarged a new member to log these if > > we don't use ioprio. > > That information does not belong to the ioprio. And which scheduler acts = on a > number of pages anyway ? The scheduler sees requests and BIOs. It can det= ermine > the number of pages they have if that is an information it needs to make > scheduling decisison. Using ioprio to pass that information down is a dir= ty hack. No. IO scheduler acts on IOPRIO_CLASS which is transferred from the page's activity by the current method. I will implement another version of iterating pages before submit_bio and feed back to the list > > >> > >>> +#define IOPRIO_NR_ACTIVITY (1 << IOPRIO_ACTIVITY_NR_BITS) > >>> +#define IOPRIO_ACTIVITY_MASK (IOPRIO_NR_ACTIVITY - 1) > >>> +#define IOPRIO_PRIO_ACTIVITY(ioprio) \ > >>> + (((ioprio) >> IOPRIO_ACTIVITY_SHIFT) & IOPRIO_ACTIVITY_MASK) > >>> +#endif > >>> /* > >>> * I/O hints. > >>> */ > >>> @@ -104,6 +116,7 @@ enum { > >>> > >>> #define IOPRIO_BAD_VALUE(val, max) ((val) < 0 || (val) >=3D (max)) > >>> > >>> +#ifndef CONFIG_CONTENT_ACT_BASED_IOPRIO > >>> /* > >>> * Return an I/O priority value based on a class, a level and a hint= . > >>> */ > >>> @@ -123,5 +136,30 @@ static __always_inline __u16 ioprio_value(int pr= ioclass, int priolevel, > >>> ioprio_value(prioclass, priolevel, IOPRIO_HINT_NONE) > >>> #define IOPRIO_PRIO_VALUE_HINT(prioclass, priolevel, priohint) = \ > >>> ioprio_value(prioclass, priolevel, priohint) > >>> +#else > >>> +/* > >>> + * Return an I/O priority value based on a class, a level, a hint an= d > >>> + * content's activities > >>> + */ > >>> +static __always_inline __u16 ioprio_value(int prioclass, int priolev= el, > >>> + int priohint, int activity) > >>> +{ > >>> + if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) || > >>> + IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) |= | > >>> + IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS) || > >>> + IOPRIO_BAD_VALUE(activity, IOPRIO_NR_ACTIVITY)) > >>> + return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT; > >>> > >>> + return (prioclass << IOPRIO_CLASS_SHIFT) | > >>> + (activity << IOPRIO_ACTIVITY_SHIFT) | > >>> + (priohint << IOPRIO_HINT_SHIFT) | priolevel; > >>> +} > >>> + > >>> +#define IOPRIO_PRIO_VALUE(prioclass, priolevel) = \ > >>> + ioprio_value(prioclass, priolevel, IOPRIO_HINT_NONE, 0) > >>> +#define IOPRIO_PRIO_VALUE_HINT(prioclass, priolevel, priohint) = \ > >>> + ioprio_value(prioclass, priolevel, priohint, 0) > >>> +#define IOPRIO_PRIO_VALUE_ACTIVITY(prioclass, priolevel, priohint, a= ctivity) \ > >>> + ioprio_value(prioclass, priolevel, priohint, activity) > >>> +#endif > >>> #endif /* _UAPI_LINUX_IOPRIO_H */ > >>> diff --git a/mm/Kconfig b/mm/Kconfig > >>> index 264a2df5ecf5..e0e5a5a44ded 100644 > >>> --- a/mm/Kconfig > >>> +++ b/mm/Kconfig > >>> @@ -1240,6 +1240,14 @@ config LRU_GEN_STATS > >>> from evicted generations for debugging purpose. > >>> > >>> This option has a per-memcg and per-node memory overhead. > >>> + > >>> +config CONTENT_ACT_BASED_IOPRIO > >>> + bool "Enable content activity based ioprio" > >>> + depends on LRU_GEN > >>> + default n > >>> + help > >>> + This item enable the feature of adjust bio's priority by > >>> + calculating its content's activity. > >>> # } > >>> > >>> config ARCH_SUPPORTS_PER_VMA_LOCK > >> > >> -- > >> Damien Le Moal > >> Western Digital Research > >> > > -- > Damien Le Moal > Western Digital Research >