From: Jesper Dangaard Brouer <hawk@kernel.org>
To: "Yunsheng Lin" <linyunsheng@huawei.com>,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com
Cc: zhangkun09@huawei.com, fanghaiqing@huawei.com,
liuyonglong@huawei.com, Robin Murphy <robin.murphy@arm.com>,
Alexander Duyck <alexander.duyck@gmail.com>,
IOMMU <iommu@lists.linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Eric Dumazet <edumazet@google.com>,
Ilias Apalodimas <ilias.apalodimas@linaro.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, kernel-team <kernel-team@cloudflare.com>,
Viktor Malik <vmalik@redhat.com>
Subject: Re: [PATCH net-next v3 3/3] page_pool: fix IOMMU crash when driver has already unbound
Date: Wed, 6 Nov 2024 16:57:25 +0100 [thread overview]
Message-ID: <b8b7818a-e44b-45f5-91c2-d5eceaa5dd5b@kernel.org> (raw)
In-Reply-To: <18ba4489-ad30-423e-9c54-d4025f74c193@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 7688 bytes --]
On 06/11/2024 14.25, Jesper Dangaard Brouer wrote:
>
> On 26/10/2024 09.33, Yunsheng Lin wrote:
>> On 2024/10/25 22:07, Jesper Dangaard Brouer wrote:
>>
>> ...
>>
>>>
>>>>> You and Jesper seems to be mentioning a possible fact that there might
>>>>> be 'hundreds of gigs of memory' needed for inflight pages, it would
>>>>> be nice
>>>>> to provide more info or reasoning above why 'hundreds of gigs of
>>>>> memory' is
>>>>> needed here so that we don't do a over-designed thing to support
>>>>> recording
>>>>> unlimited in-flight pages if the driver unbound stalling turns out
>>>>> impossible
>>>>> and the inflight pages do need to be recorded.
>>>>
>>>> I don't have a concrete example of a use that will blow the limit you
>>>> are setting (but maybe Jesper does), I am simply objecting to the
>>>> arbitrary imposing of any limit at all. It smells a lot of "640k ought
>>>> to be enough for anyone".
>>>>
>>>
>>> As I wrote before. In *production* I'm seeing TCP memory reach 24 GiB
>>> (on machines with 384GiB memory). I have attached a grafana screenshot
>>> to prove what I'm saying.
>>>
>>> As my co-worker Mike Freemon, have explain to me (and more details in
>>> blogposts[1]). It is no coincident that graph have a strange "sealing"
>>> close to 24 GiB (on machines with 384GiB total memory). This is because
>>> TCP network stack goes into a memory "under pressure" state when 6.25%
>>> of total memory is used by TCP-stack. (Detail: The system will stay in
>>> that mode until allocated TCP memory falls below 4.68% of total memory).
>>>
>>> [1]
>>> https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/
>>
>> Thanks for the info.
>
> Some more info from production servers.
>
> (I'm amazed what we can do with a simple bpftrace script, Cc Viktor)
>
> In below bpftrace script/oneliner I'm extracting the inflight count, for
> all page_pool's in the system, and storing that in a histogram hash.
>
> sudo bpftrace -e '
> rawtracepoint:page_pool_state_release { @cnt[probe]=count();
> @cnt_total[probe]=count();
> $pool=(struct page_pool*)arg0;
> $release_cnt=(uint32)arg2;
> $hold_cnt=$pool->pages_state_hold_cnt;
> $inflight_cnt=(int32)($hold_cnt - $release_cnt);
> @inflight=hist($inflight_cnt);
> }
> interval:s:1 {time("\n%H:%M:%S\n");
> print(@cnt); clear(@cnt);
> print(@inflight);
> print(@cnt_total);
> }'
>
> The page_pool behavior depend on how NIC driver use it, so I've run this
> on two prod servers with drivers bnxt and mlx5, on a 6.6.51 kernel.
>
> Driver: bnxt_en
> - kernel 6.6.51
>
> @cnt[rawtracepoint:page_pool_state_release]: 8447
> @inflight:
> [0] 507 | |
> [1] 275 | |
> [2, 4) 261 | |
> [4, 8) 215 | |
> [8, 16) 259 | |
> [16, 32) 361 | |
> [32, 64) 933 | |
> [64, 128) 1966 | |
> [128, 256) 937052 |@@@@@@@@@ |
> [256, 512) 5178744 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [512, 1K) 73908 | |
> [1K, 2K) 1220128 |@@@@@@@@@@@@ |
> [2K, 4K) 1532724 |@@@@@@@@@@@@@@@ |
> [4K, 8K) 1849062 |@@@@@@@@@@@@@@@@@@ |
> [8K, 16K) 1466424 |@@@@@@@@@@@@@@ |
> [16K, 32K) 858585 |@@@@@@@@ |
> [32K, 64K) 693893 |@@@@@@ |
> [64K, 128K) 170625 |@ |
>
> Driver: mlx5_core
> - Kernel: 6.6.51
>
> @cnt[rawtracepoint:page_pool_state_release]: 1975
> @inflight:
> [128, 256) 28293 |@@@@ |
> [256, 512) 184312 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [512, 1K) 0 | |
> [1K, 2K) 4671 | |
> [2K, 4K) 342571 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [4K, 8K) 180520 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
> [8K, 16K) 96483 |@@@@@@@@@@@@@@ |
> [16K, 32K) 25133 |@@@ |
> [32K, 64K) 8274 |@ |
>
>
> The key thing to notice that we have up-to 128,000 pages in flight on
> these random production servers. The NIC have 64 RX queue configured,
> thus also 64 page_pool objects.
>
I realized that we primarily want to know the maximum in-flight pages.
So, I modified the bpftrace oneliner to track the max for each page_pool
in the system.
sudo bpftrace -e '
rawtracepoint:page_pool_state_release { @cnt[probe]=count();
@cnt_total[probe]=count();
$pool=(struct page_pool*)arg0;
$release_cnt=(uint32)arg2;
$hold_cnt=$pool->pages_state_hold_cnt;
$inflight_cnt=(int32)($hold_cnt - $release_cnt);
$cur=@inflight_max[$pool];
if ($inflight_cnt > $cur) {
@inflight_max[$pool]=$inflight_cnt;}
}
interval:s:1 {time("\n%H:%M:%S\n");
print(@cnt); clear(@cnt);
print(@inflight_max);
print(@cnt_total);
}'
I've attached the output from the script.
For unknown reason this system had 199 page_pool objects.
The 20 top users:
$ cat out02.inflight-max | grep inflight_max | tail -n 20
@inflight_max[0xffff88829133d800]: 26473
@inflight_max[0xffff888293c3e000]: 27042
@inflight_max[0xffff888293c3b000]: 27709
@inflight_max[0xffff8881076f2800]: 29400
@inflight_max[0xffff88818386e000]: 29690
@inflight_max[0xffff8882190b1800]: 29813
@inflight_max[0xffff88819ee83800]: 30067
@inflight_max[0xffff8881076f4800]: 30086
@inflight_max[0xffff88818386b000]: 31116
@inflight_max[0xffff88816598f800]: 36970
@inflight_max[0xffff8882190b7800]: 37336
@inflight_max[0xffff888293c38800]: 39265
@inflight_max[0xffff888293c3c800]: 39632
@inflight_max[0xffff888293c3b800]: 43461
@inflight_max[0xffff888293c3f000]: 43787
@inflight_max[0xffff88816598f000]: 44557
@inflight_max[0xffff888132ce9000]: 45037
@inflight_max[0xffff888293c3f800]: 51843
@inflight_max[0xffff888183869800]: 62612
@inflight_max[0xffff888113d08000]: 73203
Adding all values together:
grep inflight_max out02.inflight-max | awk 'BEGIN {tot=0} {tot+=$2;
printf "total:" tot "\n"}' | tail -n 1
total:1707129
Worst case we need a data structure holding 1,707,129 pages.
Fortunately, we don't need a single data structure as this will be split
between 199 page_pool's.
--Jesper
[-- Attachment #2: out02.inflight-max --]
[-- Type: text/plain, Size: 8017 bytes --]
15:07:05
@cnt[rawtracepoint:page_pool_state_release]: 6344
@inflight_max[0xffff88a1512d9800]: 318
@inflight_max[0xffff88a071703800]: 318
@inflight_max[0xffff88a151367800]: 318
@inflight_max[0xffff88a1512fc000]: 318
@inflight_max[0xffff88a151365000]: 318
@inflight_max[0xffff88a1512d8000]: 318
@inflight_max[0xffff88a1512fe800]: 318
@inflight_max[0xffff88a1512dd800]: 318
@inflight_max[0xffff88a1512de800]: 318
@inflight_max[0xffff88a071707800]: 318
@inflight_max[0xffff88a1512ff000]: 318
@inflight_max[0xffff88a1512dd000]: 318
@inflight_max[0xffff88a1512dc800]: 318
@inflight_max[0xffff88a151366800]: 318
@inflight_max[0xffff88a071701000]: 318
@inflight_max[0xffff88a1512f9800]: 318
@inflight_max[0xffff88a071706000]: 318
@inflight_max[0xffff88a1512fb000]: 318
@inflight_max[0xffff88a071700000]: 318
@inflight_max[0xffff88a151360800]: 318
@inflight_max[0xffff88a1512fa800]: 318
@inflight_max[0xffff88a151360000]: 318
@inflight_max[0xffff88a151361800]: 318
@inflight_max[0xffff88a1512da000]: 318
@inflight_max[0xffff88a1512de000]: 318
@inflight_max[0xffff88a1512db800]: 318
@inflight_max[0xffff88a1512fb800]: 318
@inflight_max[0xffff88a1512fe000]: 318
@inflight_max[0xffff88a151364800]: 318
@inflight_max[0xffff88a071706800]: 318
@inflight_max[0xffff88a151364000]: 318
@inflight_max[0xffff88a1512fd000]: 318
@inflight_max[0xffff88a151366000]: 318
@inflight_max[0xffff88a071701800]: 318
@inflight_max[0xffff88a1512da800]: 318
@inflight_max[0xffff88a071700800]: 318
@inflight_max[0xffff88a1512fd800]: 318
@inflight_max[0xffff88a071702000]: 318
@inflight_max[0xffff88a151365800]: 318
@inflight_max[0xffff88a151361000]: 318
@inflight_max[0xffff88a1512f8000]: 318
@inflight_max[0xffff88a071705000]: 318
@inflight_max[0xffff88a151363800]: 318
@inflight_max[0xffff88a151362000]: 318
@inflight_max[0xffff88a1512d8800]: 318
@inflight_max[0xffff88a071704800]: 318
@inflight_max[0xffff88a1512db000]: 318
@inflight_max[0xffff88a1512fc800]: 318
@inflight_max[0xffff88a1512df000]: 318
@inflight_max[0xffff88a1512f8800]: 318
@inflight_max[0xffff88a1512df800]: 318
@inflight_max[0xffff88a071707000]: 318
@inflight_max[0xffff88a1512dc000]: 318
@inflight_max[0xffff88a071704000]: 318
@inflight_max[0xffff88a071702800]: 318
@inflight_max[0xffff88a071703000]: 318
@inflight_max[0xffff88a1512d9000]: 318
@inflight_max[0xffff88a151362800]: 318
@inflight_max[0xffff88a151367000]: 318
@inflight_max[0xffff88a1512f9000]: 318
@inflight_max[0xffff88a151363000]: 318
@inflight_max[0xffff88a1512fa000]: 318
@inflight_max[0xffff88a071705800]: 318
@inflight_max[0xffff88991d969000]: 331
@inflight_max[0xffff8899ca7b6800]: 336
@inflight_max[0xffff88991d96c000]: 339
@inflight_max[0xffff8899ca7b6000]: 340
@inflight_max[0xffff8899ca7b3000]: 342
@inflight_max[0xffff8899ca7a1800]: 342
@inflight_max[0xffff8899ca7b2800]: 342
@inflight_max[0xffff8899ca7a7800]: 343
@inflight_max[0xffff88991d96e000]: 343
@inflight_max[0xffff88991d96a000]: 344
@inflight_max[0xffff88991d96b000]: 344
@inflight_max[0xffff8898b3c92800]: 345
@inflight_max[0xffff8899ca7a4800]: 345
@inflight_max[0xffff8899ca7b4000]: 346
@inflight_max[0xffff8898b3c93800]: 347
@inflight_max[0xffff88991d968000]: 347
@inflight_max[0xffff88991d96f800]: 348
@inflight_max[0xffff8898b3c92000]: 348
@inflight_max[0xffff88991d96b800]: 350
@inflight_max[0xffff8898b3c90800]: 350
@inflight_max[0xffff8898b3c96800]: 351
@inflight_max[0xffff8899ca7b5000]: 351
@inflight_max[0xffff8898b3c97800]: 351
@inflight_max[0xffff8899ca7b4800]: 352
@inflight_max[0xffff8898b3c93000]: 353
@inflight_max[0xffff88991d969800]: 353
@inflight_max[0xffff8899ca7b5800]: 354
@inflight_max[0xffff8899ca7a2000]: 357
@inflight_max[0xffff8898b3c91000]: 357
@inflight_max[0xffff8898b3c97000]: 357
@inflight_max[0xffff8899ca7b1000]: 358
@inflight_max[0xffff8898b3c94000]: 359
@inflight_max[0xffff88991d96d800]: 362
@inflight_max[0xffff8899ca7b0800]: 363
@inflight_max[0xffff8899ca7a1000]: 364
@inflight_max[0xffff8899ca7a0000]: 364
@inflight_max[0xffff8898b3c95800]: 364
@inflight_max[0xffff8899ca7b7800]: 364
@inflight_max[0xffff8898b3c94800]: 365
@inflight_max[0xffff88991d96d000]: 365
@inflight_max[0xffff88991d968800]: 365
@inflight_max[0xffff8898b3c90000]: 365
@inflight_max[0xffff8899ca7a5800]: 365
@inflight_max[0xffff8899ca7a5000]: 366
@inflight_max[0xffff8899ca7a2800]: 366
@inflight_max[0xffff8899ca7b0000]: 366
@inflight_max[0xffff8899ca7a7000]: 367
@inflight_max[0xffff88991d96a800]: 368
@inflight_max[0xffff88991d96c800]: 368
@inflight_max[0xffff8899ca7a0800]: 368
@inflight_max[0xffff8899ca7a3800]: 370
@inflight_max[0xffff88991d96f000]: 371
@inflight_max[0xffff88991d96e800]: 372
@inflight_max[0xffff8899ca7b3800]: 373
@inflight_max[0xffff8899ca7b2000]: 373
@inflight_max[0xffff8899ca7b7000]: 373
@inflight_max[0xffff8899ca7a6800]: 373
@inflight_max[0xffff8899ca7a4000]: 374
@inflight_max[0xffff8899ca7a6000]: 377
@inflight_max[0xffff8899ca7a3000]: 378
@inflight_max[0xffff8898b3c95000]: 379
@inflight_max[0xffff8898b3c91800]: 379
@inflight_max[0xffff8898b3c96000]: 389
@inflight_max[0xffff888111079800]: 4201
@inflight_max[0xffff888111205000]: 4203
@inflight_max[0xffff888111079000]: 4393
@inflight_max[0xffff8881134fb800]: 4519
@inflight_max[0xffff88811107d800]: 4520
@inflight_max[0xffff8881134fc800]: 4586
@inflight_max[0xffff88811107a000]: 4650
@inflight_max[0xffff8881111b1800]: 5674
@inflight_max[0xffff88811107d000]: 6314
@inflight_max[0xffff888293c3d800]: 11714
@inflight_max[0xffff88818386c000]: 12302
@inflight_max[0xffff888293c3c000]: 12393
@inflight_max[0xffff888132cea800]: 12500
@inflight_max[0xffff888165966000]: 12940
@inflight_max[0xffff888113d0d800]: 13370
@inflight_max[0xffff88818386c800]: 13510
@inflight_max[0xffff888113d0c800]: 14027
@inflight_max[0xffff8882190b0800]: 15149
@inflight_max[0xffff888132ceb800]: 15405
@inflight_max[0xffff888132ceb000]: 15633
@inflight_max[0xffff888183869000]: 15684
@inflight_max[0xffff88818386b800]: 16142
@inflight_max[0xffff888132ced000]: 16450
@inflight_max[0xffff888165964800]: 17007
@inflight_max[0xffff8882190b5800]: 17879
@inflight_max[0xffff8882190b6000]: 17915
@inflight_max[0xffff88819ee80800]: 17977
@inflight_max[0xffff88819ee84000]: 18132
@inflight_max[0xffff888118680000]: 18204
@inflight_max[0xffff888118680800]: 18514
@inflight_max[0xffff88819ee83000]: 18546
@inflight_max[0xffff888183868000]: 18552
@inflight_max[0xffff8881076f1800]: 18706
@inflight_max[0xffff88819ee87000]: 18801
@inflight_max[0xffff888165965800]: 19556
@inflight_max[0xffff888293c3d000]: 20675
@inflight_max[0xffff88818386d000]: 20749
@inflight_max[0xffff88818386a800]: 21226
@inflight_max[0xffff888183868800]: 21559
@inflight_max[0xffff8881076f3000]: 21933
@inflight_max[0xffff888293c3a000]: 22086
@inflight_max[0xffff88819ee82800]: 22975
@inflight_max[0xffff88818386a000]: 23600
@inflight_max[0xffff88816598c000]: 24092
@inflight_max[0xffff888293c39800]: 24093
@inflight_max[0xffff88818386f000]: 24438
@inflight_max[0xffff888113d0e800]: 24882
@inflight_max[0xffff888293c39000]: 25218
@inflight_max[0xffff88818386d800]: 25276
@inflight_max[0xffff888293c3a800]: 25292
@inflight_max[0xffff888293c3e800]: 25429
@inflight_max[0xffff888293c38000]: 25794
@inflight_max[0xffff8881076f6800]: 26030
@inflight_max[0xffff88829133d800]: 26473
@inflight_max[0xffff888293c3e000]: 27042
@inflight_max[0xffff888293c3b000]: 27709
@inflight_max[0xffff8881076f2800]: 29400
@inflight_max[0xffff88818386e000]: 29690
@inflight_max[0xffff8882190b1800]: 29813
@inflight_max[0xffff88819ee83800]: 30067
@inflight_max[0xffff8881076f4800]: 30086
@inflight_max[0xffff88818386b000]: 31116
@inflight_max[0xffff88816598f800]: 36970
@inflight_max[0xffff8882190b7800]: 37336
@inflight_max[0xffff888293c38800]: 39265
@inflight_max[0xffff888293c3c800]: 39632
@inflight_max[0xffff888293c3b800]: 43461
@inflight_max[0xffff888293c3f000]: 43787
@inflight_max[0xffff88816598f000]: 44557
@inflight_max[0xffff888132ce9000]: 45037
@inflight_max[0xffff888293c3f800]: 51843
@inflight_max[0xffff888183869800]: 62612
@inflight_max[0xffff888113d08000]: 73203
@cnt_total[rawtracepoint:page_pool_state_release]: 67263129
next prev parent reply other threads:[~2024-11-06 15:57 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20241022032214.3915232-1-linyunsheng@huawei.com>
2024-10-22 3:22 ` Yunsheng Lin
2024-10-22 16:40 ` Simon Horman
2024-10-22 18:14 ` Jesper Dangaard Brouer
2024-10-23 8:59 ` Yunsheng Lin
2024-10-24 14:40 ` Toke Høiland-Jørgensen
2024-10-25 3:20 ` Yunsheng Lin
2024-10-25 11:16 ` Toke Høiland-Jørgensen
2024-10-25 14:07 ` Jesper Dangaard Brouer
2024-10-26 7:33 ` Yunsheng Lin
2024-11-06 13:25 ` Jesper Dangaard Brouer
2024-11-06 15:57 ` Jesper Dangaard Brouer [this message]
2024-11-06 19:55 ` Alexander Duyck
2024-11-07 11:10 ` Yunsheng Lin
2024-11-07 11:09 ` Yunsheng Lin
2024-11-11 11:31 ` Yunsheng Lin
2024-11-11 18:51 ` Toke Høiland-Jørgensen
2024-11-12 12:22 ` Yunsheng Lin
2024-11-12 14:19 ` Jesper Dangaard Brouer
2024-11-13 12:21 ` Yunsheng Lin
[not found] ` <40c9b515-1284-4c49-bdce-c9eeff5092f9@huawei.com>
2024-11-18 15:11 ` Jesper Dangaard Brouer
2024-10-26 7:32 ` Yunsheng Lin
2024-10-29 13:58 ` Toke Høiland-Jørgensen
2024-10-30 11:30 ` Yunsheng Lin
2024-10-30 11:57 ` Toke Høiland-Jørgensen
2024-10-31 12:17 ` Yunsheng Lin
2024-10-31 16:18 ` Toke Høiland-Jørgensen
2024-11-01 11:11 ` Yunsheng Lin
2024-11-05 20:11 ` Jesper Dangaard Brouer
2024-11-06 10:56 ` Yunsheng Lin
2024-11-06 14:17 ` Robin Murphy
2024-11-07 8:41 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b8b7818a-e44b-45f5-91c2-d5eceaa5dd5b@kernel.org \
--to=hawk@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=alexander.duyck@gmail.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=fanghaiqing@huawei.com \
--cc=ilias.apalodimas@linaro.org \
--cc=iommu@lists.linux.dev \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linyunsheng@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=robin.murphy@arm.com \
--cc=toke@redhat.com \
--cc=vmalik@redhat.com \
--cc=zhangkun09@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox