On 25/10/2024 13.16, Toke Høiland-Jørgensen wrote:
> Yunsheng Lin <linyunsheng@huawei.com> writes:
> 
>> On 2024/10/24 22:40, Toke Høiland-Jørgensen wrote:
>>
>> ...
>>
>>>>>
>>>>> I really really dislike this approach!
>>>>>
>>>>> Nacked-by: Jesper Dangaard Brouer <hawk@kernel.org>
>>>>>
>>>>> Having to keep an array to record all the pages including the ones
>>>>> which are handed over to network stack, goes against the very principle
>>>>> behind page_pool. We added members to struct page, such that pages could
>>>>> be "outstanding".
>>>>
>>>> Before and after this patch both support "outstanding", the difference is
>>>> how many "outstanding" pages do they support.
>>>>
>>>> The question seems to be do we really need unlimited inflight page for
>>>> page_pool to work as mentioned in [1]?
>>>>
>>>> 1. https://lore.kernel.org/all/5d9ea7bd-67bb-4a9d-a120-c8f290c31a47@huawei.com/
>>>
>>> Well, yes? Imposing an arbitrary limit on the number of in-flight
>>> packets (especially such a low one as in this series) is a complete
>>> non-starter. Servers have hundreds of gigs of memory these days, and if
>>> someone wants to use that for storing in-flight packets, the kernel
>>> definitely shouldn't impose some (hard-coded!) limit on that.
>>

I agree this limit is a non-starter.

>> You and Jesper seems to be mentioning a possible fact that there might
>> be 'hundreds of gigs of memory' needed for inflight pages, it would be nice
>> to provide more info or reasoning above why 'hundreds of gigs of memory' is
>> needed here so that we don't do a over-designed thing to support recording
>> unlimited in-flight pages if the driver unbound stalling turns out impossible
>> and the inflight pages do need to be recorded.
> 
> I don't have a concrete example of a use that will blow the limit you
> are setting (but maybe Jesper does), I am simply objecting to the
> arbitrary imposing of any limit at all. It smells a lot of "640k ought
> to be enough for anyone".
> 

As I wrote before. In *production* I'm seeing TCP memory reach 24 GiB
(on machines with 384GiB memory). I have attached a grafana screenshot
to prove what I'm saying.

As my co-worker Mike Freemon, have explain to me (and more details in
blogposts[1]). It is no coincident that graph have a strange "sealing"
close to 24 GiB (on machines with 384GiB total memory).  This is because
TCP network stack goes into a memory "under pressure" state when 6.25%
of total memory is used by TCP-stack. (Detail: The system will stay in
that mode until allocated TCP memory falls below 4.68% of total memory).

  [1] 
https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/


>> I guess it is common sense to start with easy one until someone complains
>> with some testcase and detailed reasoning if we need to go the hard way as
>> you and Jesper are also prefering waiting over having to record the inflight
>> pages.
> 
> AFAIU Jakub's comment on his RFC patch for waiting, he was suggesting
> exactly this: Add the wait, and see if the cases where it can stall turn
> out to be problems in practice.

+1

I like Jakub's approach.

--Jesper