On 25/10/2024 13.16, Toke Høiland-Jørgensen wrote: > Yunsheng Lin writes: > >> On 2024/10/24 22:40, Toke Høiland-Jørgensen wrote: >> >> ... >> >>>>> >>>>> I really really dislike this approach! >>>>> >>>>> Nacked-by: Jesper Dangaard Brouer >>>>> >>>>> Having to keep an array to record all the pages including the ones >>>>> which are handed over to network stack, goes against the very principle >>>>> behind page_pool. We added members to struct page, such that pages could >>>>> be "outstanding". >>>> >>>> Before and after this patch both support "outstanding", the difference is >>>> how many "outstanding" pages do they support. >>>> >>>> The question seems to be do we really need unlimited inflight page for >>>> page_pool to work as mentioned in [1]? >>>> >>>> 1. https://lore.kernel.org/all/5d9ea7bd-67bb-4a9d-a120-c8f290c31a47@huawei.com/ >>> >>> Well, yes? Imposing an arbitrary limit on the number of in-flight >>> packets (especially such a low one as in this series) is a complete >>> non-starter. Servers have hundreds of gigs of memory these days, and if >>> someone wants to use that for storing in-flight packets, the kernel >>> definitely shouldn't impose some (hard-coded!) limit on that. >> I agree this limit is a non-starter. >> You and Jesper seems to be mentioning a possible fact that there might >> be 'hundreds of gigs of memory' needed for inflight pages, it would be nice >> to provide more info or reasoning above why 'hundreds of gigs of memory' is >> needed here so that we don't do a over-designed thing to support recording >> unlimited in-flight pages if the driver unbound stalling turns out impossible >> and the inflight pages do need to be recorded. > > I don't have a concrete example of a use that will blow the limit you > are setting (but maybe Jesper does), I am simply objecting to the > arbitrary imposing of any limit at all. It smells a lot of "640k ought > to be enough for anyone". > As I wrote before. In *production* I'm seeing TCP memory reach 24 GiB (on machines with 384GiB memory). I have attached a grafana screenshot to prove what I'm saying. As my co-worker Mike Freemon, have explain to me (and more details in blogposts[1]). It is no coincident that graph have a strange "sealing" close to 24 GiB (on machines with 384GiB total memory). This is because TCP network stack goes into a memory "under pressure" state when 6.25% of total memory is used by TCP-stack. (Detail: The system will stay in that mode until allocated TCP memory falls below 4.68% of total memory). [1] https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/ >> I guess it is common sense to start with easy one until someone complains >> with some testcase and detailed reasoning if we need to go the hard way as >> you and Jesper are also prefering waiting over having to record the inflight >> pages. > > AFAIU Jakub's comment on his RFC patch for waiting, he was suggesting > exactly this: Add the wait, and see if the cases where it can stall turn > out to be problems in practice. +1 I like Jakub's approach. --Jesper