On Tue, Mar 15, 2022 at 11:05 PM Song Liu <song@kernel.org> wrote:

On Wed, Mar 9, 2022 at 12:27 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
[...]
>
> The map usage by BPF programs and integration with rstat can be as follows:
> - Internally, each map entry has per-cpu arrays, a total array, and a
> pending array. BPF programs and user space only see one array.
> - The update interface is disabled. BPF programs use helpers to modify
> elements. Internally, the modifications are made to per-cpu arrays,
> and invoke a call to cgroup_bpf_updated() or an equivalent.
> - Lookups (from BPF programs or user space) invoke an rstat flush and
> read from the total array.

Lookups invoke a rstat flush, so we still walk every node of a subtree for
each lookup, no? So the actual cost should be similar than walking the
subtree with some BPF program? Did I miss something?

Hi Song,

Thanks for taking the time to read my proposal.

The rstat framework maintains a tree that contains only updated cgroups. An rstat flush only traverses this tree, not the cgroup subtree/hierarchy.

This also ensures that consecutive readers do not have to do any traversals unless new updates happened, because the first reader will have already flushed the stats.

Thanks,
Song

> - In cgroup_rstat_flush_locked() flush BPF stats as well.
>
[...]