From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.8 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6099BC43457 for ; Mon, 12 Oct 2020 09:24:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9EE8C221FE for ; Mon, 12 Oct 2020 09:24:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="uKFHU+ok" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9EE8C221FE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C4A6B940007; Mon, 12 Oct 2020 05:24:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BFA8E900002; Mon, 12 Oct 2020 05:24:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC0B6940007; Mon, 12 Oct 2020 05:24:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id 7F1A0900002 for ; Mon, 12 Oct 2020 05:24:12 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 092DF181AE862 for ; Mon, 12 Oct 2020 09:24:12 +0000 (UTC) X-FDA: 77362737144.22.cord46_361027a271f9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 8EE7E18038E67 for ; Mon, 12 Oct 2020 09:24:11 +0000 (UTC) X-HE-Tag: cord46_361027a271f9 X-Filterd-Recvd-Size: 9443 Received: from mail-wr1-f66.google.com (mail-wr1-f66.google.com [209.85.221.66]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Mon, 12 Oct 2020 09:24:10 +0000 (UTC) Received: by mail-wr1-f66.google.com with SMTP id t9so18307386wrq.11 for ; Mon, 12 Oct 2020 02:24:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=DMVGRL2h5VknQLtG+XBcBI/2PJssThsIUXIEgYmYsSU=; b=uKFHU+okcDjb6Fb/zP+lA9f+wlJxiFRf7FlqX2XxTyuTAKieQcr+4zEW7fj4A8Nodv UF+rwE5tD2Honok8Btx3sX6sPae46NxFjByl9kbbCK6ESCFKenq7EwhoEVom6Bihw8o8 JwSALjaIOzE1qYYNL5Z5adiyP5tSNlQubxoSXrK/Q/orJAeILuAmOveunVcy+BUGzlKh GiyXc7DjBg3xpg2bF/WuWixArQ5fEkUFuXU6lyRn3lcpiitXWoGX4CX48uXPPd3VpwYe HPOJndvhcauDFtUV5/TmvJEWQsDs07mw7cLnIx/vGlrqjxN8HNJwzQgzNPlmNJICIUK+ Nd0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=DMVGRL2h5VknQLtG+XBcBI/2PJssThsIUXIEgYmYsSU=; b=O/W3O1hdfYoY3gtGhdjQySjrkUuZjYGm5NaeFKpHDQUo4q8NOeDhOL4FOecm7T2o2k EGumS7IgMNn0a6+IAkHuihyaYz5QyL1WxAxlPR206xJs62donOgom8AFnCLbkVwsq2qZ //T/3rDsWRIEJV1Nsn3xDC4qZBkMG1JQi9BbErPlHzXHX/8xvvKgrKvQYTAWDhoAF1AX NyG5OyYUeOPwT9i2HKNHKDHhm1qCp3BPJmppK39gzJdP3fjDNi0Ul1pfH7zqwH1lic7d C9p8ChxHCEAso92I4zsIgjsLB2H0s1EYp2EhNx8mgllMUmEJEWKpi1VmfTTYXqhb1kHN NSKw== X-Gm-Message-State: AOAM531K4bJLLYKeMMBaaRFk84/uf8GTb4NnwucEFqNJkJIvuFY8B6vR LdSVLQFL944+9pRr6l7eVkyIf/j2Q+A= X-Google-Smtp-Source: ABdhPJxWeGiqNCh1RC5V9AJmojzSnIkqs8imzlEXRVq0FqWUz6cdLVCww51FepImaJs0WTrh5TtGDQ== X-Received: by 2002:adf:9f4c:: with SMTP id f12mr16937624wrg.108.1602494649624; Mon, 12 Oct 2020 02:24:09 -0700 (PDT) Received: from [192.168.8.147] ([37.167.93.109]) by smtp.gmail.com with ESMTPSA id c16sm25066726wrx.31.2020.10.12.02.24.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 12 Oct 2020 02:24:08 -0700 (PDT) Subject: Re: [External] Re: [PATCH] mm: proc: add Sock to /proc/meminfo To: Muchun Song , Eric Dumazet Cc: Cong Wang , Greg KH , rafael@kernel.org, "Michael S. Tsirkin" , Jason Wang , David Miller , Jakub Kicinski , Alexey Dobriyan , Andrew Morton , Alexey Kuznetsov , Hideaki YOSHIFUJI , Steffen Klassert , Herbert Xu , Shakeel Butt , Will Deacon , Michal Hocko , Roman Gushchin , Neil Brown , rppt@kernel.org, Sami Tolvanen , "Kirill A. Shutemov" , Feng Tang , Paolo Abeni , Willem de Bruijn , Randy Dunlap , Florian Westphal , gustavoars@kernel.org, Pablo Neira Ayuso , Dexuan Cui , Jakub Sitnicki , Peter Zijlstra , Christian Brauner , "Eric W. Biederman" , Thomas Gleixner , dave@stgolabs.net, Michel Lespinasse , Jann Horn , chenqiwu@xiaomi.com, christophe.leroy@c-s.fr, Minchan Kim , Martin KaFai Lau , Alexei Starovoitov , Daniel Borkmann , Miaohe Lin , Kees Cook , LKML , virtualization@lists.linux-foundation.org, Linux Kernel Network Developers , linux-fsdevel , linux-mm References: <20201010103854.66746-1-songmuchun@bytedance.com> From: Eric Dumazet Message-ID: <9262ea44-fc3a-0b30-54dd-526e16df85d1@gmail.com> Date: Mon, 12 Oct 2020 11:24:05 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/12/20 10:39 AM, Muchun Song wrote: > On Mon, Oct 12, 2020 at 3:42 PM Eric Dumazet wrot= e: >> >> On Mon, Oct 12, 2020 at 6:22 AM Muchun Song = wrote: >>> >>> On Mon, Oct 12, 2020 at 2:39 AM Cong Wang = wrote: >>>> >>>> On Sat, Oct 10, 2020 at 3:39 AM Muchun Song wrote: >>>>> >>>>> The amount of memory allocated to sockets buffer can become signifi= cant. >>>>> However, we do not display the amount of memory consumed by sockets >>>>> buffer. In this case, knowing where the memory is consumed by the k= ernel >>>> >>>> We do it via `ss -m`. Is it not sufficient? And if not, why not addi= ng it there >>>> rather than /proc/meminfo? >>> >>> If the system has little free memory, we can know where the memory is= via >>> /proc/meminfo. If a lot of memory is consumed by socket buffer, we ca= nnot >>> know it when the Sock is not shown in the /proc/meminfo. If the unawa= re user >>> can't think of the socket buffer, naturally they will not `ss -m`. Th= e >>> end result >>> is that we still don=E2=80=99t know where the memory is consumed. And= we add the >>> Sock to the /proc/meminfo just like the memcg does('sock' item in the= cgroup >>> v2 memory.stat). So I think that adding to /proc/meminfo is sufficien= t. >>> >>>> >>>>> static inline void __skb_frag_unref(skb_frag_t *frag) >>>>> { >>>>> - put_page(skb_frag_page(frag)); >>>>> + struct page *page =3D skb_frag_page(frag); >>>>> + >>>>> + if (put_page_testzero(page)) { >>>>> + dec_sock_node_page_state(page); >>>>> + __put_page(page); >>>>> + } >>>>> } >>>> >>>> You mix socket page frag with skb frag at least, not sure this is ex= actly >>>> what you want, because clearly skb page frags are frequently used >>>> by network drivers rather than sockets. >>>> >>>> Also, which one matches this dec_sock_node_page_state()? Clearly >>>> not skb_fill_page_desc() or __skb_frag_ref(). >>> >>> Yeah, we call inc_sock_node_page_state() in the skb_page_frag_refill(= ). >>> So if someone gets the page returned by skb_page_frag_refill(), it mu= st >>> put the page via __skb_frag_unref()/skb_frag_unref(). We use PG_priva= te >>> to indicate that we need to dec the node page state when the refcount= of >>> page reaches zero. >>> >> >> Pages can be transferred from pipe to socket, socket to pipe (splice() >> and zerocopy friends...) >> >> If you want to track TCP memory allocations, you always can look at >> /proc/net/sockstat, >> without adding yet another expensive memory accounting. >=20 > The 'mem' item in the /proc/net/sockstat does not represent real > memory usage. This is just the total amount of charged memory. >=20 > For example, if a task sends a 10-byte message, it only charges one > page to memcg. But the system may allocate 8 pages. Therefore, it > does not truly reflect the memory allocated by the above memory > allocation path. We can see the difference via the following message. >=20 > cat /proc/net/sockstat > sockets: used 698 > TCP: inuse 70 orphan 0 tw 617 alloc 134 mem 13 > UDP: inuse 90 mem 4 > UDPLITE: inuse 0 > RAW: inuse 1 > FRAG: inuse 0 memory 0 >=20 > cat /proc/meminfo | grep Sock > Sock: 13664 kB >=20 > The /proc/net/sockstat only shows us that there are 17*4 kB TCP > memory allocations. But apply this patch, we can see that we truly > allocate 13664 kB(May be greater than this value because of per-cpu > stat cache). Of course the load of the example here is not high. In > some high load cases, I believe the difference here will be even > greater. >=20 This is great, but you have not addressed my feedback. TCP memory allocations are bounded by /proc/sys/net/ipv4/tcp_mem Fact that the memory is forward allocated or not is a detail. If you think we must pre-allocate memory, instead of forward allocations, your patch does not address this. Adding one line per consumer in /proc/m= eminfo looks wrong to me. If you do not want 9.37 % of physical memory being possibly used by TCP, just change /proc/sys/net/ipv4/tcp_mem accordingly ?