From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6061CC433E0 for ; Tue, 26 May 2020 17:31:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 20D182084C for ; Tue, 26 May 2020 17:31:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Q8UcuakE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 20D182084C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9DF53800B6; Tue, 26 May 2020 13:31:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 98FAE80010; Tue, 26 May 2020 13:31:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A5B6800B6; Tue, 26 May 2020 13:31:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id 7183080010 for ; Tue, 26 May 2020 13:31:58 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1B1B68248068 for ; Tue, 26 May 2020 17:31:58 +0000 (UTC) X-FDA: 76859563116.20.song51_2e3bc4b681161 X-HE-Tag: song51_2e3bc4b681161 X-Filterd-Recvd-Size: 4852 Received: from mail-qv1-f65.google.com (mail-qv1-f65.google.com [209.85.219.65]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Tue, 26 May 2020 17:31:57 +0000 (UTC) Received: by mail-qv1-f65.google.com with SMTP id r16so1025313qvm.6 for ; Tue, 26 May 2020 10:31:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=QycW+biAD72KqTVySFX7OgzIfNJ+JITMcUnoB8nhmFM=; b=Q8UcuakEyeLbGLEdUDnNytNtdt62H2efob7q+wV2c/p5a1vtkd76WfMTGOpMGkrZk9 vohAO91u5zu3db0GgdfVCLb9edvjZzB6OELlJOdIQUWVUWgabDswr/xjhTV70fSAzdos a06QRzmwojzyiY8h9XbAptPeeE1lqccKXJrsmuOK9N2LN+9+C7LVgUnuSooQqWmcezcB F3X1eLkPl4IgsmF007yjQ5hUzC3p50ZwS2Jp2kgw8UjbtZdH1naxh0yG4FLbUx4HxIk0 aZcUp6SSeLHN6KRkXrUuYLOjxeL7BzxcJ2DdQw+FHsWintLcDC8LLut4oyWIhRM+eS0q 61BQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=QycW+biAD72KqTVySFX7OgzIfNJ+JITMcUnoB8nhmFM=; b=NuTXmsaCem8ZtfXCq59VpgcSVujCXetc3Nw2v6NqYn/Ct2ZFdNnYIl/fQOVwuTwNk2 QT7LvGA0QWz8CldK2KITauHQ8JtKTt+PqTzTFZEa4bTiHOOwgYwzkLZtV3koRHhmsi/X +JTOPO8wCtR0NpCsk1NaTx/OBMjVSAoS4m3pxcMT8gzQ/TYooFj8oomzZvJYGD2adTkO fLSIOh91j7CsdGFLcdDCsNsHjPWBk2kC/7Hk1hyLXSTs5hcHJAYAPMaQofYcAPfLZJy/ aFpn1ELd6tQ0P1kH1Aa5mTBhwQEX9KrP373vG0R8UyOW6mRBS5Juh9+DqEE2YU5NDCG4 pYPg== X-Gm-Message-State: AOAM530amJhzMhN3qgs4xFysFy4Ew2DkmdEuIfDxRhHApgMBMTeVfhJV fkdkH9Kkncywfk8fewQbrRM= X-Google-Smtp-Source: ABdhPJwREGivODvBgRQdvivFlvjeJqwTO44pGaghHiGrgVCn+80O/WZeGO5ZXP2dUAL/J9Q+Pb/juA== X-Received: by 2002:a0c:a144:: with SMTP id d62mr20445972qva.229.1590514316799; Tue, 26 May 2020 10:31:56 -0700 (PDT) Received: from localhost ([2620:10d:c091:480::1:85c0]) by smtp.gmail.com with ESMTPSA id s45sm311216qte.26.2020.05.26.10.31.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 May 2020 10:31:56 -0700 (PDT) Date: Tue, 26 May 2020 13:31:53 -0400 From: Tejun Heo To: Roman Gushchin Cc: Andrew Morton , Dennis Zhou , Christoph Lameter , Johannes Weiner , Michal Hocko , Shakeel Butt , linux-mm@kvack.org, kernel-team@fb.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC 2/5] mm: memcg/percpu: account percpu memory to memory cgroups Message-ID: <20200526173153.GE83516@mtj.thefacebook.com> References: <20200519201806.2308480-1-guro@fb.com> <20200519201806.2308480-3-guro@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200519201806.2308480-3-guro@fb.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 19, 2020 at 01:18:03PM -0700, Roman Gushchin wrote: > Percpu memory is becoming more and more widely used by various > subsystems, and the total amount of memory controlled by the percpu > allocator can make a good part of the total memory. > > As an example, bpf maps can consume a lot of percpu memory, > and they are created by a user. Also, some cgroup internals > (e.g. memory controller statistics) can be quite large. > On a machine with many CPUs and big number of cgroups they > can consume hundreds of megabytes. > > So the lack of memcg accounting is creating a breach in the memory > isolation. Similar to the slab memory, percpu memory should be > accounted by default. > > To implement the perpcu accounting it's possible to take the slab > memory accounting as a model to follow. Let's introduce two types of > percpu chunks: root and memcg. What makes memcg chunks different is > an additional space allocated to store memcg membership information. > If __GFP_ACCOUNT is passed on allocation, a memcg chunk should be be > used. If it's possible to charge the corresponding size to the target > memory cgroup, allocation is performed, and the memcg ownership data > is recorded. System-wide allocations are performed using root chunks, > so there is no additional memory overhead. > > To implement a fast reparenting of percpu memory on memcg removal, > we don't store mem_cgroup pointers directly: instead we use obj_cgroup > API, introduced for slab accounting. The overall approach makes sense to me but it'd help to have a high level comment explaining what's going on and why. Thanks. -- tejun