From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA9E8C433EF for ; Wed, 2 Feb 2022 23:07:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE98E6B0291; Wed, 2 Feb 2022 18:06:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A980C6B0292; Wed, 2 Feb 2022 18:06:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 939066B0293; Wed, 2 Feb 2022 18:06:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id 883616B0291 for ; Wed, 2 Feb 2022 18:06:59 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 53426972ED for ; Wed, 2 Feb 2022 23:06:59 +0000 (UTC) X-FDA: 79099376958.05.87AC629 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 99AF9A0004 for ; Wed, 2 Feb 2022 23:06:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1643843218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=srHq4RerCPcuK2/I9lWBI5nczQqLUr4zYMys9nonwW0=; b=cbR4b/netAA0zZ6g9D0DBNbtG+a6ExoxplzEU0jw0CqOy6KlaJ5hFBhd265usR6yVpuSHA xgYnNpe4GPTqmBf+QLBdoQICRkItPNdCYLSI/9fFOGiv3scYAWXETtJs1lgCtxHrp2nxKz H7TVoi/jW2F5qxHi/hgZO/kXXKAe5x4= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-47-laOJsAqOPc-2-OkNWQ9wZg-1; Wed, 02 Feb 2022 18:06:55 -0500 X-MC-Unique: laOJsAqOPc-2-OkNWQ9wZg-1 Received: by mail-qk1-f200.google.com with SMTP id d11-20020a37680b000000b0047d87e46f4aso847155qkc.11 for ; Wed, 02 Feb 2022 15:06:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=srHq4RerCPcuK2/I9lWBI5nczQqLUr4zYMys9nonwW0=; b=mGQYqSMC4fJDtOO4Sj2cVr7gVh/IHZ4l+QLXZgnrv6WGrB5P4xK92RxfH5WqK2g1vO wYmLysYkkEkzwEGZChXVoQMkCybSqE0+shKf6l+QUrLC3bN8ynlSRfxA6zk3sIBGTbjv UKxIp0l9LZoS6L6O+Ttw0Vwf8SH1VGF+d4OUNCyX3wO+2uBY6Kg8eB5VQ+lotMj4ZkAf LuzPUy4xOW+W/ZOiHsm1T6eB9LSWGmDiJv4O+wW/Aks7Pq/FTUAVR2Gb/LTzSQqkL4yA R9pi7a5o8TfFkT6Hi3NxO7zm6rS02aiS8/+HF5HNesujRCo2lElEP988pqe+v0x7fdIt NuYw== X-Gm-Message-State: AOAM530iAjGy3K0erNeEhiN7d8OaV5mID/GfTrwKL916egPyhpOY0Xeq hR2zeOLobC+H1ABimZ+3rdDcE0j26r3JZjqHCmUpj79K2clWoxv7vjdfXoAUdiQGFgnGlN8RtUk uaSjGaw/5zf4= X-Received: by 2002:a05:622a:144a:: with SMTP id v10mr16612666qtx.350.1643843214593; Wed, 02 Feb 2022 15:06:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJwhd9wQP9nLOoPNf+zPu0Q9gF2MAe960omGhUqE/PcrJrAiip/p0+OPiu1a41WQoa5ljwp9Zg== X-Received: by 2002:a05:622a:144a:: with SMTP id v10mr16612625qtx.350.1643843214253; Wed, 02 Feb 2022 15:06:54 -0800 (PST) Received: from optiplex-fbsd (c-73-182-255-193.hsd1.nh.comcast.net. [73.182.255.193]) by smtp.gmail.com with ESMTPSA id h7sm3106143qtb.27.2022.02.02.15.06.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Feb 2022 15:06:53 -0800 (PST) Date: Wed, 2 Feb 2022 18:06:51 -0500 From: Rafael Aquini To: Waiman Long Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Petr Mladek , Steven Rostedt , Sergey Senozhatsky , Andy Shevchenko , Rasmus Villemoes , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Ira Weiny , Mike Rapoport , David Rientjes , Roman Gushchin Subject: Re: [PATCH v4 0/4] mm/page_owner: Extend page_owner to show memcg information Message-ID: References: <20220131192308.608837-5-longman@redhat.com> <20220202203036.744010-1-longman@redhat.com> MIME-Version: 1.0 In-Reply-To: <20220202203036.744010-1-longman@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Stat-Signature: krhz4mhbd4fus8x65x5ictexps9syfi5 X-Rspam-User: nil Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="cbR4b/ne"; spf=none (imf25.hostedemail.com: domain of aquini@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=aquini@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 99AF9A0004 X-HE-Tag: 1643843218-512596 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 02, 2022 at 03:30:32PM -0500, Waiman Long wrote: > v4: > - Take rcu_read_lock() when memcg is being accessed as suggested by > Michal. > - Make print_page_owner_memcg() return the new offset into the buffer > and put CONFIG_MEMCG block inside as suggested by Mike. > - Directly use TASK_COMM_LEN as length of name buffer as suggested by > Roman. > > v3: > - Add unlikely() to patch 1 and clarify that -1 will not be returned. > - Use a helper function to print out memcg information in patch 3. > - Add a new patch 4 to store task command name in page_owner > structure. > > v2: > - Remove the SNPRINTF() macro as suggested by Ira and use scnprintf() > instead to remove some buffer overrun checks. > - Add a patch to optimize vscnprintf with a size parameter of 0. > > While debugging the constant increase in percpu memory consumption on > a system that spawned large number of containers, it was found that a > lot of offline mem_cgroup structures remained in place without being > freed. Further investigation indicated that those mem_cgroup structures > were pinned by some pages. > > In order to find out what those pages are, the existing page_owner > debugging tool is extended to show memory cgroup information and whether > those memcgs are offline or not. With the enhanced page_owner tool, > the following is a typical page that pinned the mem_cgroup structure > in my test case: > > Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 162970 (podman), ts 1097761405537 ns, free_ts 1097760838089 ns > PFN 1925700 type Movable Block 3761 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff) > prep_new_page+0xac/0xe0 > get_page_from_freelist+0x1327/0x14d0 > __alloc_pages+0x191/0x340 > alloc_pages_vma+0x84/0x250 > shmem_alloc_page+0x3f/0x90 > shmem_alloc_and_acct_page+0x76/0x1c0 > shmem_getpage_gfp+0x281/0x940 > shmem_write_begin+0x36/0xe0 > generic_perform_write+0xed/0x1d0 > __generic_file_write_iter+0xdc/0x1b0 > generic_file_write_iter+0x5d/0xb0 > new_sync_write+0x11f/0x1b0 > vfs_write+0x1ba/0x2a0 > ksys_write+0x59/0xd0 > do_syscall_64+0x37/0x80 > entry_SYSCALL_64_after_hwframe+0x44/0xae > Charged to offline memcg libpod-conmon-15e4f9c758422306b73b2dd99f9d50a5ea53cbb16b4a13a2c2308a4253cc0ec8. > > So the page was not freed because it was part of a shmem segment. That > is useful information that can help users to diagnose similar problems. > > With cgroup v1, /proc/cgroups can be read to find out the total number > of memory cgroups (online + offline). With cgroup v2, the cgroup.stat of > the root cgroup can be read to find the number of dying cgroups (most > likely pinned by dying memcgs). > > The page_owner feature is not supposed to be enabled for production > system due to its memory overhead. However, if it is suspected that > dying memcgs are increasing over time, a test environment with page_owner > enabled can then be set up with appropriate workload for further analysis > on what may be causing the increasing number of dying memcgs. > > Waiman Long (4): > lib/vsprintf: Avoid redundant work with 0 size > mm/page_owner: Use scnprintf() to avoid excessive buffer overrun check > mm/page_owner: Print memcg information > mm/page_owner: Record task command name > > lib/vsprintf.c | 8 +++--- > mm/page_owner.c | 70 ++++++++++++++++++++++++++++++++++++++----------- > 2 files changed, 60 insertions(+), 18 deletions(-) > > -- > 2.27.0 > Thank you, Waiman. Acked-by: Rafael Aquini