From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35E0DC433E2 for ; Thu, 3 Sep 2020 13:25:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CAD49206EF for ; Thu, 3 Sep 2020 13:25:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=shutemov-name.20150623.gappssmtp.com header.i=@shutemov-name.20150623.gappssmtp.com header.b="EdgNnc1c" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CAD49206EF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 33FA06B0002; Thu, 3 Sep 2020 09:25:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2EEE16B0003; Thu, 3 Sep 2020 09:25:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B66A6B0037; Thu, 3 Sep 2020 09:25:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id 026816B0002 for ; Thu, 3 Sep 2020 09:25:42 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 967B03631 for ; Thu, 3 Sep 2020 13:25:42 +0000 (UTC) X-FDA: 77221822524.11.cable53_6203a45270a9 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 6D2D5180F8B82 for ; Thu, 3 Sep 2020 13:25:39 +0000 (UTC) X-HE-Tag: cable53_6203a45270a9 X-Filterd-Recvd-Size: 9319 Received: from mail-lf1-f65.google.com (mail-lf1-f65.google.com [209.85.167.65]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Sep 2020 13:25:38 +0000 (UTC) Received: by mail-lf1-f65.google.com with SMTP id w11so1910871lfn.2 for ; Thu, 03 Sep 2020 06:25:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=eBG5KPOY3tL0PKRUVzvr/zW4jB9qxuIMmfqHyGWKJlo=; b=EdgNnc1cKrxWjOnsaJ+KBS7zdtE/AlGztDoDj8Sq+SKHlJmAvBO/o3CutJyV/wQowe S6r9jaomrAhxhdsjhR6WdpR5w61grLxNEndHdvkggfNe9U8PyI1Iz+ZDDtlqjEFInmxm P5Io5uJeF2oWWsxi2LpXU8J7whwmwupxvYAMMXtNEcfQDLno8zqn2Dq1OhLEPwfUDS6E z6l37O5G5gRq6R1VYqqTWZwxjyaVbgl6ba/zqYxnnyAgU3dzJMgGKdjgn959DxxY0fKg 9UHVw0R+TUuXvBvj8VfPnugHy82iql4kDcO6GqeQgG6m3RBbx/3DrXRKzR972smJ6XuB +3Xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=eBG5KPOY3tL0PKRUVzvr/zW4jB9qxuIMmfqHyGWKJlo=; b=RiRZP4FMJyM00dOzTrHXpDcfB8KKUGkf/WpliZesK05Aju+VRgs5xLv5X+f7qy1483 lPRL5dzB9SN4UOkjjx+BnMK5T1Y7NZsx+5Uz/uJzjdm4LVlGFgKxKuQEIXmEcPuz6vY6 ytTQuUPO6R1ZBPUF2GNp5IC/IMqE1gJkINg5ZpqAT+CUOazznwPgn7hhXi9vp4eKQS+0 nUJ10UinhRLmiR+j7ljCH8bMGixauXvLEgxbezqvauTwB+GMLnm0n2gLDWZrpwGxMVT/ 1sUItPTRRy8p0mMYV5i2GOLHqe63KVTeyQaBe45B4OPlV+GJXPFYFmzClxaeRMnzqVdW HCuQ== X-Gm-Message-State: AOAM530tT6APDbTyfSxKydOe6DkL6qALI49HTLXHh31YdhEzHso6Z8p5 5f0uJxRU9a0as0adOtQQmwJ0fw== X-Google-Smtp-Source: ABdhPJzBpBvcTJFLowp7YZqE6ahXTDgsAn6Pbm55COV8Mv1iKr9qa57jiOeoJlQ53vBw3ZfBBAfg5A== X-Received: by 2002:ac2:5e2e:: with SMTP id o14mr1287064lfg.163.1599139537343; Thu, 03 Sep 2020 06:25:37 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id y4sm606641ljk.61.2020.09.03.06.25.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Sep 2020 06:25:35 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 1A5C1102212; Thu, 3 Sep 2020 16:25:37 +0300 (+03) Date: Thu, 3 Sep 2020 16:25:37 +0300 From: "Kirill A. Shutemov" To: Sumit Semwal Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Alexey Dobriyan , Jonathan Corbet , Mauro Carvalho Chehab , Kees Cook , Michal Hocko , Colin Cross , Alexey Gladkov , Matthew Wilcox , Jason Gunthorpe , "Kirill A . Shutemov" , Michel Lespinasse , Michal =?utf-8?Q?Koutn=C3=BD?= , Song Liu , Huang Ying , Vlastimil Babka , Yang Shi , chenqiwu , Mathieu Desnoyers , John Hubbard , Mike Christie , Bart Van Assche , Amit Pundir , Thomas Gleixner , Christian Brauner , Daniel Jordan , Adrian Reber , Nicolas Viennot , Al Viro , linux-fsdevel@vger.kernel.org, John Stultz , Pekka Enberg , Dave Hansen , Peter Zijlstra , Ingo Molnar , Oleg Nesterov , "Eric W. Biederman" , Jan Glauber , Rob Landley , Cyrill Gorcunov , "Serge E. Hallyn" , David Rientjes , Hugh Dickins , Rik van Riel , Mel Gorman , Tang Chen , Robin Holt , Shaohua Li , Sasha Levin , Johannes Weiner , Minchan Kim Subject: Re: [PATCH v7 3/3] mm: add a field to store names for private anonymous memory Message-ID: <20200903132537.mp5e6o6ptgbkghxe@box> References: <20200901161459.11772-1-sumit.semwal@linaro.org> <20200901161459.11772-4-sumit.semwal@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200901161459.11772-4-sumit.semwal@linaro.org> X-Rspamd-Queue-Id: 6D2D5180F8B82 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Sep 01, 2020 at 09:44:59PM +0530, Sumit Semwal wrote: > From: Colin Cross > > In many userspace applications, and especially in VM based applications > like Android uses heavily, there are multiple different allocators in use. > At a minimum there is libc malloc and the stack, and in many cases there > are libc malloc, the stack, direct syscalls to mmap anonymous memory, and > multiple VM heaps (one for small objects, one for big objects, etc.). > Each of these layers usually has its own tools to inspect its usage; > malloc by compiling a debug version, the VM through heap inspection tools, > and for direct syscalls there is usually no way to track them. > > On Android we heavily use a set of tools that use an extended version of > the logic covered in Documentation/vm/pagemap.txt to walk all pages mapped > in userspace and slice their usage by process, shared (COW) vs. unique > mappings, backing, etc. This can account for real physical memory usage > even in cases like fork without exec (which Android uses heavily to share > as many private COW pages as possible between processes), Kernel SamePage > Merging, and clean zero pages. It produces a measurement of the pages > that only exist in that process (USS, for unique), and a measurement of > the physical memory usage of that process with the cost of shared pages > being evenly split between processes that share them (PSS). > > If all anonymous memory is indistinguishable then figuring out the real > physical memory usage (PSS) of each heap requires either a pagemap walking > tool that can understand the heap debugging of every layer, or for every > layer's heap debugging tools to implement the pagemap walking logic, in > which case it is hard to get a consistent view of memory across the whole > system. > > Tracking the information in userspace leads to all sorts of problems. > It either needs to be stored inside the process, which means every > process has to have an API to export its current heap information upon > request, or it has to be stored externally in a filesystem that > somebody needs to clean up on crashes. It needs to be readable while > the process is still running, so it has to have some sort of > synchronization with every layer of userspace. Efficiently tracking > the ranges requires reimplementing something like the kernel vma > trees, and linking to it from every layer of userspace. It requires > more memory, more syscalls, more runtime cost, and more complexity to > separately track regions that the kernel is already tracking. > > This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a > userspace-provided name for anonymous vmas. The names of named anonymous > vmas are shown in /proc/pid/maps and /proc/pid/smaps as [anon:]. Hm. I guess that there might be tools that expect the field to be empty for anonymous memory, no? > Userspace can set the name for a region of memory by calling > prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name); > Setting the name to NULL clears it. > > The name is stored in a user pointer in the shared union in vm_area_struct > that points to a null terminated string inside the user process. vmas > that point to the same address and are otherwise mergeable will be merged, > but vmas that point to equivalent strings at different addresses will not > be merged. > > The idea to store a userspace pointer to reduce the complexity within mm > (at the expense of the complexity of reading /proc/pid/mem) came from Dave > Hansen. This results in no runtime overhead in the mm subsystem other > than comparing the anon_name pointers when considering vma merging. The > pointer is stored in a union with fields that are only used on file-backed > mappings, so it does not increase memory usage. > (Upstream changed to remove the union, so this patch adds it back as well) IIUC, it gives userspace direct control of content of /proc/$PID/maps and /proc/$PID/smaps. There's no verification of the given string whatsoever. I'm sure security experts would find clever usage of the feature :P -- Kirill A. Shutemov