From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31C28C4332F for ; Thu, 22 Dec 2022 00:37:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C05BE8E0003; Wed, 21 Dec 2022 19:37:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB5A48E0001; Wed, 21 Dec 2022 19:37:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7DF68E0003; Wed, 21 Dec 2022 19:37:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 99A558E0001 for ; Wed, 21 Dec 2022 19:37:38 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 697454030A for ; Thu, 22 Dec 2022 00:37:38 +0000 (UTC) X-FDA: 80268078996.25.87D719E Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) by imf02.hostedemail.com (Postfix) with ESMTP id CFB668000F for ; Thu, 22 Dec 2022 00:37:36 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=eVNjqgBw; spf=pass (imf02.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1671669456; a=rsa-sha256; cv=none; b=Mapam4IRt9MrXmpE/AAsr1WxXeNhZLNJg3xzj4XmmkMdIOkdOuE4gNCgmQJB8adDaVQ8xM W4MVKDsKSeWPAXNnek4SHjLezdCGEuh+EwKGlRGFwkl5+EugV1CgoBf5KFkPJ/p32sNLzb LoX7/RDoX1ThiHqvCGmMuu4w+emVlds= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=eVNjqgBw; spf=pass (imf02.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.128.50 as permitted sender) smtp.mailfrom=nphamcs@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1671669456; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hAdnUtu9qRVSKqpWG/Tpt14Jak70xwbh8SQrM071rQ8=; b=sf8NQrS5O9hWIKZuiG6/g6fZFCgsoxcrnHfJNP7oqTAzO2Jrkl5Ti+tn5ORAnNs9vT8T+/ AeHkursO8soqp/V2ugMOCJH1kMucofg7cCP6CaHmJ6wjbPI2vXIjz/gom0VB+IfvIznVB5 HgOxfBv8pUyrbdCGsXrpygZNGCO8Ogs= Received: by mail-wm1-f50.google.com with SMTP id m19so334674wms.5 for ; Wed, 21 Dec 2022 16:37:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=hAdnUtu9qRVSKqpWG/Tpt14Jak70xwbh8SQrM071rQ8=; b=eVNjqgBwN0FXeAH2HzMCRHRF2VZdFv0VA8YCVCQ9nHtRIukFXj5Fl0Mcs20QSrNlp6 9EWl211IiNbDFr9fLXFuam2ae+DZ0dws07t4QSSW1rQCj0boCFB/Q+/xbtxcDX49QeHX QV7a/bSbLwPloA7YwT4KpT8NC0CHnkQ/Px2B436AOQBQEOiHQUahLYwyepUPVRgTY0nZ PYintlSl3MpeFUPSKZSU0Fd9QGl3lcSgVc140rg5We/KF+S0hlYOatBxTDiFIUa3V7xi 13ky7GKjofowgYpQYvM0OwyR/LxIIOJowSOsnsaNGZfp1vtbzZokMBXaojtDSjJVfLWE 4sIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hAdnUtu9qRVSKqpWG/Tpt14Jak70xwbh8SQrM071rQ8=; b=ivcqGun+m35Oui4lvwgdVL2Y0ZmQbXtDrSpoE8CnAZ9hdOwQ7WtMSKfu4Q3YzA3+1E 1M68MwTaMCvaEyG8kHPw5Px9n+6UZg2MKWNimQtB4e0oBQoPNubvkIQOWuQw/07NtME3 8yz4vUnkh/5y2PfUBhjCvMXbBLgoQPNLMhwQvVuJLzm54zgvcUgUl+HOhGX+HzdbRnbB y3vnB3NVUORJiQjzr7MKIAqU0sdmc04N3PJign1Ipwfh6QqbCHSxWRHUqjPl1csXawvc dAFQJ0YPiG0j8qZHlanj1Q/PUvBl6hqKALGbbFcCvreRrMj3bRxF188q05Zw5eYcm1zm rZ6w== X-Gm-Message-State: AFqh2ko3ofwS3qtchwj87RbYVCRDlrOHyrUq2tPmv2AB1xQTj4r5YR23 vE/7VV5ZeEryTUP1YWlzeAnGGE8k2u/dOtKYyeE= X-Google-Smtp-Source: AMrXdXsHgbK7EGFRNLH2qL0IwC1ZPob5Eily2pO2/NmLNbq+3gB8VvkaJs6e5gqPacWJ2EltX6R79Q3pO7xGpcXHd3o= X-Received: by 2002:a7b:c7cb:0:b0:3cf:a511:3217 with SMTP id z11-20020a7bc7cb000000b003cfa5113217mr206730wmk.205.1671669455260; Wed, 21 Dec 2022 16:37:35 -0800 (PST) MIME-Version: 1.0 References: <20221216192149.3902877-1-nphamcs@gmail.com> <20221216192149.3902877-4-nphamcs@gmail.com> <20221216134814.61c8d5119ceb4179c68e1cd7@linux-foundation.org> In-Reply-To: <20221216134814.61c8d5119ceb4179c68e1cd7@linux-foundation.org> From: Nhat Pham Date: Wed, 21 Dec 2022 16:37:23 -0800 Message-ID: Subject: Re: [PATCH v4 3/4] cachestat: implement cachestat syscall To: Andrew Morton Cc: hannes@cmpxchg.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, bfoster@redhat.com, willy@infradead.org, kernel-team@meta.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: CFB668000F X-Stat-Signature: tahwejz4xzsrt3muw3nrbdp41depzc4b X-HE-Tag: 1671669456-485665 X-HE-Meta: U2FsdGVkX1+vDCJyGtvPlwt7713a2uN3uvQKNTCbF16NZQmAVKTCnDlF1kRszAWtkHqjZNbEmEQyqWQIfCXCZZD9NfAv2eZm7X0OrU6jSIVBy5zu6VEkCZTVzm71L2HA8dxbcgrT+kwxpewmXvrbU9HgbSSLySMYSfDW3Ev+sKQ6+DyqyMUuJV9ztniPlGFW96CUynZvEnSw0pUbDfrfuO2/9hInJEQ//puUP1XL0IuzCh0PCNnhpD4NJY04tUCetVjeZByyB/SzKoyGhD4rvu1mHpbBoD9tdCgI/+6HWx9R7UoX1uP1dye8J6MvzWbFq37/lY2N/NUAgc4bvSgxYZD0cAPnVoITzWMLIIfctx0RwOI0zFepCTAcTxRlviAN2H9JDM8ELEWBeZYOljcTXevLGBoh61L8dlS4Ii+e/4ZA2xrMl8Lkc+QL5VUCyH2SK1tHs3QffQii4xaixZX992rYFhT+0lrLNFxyHetoFt0mTDNV70EXAQwFC2jaxvYOupsQkrhbsFWCIFHPbHGCmqASPQzaNRUvrVOSty+7/kvGr0A/2dZOVEgJDCIju8WxGKWBtJlw7+4Iq/oYIDIM65gH9WUMolv4aypNnDyGpjK6jw8j752ElR4+NeQQ/LFxhTEagzk+xY2k5iEaBVAUw0daKK0zqWJJoYTj0rZLuJ4tjZv48u3jE2NWaJnATIa7IrWUGIEN+vQW3J3W1w4JgI8MOwo6uy2wZOunjMfNL/olNf9ZCN1ua4F4mBSTyX4X/+rEAgx81bdsVsFwm1JO5OBkwZasFU6BuMoB5tjEV5DBgUJGh5B32VQ5kcN2wt7uADoH4hSsA416H6NWQukXSSzAhDWdVyLb9m/RLZuuAOgAmRu4wdOFcyyIlcpJdJfqd+1np+2itqTruPczef0m5yMc/pnBrt7J4K6kfvtAt7cF4BuGRRsyKiSymyIXg3pa6GGs1QdFz7Uu/3QKvDA 1Rjmawwf z8WwtjiRcm6pPyAXDCzk3JTbWy3+ijncfsS6aPwSbKi3op+TGWPd2gRdD8g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Dec 16, 2022 at 1:48 PM Andrew Morton wrote: > > On Fri, 16 Dec 2022 11:21:48 -0800 Nhat Pham wrote: > > > Implement a new syscall that queries cache state of a file and > > summarizes the number of cached pages, number of dirty pages, number of > > pages marked for writeback, number of (recently) evicted pages, etc. in > > a given range. > > > > NAME > > cachestat - query the page cache status of a file. > > > > SYNOPSIS > > #include > > > > struct cachestat { > > __u64 nr_cache; > > __u64 nr_dirty; > > __u64 nr_writeback; > > __u64 nr_evicted; > > __u64 nr_recently_evicted; > > }; > > > > int cachestat(unsigned int fd, off_t off, size_t len, > > size_t cstat_size, struct cachestat *cstat, > > unsigned int flags); > > > > DESCRIPTION > > cachestat() queries the number of cached pages, number of dirty > > pages, number of pages marked for writeback, number of (recently) > > evicted pages, in the bytes range given by `off` and `len`. > > I suggest this be spelled out better: "number of evicted and number or > recently evicted pages". > > I suggest this clearly tell readers what an "evicted" page is - they > aren't kernel programmers! Valid points - I'll try to explain this more clearly in the future versions of this patch series, especially in the draft man page. > > What is the benefit of the "recently evicted" pages? "recently" seems > very vague - what use is this to anyone? This eviction recency semantics comes from the LRU's refault computation. Users of cachestat might be interested in two very different questions: 1. How many pages are not resident in the page cache. 2. How many pages are recently evicted (recently enough that their refault will be construed as memory pressure). The first question is answered with nr_evicted, whereas the second is answered with nr_recently_evicted. I will figure out a way to explain this better in the next version. Users definitely do not need to know the nitty gritty details of LRU logic, but they should know the general idea of each field at least. > > > These values are returned in a cachestat struct, whose address is > > given by the `cstat` argument. > > > > The `off` and `len` arguments must be non-negative integers. If > > `len` > 0, the queried range is [`off`, `off` + `len`]. If `len` == > > 0, we will query in the range from `off` to the end of the file. > > > > `cstat_size` allows users to obtain partial results. The syscall > > will copy the first `csstat_size` bytes to the specified userspace > > memory. `cstat_size` must be a non-negative value that is no larger > > than the current size of the cachestat struct. > > > > The `flags` argument is unused for now, but is included for future > > extensibility. User should pass 0 (i.e no flag specified). > > Why is `flags' here? We could add an unused flags arg to any syscall, > but we don't. What's the plan? I included this field to ensure that cachestat can be extended safely, especially when different users might want different things out of it. For instance, in the future there might be new fields/computations that are too heavy for certain use cases - a flag could be used to disable/skip such fields/computations. Another thing it might be used for is the huge page counting - we have not implemented this in this version yet, but it might introduce murky semantics to new/existing fields in struct cachestat. Or maybe not - but worst case scenario we can leave this decision to the users to decide through flags. I'm sure there are more potential pitfalls that the flags could save us from, but these are the two on top of my head. > > Are there security implications? If I know that some process has a > file open, I can use cachestat() to infer which parts of that file > they're looking at (like mincore(), I guess). And I can infer which > parts they're writing to, unlike mincore(). This one, I'm not 100% sure, but it is a valid concern. Let me think about it and discuss with more security-oriented minds before responding to this. > > I suggest the [patch 1/4] fixup be separated from this series. Sounds good! I'll loop Johannes in about this breakup as well.