From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D6BAEB64D7 for ; Tue, 13 Jun 2023 22:36:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B9FE36B0074; Tue, 13 Jun 2023 18:36:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B29186B0075; Tue, 13 Jun 2023 18:36:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A20C8E0002; Tue, 13 Jun 2023 18:36:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 853C26B0074 for ; Tue, 13 Jun 2023 18:36:39 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 5A607C04D9 for ; Tue, 13 Jun 2023 22:36:39 +0000 (UTC) X-FDA: 80899185318.27.3CADCDE Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com [209.85.208.44]) by imf27.hostedemail.com (Postfix) with ESMTP id 67D8840004 for ; Tue, 13 Jun 2023 22:36:37 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3Glh+mv9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of emmir@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=emmir@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686695797; a=rsa-sha256; cv=none; b=V1u49gvy9gCMFQmDIaJm6aZLAPeife5RNB3DnqErgo1mFpg9jOzc9N/3uYGS291qlrVgF/ /fCdJtBnEPKauelgej+9U/OoD3RcqV6x7u/yJPEoFCrqMXEpk+Y5TtQXb+POzap4tketdm oGEVB7ZQ4qlrWShhfsYsPkvAx8d0VqI= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=3Glh+mv9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf27.hostedemail.com: domain of emmir@google.com designates 209.85.208.44 as permitted sender) smtp.mailfrom=emmir@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686695797; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PvWX2bD0pqjf2LH9LS1XjF7500hItj0+6/IMn+0hUyQ=; b=Wr+HZqBD6kAqCYhq3wbeMhNsmeWSPLQSLD8AeoJ/cafB0+BTtNp4xXyF6aMnoKy+7pp1UJ 5tTYakfYncQgtQKJpRXpp2UzOKbw9/o36m/rnfNpIjNOb3wNVpKEl5VBW5todZwub3ONAJ LsuzO1PMsYG/6V/Dm3sg2FPIQ9XoyvA= Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-514ad92d1e3so1361a12.1 for ; Tue, 13 Jun 2023 15:36:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686695796; x=1689287796; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PvWX2bD0pqjf2LH9LS1XjF7500hItj0+6/IMn+0hUyQ=; b=3Glh+mv9w/hlsr3HtW9bHUVgiWBnoxfzOAwjquvJ1AtpWmAxPHzA/8KB9rQGLI6bPW f7qGlGThGtdA47ymx+HgiBOoyHoX8jWor32TAvaO+WmUaWX3Xn1BpcQf2f1XZitWH15J MDv06bVTji5Ik3ErHeafLVYLUfoKp++ieuFYgvwKgCKflAvoHh5Ewq6xg9zaJ+jxAdsg lyTmXYkYfQvktXvGcfRQcvhxjg6Ujut2b0YJFUIhzfQVJ5PzkiNsrvcO0ZYmSN5sFFLX 0WJW3uVykV1WNvlLlgmXEDrdroLofKmh8799uzbRJ6beNn6YsvBet8ulVt1JaK5KjEaQ NW8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686695796; x=1689287796; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PvWX2bD0pqjf2LH9LS1XjF7500hItj0+6/IMn+0hUyQ=; b=gTGi8+na3q4kkzVtsmsY+Q5U95cHqJAuJ6g/lIWpmf8nhjVR1ttqG3X+cVFC+VFKO5 xSiwY2C0YZcJi0qzdF4A/OjJyJo9HqtNQsEF9LXsnkI5yBFAJPZMwQ6pKtgOvcD1GZQX XvrvthSAIaqiShw8SlEPLC7KpVoob3DWwn6lupqJt10537tgraXjn0cAU2LmMDnOFD6k 8g4uJvbCWpfJkdZ1JdDWCG82fXSVJAV0s5LDUdtuu8IuksN9ZojZRUIw4dKeSoj2PMau u/Ddixy3t5/raswIkHslkuQLMbsEiO8a42+WwrkIHTSkSKbynKuihYN6T/mgnB/k0qP6 wXxA== X-Gm-Message-State: AC+VfDxSkIuZH3yi0zhYNWkGfw6m7D7fAH4E8im3+162D4Z2jfRN9s8c nve/JrjVHRG8+W3DdWoMzNT3VfCNbbqWM1208Vzq5g== X-Google-Smtp-Source: ACHHUZ4b9ZmvHIHuVqiKCGkay8zIJtZ17R+nhtVYSufNYgAsmJA4+02HPCuDfftyIn3PNzWLVqxf8ToCsmK0FXHe7fI= X-Received: by 2002:a50:cc9a:0:b0:506:90c4:b63b with SMTP id q26-20020a50cc9a000000b0050690c4b63bmr36356edi.4.1686695795711; Tue, 13 Jun 2023 15:36:35 -0700 (PDT) MIME-Version: 1.0 References: <20230613102905.2808371-1-usama.anjum@collabora.com> <20230613102905.2808371-3-usama.anjum@collabora.com> In-Reply-To: <20230613102905.2808371-3-usama.anjum@collabora.com> From: =?UTF-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= Date: Wed, 14 Jun 2023 00:36:24 +0200 Message-ID: Subject: Re: [PATCH v18 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs To: Muhammad Usama Anjum Cc: Peter Xu , David Hildenbrand , Andrew Morton , Andrei Vagin , Danylo Mocherniuk , Paul Gofman , Cyrill Gorcunov , Mike Rapoport , Nadav Amit , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Suren Baghdasaryan , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Axel Rasmussen , "Gustavo A . R . Silva" , Dan Williams , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 67D8840004 X-Stat-Signature: 9mr7tezpz17kkw5wqo53yfj85sjqsmfx X-HE-Tag: 1686695797-778087 X-HE-Meta: U2FsdGVkX19dlk12QUy2hHLR7VrBLPnJe1VrTW0Ki+a8A4upJQKzTrXWgTuWOMJkjkteSto4tkqe8gwQ1bnNYkmN87zsM3trd2vPVjuh7YNX68g0wtQTdIYbSlIvL2Ig9z2WGRzS06C1kBV/lhScGffdMDvMad4ermxknGkONRWOEQNAxY7MLyMCmm1ESefiFZoLAZ5qdGHFgrjCBcoSRngqrkOezGQdLuJXH51FfS6jd3ptPMya6cjVGg6bRtKmgjMBzgpP2R4s4JrRwQvAgfZN7Wp0sUyACPAWaYiRYqRARPTu5d4UagQDfVwP5KFyKK/M5si71pwliCW8CeDsJ1HoulS2LREHnV4IV9TZhGwpJmwsOqJtc06AbFP5xsJj49oODKJ16KhhKZjBy4xsSb7XbFtnhqLoSqhnmeRNUkcPDrbhrHxPeUVV9Kk2cSGs872Nx0W/DbCttIgYxedJUeVpOcYoAlKk7kwLyKMWeOc/2FrMSKZhbUqBCEfG0Zegv5BDr17QafcHaEZ7hykl20nGVefnK6iANbbrs9rVSkWD1zWbM/OaBQIEHfX0uhFAKtonCxX7eZV9gaLeGYoiDOQ7Td9EKa8dyC3XrouhJ8n/XSkqt8StPTW41Yln48DGwsMkHHfcvklJ+cBikESv40jOl/NOCzZyRjoHFkNI40MhnPvJvjuBWzXBxH66cLxTfcdUswyRff5gxPeA3lvP6twxL+bwaLCjt59udS/wijvf8f+KnzOp6oNRT1k5slt3WhED4whgY6a0FUUSBgZ6h8QoXoFvxM27NMyHQTOtYL5pEnMKkdy++tn+DKXsxj6kI1GwVt0grN0e57IEYa/YxMWZfdKwBJ4XVtMxOx9FlT/BALK9QP6NNIArAgdrMroF3sixN23JfM3R/mRi7FBwwyxi49MCGDjc5TTyvUeBgaOEvq3HiBdVVOcaCwzMGgdt71G/wMLB/HNr8DQt3VY GUBPMsPt bwzklSuG7StTY2JAL+mLybK+FLO9oQJV2bnNinsWs4X0pl/y3jvAPG74f2gvYCta/Zxq+XEIQNow13ShVDq9Qve8DctSMyrU+Ni0yGjJNHuhylbibesZ0owFTVpGKX4iiYOzAkF4PfBeltTBbbszTPFRLC5l2rqmFJ8QgaxJMVqmaXMbDn/7YOrVP7Q5BOtU5MAEWFQM0X15iCI8WhdWCvjMdyRQ7F/spNeI4B2rmJfqwU8ro5eNa0OHdcNRBy/zb9asDmPzs5F97NSf0hHfu9eRUQkPnBW2iKJpgIhgvzXiQ+vmZhj/zPWVOc+qRhJp3BRBD9i2aYVW6MkXcqWCJ/CDUyYvfHWmhwQIVOVqUwT+9ELQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 13 Jun 2023 at 12:29, Muhammad Usama Anjum wrote: > > This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear > the info about page table entries. The following operations are supported > in this ioctl: > - Get the information if the pages have been written-to (PAGE_IS_WRITTEN)= , > file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped > (PAGE_IS_SWAPPED). > - Find pages which have been written-to and/or write protect the pages > (atomic PM_SCAN_OP_GET + PM_SCAN_OP_WP) > > This IOCTL can be extended to get information about more PTE bits. [...] > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c [...] > +static int pagemap_scan_output(bool wt, bool file, bool pres, bool swap, > + struct pagemap_scan_private *p, > + unsigned long addr, unsigned int n_pages) > +{ > + unsigned long bitmap =3D PM_SCAN_BITMAP(wt, file, pres, swap); > + struct page_region *cur_buf =3D &p->cur_buf; Maybe we can go a step further and say here `cur_buf =3D &p->vec_buf[p->vec_buf_index];` and remove `p->cur_buf` entirely? > + > + if (!n_pages) > + return -EINVAL; > + > + if ((p->required_mask & bitmap) !=3D p->required_mask) > + return 0; > + if (p->anyof_mask && !(p->anyof_mask & bitmap)) > + return 0; > + if (p->excluded_mask & bitmap) > + return 0; > + > + bitmap &=3D p->return_mask; > + if (!bitmap) > + return 0; > + > + if (cur_buf->bitmap =3D=3D bitmap && > + cur_buf->start + cur_buf->len * PAGE_SIZE =3D=3D addr) { > + cur_buf->len +=3D n_pages; > + p->found_pages +=3D n_pages; > + } else { > + if (cur_buf->len && p->vec_buf_index >=3D p->vec_buf_len) > + return -ENOMEM; Shouldn't this be -ENOSPC? -ENOMEM usually signifies that the kernel ran out of memory when allocating, not that there is no space in a user-provided buffer. BTW, the check could be inside the if() below for easier reading and less redundancy. > + if (cur_buf->len) { > + memcpy(&p->vec_buf[p->vec_buf_index], cur_buf, > + sizeof(*p->vec_buf)); > + p->vec_buf_index++; > + } > + > + cur_buf->start =3D addr; > + cur_buf->len =3D n_pages; > + cur_buf->bitmap =3D bitmap; > + p->found_pages +=3D n_pages; > + } > + > + if (p->max_pages && (p->found_pages =3D=3D p->max_pages)) Since `p->found_pages > 0` holds here, the first check is redundant. Nit: the parentheses around =3D=3D are not needed. > + return PM_SCAN_FOUND_MAX_PAGES; > + > + return 0; > +} [...] > +static inline unsigned long pagemap_scan_check_page_written(struct pagem= ap_scan_private *p) > +{ > + return ((p->required_mask | p->anyof_mask | p->excluded_mask) & > + PAGE_IS_WRITTEN) ? PM_SCAN_OP_WRITE : 0; > +} Please inline at the single callsite. For flags name: PM_REQUIRE_WRITE_ACCESS? Or Is it intended to be checked only if doing WP (as the current name suggests) and so it would be redundant as WP currently requires `p->required_mask =3D PAGE_IS_WRITTEN`? > +static int pagemap_scan_args_valid(struct pm_scan_arg *arg, unsigned lon= g start, > + struct page_region __user *vec) > +{ > + /* Detect illegal size, flags, len and masks */ > + if (arg->size !=3D sizeof(struct pm_scan_arg)) > + return -EINVAL; > + if (arg->flags & ~PM_SCAN_OPS) > + return -EINVAL; > + if (!arg->len) > + return -EINVAL; > + if ((arg->required_mask | arg->anyof_mask | arg->excluded_mask | > + arg->return_mask) & ~PM_SCAN_BITS_ALL) > + return -EINVAL; > + if (!arg->required_mask && !arg->anyof_mask && > + !arg->excluded_mask) > + return -EINVAL; > + if (!arg->return_mask) > + return -EINVAL; I just noticed that `!arg->return_mask =3D=3D !IS_PM_SCAN_GET()`. So the OP_GET is redundant and can be removed from the `flags` if checking `return_mask` instead. That way there won't be a "flags-encoded ops" thing as it would be a 'scan' with optional 'write-protect'. > + > + /* Validate memory range */ > + if (!IS_ALIGNED(start, PAGE_SIZE)) > + return -EINVAL; > + if (!access_ok((void __user *)start, arg->len)) > + return -EFAULT; > + > + if (IS_PM_SCAN_GET(arg->flags)) { > + if (!arg->vec) > + return -EINVAL; access_ok() -> -EFAULT (an early fail-check) (the vec_len should be checked first then, failing with -EINVAL if 0) > + if (arg->vec_len =3D=3D 0) > + return -EINVAL; > + } > + > + if (IS_PM_SCAN_WP(arg->flags)) { > + if (!IS_PM_SCAN_GET(arg->flags) && arg->max_pages) > + return -EINVAL; > + > + if ((arg->required_mask | arg->anyof_mask | arg->excluded= _mask) & > + ~PAGE_IS_WRITTEN) Is `excluded_mask =3D PAGE_IS_WRITTEN` intended to be allowed? It would make WP do nothing, unless the required/anyof/excluded masks are not supposed to limit WP? > + return -EINVAL; > + } If `flags =3D=3D 0` (and `return_mask =3D=3D 0` in case my earlier comment = is applied) it should fail with -EINVAL. [...] > --- a/include/uapi/linux/fs.h > +++ b/include/uapi/linux/fs.h > +/* > + * struct page_region - Page region with bitmap flags > + * @start: Start of the region > + * @len: Length of the region in pages > + * bitmap: Bits sets for the region '@' is missing for the third field. BTW, maybe we can call it something like `flags` or `category` (something that hints at the meaning of the value instead of its data representation). > +/* > + * struct pm_scan_arg - Pagemap ioctl argument > + * @size: Size of the structure > + * @flags: Flags for the IOCTL > + * @start: Starting address of the region > + * @len: Length of the region (All the pages in this lengt= h are included) Maybe `scan_start`, `scan_len` - so that there is a better distinction from the structure's `size` field? > + * @vec: Address of page_region struct array for output > + * @vec_len: Length of the page_region struct array > + * @max_pages: Optional max return pages > + * @required_mask: Required mask - All of these bits have to be set = in the PTE > + * @anyof_mask: Any mask - Any of these bits are set in t= he PTE > + * @excluded_mask: Exclude mask - None of these bits are set in the = PTE > + * @return_mask: Bits that are to be reported in page_region > + */ I skipped most of the page walk implementation as maybe the comments above could make it simpler. Reading this patch and the documentation I still feel confused about how the filtering/limiting parameters should affect GET, WP and WP+GET. Should they limit the pages walked (and WP-ed)? Or only the GET's output? How about GET+WP case? Best Regards Micha=C5=82 Miros=C5=82aw