From: Muhammad Usama Anjum <usama.anjum@collabora.com>
To: "Michał Mirosław" <emmir@google.com>
Cc: "Muhammad Usama Anjum" <usama.anjum@collabora.com>,
"Michał Mirosław" <mirq-linux@rere.qmqm.pl>,
"Andrei Vagin" <avagin@gmail.com>,
"Danylo Mocherniuk" <mdanylo@google.com>,
"Alex Sierra" <alex.sierra@amd.com>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Axel Rasmussen" <axelrasmussen@google.com>,
"Christian Brauner" <brauner@kernel.org>,
"Cyrill Gorcunov" <gorcunov@gmail.com>,
"Dan Williams" <dan.j.williams@intel.com>,
"David Hildenbrand" <david@redhat.com>,
"Greg KH" <gregkh@linuxfoundation.org>,
"Gustavo A . R . Silva" <gustavoars@kernel.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
"Matthew Wilcox" <willy@infradead.org>,
"Mike Rapoport" <rppt@kernel.org>,
"Nadav Amit" <namit@vmware.com>,
"Pasha Tatashin" <pasha.tatashin@soleen.com>,
"Paul Gofman" <pgofman@codeweavers.com>,
"Peter Xu" <peterx@redhat.com>, "Shuah Khan" <shuah@kernel.org>,
"Suren Baghdasaryan" <surenb@google.com>,
"Vlastimil Babka" <vbabka@suse.cz>,
"Yang Shi" <shy828301@gmail.com>,
"Yun Zhou" <yun.zhou@windriver.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
kernel@collabora.com
Subject: Re: [v2] fs/proc/task_mmu: Implement IOCTL for efficient page table scanning
Date: Mon, 24 Jul 2023 20:21:58 +0500 [thread overview]
Message-ID: <44eddc7d-fd68-1595-7e4f-e196abe37311@collabora.com> (raw)
In-Reply-To: <CABb0KFF6M2_94Ect72zMtaRLBpOoHjHYJA-Ube3oQAh4cXSg5w@mail.gmail.com>
On 7/24/23 7:38 PM, Michał Mirosław wrote:
> On Mon, 24 Jul 2023 at 16:04, Muhammad Usama Anjum
> <usama.anjum@collabora.com> wrote:
>>
>> Fixed found bugs. Testing it further.
>>
>> - Split and backoff in case buffer full case as well
>> - Fix the wrong breaking of loop if page isn't interesting, skip intead
>> - Untag the address and save them into struct
>> - Round off the end address to next page
>>
>> Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
>> ---
>> fs/proc/task_mmu.c | 54 ++++++++++++++++++++++++++--------------------
>> 1 file changed, 31 insertions(+), 23 deletions(-)
>>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index add21fdf3c9a..64b326d0ec6d 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -1985,18 +1989,19 @@ static int pagemap_scan_output(unsigned long
>> categories,
>> unsigned long n_pages, total_pages;
>> int ret = 0;
>>
>> + if (!p->vec_buf)
>> + return 0;
>> +
>> if (!pagemap_scan_is_interesting_page(categories, p)) {
>> *end = addr;
>> return 0;
>> }
>>
>> - if (!p->vec_buf)
>> - return 0;
>> -
>> categories &= p->arg.return_mask;
>
> This is wrong - is_interesting() check must happen before output as
> the `*end = addr` means the range should be skipped, but return 0
> requests continuing of the walk.
Will revert.
>
>> @@ -2044,7 +2050,7 @@ static int pagemap_scan_thp_entry(pmd_t *pmd,
>> unsigned long start,
>> * Break huge page into small pages if the WP operation
>> * need to be performed is on a portion of the huge page.
>> */
>> - if (end != start + HPAGE_SIZE) {
>> + if (end != start + HPAGE_SIZE || ret == -ENOSPC) {
>
> Why is it needed? If `end == start + HPAGE_SIZE` then we're handling a
> full hugepage anyway.
If we weren't able to add the complete thp in the output buffer and we need
to perform WP on the entire page, we should split and rollback. Otherwise
we'll WP the entire thp and we'll lose the state on the remaining THP which
wasn't added to output.
Lets say max=100
only 100 pages would be added to output
we need to split and rollback otherwise other 412 pages would get WP
>
>> @@ -2066,8 +2072,8 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd,
>> unsigned long start,
>> {
>> struct pagemap_scan_private *p = walk->private;
>> struct vm_area_struct *vma = walk->vma;
>> + unsigned long addr, categories, next;
>> pte_t *pte, *start_pte;
>> - unsigned long addr;
>> bool flush = false;
>> spinlock_t *ptl;
>> int ret;
>> @@ -2088,12 +2094,14 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd,
>> unsigned long start,
>> }
>>
>> for (addr = start; addr != end; pte++, addr += PAGE_SIZE) {
>> - unsigned long categories = p->cur_vma_category |
>> - pagemap_page_category(vma, addr, ptep_get(pte));
>> - unsigned long next = addr + PAGE_SIZE;
>> + categories = p->cur_vma_category |
>> + pagemap_page_category(vma, addr, ptep_get(pte));
>> + next = addr + PAGE_SIZE;
>
> Why moving the variable declarations out of the loop?
Saving spaces inside loop. What are pros of declation of variable in loop?
>
>>
>> ret = pagemap_scan_output(categories, p, addr, &next);
>> - if (next == addr)
>> + if (ret == 0 && next == addr)
>> + continue;
>> + else if (next == addr)
>> break;
>
> Ah, this indeed was a bug. Nit:
>
> if (next == addr) { if (!ret) continue; break; }
>
>> @@ -2204,8 +2212,6 @@ static const struct mm_walk_ops pagemap_scan_ops = {
>> static int pagemap_scan_get_args(struct pm_scan_arg *arg,
>> unsigned long uarg)
>> {
>> - unsigned long start, end, vec;
>> -
>> if (copy_from_user(arg, (void __user *)uarg, sizeof(*arg)))
>> return -EFAULT;
>>
>> @@ -2219,22 +2225,24 @@ static int pagemap_scan_get_args(struct pm_scan_arg
>> *arg,
>> arg->category_anyof_mask | arg->return_mask) & ~PM_SCAN_CATEGORIES)
>> return -EINVAL;
>>
>> - start = untagged_addr((unsigned long)arg->start);
>> - end = untagged_addr((unsigned long)arg->end);
>> - vec = untagged_addr((unsigned long)arg->vec);
>> + arg->start = untagged_addr((unsigned long)arg->start);
>> + arg->end = untagged_addr((unsigned long)arg->end);
>> + arg->vec = untagged_addr((unsigned long)arg->vec);
>
> BTW, We should we keep the tag in args writeback().
Sorry what?
After this function, the start, end and vec would be used. We need to make
sure that the address are untagged before that.
>
>> /* Validate memory pointers */
>> - if (!IS_ALIGNED(start, PAGE_SIZE))
>> + if (!IS_ALIGNED(arg->start, PAGE_SIZE))
>> return -EINVAL;
>> - if (!access_ok((void __user *)start, end - start))
>> + if (!access_ok((void __user *)arg->start, arg->end - arg->start))
>> return -EFAULT;
>> - if (!vec && arg->vec_len)
>> + if (!arg->vec && arg->vec_len)
>> return -EFAULT;
>> - if (vec && !access_ok((void __user *)vec,
>> + if (arg->vec && !access_ok((void __user *)arg->vec,
>> arg->vec_len * sizeof(struct page_region)))
>> return -EFAULT;
>>
>> /* Fixup default values */
>> + arg->end = (arg->end & ~PAGE_MASK) ?
>> + ((arg->end & PAGE_MASK) + PAGE_SIZE) : (arg->end);
>
> arg->end = ALIGN(arg->end, PAGE_SIZE);
>
>> if (!arg->max_pages)
>> arg->max_pages = ULONG_MAX;
>>
>
> Best Regards
> Michał Mirosław
--
BR,
Muhammad Usama Anjum
next prev parent reply other threads:[~2023-07-24 15:22 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-13 10:14 [PATCH v25 0/5] Implement IOCTL to get and optionally clear info about PTEs Muhammad Usama Anjum
2023-07-13 10:14 ` [PATCH v25 1/5] userfaultfd: UFFD_FEATURE_WP_ASYNC Muhammad Usama Anjum
2023-07-13 10:14 ` [PATCH v25 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs Muhammad Usama Anjum
2023-07-17 17:26 ` Andrei Vagin
2023-07-18 8:18 ` Muhammad Usama Anjum
2023-07-18 16:08 ` Andrei Vagin
2023-07-18 16:27 ` Muhammad Usama Anjum
2023-07-13 10:14 ` [PATCH v25 3/5] tools headers UAPI: Update linux/fs.h with the kernel sources Muhammad Usama Anjum
2023-07-13 10:14 ` [PATCH v25 4/5] mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL Muhammad Usama Anjum
2023-07-13 10:14 ` [PATCH v25 5/5] selftests: mm: add pagemap ioctl tests Muhammad Usama Anjum
2023-07-20 19:28 ` fs/proc/task_mmu: Implement IOCTL for efficient page table scanning Michał Mirosław
2023-07-20 19:50 ` Michał Mirosław
2023-07-20 21:12 ` kernel test robot
2023-07-21 2:56 ` kernel test robot
2023-07-21 4:27 ` Muhammad Usama Anjum
2023-07-21 14:49 ` Andrei Vagin
2023-07-21 5:43 ` kernel test robot
2023-07-21 7:18 ` kernel test robot
2023-07-21 10:48 ` Muhammad Usama Anjum
2023-07-21 11:23 ` Michał Mirosław
2023-07-21 17:50 ` Muhammad Usama Anjum
2023-07-22 0:22 ` Michał Mirosław
2023-07-22 0:24 ` [v2] " Michał Mirosław
2023-07-22 13:55 ` kernel test robot
2023-07-22 14:05 ` kernel test robot
2023-07-24 14:04 ` Muhammad Usama Anjum
2023-07-24 14:38 ` Michał Mirosław
2023-07-24 15:21 ` Muhammad Usama Anjum [this message]
2023-07-24 16:10 ` Michał Mirosław
2023-07-25 7:23 ` Muhammad Usama Anjum
2023-07-25 9:09 ` Muhammad Usama Anjum
2023-07-25 9:11 ` [v3] " Muhammad Usama Anjum
2023-07-25 18:05 ` Michał Mirosław
2023-07-26 8:34 ` Muhammad Usama Anjum
2023-07-26 21:10 ` Michał Mirosław
2023-07-26 23:06 ` Paul Gofman
2023-07-27 11:18 ` Michał Mirosław
2023-07-27 11:21 ` Michał Mirosław
2023-07-27 17:15 ` Paul Gofman
2023-07-27 8:03 ` Muhammad Usama Anjum
2023-07-27 11:26 ` Michał Mirosław
2023-07-27 11:31 ` Muhammad Usama Anjum
2023-07-21 11:29 ` Michał Mirosław
2023-07-21 17:51 ` Muhammad Usama Anjum
2023-08-26 13:07 ` kernel test robot
2023-07-18 16:05 ` [PATCH v25 0/5] Implement IOCTL to get and optionally clear info about PTEs Rogerio Alves
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44eddc7d-fd68-1595-7e4f-e196abe37311@collabora.com \
--to=usama.anjum@collabora.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=alex.sierra@amd.com \
--cc=avagin@gmail.com \
--cc=axelrasmussen@google.com \
--cc=brauner@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=emmir@google.com \
--cc=gorcunov@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=gustavoars@kernel.org \
--cc=kernel@collabora.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mdanylo@google.com \
--cc=mirq-linux@rere.qmqm.pl \
--cc=namit@vmware.com \
--cc=pasha.tatashin@soleen.com \
--cc=peterx@redhat.com \
--cc=pgofman@codeweavers.com \
--cc=rppt@kernel.org \
--cc=shuah@kernel.org \
--cc=shy828301@gmail.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=yun.zhou@windriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox