[PATCH v3] perf: map pages in advance

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3] perf: map pages in advance
@ 2024-12-05  8:29 Lorenzo Stoakes
  2024-12-19 21:17 ` Chen, Zide
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-12-05  8:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, Kan Liang, linux-perf-users, linux-kernel,
	linux-mm, Matthew Wilcox, David Hildenbrand, Yi Lai

We are adjusting struct page to make it smaller, removing unneeded fields
which correctly belong to struct folio.

Two of those fields are page->index and page->mapping. Perf is currently
making use of both of these. This is unnecessary. This patch eliminates
this.

Perf establishes its own internally controlled memory-mapped pages using
vm_ops hooks. The first page in the mapping is the read/write user control
page, and the rest of the mapping consists of read-only pages.

The VMA is backed by kernel memory either from the buddy allocator or
vmalloc depending on configuration. It is intended to be mapped read/write,
but because it has a page_mkwrite() hook, vma_wants_writenotify() indicates
that it should be mapped read-only.

When a write fault occurs, the provided page_mkwrite() hook,
perf_mmap_fault() (doing double duty handing faults as well) uses the
vmf->pgoff field to determine if this is the first page, allowing for the
desired read/write first page, read-only rest mapping.

For this to work the implementation has to carefully work around faulting
logic. When a page is write-faulted, the fault() hook is called first, then
its page_mkwrite() hook is called (to allow for dirty tracking in file
systems).

On fault we set the folio's mapping in perf_mmap_fault(), this is because
when do_page_mkwrite() is subsequently invoked, it treats a missing mapping
as an indicator that the fault should be retried.

We also set the folio's index so, given the folio is being treated as faux
user memory, it correctly references its offset within the VMA.

This explains why the mapping and index fields are used - but it's not
necessary.

We preallocate pages when perf_mmap() is called for the first time via
rb_alloc(), and further allocate auxiliary pages via rb_aux_alloc() as
needed if the mapping requires it.

This allocation is done in the f_ops->mmap() hook provided in perf_mmap(),
and so we can instead simply map all the memory right away here - there's
no point in handling (read) page faults when we don't demand page nor need
to be notified about them (perf does not).

This patch therefore changes this logic to map everything when the mmap()
hook is called, establishing a PFN map. It implements vm_ops->pfn_mkwrite()
to provide the required read/write vs. read-only behaviour, which does not
require the previously implemented workarounds.

While it is not ideal to use a VM_PFNMAP here, doing anything else will
result in the page_mkwrite() hook need to be provided, which requires the
same page->mapping hack this patch seeks to undo.

It will also result in the pages being treated as folios and placed on the
rmap, which really does not make sense for these mappings.

Semantically it makes sense to establish this as some kind of special
mapping, as the pages are managed by perf and are not strictly user pages,
but currently the only means by which we can do so functionally while
maintaining the required R/W and R/O behaviour is a PFN map.

There should be no change to actual functionality as a result of this
change.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
v3:
* Fix issue raised by syzbot where it's possible that ret == 0 and rb ==
  NULL, leading to a null pointer deref in perf_mmap_(). Thanks to Yi Lai
  for reporting.
* Fix typos in commit message and correct prose.

v2:
* nommu fixup.
* Add comment explaining why we are using a VM_PFNMAP as suggested by
  David H.
https://lore.kernel.org/all/20241129153134.82755-1-lorenzo.stoakes@oracle.com/

v1:
https://lore.kernel.org/all/20241128113714.492474-1-lorenzo.stoakes@oracle.com/

 kernel/events/core.c        | 118 +++++++++++++++++++++++++-----------
 kernel/events/ring_buffer.c |  19 +-----
 2 files changed, 82 insertions(+), 55 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5d4a54f50826..000d2fe0b3cf 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6284,41 +6284,6 @@ void perf_event_update_userpage(struct perf_event *event)
 }
 EXPORT_SYMBOL_GPL(perf_event_update_userpage);

-static vm_fault_t perf_mmap_fault(struct vm_fault *vmf)
-{
-	struct perf_event *event = vmf->vma->vm_file->private_data;
-	struct perf_buffer *rb;
-	vm_fault_t ret = VM_FAULT_SIGBUS;
-
-	if (vmf->flags & FAULT_FLAG_MKWRITE) {
-		if (vmf->pgoff == 0)
-			ret = 0;
-		return ret;
-	}
-
-	rcu_read_lock();
-	rb = rcu_dereference(event->rb);
-	if (!rb)
-		goto unlock;
-
-	if (vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))
-		goto unlock;
-
-	vmf->page = perf_mmap_to_page(rb, vmf->pgoff);
-	if (!vmf->page)
-		goto unlock;
-
-	get_page(vmf->page);
-	vmf->page->mapping = vmf->vma->vm_file->f_mapping;
-	vmf->page->index   = vmf->pgoff;
-
-	ret = 0;
-unlock:
-	rcu_read_unlock();
-
-	return ret;
-}
-
 static void ring_buffer_attach(struct perf_event *event,
 			       struct perf_buffer *rb)
 {
@@ -6558,13 +6523,87 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 	ring_buffer_put(rb); /* could be last */
 }

+static vm_fault_t perf_mmap_pfn_mkwrite(struct vm_fault *vmf)
+{
+	/* The first page is the user control page, others are read-only. */
+	return vmf->pgoff == 0 ? 0 : VM_FAULT_SIGBUS;
+}
+
 static const struct vm_operations_struct perf_mmap_vmops = {
 	.open		= perf_mmap_open,
 	.close		= perf_mmap_close, /* non mergeable */
-	.fault		= perf_mmap_fault,
-	.page_mkwrite	= perf_mmap_fault,
+	.pfn_mkwrite	= perf_mmap_pfn_mkwrite,
 };

+static int map_range(struct perf_buffer *rb, struct vm_area_struct *vma)
+{
+	unsigned long nr_pages = vma_pages(vma);
+	int err = 0;
+	unsigned long pgoff;
+
+	/*
+	 * We map this as a VM_PFNMAP VMA.
+	 *
+	 * This is not ideal as this is designed broadly for mappings of PFNs
+	 * referencing memory-mapped I/O ranges or non-system RAM i.e. for which
+	 * !pfn_valid(pfn).
+	 *
+	 * We are mapping kernel-allocated memory (memory we manage ourselves)
+	 * which would more ideally be mapped using vm_insert_page() or a
+	 * similar mechanism, that is as a VM_MIXEDMAP mapping.
+	 *
+	 * However this won't work here, because:
+	 *
+	 * 1. It uses vma->vm_page_prot, but this field has not been completely
+	 *    setup at the point of the f_op->mmp() hook, so we are unable to
+	 *    indicate that this should be mapped CoW in order that the
+	 *    mkwrite() hook can be invoked to make the first page R/W and the
+	 *    rest R/O as desired.
+	 *
+	 * 2. Anything other than a VM_PFNMAP of valid PFNs will result in
+	 *    vm_normal_page() returning a struct page * pointer, which means
+	 *    vm_ops->page_mkwrite() will be invoked rather than
+	 *    vm_ops->pfn_mkwrite(), and this means we have to set page->mapping
+	 *    to work around retry logic in the fault handler, however this
+	 *    field is no longer allowed to be used within struct page.
+	 *
+	 * 3. Having a struct page * made available in the fault logic also
+	 *    means that the page gets put on the rmap and becomes
+	 *    inappropriately accessible and subject to map and ref counting.
+	 *
+	 * Ideally we would have a mechanism that could explicitly express our
+	 * desires, but this is not currently the case, so we instead use
+	 * VM_PFNMAP.
+	 *
+	 * We manage the lifetime of these mappings with internal refcounts (see
+	 * perf_mmap_open() and perf_mmap_close()) so we ensure the lifetime of
+	 * this mapping is maintained correctly.
+	 */
+	for (pgoff = 0; pgoff < nr_pages; pgoff++) {
+		unsigned long va = vma->vm_start + PAGE_SIZE * pgoff;
+		struct page *page = perf_mmap_to_page(rb, pgoff);
+
+		if (page == NULL) {
+			err = -EINVAL;
+			break;
+		}
+
+		/* Map readonly, perf_mmap_pfn_mkwrite() called on write fault. */
+		err = remap_pfn_range(vma, va, page_to_pfn(page), PAGE_SIZE,
+				      vm_get_page_prot(vma->vm_flags & ~VM_SHARED));
+		if (err)
+			break;
+	}
+
+#ifdef CONFIG_MMU
+	/* Clear any partial mappings on error. */
+	if (err)
+		zap_page_range_single(vma, vma->vm_start, nr_pages * PAGE_SIZE, NULL);
+#endif
+
+	return err;
+}
+
 static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct perf_event *event = file->private_data;
@@ -6689,6 +6728,8 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 			goto again;
 		}

+		/* We need the rb to map pages. */
+		rb = event->rb;
 		goto unlock;
 	}

@@ -6783,6 +6824,9 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 	vm_flags_set(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
 	vma->vm_ops = &perf_mmap_vmops;

+	if (!ret)
+		ret = map_range(rb, vma);
+
 	if (event->pmu->event_mapped)
 		event->pmu->event_mapped(event, vma->vm_mm);

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 4f46f688d0d4..180509132d4b 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -643,7 +643,6 @@ static void rb_free_aux_page(struct perf_buffer *rb, int idx)
 	struct page *page = virt_to_page(rb->aux_pages[idx]);

 	ClearPagePrivate(page);
-	page->mapping = NULL;
 	__free_page(page);
 }

@@ -819,7 +818,6 @@ static void perf_mmap_free_page(void *addr)
 {
 	struct page *page = virt_to_page(addr);

-	page->mapping = NULL;
 	__free_page(page);
 }

@@ -890,28 +888,13 @@ __perf_mmap_to_page(struct perf_buffer *rb, unsigned long pgoff)
 	return vmalloc_to_page((void *)rb->user_page + pgoff * PAGE_SIZE);
 }

-static void perf_mmap_unmark_page(void *addr)
-{
-	struct page *page = vmalloc_to_page(addr);
-
-	page->mapping = NULL;
-}
-
 static void rb_free_work(struct work_struct *work)
 {
 	struct perf_buffer *rb;
-	void *base;
-	int i, nr;

 	rb = container_of(work, struct perf_buffer, work);
-	nr = data_page_nr(rb);
-
-	base = rb->user_page;
-	/* The '<=' counts in the user page. */
-	for (i = 0; i <= nr; i++)
-		perf_mmap_unmark_page(base + (i * PAGE_SIZE));

-	vfree(base);
+	vfree(rb->user_page);
 	kfree(rb);
 }

--
2.47.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] perf: map pages in advance
  2024-12-05  8:29 [PATCH v3] perf: map pages in advance Lorenzo Stoakes
@ 2024-12-19 21:17 ` Chen, Zide
  2024-12-20  9:31   ` Lorenzo Stoakes
  0 siblings, 1 reply; 11+ messages in thread
From: Chen, Zide @ 2024-12-19 21:17 UTC (permalink / raw)
  To: Lorenzo Stoakes, Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, Kan Liang, linux-perf-users, linux-kernel,
	linux-mm, Matthew Wilcox, David Hildenbrand, Yi Lai



On 12/5/2024 12:29 AM, Lorenzo Stoakes wrote:
> We are adjusting struct page to make it smaller, removing unneeded fields
> which correctly belong to struct folio.
> 
> Two of those fields are page->index and page->mapping. Perf is currently
> making use of both of these. This is unnecessary. This patch eliminates
> this.
> 
> Perf establishes its own internally controlled memory-mapped pages using
> vm_ops hooks. The first page in the mapping is the read/write user control
> page, and the rest of the mapping consists of read-only pages.
> 
> The VMA is backed by kernel memory either from the buddy allocator or
> vmalloc depending on configuration. It is intended to be mapped read/write,
> but because it has a page_mkwrite() hook, vma_wants_writenotify() indicates
> that it should be mapped read-only.
> 
> When a write fault occurs, the provided page_mkwrite() hook,
> perf_mmap_fault() (doing double duty handing faults as well) uses the
> vmf->pgoff field to determine if this is the first page, allowing for the
> desired read/write first page, read-only rest mapping.
> 
> For this to work the implementation has to carefully work around faulting
> logic. When a page is write-faulted, the fault() hook is called first, then
> its page_mkwrite() hook is called (to allow for dirty tracking in file
> systems).
> 
> On fault we set the folio's mapping in perf_mmap_fault(), this is because
> when do_page_mkwrite() is subsequently invoked, it treats a missing mapping
> as an indicator that the fault should be retried.
> 
> We also set the folio's index so, given the folio is being treated as faux
> user memory, it correctly references its offset within the VMA.
> 
> This explains why the mapping and index fields are used - but it's not
> necessary.
> 
> We preallocate pages when perf_mmap() is called for the first time via
> rb_alloc(), and further allocate auxiliary pages via rb_aux_alloc() as
> needed if the mapping requires it.
> 
> This allocation is done in the f_ops->mmap() hook provided in perf_mmap(),
> and so we can instead simply map all the memory right away here - there's
> no point in handling (read) page faults when we don't demand page nor need
> to be notified about them (perf does not).
> 
> This patch therefore changes this logic to map everything when the mmap()
> hook is called, establishing a PFN map. It implements vm_ops->pfn_mkwrite()
> to provide the required read/write vs. read-only behaviour, which does not
> require the previously implemented workarounds.
> 
> While it is not ideal to use a VM_PFNMAP here, doing anything else will
> result in the page_mkwrite() hook need to be provided, which requires the
> same page->mapping hack this patch seeks to undo.
> 
> It will also result in the pages being treated as folios and placed on the
> rmap, which really does not make sense for these mappings.
> 
> Semantically it makes sense to establish this as some kind of special
> mapping, as the pages are managed by perf and are not strictly user pages,
> but currently the only means by which we can do so functionally while
> maintaining the required R/W and R/O behaviour is a PFN map.
> 
> There should be no change to actual functionality as a result of this
> change.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

With this patch, it seems perf tool has some problems in capturing the
kernel data with Intel PT.

Running the following commands, the size of perf.data is very small, and
perf script can't find any valid records.

perf record -e intel_pt//u -- /bin/ls
perf script --insn-trace




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] perf: map pages in advance
  2024-12-19 21:17 ` Chen, Zide
@ 2024-12-20  9:31   ` Lorenzo Stoakes
  2024-12-20  9:56     ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-12-20  9:31 UTC (permalink / raw)
  To: Chen, Zide
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, linux-mm, Matthew Wilcox, David Hildenbrand,
	Yi Lai

On Thu, Dec 19, 2024 at 01:17:44PM -0800, Chen, Zide wrote:

> With this patch, it seems perf tool has some problems in capturing the
> kernel data with Intel PT.
>
> Running the following commands, the size of perf.data is very small, and
> perf script can't find any valid records.
>
> perf record -e intel_pt//u -- /bin/ls
> perf script --insn-trace
>
>

Hi,

I'm on leave (and should really go back to relaxing :>), returning on 2nd
Jan so can't really dig into this.

But I tried it on my intel box and it 'works on my machine' with and
without patch with commands provided, so I'm not sure this is actually a
product of this change (which shouldn't impact this).

I'm afraid I can't really reply any further until the new year, however (I
really shouldn't even be replying here ;).

Happy holidays, Lorenzo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] perf: map pages in advance
  2024-12-20  9:31   ` Lorenzo Stoakes
@ 2024-12-20  9:56     ` David Hildenbrand
  2024-12-20 19:36       ` Chen, Zide
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-12-20  9:56 UTC (permalink / raw)
  To: Lorenzo Stoakes, Chen, Zide
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, linux-mm, Matthew Wilcox, Yi Lai

On 20.12.24 10:31, Lorenzo Stoakes wrote:
> On Thu, Dec 19, 2024 at 01:17:44PM -0800, Chen, Zide wrote:
> 
>> With this patch, it seems perf tool has some problems in capturing the
>> kernel data with Intel PT.
>>
>> Running the following commands, the size of perf.data is very small, and
>> perf script can't find any valid records.
>>
>> perf record -e intel_pt//u -- /bin/ls
>> perf script --insn-trace
>>
>>
> 
> Hi,
> 
> I'm on leave (and should really go back to relaxing :>), returning on 2nd
> Jan so can't really dig into this.
> 
> But I tried it on my intel box and it 'works on my machine' with and
> without patch with commands provided, so I'm not sure this is actually a
> product of this change (which shouldn't impact this).

Zide Chen, can you try with and without this patch to see if it 
introduces the issue?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] perf: map pages in advance
  2024-12-20  9:56     ` David Hildenbrand
@ 2024-12-20 19:36       ` Chen, Zide
  2024-12-20 21:29         ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Chen, Zide @ 2024-12-20 19:36 UTC (permalink / raw)
  To: David Hildenbrand, Lorenzo Stoakes
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, linux-mm, Matthew Wilcox, Yi Lai



On 12/20/2024 1:56 AM, David Hildenbrand wrote:
> On 20.12.24 10:31, Lorenzo Stoakes wrote:
>> On Thu, Dec 19, 2024 at 01:17:44PM -0800, Chen, Zide wrote:
>>
>>> With this patch, it seems perf tool has some problems in capturing the
>>> kernel data with Intel PT.
>>>
>>> Running the following commands, the size of perf.data is very small, and
>>> perf script can't find any valid records.
>>>
>>> perf record -e intel_pt//u -- /bin/ls
>>> perf script --insn-trace
>>>
>>>
>>
>> Hi,
>>
>> I'm on leave (and should really go back to relaxing :>), returning on 2nd
>> Jan so can't really dig into this.
>>
>> But I tried it on my intel box and it 'works on my machine' with and
>> without patch with commands provided, so I'm not sure this is actually a
>> product of this change (which shouldn't impact this).
> 
> Zide Chen, can you try with and without this patch to see if it
> introduces the issue?

Yes, I re-did the test on a SPR server, and the result is same. Without
the patch, it went well; But with it, "perf script --insn-trace" doesn't
show valid records.

This time I tested it on the clean 6.13-rc1 tag, base commit
40384c840ea1944d7c5a392e8975ed088ecf0b37

Also, with this patch, running tools/perf/tests/shell/test_intel_pt.sh:

Error:
The - data has no samples!


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] perf: map pages in advance
  2024-12-20 19:36       ` Chen, Zide
@ 2024-12-20 21:29         ` David Hildenbrand
  2024-12-20 21:53           ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-12-20 21:29 UTC (permalink / raw)
  To: Chen, Zide, Lorenzo Stoakes
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, linux-mm, Matthew Wilcox, Yi Lai

On 20.12.24 20:36, Chen, Zide wrote:
> 
> 
> On 12/20/2024 1:56 AM, David Hildenbrand wrote:
>> On 20.12.24 10:31, Lorenzo Stoakes wrote:
>>> On Thu, Dec 19, 2024 at 01:17:44PM -0800, Chen, Zide wrote:
>>>
>>>> With this patch, it seems perf tool has some problems in capturing the
>>>> kernel data with Intel PT.
>>>>
>>>> Running the following commands, the size of perf.data is very small, and
>>>> perf script can't find any valid records.
>>>>
>>>> perf record -e intel_pt//u -- /bin/ls
>>>> perf script --insn-trace
>>>>
>>>>
>>>
>>> Hi,
>>>
>>> I'm on leave (and should really go back to relaxing :>), returning on 2nd
>>> Jan so can't really dig into this.
>>>
>>> But I tried it on my intel box and it 'works on my machine' with and
>>> without patch with commands provided, so I'm not sure this is actually a
>>> product of this change (which shouldn't impact this).
>>
>> Zide Chen, can you try with and without this patch to see if it
>> introduces the issue?
> 
> Yes, I re-did the test on a SPR server, and the result is same. Without
> the patch, it went well; But with it, "perf script --insn-trace" doesn't
> show valid records.
> 
> This time I tested it on the clean 6.13-rc1 tag, base commit
> 40384c840ea1944d7c5a392e8975ed088ecf0b37
> 
> Also, with this patch, running tools/perf/tests/shell/test_intel_pt.sh:
> 
> Error:
> The - data has no samples!

I just tested it on 6.13-rc1  vs. 6.13-rc1  with this patch.

Indeed, there is quite difference. Below are the main parts that changed, only.

We seem to be recording data, but maybe what we record gets corrupted somehow?


Looks like


Before:

--- Test tracing self-modifying code that uses jitdump ---
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB /tmp/perf-test-intel-pt-sh.pSYRjy9UUy/tmp-perf.data ]
/root/linux/tools/perf/tests/shell
OK
--- Test with MTC and TSC disabled ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.050 MB /tmp/perf-test-intel-pt-sh.pSYRjy9UUy/test-perf.data ]
OK
--- Test with branches disabled ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.012 MB /tmp/perf-test-intel-pt-sh.pSYRjy9UUy/test-perf.data ]
OK
--- Test with/without CYC ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.095 MB /tmp/perf-test-intel-pt-sh.pSYRjy9UUy/test-perf.data ]
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.061 MB /tmp/perf-test-intel-pt-sh.pSYRjy9UUy/test-perf.data ]
OK
--- Test recording with sample mode ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.077 MB /tmp/perf-test-intel-pt-sh.pSYRjy9UUy/test-perf.data ]
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB /tmp/perf-test-intel-pt-sh.pSYRjy9UUy/test-perf.data ]
Linux
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.006 MB /tmp/perf-test-intel-pt-sh.pSYRjy9UUy/test-perf.data ]
OK
--- Test with kernel trace ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.060 MB /tmp/perf-test-intel-pt-sh.pSYRjy9UUy/test-perf.data ]
OK
--- Test virtual LBR ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.053 MB /tmp/perf-test-intel-pt-sh.pSYRjy9UUy/test-perf.data ]
OK
--- Test power events ---
SKIP: power_event_trace is not supported
--- Test with TNT packets disabled  ---
SKIP: tnt_disable is not supported
--- Test with event_trace ---
SKIP: event_trace is not supported
--- Test with pipe mode ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.075 MB - ]
     14.20%  uname    libc.so.6             [.] _int_malloc
     10.39%  uname    ld-linux-x86-64.so.2  [.] _dl_relocate_object
      9.68%  uname    libc.so.6             [.] __strcmp_evex
      8.04%  uname    libc.so.6             [.] __stpcpy_evex
      7.89%  uname    libc.so.6             [.] __memmove_chk_evex_unaligned_erms
      5.71%  uname    libc.so.6             [.] __memmove_evex_unaligned_erms
      5.68%  uname    ld-linux-x86-64.so.2  [.] __GI___tunables_init
      5.46%  uname    ld-linux-x86-64.so.2  [.] _dl_lookup_symbol_x
      4.95%  uname    libc.so.6             [.] getenv
      4.37%  uname    ld-linux-x86-64.so.2  [.] do_lookup_x
      4.21%  uname    libc.so.6             [.] __strchr_evex
      3.56%  uname    libc.so.6             [.] strlen@plt
      2.71%  uname    libc.so.6             [.] _nl_make_l10nflist.localalias
      2.34%  uname    ld-linux-x86-64.so.2  [.] mprotect
      2.22%  uname    libc.so.6             [.] memchr
      1.82%  uname    libc.so.6             [.] find_module_idx
      1.61%  uname    libc.so.6             [.] _nl_intern_locale_data
      1.55%  uname    ld-linux-x86-64.so.2  [.] __minimal_malloc
      0.98%  uname    ld-linux-x86-64.so.2  [.] _dl_start_user
      0.96%  uname    ld-linux-x86-64.so.2  [.] intel_check_word.constprop.0
      0.61%  uname    ld-linux-x86-64.so.2  [.] check_match
      0.57%  uname    ld-linux-x86-64.so.2  [.] update_active.constprop.0
      0.50%  uname    ld-linux-x86-64.so.2  [.] get_common_cache_info.constprop.0


Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.074 MB - ]
OK
--- Cleaning up ---
--- Done ---



After:

--- Test tracing self-modifying code that uses jitdump ---
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB /tmp/perf-test-intel-pt-sh.SvCUtQzSAe/tmp-perf.data ]
/root/linux/tools/perf/tests/shell
Decode failed, only 0 branches
--- Test with MTC and TSC disabled ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.051 MB /tmp/perf-test-intel-pt-sh.SvCUtQzSAe/test-perf.data ]
Failed to filter with tsc=0
--- Test with branches disabled ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB /tmp/perf-test-intel-pt-sh.SvCUtQzSAe/test-perf.data ]
Failed to disable branches
--- Test with/without CYC ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.097 MB /tmp/perf-test-intel-pt-sh.SvCUtQzSAe/test-perf.data ]
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.061 MB /tmp/perf-test-intel-pt-sh.SvCUtQzSAe/test-perf.data ]
Still get CYC packet without cyc
--- Test recording with sample mode ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.076 MB /tmp/perf-test-intel-pt-sh.SvCUtQzSAe/test-perf.data ]
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB /tmp/perf-test-intel-pt-sh.SvCUtQzSAe/test-perf.data ]
Linux
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.006 MB /tmp/perf-test-intel-pt-sh.SvCUtQzSAe/test-perf.data ]
OK
--- Test with kernel trace ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.061 MB /tmp/perf-test-intel-pt-sh.SvCUtQzSAe/test-perf.data ]
OK
--- Test virtual LBR ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.054 MB /tmp/perf-test-intel-pt-sh.SvCUtQzSAe/test-perf.data ]
OK
--- Test power events ---
SKIP: power_event_trace is not supported
--- Test with TNT packets disabled  ---
SKIP: tnt_disable is not supported
--- Test with event_trace ---
SKIP: event_trace is not supported
--- Test with pipe mode ---
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.073 MB - ]
Error:
The - data has no samples!
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.077 MB - ]
OK
--- Cleaning up ---
--- Done ---


-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] perf: map pages in advance
  2024-12-20 21:29         ` David Hildenbrand
@ 2024-12-20 21:53           ` David Hildenbrand
  2024-12-23 11:10             ` Lorenzo Stoakes
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-12-20 21:53 UTC (permalink / raw)
  To: Chen, Zide, Lorenzo Stoakes
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, linux-mm, Matthew Wilcox, Yi Lai

On 20.12.24 22:29, David Hildenbrand wrote:
> On 20.12.24 20:36, Chen, Zide wrote:
>>
>>
>> On 12/20/2024 1:56 AM, David Hildenbrand wrote:
>>> On 20.12.24 10:31, Lorenzo Stoakes wrote:
>>>> On Thu, Dec 19, 2024 at 01:17:44PM -0800, Chen, Zide wrote:
>>>>
>>>>> With this patch, it seems perf tool has some problems in capturing the
>>>>> kernel data with Intel PT.
>>>>>
>>>>> Running the following commands, the size of perf.data is very small, and
>>>>> perf script can't find any valid records.
>>>>>
>>>>> perf record -e intel_pt//u -- /bin/ls
>>>>> perf script --insn-trace
>>>>>
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> I'm on leave (and should really go back to relaxing :>), returning on 2nd
>>>> Jan so can't really dig into this.
>>>>
>>>> But I tried it on my intel box and it 'works on my machine' with and
>>>> without patch with commands provided, so I'm not sure this is actually a
>>>> product of this change (which shouldn't impact this).
>>>
>>> Zide Chen, can you try with and without this patch to see if it
>>> introduces the issue?
>>
>> Yes, I re-did the test on a SPR server, and the result is same. Without
>> the patch, it went well; But with it, "perf script --insn-trace" doesn't
>> show valid records.
>>
>> This time I tested it on the clean 6.13-rc1 tag, base commit
>> 40384c840ea1944d7c5a392e8975ed088ecf0b37
>>
>> Also, with this patch, running tools/perf/tests/shell/test_intel_pt.sh:
>>
>> Error:
>> The - data has no samples!
> 
> I just tested it on 6.13-rc1  vs. 6.13-rc1  with this patch.
> 
> Indeed, there is quite difference. Below are the main parts that changed, only.
> 
> We seem to be recording data, but maybe what we record gets corrupted somehow?

Huge parts of the new file are full of 0s. Either we are mapping the 
wrong pages, or reading from the pages (via PFNMAP) does not work as 
expected.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] perf: map pages in advance
  2024-12-20 21:53           ` David Hildenbrand
@ 2024-12-23 11:10             ` Lorenzo Stoakes
  2024-12-23 21:12               ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenzo Stoakes @ 2024-12-23 11:10 UTC (permalink / raw)
  To: Peter Zijlstra, David Hildenbrand
  Cc: Chen, Zide, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, Kan Liang, linux-perf-users, linux-kernel,
	linux-mm, Matthew Wilcox, Yi Lai

Peter - could you drop this patch for now until I have a chance to take a
look at this issue on my return on 2nd Jan?

On Fri, Dec 20, 2024 at 10:53:14PM +0100, David Hildenbrand wrote:
> On 20.12.24 22:29, David Hildenbrand wrote:
> > On 20.12.24 20:36, Chen, Zide wrote:
> > >
> > >
> > > On 12/20/2024 1:56 AM, David Hildenbrand wrote:
> > > > On 20.12.24 10:31, Lorenzo Stoakes wrote:
> > > > > On Thu, Dec 19, 2024 at 01:17:44PM -0800, Chen, Zide wrote:
> > > > >
> > > > > > With this patch, it seems perf tool has some problems in capturing the
> > > > > > kernel data with Intel PT.
> > > > > >
> > > > > > Running the following commands, the size of perf.data is very small, and
> > > > > > perf script can't find any valid records.
> > > > > >
> > > > > > perf record -e intel_pt//u -- /bin/ls
> > > > > > perf script --insn-trace
> > > > > >
> > > > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'm on leave (and should really go back to relaxing :>), returning on 2nd
> > > > > Jan so can't really dig into this.
> > > > >
> > > > > But I tried it on my intel box and it 'works on my machine' with and
> > > > > without patch with commands provided, so I'm not sure this is actually a
> > > > > product of this change (which shouldn't impact this).
> > > >
> > > > Zide Chen, can you try with and without this patch to see if it
> > > > introduces the issue?
> > >
> > > Yes, I re-did the test on a SPR server, and the result is same. Without
> > > the patch, it went well; But with it, "perf script --insn-trace" doesn't
> > > show valid records.
> > >
> > > This time I tested it on the clean 6.13-rc1 tag, base commit
> > > 40384c840ea1944d7c5a392e8975ed088ecf0b37
> > >
> > > Also, with this patch, running tools/perf/tests/shell/test_intel_pt.sh:
> > >
> > > Error:
> > > The - data has no samples!
> >
> > I just tested it on 6.13-rc1  vs. 6.13-rc1  with this patch.
> >
> > Indeed, there is quite difference. Below are the main parts that changed, only.
> >
> > We seem to be recording data, but maybe what we record gets corrupted somehow?
>
> Huge parts of the new file are full of 0s. Either we are mapping the wrong
> pages, or reading from the pages (via PFNMAP) does not work as expected.
>

Thanks David, and apologies Zide, appears there is an issue here clearly.

Could you try this with sudo operations? I was doing this locally and I
wonder if there is now a permissioning error?

I'd be surprised if pfn map would cause an issue here as it should just
directly map the kernel memory, however if the PT code assumes there will
be faults there could be an issue. I did take a brief look at this last
week and it seems the PT stuff relies on the aux functionality, so that
could also be a source of problems here.

I am on leave at the moment returning on 2nd Jan, I will look at this as a
priority when I return, as you can see above I've asked Peter to drop this
for now.

Thank you very much for reporting!

And happy holidays to you both :)

> --
> Cheers,
>
> David / dhildenb
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] perf: map pages in advance
  2024-12-23 11:10             ` Lorenzo Stoakes
@ 2024-12-23 21:12               ` David Hildenbrand
  2025-01-03 14:45                 ` Lorenzo Stoakes
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-12-23 21:12 UTC (permalink / raw)
  To: Lorenzo Stoakes, Peter Zijlstra
  Cc: Chen, Zide, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
	Adrian Hunter, Kan Liang, linux-perf-users, linux-kernel,
	linux-mm, Matthew Wilcox, Yi Lai

On 23.12.24 12:10, Lorenzo Stoakes wrote:
> Peter - could you drop this patch for now until I have a chance to take a
> look at this issue on my return on 2nd Jan?
> 
> On Fri, Dec 20, 2024 at 10:53:14PM +0100, David Hildenbrand wrote:
>> On 20.12.24 22:29, David Hildenbrand wrote:
>>> On 20.12.24 20:36, Chen, Zide wrote:
>>>>
>>>>
>>>> On 12/20/2024 1:56 AM, David Hildenbrand wrote:
>>>>> On 20.12.24 10:31, Lorenzo Stoakes wrote:
>>>>>> On Thu, Dec 19, 2024 at 01:17:44PM -0800, Chen, Zide wrote:
>>>>>>
>>>>>>> With this patch, it seems perf tool has some problems in capturing the
>>>>>>> kernel data with Intel PT.
>>>>>>>
>>>>>>> Running the following commands, the size of perf.data is very small, and
>>>>>>> perf script can't find any valid records.
>>>>>>>
>>>>>>> perf record -e intel_pt//u -- /bin/ls
>>>>>>> perf script --insn-trace
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm on leave (and should really go back to relaxing :>), returning on 2nd
>>>>>> Jan so can't really dig into this.
>>>>>>
>>>>>> But I tried it on my intel box and it 'works on my machine' with and
>>>>>> without patch with commands provided, so I'm not sure this is actually a
>>>>>> product of this change (which shouldn't impact this).
>>>>>
>>>>> Zide Chen, can you try with and without this patch to see if it
>>>>> introduces the issue?
>>>>
>>>> Yes, I re-did the test on a SPR server, and the result is same. Without
>>>> the patch, it went well; But with it, "perf script --insn-trace" doesn't
>>>> show valid records.
>>>>
>>>> This time I tested it on the clean 6.13-rc1 tag, base commit
>>>> 40384c840ea1944d7c5a392e8975ed088ecf0b37
>>>>
>>>> Also, with this patch, running tools/perf/tests/shell/test_intel_pt.sh:
>>>>
>>>> Error:
>>>> The - data has no samples!
>>>
>>> I just tested it on 6.13-rc1  vs. 6.13-rc1  with this patch.
>>>
>>> Indeed, there is quite difference. Below are the main parts that changed, only.
>>>
>>> We seem to be recording data, but maybe what we record gets corrupted somehow?
>>
>> Huge parts of the new file are full of 0s. Either we are mapping the wrong
>> pages, or reading from the pages (via PFNMAP) does not work as expected.
>>
> 
> Thanks David, and apologies Zide, appears there is an issue here clearly.
> 
> Could you try this with sudo operations? I was doing this locally and I
> wonder if there is now a permissioning error?

I ran it under root.

> 
> I'd be surprised if pfn map would cause an issue here as it should just
> directly map the kernel memory, however if the PT code assumes there will
> be faults there could be an issue. I did take a brief look at this last
> week and it seems the PT stuff relies on the aux functionality, so that
> could also be a source of problems here.

I started a bit at that code, no clue yet what's happening.

I was wondering if we end up mapping the wrong pages, meaning: the pages 
at mmap time end up being different to the pages later at fault time. 
The code is a bit confusing, but I thought we cannot change the 
effective event/pages while we have an active mmap. Maybe there is some 
corner case ...

Nothing else really jumped at me ... moving the mapping og pages after 
the event_mapped() callback also didn't change anything.

> 
> I am on leave at the moment returning on 2nd Jan, I will look at this as a
> priority when I return, as you can see above I've asked Peter to drop this
> for now.

Enjoy your time off an Happy Holidays!

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] perf: map pages in advance
  2024-12-23 21:12               ` David Hildenbrand
@ 2025-01-03 14:45                 ` Lorenzo Stoakes
  2025-01-03 20:36                   ` Chen, Zide
  0 siblings, 1 reply; 11+ messages in thread
From: Lorenzo Stoakes @ 2025-01-03 14:45 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Peter Zijlstra, Chen, Zide, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Kan Liang, linux-perf-users, linux-kernel, linux-mm,
	Matthew Wilcox, Yi Lai

On Mon, Dec 23, 2024 at 10:12:42PM +0100, David Hildenbrand wrote:
> On 23.12.24 12:10, Lorenzo Stoakes wrote:
> > Peter - could you drop this patch for now until I have a chance to take a
> > look at this issue on my return on 2nd Jan?
> >
> > On Fri, Dec 20, 2024 at 10:53:14PM +0100, David Hildenbrand wrote:
> > > On 20.12.24 22:29, David Hildenbrand wrote:
> > > > On 20.12.24 20:36, Chen, Zide wrote:
> > > > >
> > > > >
> > > > > On 12/20/2024 1:56 AM, David Hildenbrand wrote:
> > > > > > On 20.12.24 10:31, Lorenzo Stoakes wrote:
> > > > > > > On Thu, Dec 19, 2024 at 01:17:44PM -0800, Chen, Zide wrote:
> > > > > > >
> > > > > > > > With this patch, it seems perf tool has some problems in capturing the
> > > > > > > > kernel data with Intel PT.
> > > > > > > >
> > > > > > > > Running the following commands, the size of perf.data is very small, and
> > > > > > > > perf script can't find any valid records.
> > > > > > > >
> > > > > > > > perf record -e intel_pt//u -- /bin/ls
> > > > > > > > perf script --insn-trace
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm on leave (and should really go back to relaxing :>), returning on 2nd
> > > > > > > Jan so can't really dig into this.
> > > > > > >
> > > > > > > But I tried it on my intel box and it 'works on my machine' with and
> > > > > > > without patch with commands provided, so I'm not sure this is actually a
> > > > > > > product of this change (which shouldn't impact this).
> > > > > >
> > > > > > Zide Chen, can you try with and without this patch to see if it
> > > > > > introduces the issue?
> > > > >
> > > > > Yes, I re-did the test on a SPR server, and the result is same. Without
> > > > > the patch, it went well; But with it, "perf script --insn-trace" doesn't
> > > > > show valid records.
> > > > >
> > > > > This time I tested it on the clean 6.13-rc1 tag, base commit
> > > > > 40384c840ea1944d7c5a392e8975ed088ecf0b37
> > > > >
> > > > > Also, with this patch, running tools/perf/tests/shell/test_intel_pt.sh:
> > > > >
> > > > > Error:
> > > > > The - data has no samples!
> > > >
> > > > I just tested it on 6.13-rc1  vs. 6.13-rc1  with this patch.
> > > >
> > > > Indeed, there is quite difference. Below are the main parts that changed, only.
> > > >
> > > > We seem to be recording data, but maybe what we record gets corrupted somehow?
> > >
> > > Huge parts of the new file are full of 0s. Either we are mapping the wrong
> > > pages, or reading from the pages (via PFNMAP) does not work as expected.
> > >
> >
> > Thanks David, and apologies Zide, appears there is an issue here clearly.
> >
> > Could you try this with sudo operations? I was doing this locally and I
> > wonder if there is now a permissioning error?
>
> I ran it under root.
>
> >
> > I'd be surprised if pfn map would cause an issue here as it should just
> > directly map the kernel memory, however if the PT code assumes there will
> > be faults there could be an issue. I did take a brief look at this last
> > week and it seems the PT stuff relies on the aux functionality, so that
> > could also be a source of problems here.
>
> I started a bit at that code, no clue yet what's happening.
>
> I was wondering if we end up mapping the wrong pages, meaning: the pages at
> mmap time end up being different to the pages later at fault time. The code
> is a bit confusing, but I thought we cannot change the effective event/pages
> while we have an active mmap. Maybe there is some corner case ...

OK I figured it out... it's a very silly mistake on my part (oh how this is so
often the case :).

When we map the pages, we do not offset by vma->vm_pgoff when looking up the
page, so if you map with an offset (as presumably the intel pt stuff is), it is
then retrieving the wrong pages).

This also resolves the apparent need for sudo...

Very silly mistake. Apologies :>)

I will send a v4 in a second.

Zide - could you give v4 a test when I send it out just to confirm it resolves
your issue? I will cc- you on this.

Thanks again for your report, and apologies for the noise!

>
> Nothing else really jumped at me ... moving the mapping og pages after the
> event_mapped() callback also didn't change anything.
>
> >
> > I am on leave at the moment returning on 2nd Jan, I will look at this as a
> > priority when I return, as you can see above I've asked Peter to drop this
> > for now.
>
> Enjoy your time off an Happy Holidays!
>
> --
> Cheers,
>
> David / dhildenb
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3] perf: map pages in advance
  2025-01-03 14:45                 ` Lorenzo Stoakes
@ 2025-01-03 20:36                   ` Chen, Zide
  0 siblings, 0 replies; 11+ messages in thread
From: Chen, Zide @ 2025-01-03 20:36 UTC (permalink / raw)
  To: Lorenzo Stoakes, David Hildenbrand
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, linux-perf-users,
	linux-kernel, linux-mm, Matthew Wilcox, Yi Lai



On 1/3/2025 6:45 AM, Lorenzo Stoakes wrote:
> On Mon, Dec 23, 2024 at 10:12:42PM +0100, David Hildenbrand wrote:
>> On 23.12.24 12:10, Lorenzo Stoakes wrote:
>>> Peter - could you drop this patch for now until I have a chance to take a
>>> look at this issue on my return on 2nd Jan?
>>>
>>> On Fri, Dec 20, 2024 at 10:53:14PM +0100, David Hildenbrand wrote:
>>>> On 20.12.24 22:29, David Hildenbrand wrote:
>>>>> On 20.12.24 20:36, Chen, Zide wrote:
>>>>>>
>>>>>>
>>>>>> On 12/20/2024 1:56 AM, David Hildenbrand wrote:
>>>>>>> On 20.12.24 10:31, Lorenzo Stoakes wrote:
>>>>>>>> On Thu, Dec 19, 2024 at 01:17:44PM -0800, Chen, Zide wrote:
>>>>>>>>
>>>>>>>>> With this patch, it seems perf tool has some problems in capturing the
>>>>>>>>> kernel data with Intel PT.
>>>>>>>>>
>>>>>>>>> Running the following commands, the size of perf.data is very small, and
>>>>>>>>> perf script can't find any valid records.
>>>>>>>>>
>>>>>>>>> perf record -e intel_pt//u -- /bin/ls
>>>>>>>>> perf script --insn-trace
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm on leave (and should really go back to relaxing :>), returning on 2nd
>>>>>>>> Jan so can't really dig into this.
>>>>>>>>
>>>>>>>> But I tried it on my intel box and it 'works on my machine' with and
>>>>>>>> without patch with commands provided, so I'm not sure this is actually a
>>>>>>>> product of this change (which shouldn't impact this).
>>>>>>>
>>>>>>> Zide Chen, can you try with and without this patch to see if it
>>>>>>> introduces the issue?
>>>>>>
>>>>>> Yes, I re-did the test on a SPR server, and the result is same. Without
>>>>>> the patch, it went well; But with it, "perf script --insn-trace" doesn't
>>>>>> show valid records.
>>>>>>
>>>>>> This time I tested it on the clean 6.13-rc1 tag, base commit
>>>>>> 40384c840ea1944d7c5a392e8975ed088ecf0b37
>>>>>>
>>>>>> Also, with this patch, running tools/perf/tests/shell/test_intel_pt.sh:
>>>>>>
>>>>>> Error:
>>>>>> The - data has no samples!
>>>>>
>>>>> I just tested it on 6.13-rc1  vs. 6.13-rc1  with this patch.
>>>>>
>>>>> Indeed, there is quite difference. Below are the main parts that changed, only.
>>>>>
>>>>> We seem to be recording data, but maybe what we record gets corrupted somehow?
>>>>
>>>> Huge parts of the new file are full of 0s. Either we are mapping the wrong
>>>> pages, or reading from the pages (via PFNMAP) does not work as expected.
>>>>
>>>
>>> Thanks David, and apologies Zide, appears there is an issue here clearly.
>>>
>>> Could you try this with sudo operations? I was doing this locally and I
>>> wonder if there is now a permissioning error?
>>
>> I ran it under root.
>>
>>>
>>> I'd be surprised if pfn map would cause an issue here as it should just
>>> directly map the kernel memory, however if the PT code assumes there will
>>> be faults there could be an issue. I did take a brief look at this last
>>> week and it seems the PT stuff relies on the aux functionality, so that
>>> could also be a source of problems here.
>>
>> I started a bit at that code, no clue yet what's happening.
>>
>> I was wondering if we end up mapping the wrong pages, meaning: the pages at
>> mmap time end up being different to the pages later at fault time. The code
>> is a bit confusing, but I thought we cannot change the effective event/pages
>> while we have an active mmap. Maybe there is some corner case ...
> 
> OK I figured it out... it's a very silly mistake on my part (oh how this is so
> often the case :).
> 
> When we map the pages, we do not offset by vma->vm_pgoff when looking up the
> page, so if you map with an offset (as presumably the intel pt stuff is), it is
> then retrieving the wrong pages).
> 
> This also resolves the apparent need for sudo...
> 
> Very silly mistake. Apologies :>)
> 
> I will send a v4 in a second.
> 
> Zide - could you give v4 a test when I send it out just to confirm it resolves
> your issue? I will cc- you on this.
> 
> Thanks again for your report, and apologies for the noise!

I can confirm that v4 works for my Intel PT tests!

-Zide


>>
>> Nothing else really jumped at me ... moving the mapping og pages after the
>> event_mapped() callback also didn't change anything.
>>
>>>
>>> I am on leave at the moment returning on 2nd Jan, I will look at this as a
>>> priority when I return, as you can see above I've asked Peter to drop this
>>> for now.
>>
>> Enjoy your time off an Happy Holidays!
>>
>> --
>> Cheers,
>>
>> David / dhildenb
>>
> 
> Cheers, Lorenzo



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-01-03 20:36 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-12-05  8:29 [PATCH v3] perf: map pages in advance Lorenzo Stoakes
2024-12-19 21:17 ` Chen, Zide
2024-12-20  9:31   ` Lorenzo Stoakes
2024-12-20  9:56     ` David Hildenbrand
2024-12-20 19:36       ` Chen, Zide
2024-12-20 21:29         ` David Hildenbrand
2024-12-20 21:53           ` David Hildenbrand
2024-12-23 11:10             ` Lorenzo Stoakes
2024-12-23 21:12               ` David Hildenbrand
2025-01-03 14:45                 ` Lorenzo Stoakes
2025-01-03 20:36                   ` Chen, Zide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox