* CPU consumption is going as high as 95% on ARM Cortex A8
@ 2009-12-17 5:38 Hiremath, Vaibhav
2009-12-17 6:24 ` Shilimkar, Santosh
2009-12-17 9:56 ` Russell King - ARM Linux
0 siblings, 2 replies; 8+ messages in thread
From: Hiremath, Vaibhav @ 2009-12-17 5:38 UTC (permalink / raw)
To: linux; +Cc: linux-arm-kernel, linux-mm, linux-omap
Hi,
I am seeing some strange behavior while accessing buffers through User Space (mapped using mmap call)
Background :-
------------
Platform - TI AM3517
CPU - ARM Cortex A8
root@am3517-evm:~#
root@am3517-evm:~# cat /proc/cpuinfo
Processor : ARMv7 Processor rev 7 (v7l)
BogoMIPS : 499.92
Features : swp half thumb fastmult vfp edsp neon vfpv3
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x1
CPU part : 0xc08
CPU revision : 7
Hardware : OMAP3517/AM3517 EVM
Revision : 0020
Serial : 0000000000000000
root@omap3517-evm:~#
Issue/Usage :-
-------------
The V4l2-Capture driver captures the data from video decoder into buffer and the application does some processing on this buffer. The mmap implementation can be found at drivers/media/video/videobuf-dma-contig.c, function__videobuf_mmap_mapper().
Observation -
The CPU consumption goes as high as 95% on read buffer operation, please note that write operation on these buffers also gives 60-70% CPU consumption. (Using memcpy/memset API's for read and write operation).
Some more inputs :-
------------------
- If I specify PAGE_READONLY or PAGE_SHARED (actual flag is L_PTE_USER) while mapping the buffer to UserSpace in mmap system call, the CPU consumption goes down to expected value (20-27%).
Then I reached till the function cpu_v7_set_pte_ext, where we are configuring level 2 translation table entries, which makes use of these flags.
- Below is the value of r0, r1 and r2 register (ptep, pteval, ext) in both the cases -
Without PAGE_READONLY/PAGE_SHARED
ptep - cfb5de10, pte - 8d200383, ext - 800
ptep - cfb5de14, pte - 8d201383, ext - 800
Important bits are [0-9] - 0x383
With PAGE_READONLY/PAGE_SHARED set
ptep - cfb30e10, pte - 8d10038f, ext - 800
ptep - cfb30e14, pte - 8d10138f, ext - 800
Important bits are [0-9] - 0x38F
The lines inside function "cpu_v7_set_pte_ext", is using the flag as shown below -
tst r1, #L_PTE_USER
orrne r3, r3, #PTE_EXT_AP1
tstne r3, #PTE_EXT_APX
bicne r3, r3, #PTE_EXT_APX | PTE_EXT_AP0
Without PAGE_READONLY/PAGE_SHARED With flags set
Access perm = reserved Access Perm = Read Only
- I tried the same thing with another platform (ARM9) and it works fine there.
Can somebody help me to understand the flag PAGE_SHARED/PAGE_READONLY and access permissions? Am I debugging this into right path? Does anybody have seen/observed similar issue before?
Thanks in advance.
Thanks,
Vaibhav Hiremath
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: CPU consumption is going as high as 95% on ARM Cortex A8
2009-12-17 5:38 CPU consumption is going as high as 95% on ARM Cortex A8 Hiremath, Vaibhav
@ 2009-12-17 6:24 ` Shilimkar, Santosh
2009-12-17 9:56 ` Russell King - ARM Linux
1 sibling, 0 replies; 8+ messages in thread
From: Shilimkar, Santosh @ 2009-12-17 6:24 UTC (permalink / raw)
To: Hiremath, Vaibhav, linux; +Cc: linux-mm, linux-omap, linux-arm-kernel
> -----Original Message-----
> From: linux-arm-kernel-bounces@lists.infradead.org [mailto:linux-arm-kernel-
> bounces@lists.infradead.org] On Behalf Of Hiremath, Vaibhav
> Sent: Thursday, December 17, 2009 11:09 AM
> To: linux@arm.linux.org.uk
> Cc: linux-mm@kvack.org; linux-omap@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Subject: CPU consumption is going as high as 95% on ARM Cortex A8
>
> Hi,
>
> I am seeing some strange behavior while accessing buffers through User Space (mapped using mmap call)
>
> Background :-
> ------------
> Platform - TI AM3517
> CPU - ARM Cortex A8
>
> root@am3517-evm:~#
> root@am3517-evm:~# cat /proc/cpuinfo
> Processor : ARMv7 Processor rev 7 (v7l)
> BogoMIPS : 499.92
> Features : swp half thumb fastmult vfp edsp neon vfpv3
> CPU implementer : 0x41
> CPU architecture: 7
> CPU variant : 0x1
> CPU part : 0xc08
> CPU revision : 7
> Hardware : OMAP3517/AM3517 EVM
> Revision : 0020
> Serial : 0000000000000000
> root@omap3517-evm:~#
>
>
> Issue/Usage :-
> -------------
> The V4l2-Capture driver captures the data from video decoder into buffer and the application does
> some processing on this buffer. The mmap implementation can be found at drivers/media/video/videobuf-
> dma-contig.c, function__videobuf_mmap_mapper().
>
> Observation -
> The CPU consumption goes as high as 95% on read buffer operation, please note that write operation on
> these buffers also gives 60-70% CPU consumption. (Using memcpy/memset API's for read and write
> operation).
>
> Some more inputs :-
> ------------------
> - If I specify PAGE_READONLY or PAGE_SHARED (actual flag is L_PTE_USER) while mapping the buffer to
> UserSpace in mmap system call, the CPU consumption goes down to expected value (20-27%).
> Then I reached till the function cpu_v7_set_pte_ext, where we are configuring level 2 translation
> table entries, which makes use of these flags.
>
> - Below is the value of r0, r1 and r2 register (ptep, pteval, ext) in both the cases -
>
>
> Without PAGE_READONLY/PAGE_SHARED
>
> ptep - cfb5de10, pte - 8d200383, ext - 800
> ptep - cfb5de14, pte - 8d201383, ext - 800
Which kernel version is this? Can you please also give values of PRRR, NMRR and SCTLR
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: CPU consumption is going as high as 95% on ARM Cortex A8
2009-12-17 5:38 CPU consumption is going as high as 95% on ARM Cortex A8 Hiremath, Vaibhav
2009-12-17 6:24 ` Shilimkar, Santosh
@ 2009-12-17 9:56 ` Russell King - ARM Linux
2009-12-21 6:26 ` Hiremath, Vaibhav
1 sibling, 1 reply; 8+ messages in thread
From: Russell King - ARM Linux @ 2009-12-17 9:56 UTC (permalink / raw)
To: Hiremath, Vaibhav; +Cc: linux-arm-kernel, linux-mm, linux-omap
On Thu, Dec 17, 2009 at 11:08:31AM +0530, Hiremath, Vaibhav wrote:
> Issue/Usage :-
> -------------
> The V4l2-Capture driver captures the data from video decoder into buffer
> and the application does some processing on this buffer. The mmap
> implementation can be found at drivers/media/video/videobuf-dma-contig.c,
> function__videobuf_mmap_mapper().
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
will result in the memory being mapped as 'Strongly Ordered', resulting
in there being multiple mappings with differing types. In later
kernels, we have pgprot_dmacoherent() and I'd suggest changing the above
macro for that.
> Without PAGE_READONLY/PAGE_SHARED
>
> Important bits are [0-9] - 0x383
>
> With PAGE_READONLY/PAGE_SHARED set
>
> Important bits are [0-9] - 0x38F
So the difference is the C and B bits, which is more or less expected
with the change you've made.
>
> The lines inside function "cpu_v7_set_pte_ext", is using the flag as shown below -
>
> tst r1, #L_PTE_USER
> orrne r3, r3, #PTE_EXT_AP1
> tstne r3, #PTE_EXT_APX
> bicne r3, r3, #PTE_EXT_APX | PTE_EXT_AP0
>
> Without PAGE_READONLY/PAGE_SHARED With flags set
>
> Access perm = reserved Access Perm = Read Only
The bits you quote above are L_PTE_* bits, so you need to be careful
decoding them. 0x383 gives
L_PTE_EXEC|L_PTE_USER|L_PTE_WRITE|L_PTE_YOUNG|L_PTE_PRESENT
which is as expected, and will be translated into: APX=0 AP1=1 AP0=0
which is user r/o, system r/w. The same will be true of 0x38f.
> - I tried the same thing with another platform (ARM9) and it works fine there.
>
> Can somebody help me to understand the flag PAGE_SHARED/PAGE_READONLY
> and access permissions? Am I debugging this into right path? Does
> anybody have seen/observed similar issue before?
I think you're just seeing the effects of 'strongly ordered' memory
rather than anything actually wrong.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread* RE: CPU consumption is going as high as 95% on ARM Cortex A8
2009-12-17 9:56 ` Russell King - ARM Linux
@ 2009-12-21 6:26 ` Hiremath, Vaibhav
2009-12-21 9:07 ` Russell King - ARM Linux
0 siblings, 1 reply; 8+ messages in thread
From: Hiremath, Vaibhav @ 2009-12-21 6:26 UTC (permalink / raw)
To: Russell King - ARM Linux; +Cc: linux-arm-kernel, linux-mm, linux-omap
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> Sent: Thursday, December 17, 2009 3:27 PM
> To: Hiremath, Vaibhav
> Cc: linux-arm-kernel@lists.infradead.org; linux-mm@kvack.org; linux-
> omap@vger.kernel.org
> Subject: Re: CPU consumption is going as high as 95% on ARM Cortex
> A8
>
> On Thu, Dec 17, 2009 at 11:08:31AM +0530, Hiremath, Vaibhav wrote:
> > Issue/Usage :-
> > -------------
> > The V4l2-Capture driver captures the data from video decoder into
> buffer
> > and the application does some processing on this buffer. The mmap
> > implementation can be found at drivers/media/video/videobuf-dma-
> contig.c,
> > function__videobuf_mmap_mapper().
>
> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
>
> will result in the memory being mapped as 'Strongly Ordered',
> resulting
> in there being multiple mappings with differing types. In later
> kernels, we have pgprot_dmacoherent() and I'd suggest changing the
> above
> macro for that.
>
[Hiremath, Vaibhav] Russell,
I tried with your suggestion above but unfortunately it didn't work for me. I am seeing the same behavior with the pgprot_dmacoherent(). I pulled your patch (which got applied cleanly on 2.6.32-rc5) -
-----------------------------------------
commit 26a26d329688ab018e068b412b03d43d7c299f0a
Author: Russell King <rmk+kernel@arm.linux.org.uk>
Date: Fri Nov 20 21:06:43 2009 +0000
Subject: ARM: dma-mapping: switch ARMv7 DMA mappings to retain 'memory' attribute
-----------------------------------------
Any other pointers/suggestions?
Thanks,
Vaibhav
> > Without PAGE_READONLY/PAGE_SHARED
> >
> > Important bits are [0-9] - 0x383
> >
> > With PAGE_READONLY/PAGE_SHARED set
> >
> > Important bits are [0-9] - 0x38F
>
> So the difference is the C and B bits, which is more or less
> expected
> with the change you've made.
>
> >
> > The lines inside function "cpu_v7_set_pte_ext", is using the flag
> as shown below -
> >
> > tst r1, #L_PTE_USER
> > orrne r3, r3, #PTE_EXT_AP1
> > tstne r3, #PTE_EXT_APX
> > bicne r3, r3, #PTE_EXT_APX | PTE_EXT_AP0
> >
> > Without PAGE_READONLY/PAGE_SHARED With flags set
> >
> > Access perm = reserved Access Perm = Read
> Only
>
> The bits you quote above are L_PTE_* bits, so you need to be careful
> decoding them. 0x383 gives
>
> L_PTE_EXEC|L_PTE_USER|L_PTE_WRITE|L_PTE_YOUNG|L_PTE_PRESENT
>
> which is as expected, and will be translated into: APX=0 AP1=1 AP0=0
> which is user r/o, system r/w. The same will be true of 0x38f.
>
> > - I tried the same thing with another platform (ARM9) and it works
> fine there.
> >
> > Can somebody help me to understand the flag
> PAGE_SHARED/PAGE_READONLY
> > and access permissions? Am I debugging this into right path? Does
> > anybody have seen/observed similar issue before?
>
> I think you're just seeing the effects of 'strongly ordered' memory
> rather than anything actually wrong.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: CPU consumption is going as high as 95% on ARM Cortex A8
2009-12-21 6:26 ` Hiremath, Vaibhav
@ 2009-12-21 9:07 ` Russell King - ARM Linux
2009-12-21 9:21 ` Hiremath, Vaibhav
0 siblings, 1 reply; 8+ messages in thread
From: Russell King - ARM Linux @ 2009-12-21 9:07 UTC (permalink / raw)
To: Hiremath, Vaibhav; +Cc: linux-arm-kernel, linux-mm, linux-omap
On Mon, Dec 21, 2009 at 11:56:23AM +0530, Hiremath, Vaibhav wrote:
> > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> >
> > will result in the memory being mapped as 'Strongly Ordered',
> > resulting
> > in there being multiple mappings with differing types. In later
> > kernels, we have pgprot_dmacoherent() and I'd suggest changing the
> > above
> > macro for that.
> >
>
> I tried with your suggestion above but unfortunately it didn't work for
> me. I am seeing the same behavior with the pgprot_dmacoherent(). I
> pulled your patch (which got applied cleanly on 2.6.32-rc5) -
What happens if you comment out the pgprot_dmacoherent() / pgprot_noncached()
line completely?
I suspect that will "solve" the problem - but you'll then no longer have
DMA coherency with userspace, so its not really a solution.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: CPU consumption is going as high as 95% on ARM Cortex A8
2009-12-21 9:07 ` Russell King - ARM Linux
@ 2009-12-21 9:21 ` Hiremath, Vaibhav
2009-12-21 10:50 ` Russell King - ARM Linux
0 siblings, 1 reply; 8+ messages in thread
From: Hiremath, Vaibhav @ 2009-12-21 9:21 UTC (permalink / raw)
To: Russell King - ARM Linux; +Cc: linux-arm-kernel, linux-mm, linux-omap
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> Sent: Monday, December 21, 2009 2:38 PM
> To: Hiremath, Vaibhav
> Cc: linux-arm-kernel@lists.infradead.org; linux-mm@kvack.org; linux-
> omap@vger.kernel.org
> Subject: Re: CPU consumption is going as high as 95% on ARM Cortex
> A8
>
> On Mon, Dec 21, 2009 at 11:56:23AM +0530, Hiremath, Vaibhav wrote:
> > > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > >
> > > will result in the memory being mapped as 'Strongly Ordered',
> > > resulting
> > > in there being multiple mappings with differing types. In later
> > > kernels, we have pgprot_dmacoherent() and I'd suggest changing
> the
> > > above
> > > macro for that.
> > >
> >
> > I tried with your suggestion above but unfortunately it didn't
> work for
> > me. I am seeing the same behavior with the pgprot_dmacoherent(). I
> > pulled your patch (which got applied cleanly on 2.6.32-rc5) -
>
> What happens if you comment out the pgprot_dmacoherent() /
> pgprot_noncached()
> line completely?
>
[Hiremath, Vaibhav] If I comment the line completely then I am seeing CPU consumption similar to when I was setting PAGE_READONLY/PAGE_SHARED flag, which is 25-32%.
Thanks,
Vaibhav
> I suspect that will "solve" the problem - but you'll then no longer
> have
> DMA coherency with userspace, so its not really a solution.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: CPU consumption is going as high as 95% on ARM Cortex A8
2009-12-21 9:21 ` Hiremath, Vaibhav
@ 2009-12-21 10:50 ` Russell King - ARM Linux
2009-12-21 11:26 ` Hiremath, Vaibhav
0 siblings, 1 reply; 8+ messages in thread
From: Russell King - ARM Linux @ 2009-12-21 10:50 UTC (permalink / raw)
To: Hiremath, Vaibhav; +Cc: linux-arm-kernel, linux-mm, linux-omap
On Mon, Dec 21, 2009 at 02:51:13PM +0530, Hiremath, Vaibhav wrote:
> > On Mon, Dec 21, 2009 at 11:56:23AM +0530, Hiremath, Vaibhav wrote:
> > > > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> > > >
> > > > will result in the memory being mapped as 'Strongly Ordered',
> > > > resulting
> > > > in there being multiple mappings with differing types. In later
> > > > kernels, we have pgprot_dmacoherent() and I'd suggest changing
> > the
> > > > above
> > > > macro for that.
> > > >
> > >
> > > I tried with your suggestion above but unfortunately it didn't
> > work for
> > > me. I am seeing the same behavior with the pgprot_dmacoherent(). I
> > > pulled your patch (which got applied cleanly on 2.6.32-rc5) -
> >
> > What happens if you comment out the pgprot_dmacoherent() /
> > pgprot_noncached()
> > line completely?
>
> If I comment the line completely then I am seeing
> CPU consumption similar to when I was setting PAGE_READONLY/PAGE_SHARED
> flag, which is 25-32%.
>
> > I suspect that will "solve" the problem - but you'll then no longer
> > have
> > DMA coherency with userspace, so its not really a solution.
So it _is_ down to purely the amount of time it takes to read from a
non-cacheable buffer. I think you need to investigate the userspace
program and see whether it's doing anything silly - I don't think the
lack of performance is a kernel problem as such.
How large is this buffer? What userspace program is reading from it?
Could the userspace program be unnecessarily re-reading from the
multiple times for the same frame?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: CPU consumption is going as high as 95% on ARM Cortex A8
2009-12-21 10:50 ` Russell King - ARM Linux
@ 2009-12-21 11:26 ` Hiremath, Vaibhav
0 siblings, 0 replies; 8+ messages in thread
From: Hiremath, Vaibhav @ 2009-12-21 11:26 UTC (permalink / raw)
To: Russell King - ARM Linux; +Cc: linux-arm-kernel, linux-mm, linux-omap
> -----Original Message-----
> From: Russell King - ARM Linux [mailto:linux@arm.linux.org.uk]
> Sent: Monday, December 21, 2009 4:20 PM
> To: Hiremath, Vaibhav
> Cc: linux-arm-kernel@lists.infradead.org; linux-mm@kvack.org; linux-
> omap@vger.kernel.org
> Subject: Re: CPU consumption is going as high as 95% on ARM Cortex
> A8
>
> On Mon, Dec 21, 2009 at 02:51:13PM +0530, Hiremath, Vaibhav wrote:
> > > On Mon, Dec 21, 2009 at 11:56:23AM +0530, Hiremath, Vaibhav
> wrote:
<snip>...
> >
> > If I comment the line completely then I am seeing
> > CPU consumption similar to when I was setting
> PAGE_READONLY/PAGE_SHARED
> > flag, which is 25-32%.
> >
> > > I suspect that will "solve" the problem - but you'll then no
> longer
> > > have
> > > DMA coherency with userspace, so its not really a solution.
>
> So it _is_ down to purely the amount of time it takes to read from a
> non-cacheable buffer. I think you need to investigate the userspace
> program and see whether it's doing anything silly - I don't think
> the
> lack of performance is a kernel problem as such.
>
[Hiremath, Vaibhav] The User space application program is pretty simple, doing nothing as such -
It is a loopback application where the captured frame is copied to display buffer -
/*Display buffer mmap*/
display_buff_info[i].start = mmap(NULL, buf.length,
PROT_READ | PROT_WRITE, MAP_SHARED, *display_fd,
buf.m.offset);
/*Capture Buffer mmap*/
capture_buff_info[i].start = mmap(NULL, buf.length,
PROT_READ | PROT_WRITE, MAP_SHARED, *capture_fd,
buf.m.offset);
while (1)
DEQUEUE BUFFER (blocking call)
for (h = 0; h < display_fmt.fmt.pix.height; h++) {
memcpy(dis_ptr, cap_ptr, display_fmt.fmt.pix.width * 2);
cap_ptr += capture_fmt.fmt.pix.width * 2;
dis_ptr += display_fmt.fmt.pix.width * 2;
}
QUEUE BUFFER
}
I will again review the application one more time and see whether I could get anything.
> How large is this buffer?
[Hiremath, Vaibhav] The buffer size is 720x480x2, and we have 3 such buffers used in queue/dequeue operation.
> What userspace program is reading from
> it?
[Hiremath, Vaibhav] Simple loopback application doing memcpy.
> Could the userspace program be unnecessarily re-reading from the
> multiple times for the same frame?
[Hiremath, Vaibhav] Let me re-visit the code for both application and driver with respect to this suggestion, but I don't think application is reading twice.
Thanks,
Vaibhav
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-12-21 11:26 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-17 5:38 CPU consumption is going as high as 95% on ARM Cortex A8 Hiremath, Vaibhav
2009-12-17 6:24 ` Shilimkar, Santosh
2009-12-17 9:56 ` Russell King - ARM Linux
2009-12-21 6:26 ` Hiremath, Vaibhav
2009-12-21 9:07 ` Russell King - ARM Linux
2009-12-21 9:21 ` Hiremath, Vaibhav
2009-12-21 10:50 ` Russell King - ARM Linux
2009-12-21 11:26 ` Hiremath, Vaibhav
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox