* [PATCH] shrink per_cpu_pages to fit 32byte cacheline
@ 2004-09-13 23:38 Marcelo Tosatti
2004-09-14 6:10 ` Arjan van de Ven
0 siblings, 1 reply; 9+ messages in thread
From: Marcelo Tosatti @ 2004-09-13 23:38 UTC (permalink / raw)
To: akpm, Martin J. Bligh; +Cc: linux-mm
Subject says it all, the following patch shrinks per_cpu_pages
struct from 24 to 16bytes, that makes the per CPU array containing
hot and cold "per_cpu_pages[2]" fit on 32byte cacheline. This structure
is often used so I bet this is a useful optimization.
The counters never reach 2 ^ 16 (the maximum "batch" can get is 64).
Please apply
--- linux-2.6.9-rc1-mm4/include/linux/mmzone.h.orig 2004-09-09 18:42:32.000000000 -0300
+++ linux-2.6.9-rc1-mm4/include/linux/mmzone.h 2004-09-13 13:41:55.589436224 -0300
@@ -43,10 +43,10 @@
#endif
struct per_cpu_pages {
- int count; /* number of pages in the list */
- int low; /* low watermark, refill needed */
- int high; /* high watermark, emptying needed */
- int batch; /* chunk size for buddy add/remove */
+ short int count; /* number of pages in the list */
+ short int low; /* low watermark, refill needed */
+ short int high; /* high watermark, emptying needed */
+ short int batch; /* chunk size for buddy add/remove */
struct list_head list; /* the list of pages */
};
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline
2004-09-13 23:38 [PATCH] shrink per_cpu_pages to fit 32byte cacheline Marcelo Tosatti
@ 2004-09-14 6:10 ` Arjan van de Ven
2004-09-14 9:34 ` Marcelo Tosatti
0 siblings, 1 reply; 9+ messages in thread
From: Arjan van de Ven @ 2004-09-14 6:10 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: akpm, Martin J. Bligh, linux-mm
[-- Attachment #1: Type: text/plain, Size: 520 bytes --]
On Tue, 2004-09-14 at 01:38, Marcelo Tosatti wrote:
> Subject says it all, the following patch shrinks per_cpu_pages
> struct from 24 to 16bytes, that makes the per CPU array containing
> hot and cold "per_cpu_pages[2]" fit on 32byte cacheline. This structure
> is often used so I bet this is a useful optimization.
I'm not sure it's worth it. cachelines are 64 or 128 bytes nowadays and
a short access costs you at least 1 extra cycle per access on several
x86 cpus (byte and dword are cheap, short is not)
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline
2004-09-14 6:10 ` Arjan van de Ven
@ 2004-09-14 9:34 ` Marcelo Tosatti
2004-09-14 11:13 ` Arjan van de Ven
0 siblings, 1 reply; 9+ messages in thread
From: Marcelo Tosatti @ 2004-09-14 9:34 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: akpm, Martin J. Bligh, linux-mm
On Tue, Sep 14, 2004 at 08:10:04AM +0200, Arjan van de Ven wrote:
> On Tue, 2004-09-14 at 01:38, Marcelo Tosatti wrote:
> > Subject says it all, the following patch shrinks per_cpu_pages
> > struct from 24 to 16bytes, that makes the per CPU array containing
> > hot and cold "per_cpu_pages[2]" fit on 32byte cacheline. This structure
> > is often used so I bet this is a useful optimization.
>
> I'm not sure it's worth it. cachelines are 64 or 128 bytes nowadays and
> a short access costs you at least 1 extra cycle per access on several
> x86 cpus (byte and dword are cheap, short is not)
I changed the counters to short thinking about 32 byte cacheline machines.
There are a lot of non-x86 boxes which have 32 byte cachelines (embedded) and which
will continue to have such AFAIK.
How come short access can cost 1 extra cycle? Because you need two "read bytes" ?
It doesnt make much sense to me. I should go look into gcc asm output.
If that's true we should also undo the pagevec shrinking which went into -mm5.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline
2004-09-14 9:34 ` Marcelo Tosatti
@ 2004-09-14 11:13 ` Arjan van de Ven
2004-09-14 10:01 ` Marcelo Tosatti
0 siblings, 1 reply; 9+ messages in thread
From: Arjan van de Ven @ 2004-09-14 11:13 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: akpm, Martin J. Bligh, linux-mm
[-- Attachment #1: Type: text/plain, Size: 387 bytes --]
On Tue, Sep 14, 2004 at 06:34:07AM -0300, Marcelo Tosatti wrote:
> How come short access can cost 1 extra cycle? Because you need two "read bytes" ?
on an x86, a word (2byte) access will cause a prefix byte to the
instruction, that particular prefix byte will take an extra cycle during execution
of the instruction and potentially reduces the parallal decodability of
instructions....
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline
2004-09-14 11:13 ` Arjan van de Ven
@ 2004-09-14 10:01 ` Marcelo Tosatti
2004-09-14 11:44 ` Arjan van de Ven
0 siblings, 1 reply; 9+ messages in thread
From: Marcelo Tosatti @ 2004-09-14 10:01 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: akpm, Martin J. Bligh, linux-mm
On Tue, Sep 14, 2004 at 01:13:29PM +0200, Arjan van de Ven wrote:
> On Tue, Sep 14, 2004 at 06:34:07AM -0300, Marcelo Tosatti wrote:
> > How come short access can cost 1 extra cycle? Because you need two "read bytes" ?
>
> on an x86, a word (2byte) access will cause a prefix byte to the
> instruction, that particular prefix byte will take an extra cycle during execution
> of the instruction and potentially reduces the parallal decodability of
> instructions....
OK thanks Arjan, where did you read this? The "Intel IA32 Optimization Guide" ?
Thanks for the info.
Andrew, we might want to revert the pagevec shrinkage part which changes couple
of variables to short - want to keep the "15" though.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline
2004-09-14 10:01 ` Marcelo Tosatti
@ 2004-09-14 11:44 ` Arjan van de Ven
2004-09-14 22:45 ` Marcelo Tosatti
0 siblings, 1 reply; 9+ messages in thread
From: Arjan van de Ven @ 2004-09-14 11:44 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: akpm, Martin J. Bligh, linux-mm
[-- Attachment #1: Type: text/plain, Size: 838 bytes --]
On Tue, Sep 14, 2004 at 07:01:52AM -0300, Marcelo Tosatti wrote:
> On Tue, Sep 14, 2004 at 01:13:29PM +0200, Arjan van de Ven wrote:
> > On Tue, Sep 14, 2004 at 06:34:07AM -0300, Marcelo Tosatti wrote:
> > > How come short access can cost 1 extra cycle? Because you need two "read bytes" ?
> >
> > on an x86, a word (2byte) access will cause a prefix byte to the
> > instruction, that particular prefix byte will take an extra cycle during execution
> > of the instruction and potentially reduces the parallal decodability of
> > instructions....
>
> OK thanks Arjan, where did you read this? The "Intel IA32 Optimization Guide" ?
some version of that; I can't find it in my current one though. Hrmpf
Maybe there's someone from intel or amd on this list who can confirm the
performance impact of the 0x66 operand size override prefix
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline
2004-09-14 11:44 ` Arjan van de Ven
@ 2004-09-14 22:45 ` Marcelo Tosatti
2004-09-15 0:57 ` Nick Piggin
0 siblings, 1 reply; 9+ messages in thread
From: Marcelo Tosatti @ 2004-09-14 22:45 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: akpm, Martin J. Bligh, linux-mm
On Tue, Sep 14, 2004 at 01:44:12PM +0200, Arjan van de Ven wrote:
> On Tue, Sep 14, 2004 at 07:01:52AM -0300, Marcelo Tosatti wrote:
> > On Tue, Sep 14, 2004 at 01:13:29PM +0200, Arjan van de Ven wrote:
> > > On Tue, Sep 14, 2004 at 06:34:07AM -0300, Marcelo Tosatti wrote:
> > > > How come short access can cost 1 extra cycle? Because you need two "read bytes" ?
> > >
> > > on an x86, a word (2byte) access will cause a prefix byte to the
> > > instruction, that particular prefix byte will take an extra cycle during execution
> > > of the instruction and potentially reduces the parallal decodability of
> > > instructions....
> >
> > OK thanks Arjan, where did you read this? The "Intel IA32 Optimization Guide" ?
>
> some version of that; I can't find it in my current one though. Hrmpf
> Maybe there's someone from intel or amd on this list who can confirm the
> performance impact of the 0x66 operand size override prefix
Prefix "data16" I see... Well it doesnt seem anyone really familiar with this
is part of the list - who you think would be sure about this?
Jun Nakajima maybe?
We need to be sure because we've just done for pagevec's.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline
2004-09-14 22:45 ` Marcelo Tosatti
@ 2004-09-15 0:57 ` Nick Piggin
2004-09-15 0:48 ` Marcelo Tosatti
0 siblings, 1 reply; 9+ messages in thread
From: Nick Piggin @ 2004-09-15 0:57 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Arjan van de Ven, akpm, Martin J. Bligh, linux-mm
Marcelo Tosatti wrote:
>>some version of that; I can't find it in my current one though. Hrmpf
>>Maybe there's someone from intel or amd on this list who can confirm the
>>performance impact of the 0x66 operand size override prefix
>
>
> Prefix "data16" I see... Well it doesnt seem anyone really familiar with this
> is part of the list - who you think would be sure about this?
>
> Jun Nakajima maybe?
>
> We need to be sure because we've just done for pagevec's.
You could leave them as ints, and just make the size of the pagevec
14 on 32-bit archs and 15 on 64-bit ones.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] shrink per_cpu_pages to fit 32byte cacheline
2004-09-15 0:57 ` Nick Piggin
@ 2004-09-15 0:48 ` Marcelo Tosatti
0 siblings, 0 replies; 9+ messages in thread
From: Marcelo Tosatti @ 2004-09-15 0:48 UTC (permalink / raw)
To: Nick Piggin; +Cc: Arjan van de Ven, akpm, Martin J. Bligh, linux-mm
On Wed, Sep 15, 2004 at 10:57:15AM +1000, Nick Piggin wrote:
> Marcelo Tosatti wrote:
>
> >>some version of that; I can't find it in my current one though. Hrmpf
> >>Maybe there's someone from intel or amd on this list who can confirm the
> >>performance impact of the 0x66 operand size override prefix
> >
> >
> >Prefix "data16" I see... Well it doesnt seem anyone really familiar with
> >this is part of the list - who you think would be sure about this?
> >
> >Jun Nakajima maybe?
> >
> >We need to be sure because we've just done for pagevec's.
>
> You could leave them as ints, and just make the size of the pagevec
> 14 on 32-bit archs and 15 on 64-bit ones.
Indeed!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2004-09-15 0:57 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-13 23:38 [PATCH] shrink per_cpu_pages to fit 32byte cacheline Marcelo Tosatti
2004-09-14 6:10 ` Arjan van de Ven
2004-09-14 9:34 ` Marcelo Tosatti
2004-09-14 11:13 ` Arjan van de Ven
2004-09-14 10:01 ` Marcelo Tosatti
2004-09-14 11:44 ` Arjan van de Ven
2004-09-14 22:45 ` Marcelo Tosatti
2004-09-15 0:57 ` Nick Piggin
2004-09-15 0:48 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox