linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* RE: 2.5.59-mm2
@ 2003-01-19 21:44 Nakajima, Jun
  2003-01-19 22:05 ` 2.5.59-mm2 Andrew Morton
  0 siblings, 1 reply; 6+ messages in thread
From: Nakajima, Jun @ 2003-01-19 21:44 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, linux-mm, Kamble, Nitin A, Mallick, Asit K, Saxena, Sunil

> costly is a relative thing. a dozen cycles perhaps; do it once per
> 10 seconds and it's invisbile. I agree that if you want to do it thousands
> of times per second it might become a problem.But so far I don't see the
> real need for that.

Well, "complex" is a relative thing as well. At that time, we did not have a sophisticated algorithm to adjust the time period depending on the interrupt load. So today we may not see the difference. 

Anyway, I agree that complex things or policies should be moved to user mode as much as possible and the kernel should have the mechanism. And we'll take a look at your code. My point was that doing in user mode cannot justify wasting CPUs cycles for not good reasons.

Thanks,
Jun


> -----Original Message-----
> From: Arjan van de Ven [mailto:arjanv@redhat.com]
> Sent: Sunday, January 19, 2003 12:19 PM
> To: Nakajima, Jun
> Cc: arjanv@redhat.com; Andrew Morton; linux-mm@kvack.org; Kamble, Nitin A;
> Mallick, Asit K; Saxena, Sunil
> Subject: Re: 2.5.59-mm2
> 
> On Sun, Jan 19, 2003 at 11:45:35AM -0800, Nakajima, Jun wrote:
> > We initially implemented it in user level, accessing /proc/interrupts.
> We have two issues/concerns at that point. And we saw better results with
> kernel mode.
> 
> > - the data structures required, such as kstat, are already in the kernel
> >   and converting the text info from /proc/interrupts was costly in
> >   user mode.
> 
> costly is a relative thing. a dozen cycles perhaps; do it once per
> 10 seconds and it's invisbile. I agree that if you want to do it thousands
> of times per second it might become a problem.But so far I don't see the
> real need for that.
> 
> > - we suspect that frequent writes (asynchronous to interrupts)
> >   to /proc/irq/N/smp_affinity might expose a race condition in interrupt
> >   machinery. For example, we saw a hang caused by such a write.
> 
> if there's a bug there it needs fixing anyway; even inside the kernel
> you'll have a similar race I suspect
> 
> > So to implement it in user level efficiently, we need API that
> > - that provide binary data that can be easily processed by such a daemon,
> 
> there is rightfully a veto on such ABI and it's also not needed.
> /proc/interrupts is less than 4Kb normally; it'll be in cache so parsing
> it will be cheap. Sure the code I posted isn't optimal (far from it) but
> that can be optimized a lot.
> 
> Greetings,
>   Arjan van de Ven
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.59-mm2
  2003-01-19 21:44 2.5.59-mm2 Nakajima, Jun
@ 2003-01-19 22:05 ` Andrew Morton
  0 siblings, 0 replies; 6+ messages in thread
From: Andrew Morton @ 2003-01-19 22:05 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: arjanv, linux-mm, nitin.a.kamble, asit.k.mallick, sunil.saxena

"Nakajima, Jun" <jun.nakajima@intel.com> wrote:
>
> My point was that doing in user mode cannot justify wasting CPUs cycles
> for not good reasons.

Performing this function in userspace means that we can implement more
effective algorithms, more configurability and perhaps better monitoring - so
on machines which need it, the overhead could well be more than reclaimed.

And we can work on the overhead.  Perhaps add a lightweight alternative to
/proc/interrupts, and change the IRQ affinity setting code so that it merely
places some settings into memory, and those are actually acted upon when the
next interrupt occurs (should be able to do this locklessly).

Given that your new algorithm requires granularity on the order of a second
(this is good!), Arjan's approach is very attractive indeed.

And it should be pretty easy to get the implementation working on other
architectures.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.59-mm2
  2003-01-19 19:45 2.5.59-mm2 Nakajima, Jun
@ 2003-01-19 20:18 ` Arjan van de Ven
  0 siblings, 0 replies; 6+ messages in thread
From: Arjan van de Ven @ 2003-01-19 20:18 UTC (permalink / raw)
  To: Nakajima, Jun
  Cc: arjanv, Andrew Morton, linux-mm, Kamble, Nitin A, Mallick,
	Asit K, Saxena, Sunil

On Sun, Jan 19, 2003 at 11:45:35AM -0800, Nakajima, Jun wrote:
> We initially implemented it in user level, accessing /proc/interrupts. We have two issues/concerns at that point. And we saw better results with kernel mode.

> - the data structures required, such as kstat, are already in the kernel
>   and converting the text info from /proc/interrupts was costly in
>   user mode.

costly is a relative thing. a dozen cycles perhaps; do it once per
10 seconds and it's invisbile. I agree that if you want to do it thousands
of times per second it might become a problem.But so far I don't see the
real need for that.

> - we suspect that frequent writes (asynchronous to interrupts)
>   to /proc/irq/N/smp_affinity might expose a race condition in interrupt
>   machinery. For example, we saw a hang caused by such a write.

if there's a bug there it needs fixing anyway; even inside the kernel
you'll have a similar race I suspect

> So to implement it in user level efficiently, we need API that
> - that provide binary data that can be easily processed by such a daemon,

there is rightfully a veto on such ABI and it's also not needed.
/proc/interrupts is less than 4Kb normally; it'll be in cache so parsing
it will be cheap. Sure the code I posted isn't optimal (far from it) but
that can be optimized a lot.

Greetings,
  Arjan van de Ven
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: 2.5.59-mm2
@ 2003-01-19 19:45 Nakajima, Jun
  2003-01-19 20:18 ` 2.5.59-mm2 Arjan van de Ven
  0 siblings, 1 reply; 6+ messages in thread
From: Nakajima, Jun @ 2003-01-19 19:45 UTC (permalink / raw)
  To: arjanv, Andrew Morton
  Cc: linux-mm, Kamble, Nitin A, Mallick, Asit K, Saxena, Sunil

We initially implemented it in user level, accessing /proc/interrupts. We have two issues/concerns at that point. And we saw better results with kernel mode.
- the data structures required, such as kstat, are already in the kernel and converting the text info from /proc/interrupts was costly in user mode.
- we suspect that frequent writes (asynchronous to interrupts) to /proc/irq/N/smp_affinity might expose a race condition in interrupt machinery. For example, we saw a hang caused by such a write.

So to implement it in user level efficiently, we need API that
- that provide binary data that can be easily processed by such a daemon,
- safer API to change routing. Or we need to take a closer look at /proc/irq/N/smp_affinity.

Thanks,
Jun


> -----Original Message-----
> From: Arjan van de Ven [mailto:arjanv@redhat.com]
> Sent: Saturday, January 18, 2003 12:13 PM
> To: Andrew Morton
> Cc: linux-mm@kvack.org; Kamble, Nitin A; Nakajima, Jun; Mallick, Asit K;
> Saxena, Sunil
> Subject: Re: 2.5.59-mm2
> 
> 
> > +kirq-up-fix.patch
> >
> >  Fix the kirq build for non-SMP
> 
> Hi,
> 
> Is there any reason to put this complexity in the kernel instead of
> doing it from a userspace daemon?
> 
> A userspace daemon can do higher level evaluations, read config files
> about the system (like numa configuration etc etc) and all 2.4/2.5
> kernels already have a userspace api for setting irq affinity..
> 
> an example of a simple version of such daemon is:
> http://people.redhat.com/arjanv/irqbalance/irqbalance-0.03.tar.gz
> 
> any chance of testing this in an intel lab?
> 
> Greetings,
>      Arjan van de Ven
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.59-mm2
  2003-01-18  8:20 2.5.59-mm2 Andrew Morton
@ 2003-01-18 20:12 ` Arjan van de Ven
  0 siblings, 0 replies; 6+ messages in thread
From: Arjan van de Ven @ 2003-01-18 20:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, nitin.a.kamble, jun.nakajima, asit.k.mallick, sunil.saxena

[-- Attachment #1: Type: text/plain, Size: 586 bytes --]


> +kirq-up-fix.patch
> 
>  Fix the kirq build for non-SMP

Hi,

Is there any reason to put this complexity in the kernel instead of
doing it from a userspace daemon?

A userspace daemon can do higher level evaluations, read config files
about the system (like numa configuration etc etc) and all 2.4/2.5
kernels already have a userspace api for setting irq affinity..

an example of a simple version of such daemon is:
http://people.redhat.com/arjanv/irqbalance/irqbalance-0.03.tar.gz

any chance of testing this in an intel lab?

Greetings,
     Arjan van de Ven

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* 2.5.59-mm2
@ 2003-01-18  8:20 Andrew Morton
  2003-01-18 20:12 ` 2.5.59-mm2 Arjan van de Ven
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2003-01-18  8:20 UTC (permalink / raw)
  To: linux-kernel, linux-mm

http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm2/

- Added Andi's lockless current_kernel_time() patch again.  It'll break
  non-ia32 builds.  But it appears that we should push ahead and get this
  implemented across the other architectures.

- Updated oprofile patches from John.

- Adam's devfs rework is back in.  We only had two testers last time, (out
  of maybe 150 downloads) which is fairly disappointing.

  So if you use devfs, _please_ test this change and send either success or
  failure reports (not to me though).

  Adam said:

   If you want devfsd functionality (well, at least the "REGISTER" and
   "LOOKUP" events), you can get my user level program devfs_helper, which is
   a reduced functionality replacement program for devfsd from the following
   URL.

	ftp://ftp.yggdrasil.com/pub/dist/device_control/devfs/devfs_helper-0.2.tar.gz




Changes since 2.5.59-mm1:


+lockless-current_kernel_time.patch

 Reinstated.

-mixer-bounds-check.patch

 This didn't look right.

+kirq-up-fix.patch

 Fix the kirq build for non-SMP

-op4-fix.patch

 Not needed with the updated oprofile patch

+oprofile_cpu-as-string.patch

 oprofile work from John.

+remove-will_become_orphaned_pgrp.patch

 Cleanup

+MAX_IO_APICS-ifdef.patch

 NUMA fix

+dac960-error-retry.patch

 DAC960 robustness enhancements

+put_user-warning-fix.patch

 ARM build fix

+vmlinux-fix.patch

 Should fix the modprobe oopses with RH8.0 toolchains

+smalldevfs.patch

 devfs rework

+sound-firmware-load-fix.patch

 Build fix for OSS sound card firmware loading.

+exit_mmap-fix2.patch

 Fix exec of 32-bit apps from 64-bit apps on PPC64/ia64/sparc64, perhaps.



All 43 patches:

kgdb.patch

devfs-fix.patch

deadline-np-42.patch
  (undescribed patch)

deadline-np-43.patch
  (undescribed patch)

setuid-exec-no-lock_kernel.patch
  remove lock_kernel() from exec of setuid apps

buffer-debug.patch
  buffer.c debugging

warn-null-wakeup.patch

reiserfs-readpages.patch
  reiserfs v3 readpages support

fadvise.patch
  implement posix_fadvise64()

ext3-scheduling-storm.patch
  ext3: fix scheduling storm and lockups

auto-unplug.patch
  self-unplugging request queues

less-unplugging.patch
  Remove most of the blk_run_queues() calls

lockless-current_kernel_time.patch
  Lockless current_kernel_timer()

scheduler-tunables.patch
  scheduler tunables

htlb-2.patch
  hugetlb: fix MAP_FIXED handling

kirq.patch

kirq-up-fix.patch
  Subject: Re: 2.5.59-mm1

ext3-truncate-ordered-pages.patch
  ext3: explicitly free truncated pages

prune-icache-stats.patch
  add stats for page reclaim via inode freeing

vma-file-merge.patch

mmap-whitespace.patch

read_cache_pages-cleanup.patch
  cleanup in read_cache_pages()

remove-GFP_HIGHIO.patch
  remove __GFP_HIGHIO

quota-lockfix.patch
  quota locking fix

quota-offsem.patch
  quota semaphore fix

oprofile-p4.patch

oprofile_cpu-as-string.patch
  oprofile cpu-as-string

wli-11_pgd_ctor.patch
  (undescribed patch)

wli-11_pgd_ctor-update.patch
  pgd_ctor update

stack-overflow-fix.patch
  stack overflow checking fix

Richard_Henderson_for_President.patch
  Subject: [PATCH] Richard Henderson for President!

parenthesise-pgd_index.patch
  Subject: i386 pgd_index() doesn't parenthesize its arg

macro-double-eval-fix.patch
  Subject: Re: i386 pgd_index() doesn't parenthesize its arg

mmzone-parens.patch
  asm-i386/mmzone.h macro paren/eval fixes

blkdev-fixes.patch
  blkdev.h fixes

remove-will_become_orphaned_pgrp.patch
  remove will_become_orphaned_pgrp()

MAX_IO_APICS-ifdef.patch
  MAX_IO_APICS #ifdef'd wrongly

dac960-error-retry.patch
  Subject: [PATCH] linux2.5.56 patch to DAC960 driver for error retry

put_user-warning-fix.patch
  Subject: Re: Linux 2.5.59

vmlinux-fix.patch
  vmlinux fix

smalldevfs.patch
  smalldevfs

sound-firmware-load-fix.patch
  soundcore.c referenced non-existent errno variable

exit_mmap-fix2.patch
  exit_mmap fix for 64bit->32bit execs


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-01-19 22:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-19 21:44 2.5.59-mm2 Nakajima, Jun
2003-01-19 22:05 ` 2.5.59-mm2 Andrew Morton
  -- strict thread matches above, loose matches on Subject: below --
2003-01-19 19:45 2.5.59-mm2 Nakajima, Jun
2003-01-19 20:18 ` 2.5.59-mm2 Arjan van de Ven
2003-01-18  8:20 2.5.59-mm2 Andrew Morton
2003-01-18 20:12 ` 2.5.59-mm2 Arjan van de Ven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox