Re: Performance of Readv and the Cost of Revesemaps Under Heavy DB Workloads

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Peter Wong" <wpeter@us.ibm.com>
To: akpm@digeo.com
Cc: linux-mm@kvack.org, riel@nl.linux.org, akpm@zip.com.au,
	mjbligh@us.ibm.com, wli@holomorphy.com,
	dmccr@us.ibm.comgh@us.ibm.com, Bill Hartner <bhartner@us.ibm.com>,
	Troy C Wilson <wilsont@us.ibm.com>
Subject: Re: Performance of Readv and the Cost of Revesemaps Under Heavy DB Workloads
Date: Tue, 10 Sep 2002 13:25:09 -0500	[thread overview]
Message-ID: <OF0C04A218.48BF8D2A-ON85256C2F.007168ED@pok.ibm.com> (raw)

Andrew Morton wrote:
>Peter Wong wrote:
>>
>> All,
>>
>>      I have measured a decision support workload using 2.4.17-based
>> kernel, 2.5.31-based kernel, and 2.5.32-based kernel, all of which
>> use the readv patch made available by Janet Morgan. Janet's patch is
>> also included in Andrew Morton's mm patch, which can be found at
>> http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.32/2.5.32-mm2/.
>> I got the following results.
>>
>> ---------------------------------------------------------------
>> Database Size: 100 GB
>>
>> 2417RV:    2.4.17 (kernel.org)
>>            + lse04-rc1.diffs
>>              - bounce patch by Jens Axboe
>>              - io_reqeust_lock patch by Jonathan Lahr
>>              - rawvary patch by Badari Pulavarty
>>              - readv patches by Janet Morgan
>>            + TASK_UNMAPPED_BASE = 0x10000000
>>            + PAGE_OFFSET        = 0xD0000000
>>
>> 2531RV:    2.5.31 (kernel.org)
>>            + readv patch from Janet Morgan
>>            + TASK_UNMAPPED_BASE = 0x10000000
>>            + PAGE_OFFSET        = 0xC0000000
>>
>> 2532RV:    2.5.32 (kernel.org)
>>            + mm-2 patch from Andrew Morton which
>>              includes Janet's readv patch
>>            + TASK_UNMAPPED_BASE = 0x10000000
>>            + PAGE_OFFSET        = 0xC0000000
>>
>>      Based upon the throughput rate,
>>           2531RV is 99.8% of 2417RV;
>>           2532RV is  100% of 2417RV.
>
>Well that's a bit sad.  I assume the test was IO-bound?  Did
>you measure the CPU utilisation for the run as well?
>

The CPU utilization among these 3 kernels is similar:

                        User(%)     System(%)   Idle (%)

              2417RV         66             9         25
              2531RV         67             9         24
              2632RV         67             7         26

>What is your overall take on the performance of 2.5 with respect
>to 2.4 and, indeed, other operating systems?

Based upon the measurements of readv on this decision support
workload that I got so far, the 2.5 performance is about the
same as the 2.4 performance. I reported earlier that 2.5
performs better than 2.4 by 8% while using "read" for this
workload.

>
>>       There are 110 prefetchers for the runs, and ~2 GB of shared
>> memory space used by the database, i.e., ~500,000 pages. With Andrew's
>> mm patch, the maximum number of reversemaps reaches 43.7 millions. That
>> is, each page is used by ~87 processes. With 8 bytes per reversemap,
>> it costs ~350MB of the kernel memory, which is quite significant. Note
>> that the database system used forks processes and does not use
>> pthreads.
>
>Look in /proc/slabinfo to know the exact amount of memory which the
>reversemaps are using.
>

The maximum number of slabs used for pte_chains as observed in
/proc/slabinfo is as follows:

pte_chain         1633008 6464730     32 45175 57210    1 :  252  126
                                               ^^^^^    ^

     Memory consumed = 57210 * 4 KB = ~223 MB

David McCracken pointed out that you have done some optimization on
the pte_chain structure. It is no longer the case that every
reversemap costs 8 bytes. You allocate 32 bytes for each pte_chain,
4 bytes for the next pointer, and 28 bytes for 7 PTE pointers with
4 bytes each. Thus, if the pte_chain is fully occupied, each
reversemap costs ~4.7 bytes.

>You don't mention whether you're using CONFIG_HIGHPTE.  Probably
>not; I think it was broken in that kernel.
>
>- CONFIG_HIGHPTE will reduce ZONE_NORMAL pressure by moving pagetables
>  into highmem.
>
>- CONFIG_HIGHPTE+CONFIG_HIGHMEM64G will not be as favourable, because
>  struct page gains 4 bytes and the reverse mapping objects double
>  in size.
>
>If your machine has more than 4G (does it?) then you'll need
>CONFIG_HIGHMEM64G=y and CONFIG_HIGHPTE=y.
>
>Please, God: don't make us put pte_chains in highmem as well :(
>

My machine has 4GB RAM and I did not use CONFIG_HIGHPTE.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

next             reply	other threads:[~2002-09-10 18:25 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-09-10 18:25 Peter Wong [this message]
  -- strict thread matches above, loose matches on Subject: below --
2002-09-09 20:07 Peter Wong
2002-09-09 20:30 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OF0C04A218.48BF8D2A-ON85256C2F.007168ED@pok.ibm.com \
    --to=wpeter@us.ibm.com \
    --cc=akpm@digeo.com \
    --cc=akpm@zip.com.au \
    --cc=dmccr@us.ibm.comgh \
    --cc=linux-mm@kvack.org \
    --cc=mjbligh@us.ibm.com \
    --cc=riel@nl.linux.org \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox