Re: why swap at all? - Bill Davidsen

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Bill Davidsen <davidsen@tmr.com>
To: Ray Bryant <raybry@sgi.com>
Cc: Buddy Lumpkin <b.lumpkin@comcast.net>,
	'Con Kolivas' <kernel@kolivas.org>,
	'FabF' <fabian.frederick@skynet.be>,
	'Bernd Eckenfels' <ecki-news2004-05@lina.inka.de>,
	linux-kernel@vger.kernel.org, lse-tech@lists.sourceforge.net,
	linux-mm@kvack.org
Subject: Re: why swap at all?
Date: Wed, 09 Jun 2004 15:24:13 -0400	[thread overview]
Message-ID: <40C763DD.7090003@tmr.com> (raw)
In-Reply-To: <40C5D7FB.7020402@sgi.com>

Ray Bryant wrote:
> 
> Buddy Lumpkin wrote:
> 
>>  <snip> One method would be to keep the
>> pagecache on it's own list, and move pages to the head of the list any 
>> time
>> they are modified or referenced, and reclaim from the tail.
>> All pages on this list can be considered as "free memory", because any 
>> new
>> memory requests would just cause pages to be evicted from the tail of the
>> list.
>>
> 
> We have code running on Altix that does exactly this.  (Please note,
> however, that this is for our version of Linux 2.4.21 -- Yeah, its
> old, but that is what the product runs at the moment -- we are in
> the process of switching over to Linux 2.6 when all of this will
> have to be re-evaluated.)  The changes are in three parts:
> 
> (1)  We added a new page list, the reclaim list.  Pages are put
> onto the reclaim list when they are inserted into the page cache.
> They are removed from the list when they are marked dirty (buffers
> from the page go on to the LRU dirty list) or when the pages are
> mmap'd into an address space, since in either of these situations,
> the pages are not reclaimable.  (This list is per node in our
> NUMA system.)
> 
> (2)  We added code in __alloc_pages() so that if the local node
> allocation is going to fail (remember that Altix is a NUMA machine),
> we call out to a routine to scan the reclaim list on that node and
> to release enough clean buffer cache pages to make the local
> allocation succeed (plus a few pages, for efficiency).  If this
> doesn't work, we most likely end up spilling the allocation over
> to another node.
> 
> (3)  We added code in generic_file_write() to limit the size of
> the page cache on buffered file I/O write operations.  If the
> current size of the page cache is larger than the limit, we
> call the same routine as above to release some page cache pages.
> If we can't free enough pages to get below the limit, we throttle
> the write process by delaying it for a bit.  This was all to
> avoid the problem of a large buffered file I/O request causing
> the page cache to grow to the point where the system would start
> to swap.  (On our large memory systems, dropping into the
> swapping code can cause the system to freeze for 10's of seconds,
> and that is something we would like to avoid).
> 
> (We actually don't enforce the page cache limit unless the amount
> of free memory has dropped below a certain threshold.  This is to
> keep the page cache from being limited if there is lots of free
> memory -- even though we only limit the page cache on writes,
> it turns out that the kernel is constantly writing to the disk,
> so this also effectively causes the page cache to be limited
> for reads as well.)
> 
> This code was also written in response to customer demand.  They
> don't like the fact that the buffer cache grows and grows on our
> Altix systems, and they want old buffer cache pages to be cleared
> out when they are no longer needed.  Since we almost never suffer
> memory pressure on our systems (and if we do, we are likely in
> trouble), kswapd almost never does this.  Buffer cache pages can
> sit around for days with no one removing them.  The above was one
> approach to solve that problem.
> 
> Pleaes note: YMMV.  An Altix is not a desktop system and I make
> no claims that the above approach is appropriate for everyone.
> For us, it turns out to work better to bias storage allocation
> against unbridled growth of the page cache.  Indeed, we have
> spent a lot of time trying to solve problems related to page
> cache on Altix systems.  Assuming we get our OLS paper done
> in time, you can read more about this in our paper at OLS.
> (If not, we intend to post our experiences paper on the
> oss.sgi.com website.)
> 
> Finally, let me reiterate that we are beginning the process of
> evaluating the 2.6 memory manager wrt the same problem as above.
> Before we will propose a change such as above for 2.6, we have
> to convince ourselves that (1) setting vm_swappiness appropriately
> doesn't solve the problem, and (2) that patches such as the ones
> that Nick Piggin has been proposing don't solve the problem
> either, and that (3) there isn't some other mechanism to deal
> with this in 2.6.

I have to admit that the definition of "desktop machine" has changed a 
lot in the last few years, in terms of hardware, but I have been running 
since 486 days with "what can I build/buy for <$2k which best fits my 
overall computing?" With the onset of cheap memory and Opteron, NUMA 
will be a factor in the next few years in all probability, and SMP has 
been since the dual pentium systems were new.

That said, I think that your work will be useful, even if it is used 
piecemeal or as inspiration to Nick, Andrea, and other who have been 
working in the area. I find Nick's work as of 2.6.7-rc1-mm1 so good I 
haven't moved any of my desktop machines beyond it, but it sounds as if 
your work addresses the issue I mentioned about limiting buffer usage, 
and Rik's comment that the code lacks check and balances. You seem to 
have a balance, I'd love to see it.


-- 
    -bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
  last possible moment - but no longer"  -me
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

     prev parent reply	other threads:[~2004-06-09 19:24 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <fa.amhil9e.o5kt1u@ifi.uio.no>
     [not found] ` <fa.kfm8lru.1l2mdp4@ifi.uio.no>
2004-06-08 15:12   ` Ray Bryant
2004-06-08 15:15   ` Ray Bryant
2004-06-09 19:24     ` Bill Davidsen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40C763DD.7090003@tmr.com \
    --to=davidsen@tmr.com \
    --cc=b.lumpkin@comcast.net \
    --cc=ecki-news2004-05@lina.inka.de \
    --cc=fabian.frederick@skynet.be \
    --cc=kernel@kolivas.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lse-tech@lists.sourceforge.net \
    --cc=raybry@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox