Re: Ideas for memory management hackers.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Zlatko Calusic <Zlatko.Calusic@CARNet.hr>
To: "Dr. Werner Fink" <werner@suse.de>
Cc: linux-mm@kvack.org
Subject: Re: Ideas for memory management hackers.
Date: 10 Dec 1997 14:13:34 +0100	[thread overview]
Message-ID: <87g1o1nxxd.fsf@atlas.infra.CARNet.hr> (raw)
In-Reply-To: "Dr. Werner Fink"'s message of Tue, 9 Dec 1997 17:11:20 +0100

"Dr. Werner Fink" <werner@suse.de> writes:

> > 
> > I have integrated mmap aging in kswapd, without the need for
> > vhand, in 2.1.71 (experimental). As ppp isn't working in 2.1.71
> > I'm back to 2.1.66 now, but I have seen kswapd use over 10% of
> > CPU for short times now :(
> 
> Q: if ageing is now a separate part the CPU usage of freeing a page
>    in kswapd and __get_free_pages should drop, shouldn't it?
> 
> > I think I'll send it to Linus (together with Zlatko's
> > big-order hack) as a bug-fix (we're on feature-freeze after all:)
> > for inclusion in 2.1.72...
> > 
> > opinions please,
> 
> Q2: Is the patch available (ftp/http) for testing/reading?
> 

Due to a heavy demand... :)

Here it comes (comments below):

diff -urN linux-2.1.61/include/linux/swap.h linux/include/linux/swap.h
--- linux-2.1.61/include/linux/swap.h	Sat Oct 18 00:49:15 1997
+++ linux/include/linux/swap.h	Fri Oct 31 19:42:09 1997
@@ -34,6 +34,7 @@

 extern int nr_swap_pages;
 extern int nr_free_pages;
+extern int nr_free_pages_bigorder;
 extern atomic_t nr_async_pages;
 extern int min_free_pages;
 extern int free_pages_low;
diff -urN linux-2.1.61/mm/page_alloc.c linux/mm/page_alloc.c
--- linux-2.1.61/mm/page_alloc.c	Tue Jun 17 01:36:01 1997
+++ linux/mm/page_alloc.c	Fri Oct 31 19:42:10 1997
@@ -30,6 +30,9 @@
 int nr_swap_pages = 0;
 int nr_free_pages = 0;

+/* Number of the free pages in chunks of order 2 and bigger */
+int nr_free_pages_bigorder = 0;
+
 /*
  * Free area management
  *
@@ -118,12 +121,17 @@
 		if (!test_and_change_bit(index, area->map))
 			break;
 		remove_mem_queue(list(map_nr ^ -mask));
+		if (order >= 2)
+			nr_free_pages_bigorder -= 1 << order;
 		mask <<= 1;
+		order++;
 		area++;
 		index >>= 1;
 		map_nr &= mask;
 	}
 	add_mem_queue(area, list(map_nr));
+	if (order >= 2)
+		nr_free_pages_bigorder += 1 << order;

 #undef list

@@ -171,6 +179,8 @@
 				(prev->next = ret->next)->prev = prev; \
 				MARK_USED(map_nr, new_order, area); \
 				nr_free_pages -= 1 << order; \
+			        if (new_order >= 2) \
+				        nr_free_pages_bigorder -= 1 << new_order; \
 				EXPAND(ret, map_nr, order, new_order, area); \
 				spin_unlock_irqrestore(&page_alloc_lock, flags); \
 				return ADDRESS(map_nr); \
@@ -187,6 +197,8 @@
 		area--; high--; size >>= 1; \
 		add_mem_queue(area, map); \
 		MARK_USED(index, high, area); \
+		if (high >= 2) \
+		        nr_free_pages_bigorder += 1 << high; \
 		index += size; \
 		map += size; \
 	} \
diff -urN linux-2.1.61/mm/vmscan.c linux/mm/vmscan.c
--- linux-2.1.61/mm/vmscan.c	Thu Oct 23 22:30:25 1997
+++ linux/mm/vmscan.c	Fri Oct 31 19:42:10 1997
@@ -465,7 +465,8 @@
 			pages = nr_free_pages;
 			if (nr_free_pages >= min_free_pages)
 				pages += atomic_read(&nr_async_pages);
-			if (pages >= free_pages_high)
+			if (pages >= free_pages_high &&
+				nr_free_pages_bigorder >= min_free_pages / 2)
 				break;
 			wait = (pages < free_pages_low);
 			if (try_to_free_page(GFP_KERNEL, 0, wait))
@@ -489,7 +490,7 @@
 	int want_wakeup = 0, memory_low = 0;
 	int pages = nr_free_pages + atomic_read(&nr_async_pages);

-	if (pages < free_pages_low)
+	if (pages < free_pages_low || nr_free_pages_bigorder < min_free_pages / 2)
 		memory_low = want_wakeup = 1;
 	else if (pages < free_pages_high && jiffies >= next_swap_jiffies)
 		want_wakeup = 1;

It was originally developed for 2.1.61, but it works perfectly on
2.1.71 (I just checked). It was posted on linux-kernel list during
recent problems (massive unsubscribe), so many people missed it.

Now some comments on the patch:

I had nasty lockups with all 2.1 kernels. I traced problem down to the
network stuff which was trying to allocate pages of order 2 what was
constantly failing. Problem was (and still is!) that Linux doesn't
swap pages out to get more free memory if it already has
free_pages_high or more free pages. Of course, it is correct
behaviour, but... sometimes memory is completely fragmented, and all
free chunks are of one or two pages, so there's no way you could get
16KB of contiguous memory (even if you have 512KB free!). Networking
can't proceed without that and if you're logged remotely you're in
fact completely disconnected.

The patch was my initial attempt to solve that problems, but in the
end I found that it had some other problems which I didn't like.
Many people that tried it, reported that their machines swapped much
more with patch applied. And I noticed it for myself, too. It is true
that exactly in that cases when Linux swaps out heavily to get bigger
chunks of memory, it would lockup without the patch, but in the end I
didn't liked the idea and abandoned work on that.

My opinion is that the problem is much bigger, and we will need much
more hard work to resolve it in the future.

That shouldn't stop anybody from experimenting with the patch, since
it is simple enough and thoroughly tested, so you won't have any
problems with it. If you don't count heavier swapping, that is. :)

My current workaround against network blockups is in mm/slab.c where I 
explicitely ask from slab allocator that it stop using such a big
memory chunks for small network buffers (mostly of ~1700 bytes in
size or less).

It works perfectly and nobody knows how (and if) it affects
the performance, but (so?) I'm happy. :)

Regards,
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
	   What has four legs and an arm? A happy pitbull.

next prev parent reply	other threads:[~1997-12-10 13:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <348D3B36.673BEE82@nospam.isltd.insignia.co.uk>
1997-12-09 14:53 ` Rik van Riel
1997-12-09 16:11   ` Dr. Werner Fink
1997-12-09 17:10     ` Rik van Riel
1997-12-10 13:13     ` Zlatko Calusic [this message]
1997-12-10 15:21       ` Dr. Werner Fink
1997-12-10 19:13         ` Benjamin C.R. LaHaise
1997-12-10 21:55           ` Rik van Riel
1997-12-09 17:45   ` Benjamin C.R. LaHaise
1997-12-09 17:53     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87g1o1nxxd.fsf@atlas.infra.CARNet.hr \
    --to=zlatko.calusic@carnet.hr \
    --cc=linux-mm@kvack.org \
    --cc=werner@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox