linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: Swap Questions (includes possible bug) - swapfile.c / swap.c
@ 1999-05-12 10:30 Manfred Spraul
  1999-05-12 18:36 ` Stephen C. Tweedie
  0 siblings, 1 reply; 4+ messages in thread
From: Manfred Spraul @ 1999-05-12 10:30 UTC (permalink / raw)
  To: Rik van Riel, Joseph Pranevich; +Cc: Linux Kernel, Linux MM

>On Tue, 11 May 1999, Joseph Pranevich wrote:
> case 2:
>  error = -EINVAL;
>  if (swap_header->info.nr_badpages > MAX_SWAP_BADPAGES)
>  goto bad_swap;

MAX_SWAP_BADPAGES is a limitation of the swap format 2,
it's not a kernel limitation. (check include/linux/swap.h)
 
Rik wrote:
>On Tue, 11 May 1999, Joseph Pranevich wrote:
>> set_blocksize(p->swap_device, PAGE_SIZE);
>
>Hmm, haven't we seen this one before? Stephen?


There is another problem with this line:
set_blocksize() also means that the previous block size
doesn't work anymore:
if you accidentially enter 'swapon /dev/hda1' (my root drive)
instead of 'swapon /dev/hda3', then you have to fsck:
sys_swapon sets the blocksize, then it rejects the call
because there is no swap signature, but now ext2
can't access the partition (blocksize 4096, ext2 needs 1024).

I've posted a patch a few weeks ago, but I received no reply.

Are such problems ignored? (The super user can crash the
machine at will, one more crash doesn't matter)

Regards,
    Manfred

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Swap Questions (includes possible bug) - swapfile.c / swap.c
  1999-05-12 10:30 Swap Questions (includes possible bug) - swapfile.c / swap.c Manfred Spraul
@ 1999-05-12 18:36 ` Stephen C. Tweedie
  1999-05-12 19:45   ` Manfred Spraul
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen C. Tweedie @ 1999-05-12 18:36 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Rik van Riel, Joseph Pranevich, Linux Kernel, Linux MM

Hi,

On Wed, 12 May 1999 12:30:27 +0200, "Manfred Spraul"
<masp0008@stud.uni-sb.de> said:

> There is another problem with this line:
> set_blocksize() also means that the previous block size
> doesn't work anymore:
> if you accidentially enter 'swapon /dev/hda1' (my root drive)
> instead of 'swapon /dev/hda3', then you have to fsck:

Yep, it would make perfect sense to move the set_blocksize to be after
the EBUSY check.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Swap Questions (includes possible bug) - swapfile.c / swap.c
  1999-05-12 18:36 ` Stephen C. Tweedie
@ 1999-05-12 19:45   ` Manfred Spraul
  0 siblings, 0 replies; 4+ messages in thread
From: Manfred Spraul @ 1999-05-12 19:45 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Manfred Spraul, Rik van Riel, Joseph Pranevich, Linux Kernel, Linux MM

[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]

"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Wed, 12 May 1999 12:30:27 +0200, "Manfred Spraul"
> <masp0008@stud.uni-sb.de> said:
> 
> > There is another problem with this line:
> > set_blocksize() also means that the previous block size
> > doesn't work anymore:
> > if you accidentially enter 'swapon /dev/hda1' (my root drive)
> > instead of 'swapon /dev/hda3', then you have to fsck:
> 
> Yep, it would make perfect sense to move the set_blocksize to be after
> the EBUSY check.

Unfortunately that doesn't solve the problem:
The current EBUSY check checks that the partition is not used as a
swap partition, it doesn't check the VFS, and it doesn't check
whether the RAID driver uses the volume.

I've attached an old patch (vs.2.2.6):
I've send that patch to linux-kernel@vger, Alan (..wait until Linus
returns from vacation..), Linus (no reply).

The patch adds a bitmap to the block cache for EBUSY checks.
Actually, we can use this bitmap for other bits if we use devfs and
dynamic MAJOR/MINOR codes:
we must replace all 'MAJOR==LOOP', 'MAJOR==IDE' etc. if we want to
support dynamic block device MAJOR/MINOR's.

Additionally, we save 6-8 kB kernel memory. (ro_bits was an 8 kB
static array).

If you think that the patch is usefull, then I'll make a new patch
vs 2.3.0, otherwise I'll wait until devfs is added, and I'll
try to write a larger patch (dynamic MAJOR/MINOR for block cache)
that includes this one.

--
	Manfred

[-- Attachment #2: patch_busy-2.2.6 --]
[-- Type: text/plain, Size: 5707 bytes --]

diff -r -u -P -x CVS -x *,v 2.2.6/drivers/block/ll_rw_blk.c current/drivers/block/ll_rw_blk.c
--- 2.2.6/drivers/block/ll_rw_blk.c	Wed Mar 31 00:56:57 1999
+++ current/drivers/block/ll_rw_blk.c	Thu Apr 22 18:02:20 1999
@@ -16,6 +16,7 @@
 #include <linux/config.h>
 #include <linux/locks.h>
 #include <linux/mm.h>
+#include <linux/slab.h>
 #include <linux/init.h>
 
 #include <asm/system.h>
@@ -241,8 +242,24 @@
 }
 
 /* RO fail safe mechanism */
+/* device busy: (C) Manfred Spraul masp0008@stud.uni-sb.de */
 
-static long ro_bits[MAX_BLKDEV][8];
+struct kdev_bits {
+	unsigned char ro_bits[(1U << MINORBITS)/8];
+	unsigned char busy_bits[(1U << MINORBITS)/8];
+};
+
+static struct kdev_bits* kdev_info[MAX_BLKDEV] = { NULL, NULL };
+
+#define ALLOC_KDEV_BITS(major) \
+	if (kdev_info[major] == NULL) { \
+		kdev_info[major] = kmalloc(sizeof(struct kdev_bits),GFP_KERNEL); \
+		if(kdev_info[major] == NULL) { \
+			printk("ALLOC_KDEV_BITS() failed due to ENOMEM.\n"); \
+			return; \
+		} \
+		memset(kdev_info[major],0,sizeof(struct kdev_bits)); \
+	}
 
 int is_read_only(kdev_t dev)
 {
@@ -251,7 +268,8 @@
 	major = MAJOR(dev);
 	minor = MINOR(dev);
 	if (major < 0 || major >= MAX_BLKDEV) return 0;
-	return ro_bits[major][minor >> 5] & (1 << (minor & 31));
+	if (kdev_info[major] == NULL) return 0;
+     	return kdev_info[major]->ro_bits[minor >> 3] & (1 << (minor & 7));
 }
 
 void set_device_ro(kdev_t dev,int flag)
@@ -261,10 +279,39 @@
 	major = MAJOR(dev);
 	minor = MINOR(dev);
 	if (major < 0 || major >= MAX_BLKDEV) return;
-	if (flag) ro_bits[major][minor >> 5] |= 1 << (minor & 31);
-	else ro_bits[major][minor >> 5] &= ~(1 << (minor & 31));
+	ALLOC_KDEV_BITS(major)
+	if (flag)
+		kdev_info[major]->ro_bits[minor >> 3] |= 1 << (minor & 7);
+	 else
+		kdev_info[major]->ro_bits[minor >> 3] &= ~(1 << (minor & 7));
+}
+
+int is_device_busy(kdev_t dev)
+{
+	int minor,major;
+
+	major = MAJOR(dev);
+	minor = MINOR(dev);
+	if (major < 0 || major >= MAX_BLKDEV) return 0;
+	if (kdev_info[major] == NULL) return 0;
+	return kdev_info[major]->busy_bits[minor >> 3] & (1 << (minor & 7));
 }
 
+void set_device_busy(kdev_t dev,int flag)
+{
+	int minor,major;
+	
+	major = MAJOR(dev);
+	minor = MINOR(dev);
+	if (major < 0 || major >= MAX_BLKDEV) return;
+	ALLOC_KDEV_BITS(major)
+	if (flag)
+		kdev_info[major]->busy_bits[minor >> 3] |= 1 << (minor & 7);
+	 else
+		kdev_info[major]->busy_bits[minor >> 3] &= ~(1 << (minor & 7));
+}
+
+
 static inline void drive_stat_acct(int cmd, unsigned long nr_sectors,
                                    short disk_index)
 {
@@ -731,7 +778,6 @@
 		req->rq_status = RQ_INACTIVE;
 		req->next = NULL;
 	}
-	memset(ro_bits,0,sizeof(ro_bits));
 	memset(max_readahead, 0, sizeof(max_readahead));
 	memset(max_sectors, 0, sizeof(max_sectors));
 #ifdef CONFIG_AMIGA_Z2RAM
diff -r -u -P -x CVS -x *,v 2.2.6/fs/super.c current/fs/super.c
--- 2.2.6/fs/super.c	Tue Apr 20 13:41:57 1999
+++ current/fs/super.c	Thu Apr 22 18:02:20 1999
@@ -131,6 +131,7 @@
 		vfsmnttail->mnt_next = lptr;
 		vfsmnttail = lptr;
 	}
+	set_device_busy(sb->s_dev,1);
 out:
 	return lptr;
 }
@@ -165,6 +166,8 @@
 	kfree(tofree->mnt_devname);
 	kfree(tofree->mnt_dirname);
 	kfree_s(tofree, sizeof(struct vfsmount));
+
+	set_device_busy(dev,0);
 }
 
 int register_filesystem(struct file_system_type * fs)
@@ -873,6 +876,8 @@
 	if (dir_d->d_covers != dir_d)
 		goto dput_and_out;
 
+	if (is_device_busy(dev))
+		goto dput_and_out;
 	/*
 	 * Note: If the superblock already exists,
 	 * read_super just does a get_super().
diff -r -u -P -x CVS -x *,v 2.2.6/include/linux/fs.h current/include/linux/fs.h
--- 2.2.6/include/linux/fs.h	Tue Apr 20 13:41:58 1999
+++ current/include/linux/fs.h	Thu Apr 22 18:02:20 1999
@@ -839,6 +839,8 @@
 extern struct buffer_head * find_buffer(kdev_t dev, int block, int size);
 extern void ll_rw_block(int, int, struct buffer_head * bh[]);
 extern int is_read_only(kdev_t);
+extern int is_device_busy(kdev_t);
+extern void set_device_busy(kdev_t dev, int flag);
 extern void __brelse(struct buffer_head *);
 extern inline void brelse(struct buffer_head *buf)
 {
diff -r -u -P -x CVS -x *,v 2.2.6/kernel/ksyms.c current/kernel/ksyms.c
--- 2.2.6/kernel/ksyms.c	Wed Mar 31 00:56:57 1999
+++ current/kernel/ksyms.c	Thu Apr 22 18:02:20 1999
@@ -47,7 +47,7 @@
 #endif
 
 extern char *get_options(char *str, int *ints);
-extern void set_device_ro(kdev_t dev,int flag);
+extern void set_device_ro(kdev_t dev, int flag);
 extern struct file_operations * get_blkfops(unsigned int);
 extern int blkdev_release(struct inode * inode);
 #if !defined(CONFIG_NFSD) && defined(CONFIG_NFSD_MODULE)
@@ -209,6 +209,8 @@
 EXPORT_SYMBOL(blk_dev);
 EXPORT_SYMBOL(is_read_only);
 EXPORT_SYMBOL(set_device_ro);
+EXPORT_SYMBOL(is_device_busy);
+EXPORT_SYMBOL(set_device_busy);
 EXPORT_SYMBOL(bmap);
 EXPORT_SYMBOL(sync_dev);
 EXPORT_SYMBOL(get_blkfops);
diff -r -u -P -x CVS -x *,v 2.2.6/mm/swapfile.c current/mm/swapfile.c
--- 2.2.6/mm/swapfile.c	Wed Mar 31 00:56:57 1999
+++ current/mm/swapfile.c	Thu Apr 22 18:02:20 1999
@@ -414,6 +414,7 @@
 			filp.f_op->release(dentry->d_inode,&filp);
 			filp.f_op->release(dentry->d_inode,&filp);
 		}
+		set_device_busy(p->swap_device,0);
 	}
 	dput(dentry);
 
@@ -531,6 +532,10 @@
 
 	if (S_ISBLK(swap_dentry->d_inode->i_mode)) {
 		p->swap_device = swap_dentry->d_inode->i_rdev;
+		if(is_device_busy(p->swap_device)) {
+			error = -EBUSY;
+			goto bad_swap;
+		}
 		set_blocksize(p->swap_device, PAGE_SIZE);
 		
 		filp.f_dentry = swap_dentry;
@@ -686,6 +691,8 @@
 		swap_info[prev].next = p - swap_info;
 	}
 	error = 0;
+	if(p->swap_device != 0)
+		set_device_busy(p->swap_device,1);
 	goto out;
 bad_swap:
 	if(filp.f_op && filp.f_op->release)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Swap Questions (includes possible bug) - swapfile.c / swap.c
       [not found] <Pine.LNX.4.03.9905111114210.19954-100000@baltimore.wwaves.com>
@ 1999-05-11 21:30 ` Rik van Riel
  0 siblings, 0 replies; 4+ messages in thread
From: Rik van Riel @ 1999-05-11 21:30 UTC (permalink / raw)
  To: Joseph Pranevich; +Cc: Linux Kernel, Linux MM

On Tue, 11 May 1999, Joseph Pranevich wrote:

> I've been gradually sifting my way through the kernel source and I
> have a few minor questions about memory management.

linux-mm@kvack.org	(majordomo-managed)
http://www.linux.eu.org/Linux-MM/

> 1) swap.c : page clustering?

> 	else
> 		page_cluster = 4;
> 
> This is fine, but wouldn't it make sense to generalize this, or is
> the benifit not as great with larger amounts of ram?

The swapOUT clustering is only done to a maximum of 32 (2^5)
pages, so it doesn't make much sense to read in more pages
(which are probably unrelated to the current process).

For mmap() reading we might want to switch to a smarter
algorithm though. Not with reading in more pages, but with
reading in the _next_ area while the program is still busy
processing this one. The idea is to have all data in memory
just before the process needs it :)


> 2) swapfile.c : sys_swapon() question 1
> 
> I'm unable to figure out exactly what this code is supposed to be
> doing. Can someone help me out here? I don't understand why we set
> the blocksize twice or what the funniness is with "filp"
> 
> 		p->swap_device = swap_dentry->d_inode->i_rdev;
> 		set_blocksize(p->swap_device, PAGE_SIZE);

We do I/O on this device in chunks of PAGE_SIZE.

> 		filp.f_dentry = swap_dentry;
> 		filp.f_mode = 3; /* read write */

Of course, we want to have our swap device read-write and we
mark it with a magic number so no harm will come to it...

> 		set_blocksize(p->swap_device, PAGE_SIZE);

Hmm, haven't we seen this one before? Stephen?


> I do apologise for the many questions, I'm just trying to get a
> feel for the swapping subsystem. I apologise if this is already
> documented someplace.

AFAIK it's not yet documented. I'd really appreciate it
if you could do that and send me the docs for inclusion
on the Linux-MM site...

cheers,

Rik -- Open Source: you deserve to be in control of your data.
+-------------------------------------------------------------------+
| Le Reseau netwerksystemen BV:               http://www.reseau.nl/ |
| Linux Memory Management site:   http://www.linux.eu.org/Linux-MM/ |
| Nederlandse Linux documentatie:          http://www.nl.linux.org/ |
+-------------------------------------------------------------------+

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1999-05-12 19:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-05-12 10:30 Swap Questions (includes possible bug) - swapfile.c / swap.c Manfred Spraul
1999-05-12 18:36 ` Stephen C. Tweedie
1999-05-12 19:45   ` Manfred Spraul
     [not found] <Pine.LNX.4.03.9905111114210.19954-100000@baltimore.wwaves.com>
1999-05-11 21:30 ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox