* Update shared mappings
@ 1998-11-20 4:10 Zlatko Calusic
1998-11-30 13:52 ` Stephen C. Tweedie
0 siblings, 1 reply; 10+ messages in thread
From: Zlatko Calusic @ 1998-11-20 4:10 UTC (permalink / raw)
To: Linux-MM List
Should this patch be applied to kernel?
Index: 129.2/mm/filemap.c
--- 129.2/mm/filemap.c Thu, 19 Nov 1998 18:20:34 +0100 zcalusic (linux-2.1/y/b/29_filemap.c 1.2.4.1.1.1.1.1 644)
+++ 129.3/mm/filemap.c Fri, 20 Nov 1998 05:07:24 +0100 zcalusic (linux-2.1/y/b/29_filemap.c 1.2.4.1.1.1.1.2 644)
@@ -5,6 +5,10 @@
*/
/*
+ * update_shared_mappings(), 1998 Andrea Arcangeli
+ */
+
+/*
* This file handles the generic file mmap semantics used by
* most "normal" filesystems (but you don't /have/ to use this:
* the NFS filesystem used to do this differently, for example)
@@ -1216,6 +1220,75 @@
return mk_pte(page,vma->vm_page_prot);
}
+static void update_one_shared_mapping(struct vm_area_struct *shared,
+ unsigned long address, pte_t orig_pte)
+{
+ pgd_t *pgd;
+ pmd_t *pmd;
+ pte_t *pte;
+ struct semaphore * mmap_sem = &shared->vm_mm->mmap_sem;
+
+ down(mmap_sem);
+
+ pgd = pgd_offset(shared->vm_mm, address);
+ if (pgd_none(*pgd))
+ goto out;
+ if (pgd_bad(*pgd)) {
+ printk(KERN_ERR "update_shared_mappings: bad pgd (%08lx)\n",
+ pgd_val(*pgd));
+ pgd_clear(pgd);
+ goto out;
+ }
+
+ pmd = pmd_offset(pgd, address);
+ if (pmd_none(*pmd))
+ goto out;
+ if (pmd_bad(*pmd))
+ {
+ printk(KERN_ERR "update_shared_mappings: bad pmd (%08lx)\n",
+ pmd_val(*pmd));
+ pmd_clear(pmd);
+ goto out;
+ }
+
+ pte = pte_offset(pmd, address);
+
+ if (pte_val(pte_mkclean(pte_mkyoung(*pte))) !=
+ pte_val(pte_mkclean(pte_mkyoung(orig_pte))))
+ goto out;
+
+ flush_page_to_ram(page(pte));
+ flush_cache_page(shared, address);
+ set_pte(pte, pte_mkclean(*pte));
+ flush_tlb_page(shared, address);
+
+ out:
+ up(mmap_sem);
+}
+
+static void update_shared_mappings(struct vm_area_struct *this,
+ unsigned long address,
+ pte_t orig_pte)
+{
+ if (this->vm_flags & VM_SHARED)
+ {
+ struct file * filp = this->vm_file;
+ if (filp)
+ {
+ struct inode * inode = filp->f_dentry->d_inode;
+ struct vm_area_struct * shared;
+
+ for (shared = inode->i_mmap; shared;
+ shared = shared->vm_next_share)
+ {
+ if (shared == this)
+ continue;
+ update_one_shared_mapping(shared, address,
+ orig_pte);
+ }
+ }
+ }
+}
static inline int filemap_sync_pte(pte_t * ptep, struct vm_area_struct *vma,
unsigned long address, unsigned int flags)
@@ -1233,6 +1306,7 @@
flush_cache_page(vma, address);
set_pte(ptep, pte_mkclean(pte));
flush_tlb_page(vma, address);
+ update_shared_mappings(vma, address, pte);
page = pte_page(pte);
atomic_inc(&mem_map[MAP_NR(page)].count);
} else {
--
Posted by Zlatko Calusic E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
If you're not confused, you're not paying attention.
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Update shared mappings
1998-11-20 4:10 Update shared mappings Zlatko Calusic
@ 1998-11-30 13:52 ` Stephen C. Tweedie
1998-11-30 15:19 ` Zlatko Calusic
1998-11-30 20:02 ` Andrea Arcangeli
0 siblings, 2 replies; 10+ messages in thread
From: Stephen C. Tweedie @ 1998-11-30 13:52 UTC (permalink / raw)
To: Andrea Arcangeli, Zlatko.Calusic; +Cc: Linux-MM List, Andi Kleen
Hi,
On 20 Nov 1998 05:10:01 +0100, Zlatko Calusic <Zlatko.Calusic@CARNet.hr>
said:
> Should this patch be applied to kernel? [Andrea's
> update_shared_mappings patch]
No.
> Index: 129.2/mm/filemap.c
> --- 129.2/mm/filemap.c Thu, 19 Nov 1998 18:20:34 +0100 zcalusic (linux-2.1/y/b/29_filemap.c 1.2.4.1.1.1.1.1 644)
> +++ 129.3/mm/filemap.c Fri, 20 Nov 1998 05:07:24 +0100 zcalusic (linux-2.1/y/b/29_filemap.c 1.2.4.1.1.1.1.2 644)
> @@ -5,6 +5,10 @@
> */
> +static void update_one_shared_mapping(struct vm_area_struct *shared,
> + unsigned long address, pte_t orig_pte)
> +{
> + pgd_t *pgd;
> + pmd_t *pmd;
> + pte_t *pte;
> + struct semaphore * mmap_sem = &shared->vm_mm->mmap_sem;
> +
> + down(mmap_sem);
The mmap_semaphore is already taken out _much_ earlier on in msync(), or
the vm_area_struct can be destroyed by another thread. Is this patch
tested? Won't we deadlock immediately on doing this extra down()
operation?
The only reason that this patch works in its current state is that
exit_mmap() skips the down(&mm->mmap_sem). It can safely do so only
because if we are exiting the mmap, we know we are the last thread and
so no other thread can be playing games with us. So, exit_mmap()
doesn't deadlock, but a sys_msync() on the region looks as if it will.
Other than that, it looks fine. One other thing occurs to me, though:
it would be easy enough to add a condition (atomic_read(&page->count) >
2) on this to disable the update-mappings call entirely if the page is
only mapped by one vma (which will be a very common case). We already
access the count field, so we are avoiding the cost of any extra cache
misses if we make this check.
Comments?
--Stephen
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Update shared mappings
1998-11-30 13:52 ` Stephen C. Tweedie
@ 1998-11-30 15:19 ` Zlatko Calusic
1998-11-30 20:02 ` Andrea Arcangeli
1 sibling, 0 replies; 10+ messages in thread
From: Zlatko Calusic @ 1998-11-30 15:19 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Andrea Arcangeli, Linux-MM List, Andi Kleen
"Stephen C. Tweedie" <sct@redhat.com> writes:
> Hi,
>
> On 20 Nov 1998 05:10:01 +0100, Zlatko Calusic <Zlatko.Calusic@CARNet.hr>
> said:
>
> > Should this patch be applied to kernel? [Andrea's
> > update_shared_mappings patch]
>
> No.
:)
> The mmap_semaphore is already taken out _much_ earlier on in msync(), or
> the vm_area_struct can be destroyed by another thread. Is this patch
> tested? Won't we deadlock immediately on doing this extra down()
> operation?
You're right. And that is the exact reason why I had locks with
StarOffice in down_sem().
Andrea already contacted me, and I think this now concludes our
conversation regarding that problem. :)
>
> The only reason that this patch works in its current state is that
> exit_mmap() skips the down(&mm->mmap_sem). It can safely do so only
> because if we are exiting the mmap, we know we are the last thread and
> so no other thread can be playing games with us. So, exit_mmap()
> doesn't deadlock, but a sys_msync() on the region looks as if it will.
>
> Other than that, it looks fine. One other thing occurs to me, though:
> it would be easy enough to add a condition (atomic_read(&page->count) >
> 2) on this to disable the update-mappings call entirely if the page is
> only mapped by one vma (which will be a very common case). We already
> access the count field, so we are avoiding the cost of any extra cache
> misses if we make this check.
>
> Comments?
>
You're probably right.
Hopefully, Andrea will resend his patch with necessary fixes, so after
testing it gets included in kernel.
Regards,
--
Posted by Zlatko Calusic E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
Life would be easier if I had the source code.
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Update shared mappings
1998-11-30 13:52 ` Stephen C. Tweedie
1998-11-30 15:19 ` Zlatko Calusic
@ 1998-11-30 20:02 ` Andrea Arcangeli
1998-12-01 15:03 ` Stephen C. Tweedie
1 sibling, 1 reply; 10+ messages in thread
From: Andrea Arcangeli @ 1998-11-30 20:02 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Zlatko.Calusic, Linux-MM List, Andi Kleen
On Mon, 30 Nov 1998, Stephen C. Tweedie wrote:
>The mmap_semaphore is already taken out _much_ earlier on in msync(), or
>the vm_area_struct can be destroyed by another thread. Is this patch
Infact the down you see is done on a different mmstruct. It has to be a
diffent mm struct.
>tested? Won't we deadlock immediately on doing this extra down()
Sure.
>operation?
No.
>The only reason that this patch works in its current state is that
>exit_mmap() skips the down(&mm->mmap_sem). It can safely do so only
I guess you have not read the patch well...
>because if we are exiting the mmap, we know we are the last thread and
>so no other thread can be playing games with us. So, exit_mmap()
>doesn't deadlock, but a sys_msync() on the region looks as if it will.
???
I reproduced the StarOffice deadlock here also without my patch. And the
guy that said me that my patch was deadlocking staroffice then said me
that now staroffice was working also with my patch applyed...
Stephen I can' t see the obvious deadlocking you are tolking about. The
mmap semphore can be held for many processes but not two times for the
same one and never for the current one. The code should work fine also
with CLONE_VM. I have no pending bug reports btw.
I am using the patch from day 0 and I never deadlocked. The only proggy I
used specifically to try my update_shared_mappings() code is this though:
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
/* file size, should be half of the size of the physical memory */
#define FILESIZE (5 * 1024 * 1024)
int main(void)
{
char *ptr;
int fd, i;
char c = 'A';
pid_t pid;
if ((fd = open("foo", O_RDWR | O_CREAT | O_EXCL, 0666)) == -1) {
perror("open");
exit(1);
}
lseek(fd, FILESIZE - 1, SEEK_SET);
/* write one byte to extend the file */
write(fd, &fd, 1);
ptr = mmap(0, FILESIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (ptr == NULL) {
perror("mmap");
exit(1);
}
/* dirty every page in the mapping */
for (i = 0; i < FILESIZE; i += 4096)
ptr[i] = c;
while (1) {
if ((pid = fork())) { /* parent, wait */
waitpid(pid, NULL, 0);
} else { /* child, exec away */
msync(ptr, FILESIZE, MS_SYNC);
}
sleep(5);
}
}
Let me know if the code still need fixing. A proggy that trigger the bug
would be helpful btw ;)
Andrea Arcangeli
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Update shared mappings
1998-11-30 20:02 ` Andrea Arcangeli
@ 1998-12-01 15:03 ` Stephen C. Tweedie
1998-12-01 17:48 ` Andrea Arcangeli
0 siblings, 1 reply; 10+ messages in thread
From: Stephen C. Tweedie @ 1998-12-01 15:03 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Stephen C. Tweedie, Zlatko.Calusic, Linux-MM List, Andi Kleen
Hi,
On Mon, 30 Nov 1998 21:02:05 +0100 (CET), Andrea Arcangeli
<andrea@e-mind.com> said:
>> The only reason that this patch works in its current state is that
>> exit_mmap() skips the down(&mm->mmap_sem). It can safely do so only
> I guess you have not read the patch well...
I think I have: I can reliably deadlock machines with this patch.
> Stephen I can' t see the obvious deadlocking you are tolking about. The
> mmap semphore can be held for many processes but not two times for the
> same one and never for the current one. The code should work fine also
> with CLONE_VM. I have no pending bug reports btw.
No. When you scan vmas to update, you down() the semaphore on their
mm. You skip the current vma, sure, but it is quite possible to have
the same inode mapped more than once in a mm. If that happens, then
you *will* deadlock (I've got a console window open right now on a test
machine which is deadlocked in msync).
> Let me know if the code still need fixing. A proggy that trigger the bug
> would be helpful btw ;)
OK, see below.
--Stephen
----------------------------------------------------------------
/*
* wmem.c
*
* Test msync of shared write mappings
*
* (C) Stephen C. Tweedie, 1998
*/
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <signal.h>
#include <time.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <sys/fcntl.h>
void try(const char *where, int why)
{
if (why) {
perror(where);
exit(1);
}
}
int pagesize;
int main(int argc, char *argv[])
{
int fd;
int err;
char * map1, * map2;
pagesize = getpagesize();
fd = open("/tmp/testfile", O_RDWR|O_CREAT, 0666);
try ("open", fd < 0);
ftruncate(fd, pagesize);
map1 = mmap(0, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
try ("mmap", map1 == MAP_FAILED);
map2 = mmap(0, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
try ("mmap", map2 == MAP_FAILED);
map1[0] = 0;
map1[100] = 0;
err = msync(map1, pagesize, 0);
try ("msync", err < 0);
err = msync(map2, pagesize, 0);
try ("msync", err < 0);
exit(0);
}
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Update shared mappings
1998-12-01 15:03 ` Stephen C. Tweedie
@ 1998-12-01 17:48 ` Andrea Arcangeli
1998-12-02 16:21 ` Stephen C. Tweedie
0 siblings, 1 reply; 10+ messages in thread
From: Andrea Arcangeli @ 1998-12-01 17:48 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Zlatko.Calusic, Linux-MM List, Andi Kleen
On Tue, 1 Dec 1998, Stephen C. Tweedie wrote:
>I think I have: I can reliably deadlock machines with this patch.
s/patch/proggy/ side effect of lots of kernel developing ;)
andrea@dragon:/tmp$ egcc -O2 shared_map.c
andrea@dragon:/tmp$ ./a.out
andrea@dragon:/tmp$
No deadlock at all. Are you sure you are using my _latest_ patch in
arca-39? Some weeks ago I fixed this:
static void update_shared_mappings(struct vm_area_struct *this,
unsigned long address,
pte_t orig_pte)
{
if (this->vm_flags & VM_SHARED)
{
struct file * filp = this->vm_file;
if (filp)
{
struct inode * inode = filp->f_dentry->d_inode;
struct vm_area_struct * shared;
for (shared = inode->i_mmap; shared;
shared = shared->vm_next_share)
{
if (shared->vm_mm == this->vm_mm)
^^^^^ ^^^^^
continue;
update_one_shared_mapping(shared, address,
orig_pte);
}
}
}
}
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: Update shared mappings
1998-12-01 17:48 ` Andrea Arcangeli
@ 1998-12-02 16:21 ` Stephen C. Tweedie
1998-12-02 18:32 ` Andrea Arcangeli
0 siblings, 1 reply; 10+ messages in thread
From: Stephen C. Tweedie @ 1998-12-02 16:21 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Stephen C. Tweedie, Zlatko.Calusic, Linux-MM List, Andi Kleen
Hi,
On Tue, 1 Dec 1998 18:48:44 +0100 (CET), Andrea Arcangeli
<andrea@e-mind.com> said:
> No deadlock at all. Are you sure you are using my _latest_ patch in
> arca-39? Some weeks ago I fixed this:
> if (shared->vm_mm == this->vm_mm)
> ^^^^^ ^^^^^
Ah right, I was working from the version Zlatko posted here a week or so
ago. This fix will indeed prevent the instant deadlock.
However, it is still susceptible to deadlock, because in msync() you
now hold the current mm semaphore while trying to take out somebody
else's mm semaphore. If you have two processes doing that to each other
(ie. two processes mapping the same file r/w and doing msyncs), then you
can most certainly still deadlock.
--Stephen
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Update shared mappings
1998-12-02 16:21 ` Stephen C. Tweedie
@ 1998-12-02 18:32 ` Andrea Arcangeli
1998-12-03 5:44 ` Eric W. Biederman
1998-12-03 11:56 ` Stephen C. Tweedie
0 siblings, 2 replies; 10+ messages in thread
From: Andrea Arcangeli @ 1998-12-02 18:32 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: Zlatko.Calusic, Linux-MM List, Andi Kleen
On Wed, 2 Dec 1998, Stephen C. Tweedie wrote:
>else's mm semaphore. If you have two processes doing that to each other
>(ie. two processes mapping the same file r/w and doing msyncs), then you
>can most certainly still deadlock.
The thing would be trivially fixable if it would exists a down_trylock()
that returns 0 if the semaphore was just held. I rejected now the
update_shared_mappings from my tree in the meantime though.
I have a question. Please consider only the UP case (as if linux would not
support SMP at all). Is it possible that while we are running inside
sys_msync() and another process has the mmap semaphore held?
Stephen I read some emails about a PG_dirty flag. Could you tell me some
more about that flag?
Andrea Arcangeli
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Update shared mappings
1998-12-02 18:32 ` Andrea Arcangeli
@ 1998-12-03 5:44 ` Eric W. Biederman
1998-12-03 11:56 ` Stephen C. Tweedie
1 sibling, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 1998-12-03 5:44 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Stephen C. Tweedie, Zlatko.Calusic, Linux-MM List, Andi Kleen
>>>>> "AA" == Andrea Arcangeli <andrea@e-mind.com> writes:
AA> On Wed, 2 Dec 1998, Stephen C. Tweedie wrote:
>> else's mm semaphore. If you have two processes doing that to each other
>> (ie. two processes mapping the same file r/w and doing msyncs), then you
>> can most certainly still deadlock.
AA> The thing would be trivially fixable if it would exists a down_trylock()
AA> that returns 0 if the semaphore was just held. I rejected now the
AA> update_shared_mappings from my tree in the meantime though.
AA> I have a question. Please consider only the UP case (as if linux would not
AA> support SMP at all). Is it possible that while we are running inside
AA> sys_msync() and another process has the mmap semaphore held?
AA> Stephen I read some emails about a PG_dirty flag. Could you tell me some
AA> more about that flag?
It used to be one of the flags defined for struct page. It doesn't exist currently.
But more precisely it refers to handling tracking dirty pages by page, instead
of where they got dirty.
This could be done using using PG_dirty and having shrink_mmap (or a similiar function)
write out the pages. The problem is that you lose all locality of reference in
the written pages. A simple linked list order by time the page gets dirty does
much better.
I have been working on this off and on for a while, but the code freeze went in
before I had anything I would be willing to integrate. Also my code focuses on the
requirements on the filesystem code, and not on the requirements of process pages, so
we need a little more work.
Currently I'm leaning towards adding a struct mapping, that would perform
some of the work of the current struct vm_area_struct, and hold a list
of the vm_area_structs, that implement that mapping.
inode->i_mmap would be replaced with inode->i_mappings which would
list the mappings.
For dirty page handling I think walking the page tables instead of the
all of memory would give us easier access to dirty bits, and remove
the need to do reverse page table entries. This assumes walking the
page tables won't be too expensive.
A process on a timer that kicks off periodically should be able to
handle updating all of the dirty bits, removing the dirty status from
the page tables, and then writing some dirty pages out to disk, before
it's good work is undone.
This idea needs to be explored with some actuall code, and isn't for
2.2 but with a little care hopefully we can have a working
implemenation op PG_dirty or similiar.
Eric
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Update shared mappings
1998-12-02 18:32 ` Andrea Arcangeli
1998-12-03 5:44 ` Eric W. Biederman
@ 1998-12-03 11:56 ` Stephen C. Tweedie
1 sibling, 0 replies; 10+ messages in thread
From: Stephen C. Tweedie @ 1998-12-03 11:56 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Stephen C. Tweedie, Zlatko.Calusic, Linux-MM List, Andi Kleen
Hi,
On Wed, 2 Dec 1998 19:32:56 +0100 (CET), Andrea Arcangeli
<andrea@e-mind.com> said:
> I have a question. Please consider only the UP case (as if linux would not
> support SMP at all). Is it possible that while we are running inside
> sys_msync() and another process has the mmap semaphore held?
No, because sys_msync() takes the mm semaphore first thing. Another
process _can_ hold the mm semaphore of a different vma on the same
region of the file, however.
> Stephen I read some emails about a PG_dirty flag. Could you tell me some
> more about that flag?
When it gets implemented it will have whatever semantics we choose to
give it. :)
--Stephen
--
This is a majordomo managed list. To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~1998-12-03 11:58 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-11-20 4:10 Update shared mappings Zlatko Calusic
1998-11-30 13:52 ` Stephen C. Tweedie
1998-11-30 15:19 ` Zlatko Calusic
1998-11-30 20:02 ` Andrea Arcangeli
1998-12-01 15:03 ` Stephen C. Tweedie
1998-12-01 17:48 ` Andrea Arcangeli
1998-12-02 16:21 ` Stephen C. Tweedie
1998-12-02 18:32 ` Andrea Arcangeli
1998-12-03 5:44 ` Eric W. Biederman
1998-12-03 11:56 ` Stephen C. Tweedie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox