On Sat, Mar 31, 2012 at 2:44 AM, Arnd Bergmann wrote: > We've had a discussion in the Linaro storage team (Saugata, Venkat and me, > with Luca joining in on the discussion) about swapping to flash based media > such as eMMC. This is a summary of what we found and what we think should > be done. If people agree that this is a good idea, we can start working > on it. > > The basic problem is that Linux without swap is sort of crippled and some > things either don't work at all (hibernate) or not as efficient as they > should (e.g. tmpfs). At the same time, the swap code seems to be rather > inappropriate for the algorithms used in most flash media today, causing > system performance to suffer drastically, and wearing out the flash > hardware > much faster than necessary. In order to change that, we would be > implementing the following changes: > > 1) Try to swap out multiple pages at once, in a single write request. My > reading of the current code is that we always send pages one by one to > the swap device, while most flash devices have an optimum write size of > 32 or 64 kb and some require an alignment of more than a page. Ideally > we would try to write an aligned 64 kb block all the time. Writing aligned > 64 kb chunks often gives us ten times the throughput of linear 4kb writes, > and going beyond 64 kb usually does not give any better performance. > It does make sense. I think we can batch will-be-swapped-out pages in shrink_page_list if they are located by contiguous swap slots. > 2) Make variable sized swap clusters. Right now, the swap space is > organized in clusters of 256 pages (1MB), which is less than the typical > erase block size of 4 or 8 MB. We should try to make the swap cluster > aligned to erase blocks and have the size match to avoid garbage collection > in the drive. The cluster size would typically be set by mkswap as a new > option and interpreted at swapon time. > If we can find such big contiguous swap slots easily, it would be good. But I am not sure how often we can get such big slots. And maybe we have to improve search method for getting such big empty cluster. > > 3) As Luca points out, some eMMC media would benefit significantly from > having discard requests issued for every page that gets freed from > the swap cache, rather than at the time just before we reuse a swap > cluster. This would probably have to become a configurable option > as well, to avoid the overhead of sending the discard requests on > media that don't benefit from this. > It's opposite of 2). I don't know how many there are such eMMC media. Normally, discard per page isn't useful on most eMMC media. I am not sure we have to implement per-page discard for such minor devices with increasing code complexity due to locking issue. > > Does this all sound appropriate for the Linux memory management people? > > Also, does this sound useful to the Android developers? Would you > start using swap if we make it perform well and not destroy the drives? > > Finally, does this plan match up with the capabilities of the > various eMMC devices? I know more about SD and USB devices and > I'm quite convinced that it would help there, but eMMC can be > more like an SSD in some ways, and the current code should be fine > for real SSDs. > > Arnd > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Fight unfair telecom internet charges in Canada: sign > http://stopthemeter.ca/ > Don't email: email@kvack.org > -- Kind regards, Minchan Kim