Thursday, November 20, 2008

Where is the memory gone?

So you just installed Solaris 10 U6. Pretty exciting this zfs boot, no?

But you ran vmstat (or installed top on the system) and noticed you had 200MB free. Worse, you tried starting your Oracle database with a large SGA and it failed because it couldn't allocate the memory. What? This machine has 16GB! and barely anything running, I hear you scream. Where is the memory gone?

# echo "::memstat" | mdb -k
Page Summary Pages MB %Tot
------------ ---------------- ---------------- ----
Kernel 1717128 13415 83%
Anon 238964 1866 12%
Exec and libs 23450 183 1%
Page cache 19039 148 1%
Free (cachelist) 19243 150 1%
Free (freelist) 40453 316 2%

Total 2058277 16080
Physical 2054336 16049

The kernel is using 13GB?? Yes. You are hitting a default setting that's been around since ZFS was introduced to Solaris 10. It is the ZFS ARC. But dont complain too much because it is now easy to fix. When we first hit this issue way back when, we had to use mdb to set values at boot time, you couldn't just set something in the /etc/system file.

So what is ZFS ARC? In simple terms, it is memory that ZFS uses for cache. The default is for the cache to grow up to total memory - 1GB. The problem is that although it is supposed to free up memory when applications in user space request memory, in practice, it doesn't do this fast enough. Plus you end up with fragmented memory which is a huge problem for SHM (part of the SGA under Oracle).

In general, I reserve 2GB for the os and my apps. If I run Oracle and / or Sun App server, i'd also set aside the SGA and / or the java memory. Add it all up. Let's say you need 4GB total you dont want touched by ZFS, and you have 8GB, then you would set the maximum size for the ARC to be 4GB.

What if you dont run Oracle and the like? Still, if you run a graphical desktop or a Sunray server on your machine, leave 2GB untouched, so if you have just 4GB total, set the ARC to 2GB.

How?

edit /etc/system and add:

* Restrict ZFS ARC to 8GB
set zfs:zfs_arc_max = 8000000000


Now this is actually less than 8GB, but it is easier to read 8 followed by 9 zeros than 8 x1024x1024x1024. So for 2GB: 2000000000 and for 4GB: 4000000000

This will require a reboot.

Once rebooted you can verify it took the change by executing:

# kstat -m zfs
module: zfs instance: 0
name: arcstats class: misc
c 8000000000
c_max 8000000000
c_min 1000000000
...

1 comment:

Michael Martin said...

Nice post. Thanks for the information. I'm hoping it solves my morning out of memory issues. I referenced your post in my blog.