SQL Server Storage
At its heart, SQL Server is just a place to keep data. There’s a lot of fancy logic for putting data
in quickly and finding the data you need quickly but bottom line it’s just a
bunch of database stored and a bunch of disks (or other storage). That means that the key to understanding and
maintaining SQL Server is understanding the way it stores data. There have been hundreds of articles, books,
posts, and tweets on this topic in the years of SQL Server’s existence so you
might well ask why I’m posting yet another one.
The answer is that in spite of all the available information, over half
of the SQL Server infrastructures I work on have made significant mistakes in
storage management so I am writing this so I have something to point to when I
run into DBAs who manage databases with storage issues. This isn’t intended to be a comprehensive
treatment of storage but rather a summary of the things I see problems with
most often.
Windows Filesystem
Since SQL Server storage uses Windows to manage the actual
storage, it’s necessary to talk a little bit about how Windows stores
data. Warning – this is an
over-simplified high level view of Windows.
Windows data is stored on volumes.
A volume may be a disk drive, a RAID array a LUN on a SAN but for our
purposes a volume is a bunch of bytes of storage Windows uses to keep files
on. There’s a directory that keeps track
of what files have been written to the volume and where they are stored. Ideally, each file occupies one contiguous
chunk of bytes but it’s common for a file to have its data spread across many
isolated chunks of storage. The chunks
of storage that aren’t being used by files are entered on a list of unallocated
chunks of storage called a free list.
When Windows needs space to write something it goes through
the free list until it finds a good place to write it. For example, if it needs to write 1MB it
might find an entry on the free list for a 1.5 MB chunk. It will then write the 1MB, put a pointer to
what it wrote in the directory and then put a .5MB entry back on the free
list. When a file is deleted, the space
not needed for the file is put back on the free list. It’s not hard to see that over time, the free
list becomes very large with pointers to many small chunks of storage. This has two impacts – Windows takes longer to
allocate new space because it has to search through a longer list to find it making
space allocation a bottleneck and Windows often can’t find a big enough chunk
of free space to write a file so the file gets written as small chunks spread
across the disk resulting is slow IO performance when reading or updating the
file. This can lead to major performance
problems so Windows has a defragmentation utility that moves data around so
files are in a single contiguous space as much as possible and free space in
consolidated into a few large chunks.
So what does all this mean for SQL Server? SQL Server tries to avoid fragmentation by
allocating all the space specified for a SQL Server file when the file is
created. This works as long as the
storage isn’t fragmented when the file is created so tip one is to defragment
the volume before creating the files for a database. This fragmentation avoidance strategy also
assumes that you create the files big enough to satisfy the databases’ need for
space for an extended period of time – maybe six months or a year. If you grow the files in small increments,
each increment is a separate space allocation for Windows so the disk volume
can become fragmented. I often see SQL
Server installations where the database files have been created with the
default size specifications. That means
the file starts at 2MB and grows in 1MB increments. A 1TB database grown with these settings
would have almost a million space allocations.
In addition to the significant impact that growing this many times has
on SQL Server which we will discuss later, this many small file growths will
badly fragment the Windows storage. This
leads to the frequently asked question “should I defragment the Windows volumes
where my databases are stored”? My
answer is that if you’re doing a good job of managing storage you probably
don’t need to but you should check for fragmentation of Windows storage
periodically and defragment if the fragmentation goes above 10 or 15%. Notice that I am talking about fragmentation
of the Windows storage that Windows uses to store SQL Server files. We’ll talk about SQL Server index
fragmentation later and that’s a very different thing.
SQL Server Filesystem
One of the least understood parts of SQL Server storage is
how the Windows file system relates to the SQL Server file system. When you are setting up Windows storage you
first partition a disk drive (or RAID array) to create disk volumes. You then format these volumes which creates
a Windows file system on the volume.
Once the Windows file system is created you can create and delete files
to your heart’s content. When you create
a file the amount of free space in the volume decreases and when you delete a
file the amount of free space increases.
The size of the volume doesn’t change.
This makes perfect sense to all storage administrators but when you try
to apply the same logic to the SQL Server filesystem, they often refuse to
believe it.
When you use SQL Server to create a SQL Server file, SQL
Server will ask Windows for a file of the size specified and build a SQL Server
filesystem in the file. When you create
SQL Server objects – tables, indexes, procedures, etc. – the amount of free
space in the file decreases. As you add
rows to the table, the amount of free space decreases more. When you delete rows from a table or drop a
table or index, the amount of free space increases. This works just like files in a Windows file
system. Just as the disk volume doesn’t
grow and shrink when you add and remove files, the SQL Server files don’t grow
and shrink when you add and delete SQL data.
Expecting a SQL Server file to shrink when you delete rows is like
expecting a Windows volume to shrink when you delete files. No storage admin would think a LUN that
shrinks every time you delete a file was a good idea but I very seldom run into
one that thinks the same should apply to SQL Server files.
At this point I’m sure some of you are shouting at your
screen shouting “but wait, SQL Server files do grow!” While this is true, it
wasn’t always true and it isn’t true for many other databases. In the early days of SQL Server, data files
worked like Windows volumes. When you
used up all the space, you couldn’t write anymore. One of the design principles of SQL Server
7.0 was to reduce the amount of routine work you need to do to keep SQL Server
operating. One of those routine
operations was expanding the SQL Server filesystem when it runs low on
space. People often fail to notice it’s
running low on space so it eventually runs out completely and SQL Server just
stops until the DBA gets out of bed at 3:00 Sunday morning and expands the SQL
Server files. To avoid this, the SQL
Server team invented “autogrow”. If
autogrow is turned on, when SQL Server runs out of space in a file it
automatically grows the file so processing can continue. While this is cool because it lets the DBA
sleep at night, it should not be the normal way for SQL Server files to
grow. While autogrow is growing the
file, all updates that require space to execute have to stop until the growth
is complete. That means you don’t want
the growth increment too large so the transactions don’t have to wait too long. On the other hand if the growth increment is
pretty small, growth will happen pretty often.
I worked with a customer a couple months ago where we calculated that
one of their files had been growing 15-20 times a day for two years. Autogrowth often looks like a hang to a user
so a database that hangs a couple times an hour is not a good thing. The right thing to do is to stay ahead of the
growth requirements and manually grow the file at a convenient time every few
months. I look at any autogrowth as a
failure of the DBA to manage space correctly.
Failing every couple hours for months at a time doesn’t look good on the
performance review. Of course some SQL
Server installations don’t have a DBA so if this case making sure the file
sizes are set to reasonable values and the increment is big enough to so growth
events are fairly rare is probably the best option.
I guess for completeness I have to say that it’s possible to
shrink a SQL Server file but there are very few situations where shrinking is
appropriate. Shrinking is pretty
expensive because space can only be freed from the end of a Windows file so the
shrink function must move data from the end of the file into other parts of the
file where there is free space. This not
only takes a lot of IO operations to do, but can badly fragment the SQL data in
the file. Because of the bad side-effects, I avoid
shrinking files unless I’m sure that data has gone away permanently and the
space won’t be needed again. For
example, if a company moves to a new ERP system and you just deleted the entire
old inventory, it might be acceptable to shrink the inventory files. I often compare shrinking files to Amazon building
a new warehouse to handle the Christmas rush and then tearing it down in
January because the rush is over. This
makes sense only if Christmas isn’t coming next year – or you plan on shutting
down the company before next Christmas.
My favorite was the DBA who did a shrink on all the files before he did
the backups because he was short of backup space – SQL Server doesn’t backup
unallocated space so this really didn’t save anything.
Well, this is probably a good stopping place for now. Next time we’ll talk about how the SQL Server
file system stores data and how knowing that will help you improve your
database designs.