Wednesday, July 24, 2013

SQL Server Space Allocation

Space Layout

In my last post I talked about how SQL Server uses Windows to store data in files and how you have to manage those files to maintain database performance.  In this post I’ll cover how SQL Server manages the space within the Windows files that it uses to store data.

SQL Server divides its storage space into 8192 byte (8KB) pages.  Having everything stored in pages of the same size makes space management much more efficient.  When SQL Server needs to store some new data it looks for a page that isn’t in use.  Because all the free spaces are the same size, any free page will do. 

If a table went to the SQL OS every time it needed a page, the SQL OS would soon be overwhelmed with requests.  For this reason, the SQL OS allocates storage in 64KB chunks called extents.  The table logic gets 8 pages at a time from the SQL OS file system so it doesn’t have to go back for more space until it has filled up 8 pages.

Keeping in mind that SQL Server was first developed when a gigabyte was a lot of disk; the SQL OS has a way to avoid allocating a whole 64KB of space to a table that may have only a few rows.  The first 8 pages in a table are allocated one at a time from Shared Extents.  The 8 pages in a Shared Extent are allocated one at a time instead of all being allocated to the same SQL Server object.  This means that to populate the first 8 pages of a table SQL Server has to make 8 times as many calls to allocate space as it does for the next 8 pages.  Keep this in mind when we discuss allocation maps next.

Allocation Maps

SQL Server keeps track of which extents and pages have been allocated using allocation maps.  The first allocation map is the Global Allocation Map (GAM).  Each bit in the GAM maps to an extent.  A 0 in the GAM indicates that an extent is available and a 1 indicates that it has been allocated.  Allocating an extent is just a matter of finding a bit in the map that is set to 0 and setting it to 1.  Obviously, the GAM must be locked when this is happening so that two threads don’t allocate the same extent.  If enough threads are trying to allocate space at the same time, the lock on the GAM can be a bottleneck.  There are about 64k bits in a GAM with each bit representing 64KB of disk space so a GAM can handle allocations for 4GB of storage.  The first GAM in a SQL Server data file is on page 2 and the subsequent GAMs occur every 4GB after that.

Page 3 in a SQL Server file and every 4GB after that is an SGAM page – Shared GAM.  A 1 bit in the SGAM means the corresponding page is a shared extent with pages available to be allocated.  A 0 bit means either the page is a uniform extent or it is a shared extent with no pages available.

The final part of the allocation picture is the Page Free Space (PFS) page.  A PFS page has one Byte for each page in its map.  Since each byte represents a page, there must be a PFS page every 8088 pages in the file.  The first PFS page is page 1 of the file.  The PFS byte corresponding to a page indicates whether the page is allocated and if the page is used in a blob or heap, the rest of the byte indicates how full the page is.  The fullness information is used to find pages with available space.  Page fullness information isn’t required for clustered or non-clustered indexes because an insert has to happen at a particular page whether there is room in the page or not (if there isn’t room, the data in the page is spread over the current page and a newly allocated page – a page split).

Allocating a page in a shared extent can be pretty involved.  First the SGAM is searched for a shared extent with pages free.  If there isn’t one, a page is allocated from the GAM (bit set to 1) and turned into a shared extent (1 bit set in the SGAM).  The corresponding PFS must then be updated to indicate that the page has been allocated and then the page number is returned to the task that needed the page.  The SGAM has to be locked during this process to keep another thread from allocating the same shared page.

Allocation Contention Issues

The locks on GAM, SGAM and PFS pages can be a bottleneck if many SQL Server threads are trying to allocate space at the same time.  This contention for the allocation locks is especially acute in tempdb because a large number of temporary tables are created and deleted and most temp tables are small enough that they mainly use pages from shared extents.  It also doesn’t help that all the databases in a SQL Server instance share the same tempdb.  There are a few ways to address this:

·         Starting in SQL Server 2005, temp tables aren’t dropped when they go out of scope.  Instead they are trimmed down to the first 2 pages and recycled so the next request for the temp table can use the storage is already allocated.  This reduces the number of allocations required for temp tables so the contention is reduced.

·         Trace flag 1118 will change the allocation algorithm so that all allocations are for uniform extents.  Because shared extents aren’t used, the SGAM is significantly reduced as a source of contention (there are still things that use mixed extents so the SGAM contention isn’t totally eliminated)


·         The earliest fix for this issue was to create a tempdb data file for every processor that the server has.  For example if you have an 8 core server you would create 8 tempdb data files.  Each file has its own set of GAM and SGAM pages so the probability of contention on a GAM or SGAM is greatly reduced.  Two things have happened to modify this recommendation – first the fixes we already mentioned were implemented so the amount of contention is significantly reduced and second modern servers can have 80 processors or more so managing that many data files incurs more overhead than the allocation contention did.  The current recommendation is to create 1/3 to ½ of a data file for each processor – for example for a 12 core server create 4 to 6 files.  The total number of files created shouldn’t be more than 8 because managing more files than that incurs a lot of overhead.

·         I usually use all three approaches in a heavily loaded machine.  Even though the problem has been addressed in the code, each new version of SQL Server seems to put more load on tempdb so reducing contention is still necessary.

While tempdb usually experiences more allocation contention than any other database, normal databases aren’t immune to contention.  I generally create at least three identical files for each file group to reduce contention.  Make sure that the files are the same size because SQL Server uses an algorithm called proportional fill for it files so if one of the files is twice as big as the others, it will get two allocations for every one allocation in the other files.