Katmai, the code name for Microsoft's imminent SQL Server 2008 release,
comes from an Alaskan territory know for volcanoes, which may
not be the best symbol for a database. So far, however, Katmai
hasn't blown up on me. And the lower-profile Katmai seems like
a good follow-on to Yukon, the code name for the gigantic SQL Server 2005
release.
Building on the sweeping, enterprise-oriented improvements
in SQL Server 2005
(see review), Katmai sports very nice new features for large
deployments. Among the more touted attributes in the database
engine are data and backup compression, sparse columns, and
compressed and filtered indexes, all of which are geared to
saving storage space, as well as Change Data Capture, which
captures changes to production data in tables that can be used
to update a data warehouse.
These are just the tip of the iceberg, or volcano, and there
are of course many other new features -- such as policy-based
management -- that will appeal to large and small shops alike.
Every aspect of the product has been touched significantly.
More data, less storage
For starters, there are two types of data compression: row
and page. They do, in fact, compress data in different ways, so
it's important to understand the benefits of each, as well as
how they work. Row compression is true compression, whereby the
engine removes unused spaces at the ends of columns and, thus,
saves space. This is the same technique SQL Server already uses
for vardecimal compression; Microsoft has just expanded the use
to other data types.
Page compression does what's known as dictionary
compression, in that it normalizes the data on each page and
keeps a lookup pointer. This is essentially the same trick used
in Oracle Database 11g, which Oracle calls Oracle Advanced
Compression. Without getting too much into the pros and cons of
each, it's worth noting that SQL Server's page compression
includes the lower-level row compression. In other words, if
you have page compression turned on, you automatically get row
compression.
Microsoft has included a couple of stored procedures to help
you estimate both the level of savings you'll get with each
method before you compress, and how much expansion will result
if you uncompress the database later. This is an important and
really thoughtful feature because you need to know not only if
compression will be worth your time, but also if your disk can
handle the uncompressed data should you need to revert. Just
keep in mind that the procedures work on a small yet
statistically significant random sampling of the data. You
could get some bad estimates if the query happens to hit a poor
representation of your data.
Plus, the way Microsoft implements compression spares more
than storage resources. The data stays compressed in memory and
only gets decompressed when read, meaning that you can fit more
data pages into memory. This should save disk fetches, and the
CPU it takes to decompress will be far less expensive than the
disk seek would have been.
The sparse columns feature allows you to store null values
without taking up any physical space. If you have a large table
with a lot of null values in a column, you can waste ample disk
space keeping track of those nulls. Storing nulls in sparse
columns takes zero space, so your storage requirement goes way
down.
Find more out
here.