I have two tables, a bit simplified, one represents a temperature meter and the other represents reading values from the temperature meter. Something like this
TemperatureMeter
--
int ID (PK)
TemperatureMeterReading
--
int TemperatureMeterID (FK)
float ReadingValue
DateTime ReadingDate
In TemperatureMeterReading, all three columns make out the PK, so they are a composite unique index.
Most of my questions on the TemperatureMeterReading table will look something like this
SELECT *
FROM TemperatureMeterReading
WHERE TemperatureMeterID = [SomeInt]
AND ReadingDate > [FromDate]
AND ReadingDate < [ToDate]
The TemperatureMeterReading table will contain LARGE amounts of rows (hondreds of milions), so configuring the indexes correct will be of great value. But the more indexes I add, the larger the DB-file will grow...
So' my question is: Will the PK-index be of any help for my type of question? If not, what kind of other index would you recommend me to add?
Regards Andreas Brosten
Here are my thoughts and some conclusions.
Unless you take multiple reading simulataneously from the same meter I can see no reason for including the reading in the unique key. If you do insist on all three then for reasons below I would put date before reading.
When you say simplified I assume that there are other columns that you are not telling us about? In general I would recommend not using * in a production select but specifically specifying the columns. This reduces the amount of data transferred and also can avoid maintenance problems if additional columns are added etc. (or cause the problem to be immediately obvious if a required column is deleted).
Are you clustering on these primary keys? If so then assuming you are inserting reading as they are taken (or in batches of sequential readings) then you have insertions points in the index for each meter (and as currently reading is before date you are not simply inserting an increasing set but are jumping around depending upon the reading (it is sorting the index by reading before date). This is why you should have date before reading in the index then the insertion is just a chain for each meter (and should consider removing reading).
As your main query is the readings for a single meter (over a date range). Putting the meterID first in the index is good as this restricts the IO needed to perform the search. Also this means it can easily be used to provide the foreign key join with the meter table. Also this suggests that clustering on this index will group the actual record data for this style of query as well - further reducing IO.
If you place more fields in the reading table but only wish to query the meter, date, and reading most of the time then it might be worth using the new in SQL Server 2005 INCLUDE syntax to include the reading value in the primary index (as this will allow queries to be completely resolved just using the index without touching the leaf data pages). It is however worth doing it without first and then adding it to see the difference as it does increase the index size and may not be worth the small saving. It is less important on clustered indexes as those are already close to the leaf pages, and also less useful where the data required is most or all of the record.
IN CONCLUSION
Order the primary index on the reading table Meter, Date, (Reading - if you still want to put it in the index).
If there are other columns on the reading table (and you only want to select meter, date, and reading) then you might try adding Reading as an included column in the index (rather than an index column). Otherwise clustering on the index will provide the record data with the minimum IO.
No comments:
Post a Comment