Star Schema Design Question
Posted: Thu Feb 03, 2011 9:54 pm
Hello,
I am new to OLAP, I hope this question makes sense. I am looking for advice on how to handle a situation where there are two related "measures" or "facts," of interest, one of which can not be added across many dimensions.
For example, suppose I am running a science experiment to estimate various insects populations in the wild. I send out field researchers to collect and identify insects in a variety of locations, and they come back for each location with the following data:
1) Insect species - quantity observed.
2) Location size in square miles.
So, for example I might have the following data:
Location, Species, Count, Area
Park, Ant, 100, 1
Park, Spider, 14, 1
Office Building, Fly, 8, .25
Beach, Flea, 2002, .5
Beach, Ant, 90, .5
For my research, I'm generally interested in:
1) The total number of bugs located.
2) The density of bugs: bugs per square mile in various locations.
And I want to be able to slice and dice this data by location and insect species, and possibly by area.
How would you design a schema to allow you to calculate the density of bugs? I keep running into the trouble that if "Insect Count" is my fact, then I don't know how to determine the area they were found in - as the two rows of data above for "Park" cover the same 1 square mile - the insect density is (100+14)/1 = 114 bugs per square mile, not (100+14)/(1+1) = 77 bugs per square mile.
I am new to OLAP, I hope this question makes sense. I am looking for advice on how to handle a situation where there are two related "measures" or "facts," of interest, one of which can not be added across many dimensions.
For example, suppose I am running a science experiment to estimate various insects populations in the wild. I send out field researchers to collect and identify insects in a variety of locations, and they come back for each location with the following data:
1) Insect species - quantity observed.
2) Location size in square miles.
So, for example I might have the following data:
Location, Species, Count, Area
Park, Ant, 100, 1
Park, Spider, 14, 1
Office Building, Fly, 8, .25
Beach, Flea, 2002, .5
Beach, Ant, 90, .5
For my research, I'm generally interested in:
1) The total number of bugs located.
2) The density of bugs: bugs per square mile in various locations.
And I want to be able to slice and dice this data by location and insect species, and possibly by area.
How would you design a schema to allow you to calculate the density of bugs? I keep running into the trouble that if "Insect Count" is my fact, then I don't know how to determine the area they were found in - as the two rows of data above for "Park" cover the same 1 square mile - the insect density is (100+14)/1 = 114 bugs per square mile, not (100+14)/(1+1) = 77 bugs per square mile.