Page 1 of 1

Star Schema Design Question

Posted: Thu Feb 03, 2011 9:54 pm
by mhayward
Hello,
I am new to OLAP, I hope this question makes sense. I am looking for advice on how to handle a situation where there are two related "measures" or "facts," of interest, one of which can not be added across many dimensions.

For example, suppose I am running a science experiment to estimate various insects populations in the wild. I send out field researchers to collect and identify insects in a variety of locations, and they come back for each location with the following data:
1) Insect species - quantity observed.
2) Location size in square miles.

So, for example I might have the following data:
Location, Species, Count, Area
Park, Ant, 100, 1
Park, Spider, 14, 1
Office Building, Fly, 8, .25
Beach, Flea, 2002, .5
Beach, Ant, 90, .5

For my research, I'm generally interested in:
1) The total number of bugs located.
2) The density of bugs: bugs per square mile in various locations.
And I want to be able to slice and dice this data by location and insect species, and possibly by area.

How would you design a schema to allow you to calculate the density of bugs? I keep running into the trouble that if "Insect Count" is my fact, then I don't know how to determine the area they were found in - as the two rows of data above for "Park" cover the same 1 square mile - the insect density is (100+14)/1 = 114 bugs per square mile, not (100+14)/(1+1) = 77 bugs per square mile.

Re: Star Schema Design Question

Posted: Sat Feb 05, 2011 10:35 am
by Marcus Scherer
nice subject! Until today I thought counting "bugs" is done per lines_of_code not area? :D

If the relation between location and area is 1:1 - what your example is suggesting - a numeric attribute 'area' for your location dimension would be enough in PALO. You would then calculate density per rules, looking up the area.
If the relation is different, please post again.

Re: Star Schema Design Question

Posted: Tue Feb 08, 2011 6:34 pm
by mhayward
Hello,
Thanks for the help - I think I now understand how to use attributes in Palo to achieve this.

As a more general question, if I were laying out tables in a schema, how would this intent be captured then? As a attribute of the Location table, which would be a dimension table?

Location
----------
Location_id
Name
Area

Bug_Count
-------------
Location_id
Species_id
...
15

And then I would have my business intelligence engine just understand, through some configuration, that the "measure" of density was to be calculated as the sum of Bug_Count / the sum of the Area attribute from Location_id
?

Re: Star Schema Design Question

Posted: Tue Feb 08, 2011 8:16 pm
by Marcus Scherer
As a more general question, if I were laying out tables in a schema, how would this intent be captured then? As a attribute of the Location table, which would be a dimension table?
correct. An attribute in your location table. Your Attribute "cube" in Palo is representing this table - it is a two dimensional "cube".

In your fact table Bug_Count you join the dimension tables over id s and one measure is 'count'. All measures you collect in one dimension in Palo.
And then I would have my business intelligence engine just understand, through some configuration, that the "measure" of density was to be calculated as the sum of Bug_Count / the sum of the Area attribute from Location_id?
In Palo you would add an element in your measure dimension 'density' and create a rule for the cube. This measure is then calculated on the fly, similar to: ( the following is TM1 code, you need to adjust for Palo).

['density'] = N: ['count'] / DB('}ElementAttributes_location',!location,'area');
bugs.jpg
bugs.jpg (32.26 KiB) Viewed 6961 times