Star Schema Design Question

Not related to a specific OLAP tool. (Includes forum policies and rules).
Post Reply
mhayward
Posts: 2
Joined: Thu Feb 03, 2011 9:42 pm
OLAP Product: Palo
Version: 3.2
Excel Version: Office 2007

Star Schema Design Question

Post by mhayward » Thu Feb 03, 2011 9:54 pm

Hello,
I am new to OLAP, I hope this question makes sense. I am looking for advice on how to handle a situation where there are two related "measures" or "facts," of interest, one of which can not be added across many dimensions.

For example, suppose I am running a science experiment to estimate various insects populations in the wild. I send out field researchers to collect and identify insects in a variety of locations, and they come back for each location with the following data:
1) Insect species - quantity observed.
2) Location size in square miles.

So, for example I might have the following data:
Location, Species, Count, Area
Park, Ant, 100, 1
Park, Spider, 14, 1
Office Building, Fly, 8, .25
Beach, Flea, 2002, .5
Beach, Ant, 90, .5

For my research, I'm generally interested in:
1) The total number of bugs located.
2) The density of bugs: bugs per square mile in various locations.
And I want to be able to slice and dice this data by location and insect species, and possibly by area.

How would you design a schema to allow you to calculate the density of bugs? I keep running into the trouble that if "Insect Count" is my fact, then I don't know how to determine the area they were found in - as the two rows of data above for "Park" cover the same 1 square mile - the insect density is (100+14)/1 = 114 bugs per square mile, not (100+14)/(1+1) = 77 bugs per square mile.

Marcus Scherer
Community Contributor
Posts: 125
Joined: Sun Jun 29, 2008 9:33 am
OLAP Product: TM1
Version: 9.5 10.1 10.2
Excel Version: 2007
Location: Karlsruhe

Re: Star Schema Design Question

Post by Marcus Scherer » Sat Feb 05, 2011 10:35 am

nice subject! Until today I thought counting "bugs" is done per lines_of_code not area? :D

If the relation between location and area is 1:1 - what your example is suggesting - a numeric attribute 'area' for your location dimension would be enough in PALO. You would then calculate density per rules, looking up the area.
If the relation is different, please post again.

mhayward
Posts: 2
Joined: Thu Feb 03, 2011 9:42 pm
OLAP Product: Palo
Version: 3.2
Excel Version: Office 2007

Re: Star Schema Design Question

Post by mhayward » Tue Feb 08, 2011 6:34 pm

Hello,
Thanks for the help - I think I now understand how to use attributes in Palo to achieve this.

As a more general question, if I were laying out tables in a schema, how would this intent be captured then? As a attribute of the Location table, which would be a dimension table?

Location
----------
Location_id
Name
Area

Bug_Count
-------------
Location_id
Species_id
...
15

And then I would have my business intelligence engine just understand, through some configuration, that the "measure" of density was to be calculated as the sum of Bug_Count / the sum of the Area attribute from Location_id
?

Marcus Scherer
Community Contributor
Posts: 125
Joined: Sun Jun 29, 2008 9:33 am
OLAP Product: TM1
Version: 9.5 10.1 10.2
Excel Version: 2007
Location: Karlsruhe

Re: Star Schema Design Question

Post by Marcus Scherer » Tue Feb 08, 2011 8:16 pm

As a more general question, if I were laying out tables in a schema, how would this intent be captured then? As a attribute of the Location table, which would be a dimension table?
correct. An attribute in your location table. Your Attribute "cube" in Palo is representing this table - it is a two dimensional "cube".

In your fact table Bug_Count you join the dimension tables over id s and one measure is 'count'. All measures you collect in one dimension in Palo.
And then I would have my business intelligence engine just understand, through some configuration, that the "measure" of density was to be calculated as the sum of Bug_Count / the sum of the Area attribute from Location_id?
In Palo you would add an element in your measure dimension 'density' and create a rule for the cube. This measure is then calculated on the fly, similar to: ( the following is TM1 code, you need to adjust for Palo).

['density'] = N: ['count'] / DB('}ElementAttributes_location',!location,'area');
bugs.jpg
bugs.jpg (32.26 KiB) Viewed 2587 times

Post Reply