Understanding Dimensional Queries

Dimensional queries are designed to enable multiple-fact querying. The basic goals of multiple-fact querying are:

Preserve data when fact data does not perfectly align across common dimensions, such as when some fact data is missing for certain dimensional values but other facts are present.
Prevent double-counting when fact data exists at different levels of granularity by ensuring that each fact is represented in a single query with appropriate grouping. Determinants may need to be created for the underlying query subjects in some cases.

Single Fact Query

A query on a star schema group results in a single fact query.

In this example, the Sales fact is the focus of any query that is written. The dimensional query subjects provide attributes and descriptions to make the data in Sales fact more meaningful. All relationships between dimensions and the fact are 1-n.

When you filter on the month and product, the result is as follows.

Month	Product	Quantity
January	TrailChef Water Bag	62,791
January	TrailChef Canteen	27,229
January	TrailChef Kitchen Kit	20,785
January	TrailChef Cup	29,670

Multiple-fact, Multiple-grain Query on Conformed Dimensions

A query on multiple facts and conformed dimensions respects the cardinality between each fact table and its dimensions and writes SQL to return all the rows from each fact table.

For example, Sales and Product forecast are both facts.

Note that this is a simplified representation and not an example of how this would appear in a model built using Cognos modeling recommendations.

The Result

Individual queries on Sales and Product forecast by Month and Product yield the following results. The data in Sales is actually stored at the day level.

Month	Product	Sales target
Opening balance
January	TrailChef Water Bag	4,053,900
January	TrailChef Canteen	4,053,900
January	TrailChef Kitchen Kit	4,053,900
January	TrailChef Cup	4,053,900

A query on Sales and Product forecast respects the cardinality between each fact table and its dimensions and writes SQL to return all the rows from each fact table. The fact tables are matched on their common keys, month and product, and, where possible, are aggregated to the lowest common level of granularity. In this case, days are rolled up to months. Nulls are often returned for this type of query because a combination of dimensional elements in one fact table may not exist in the other.

Month	Product name	Quantity	Expected volume
April 2007	Aloe Relief	1,410	1,690
April 2007	Course Pro Umbrella	132	125
February 2007	Aloe Relief	270	245
February 2007	Course Pro Umbrella		1
February 2006	Aloe Relief	88	92

Note that in February 2007, Course Pro Umbrellas were in the forecast but there were no actual sales. The data in Sales and Product forecast exist at different levels of granularity. The data in Sales is at the day level, and Product forecast is at the month level.

The SQL

The SQL generated by Cognos 8, known as a stitched query, is often misunderstood. A stitched query uses multiple subqueries, one for each star, brought together by a full outer join on the common keys. The goal is to preserve all dimensional members occurring on either side of the query.

The following example was edited for length and is used to illustrate the main features of stitched queries.

select
	coalesce(D2.MONTH_NAME,D3.MONTH_NAME) as MONTH,coalesce(D2.PRODUCT_NAME,D3.PRODUCT_NAME) as PRODUCT, 
	D2.EXPECTED_VOLUME as SALES_TARGET,
	D3.QUANTITY as QUANTITY
from (select TIME.MONTH_NAME as MONTH,
	PRODUCT_LOOKUP.PRODUCT_NAME as PRODUCT,
	XSUM(PRODUCT_FORECAST_FACT.EXPECTED_VOLUME for  
	TIME.CURRENT_YEAR,TIME.QUARTER_KEY,TIME.MONTH_KEY,
	PRODUCT.PRODUCT_LINE_CODE, PRODUCT.PRODUCT_TYPE_CODE,
	PRODUCT.PRODUCT_KEY) as SALES_TARGET
from 
	(select TIME.CURRENT_YEAR as CURRENT_YEAR,
	TIME.QUARTER_KEY as QUARTER_KEY,
	TIME.MONTH_KEY as MONTH_KEY,
	XMIN(TIME.MONTH_NAME for TIME.CURRENT_YEAR,
	TIME.QUARTER_KEY,TIME.MONTH_KEY) as MONTH
	from TIME_DIMENSION TIME
	group by TIME.MONTH_KEY) TIME
	join PRODUCT_FORECAST_FACT PRODUCT_FORECAST_FACT
	on (TIME.MONTH_KEY = PRODUCT_FORECAST_FACT.MONTH_KEY)
	join PRODUCT PRODUCT on (PRODUCT.PRODUCT_KEY =
	PRODUCT_FORECAST_FACT.PRODUCT_KEY)
where
	(PRODUCT.PRODUCT_NAME in ('Aloe Relief','Course Pro
Umbrella')) and(TIME.MONTH_NAME in ('April 2007','February 2007','February 2006')) 
group by 
	TIME.MONTH_NAME,
	PRODUCT_LOOKUP.PRODUCT_NAME
) D2
 full outer join 
(select TIME.MONTH_NAME as MONTH_NAME,
	PRODUCT_LOOKUP.PRODUCT_NAME as PRODUCT_NAME,
	XSUM(SALES_FACT.QUANTITY for TIME.CURRENT_YEAR,
	TIME.QUARTER_KEY, TIME.MONTH_KEY, 
	PRODUCT.PRODUCT_LINE_CODE, PRODUCT.PRODUCT_TYPE_CODE, 
	PRODUCT.PRODUCT_KEY ) as QUANTITY
from 
select TIME.DAY_KEY,TIME.MONTH_KEY,TIME.QUARTER_KEY,
	TIME.CURRENT_YEAR,TIME.MONTH_EN as MONTH_NAME 
	from TIME_DIMENSION TIME) TIME
	join SALES_FACT SALES_FACT
	on (TIME.DAY_KEY = SALES_FACT.ORDER_DAY_KEY)
	join PRODUCT PRODUCT on (PRODUCT.PRODUCT_KEY = SALES_FACT.PRODUCT_KEY)
where 
	PRODUCT.PRODUCT_NAME in ('Aloe Relief','Course Pro Umbrella'))
	and (TIME.MONTH_NAME in ('April 2004','February 2004','February
2006'))
 group byTIME.MONTH_NAME,PRODUCT.PRODUCT_NAME 
) D3
 on ((D2.MONTH_NAME = D3.MONTH_NAME) and(D2.PRODUCT_NAME = D3.PRODUCT_NAME))

What Is the Coalesce Statement?

A coalesce statement is simply an efficient means of dealing with query items from conformed dimensions. It is used to accept the first non-null value returned from either subquery that is built for each detected fact table. This statement allows a full list of keys with no repetitions when doing a full outer join.

Why Is There a Full Outer Join?

A full outer join is necessary to ensure that all the data from each fact table is retrieved. An inner join gives results only if an item in inventory was sold. A right outer join gives all the sales where the items were in inventory. A left outer join gives all the items in inventory that had sales. A full outer join is the only way to learn what was in inventory and what was sold.

Modeling 1-n Relationships as 1-1 Relationships

If a 1-n relationship exists in the data but is modeled as a 1-1 relationship, SQL traps cannot be avoided because the information provided by the metadata to Cognos 8 is insufficient.

The most common problems that arise if 1-n relationships are modeled as 1-1 are the following:

Double-counting for multiple-grain queries is not automatically prevented.
Cognos 8 cannot detect facts and then generate a stitched query to compensate for double-counting, which can occur when dealing with hierarchical relationships and different levels of granularity across conformed dimensions.
Multiple-fact queries are not automatically detected.
Cognos 8 will not have sufficient information to detect a multiple-fact query. For multiple-fact queries, an inner join is performed and the loop join is eliminated by dropping the last evaluated join. Dropping a join is likely to lead to incorrect or unpredictable results depending on the dimensions and facts included in the query.

If the cardinality were modified to use only 1-1 relationships between query subjects or dimensions, the result of a query on Product forecast and Sales with Time or Time and Product generates a single Select statement that drops one join to prevent a circular reference.

The example below shows that the results of this query are incorrect when compared with the results of individual queries against Sales or Product forecast.

The results of individual queries are as follows.

Month	Product	Quantity
January	TrailChef Water Bag	62,791
January	TrailChef Canteen	27,229
January	TrailChef Kitchen Kit	20,785
January	TrailChef Cup	29,670

Month	Product	Sales target
January	TrailChef Water Bag	4,053,900
January	TrailChef Canteen	4,053,900
January	TrailChef Kitchen Kit	4,053,900
January	TrailChef Cup	4,053,900

When you combine these queries into a single query, the results are as follows.

Month	Product	Quantity	Sales target
January	TrailChef Water Bag	62,791	4,053,900
January	TrailChef Canteen	27,229	4,053,900
January	TrailChef Kitchen Kit	20,785	4,053,900
January	TrailChef Cup	29,670	4,053,900

The SQL

If you look at the SQL, you can see that, because Cognos 8 detected that a circular join path exists in the model, it did not include one of the relationships that was not necessary to complete the join path. In this example, the relationship between Time and Product forecast was dropped.

A circular join path rarely results in a query that produces useful results.

select 
	TIME_.MONTH_NAME as MONTH,
	PRODUCT_LOOKUP.PRODUCT_NAME as PRODUCT,
	XSUM(SALES_FACT.QUANTITY for
	TIME_.CURRENT_YEAR, TIME_.QUARTER_KEY, TIME_.MONTH_KEY,
	PRODUCT.PRODUCT_LINE_CODE, PRODUCT.PRODUCT_TYPE_CODE,
	PRODUCT.PRODUCT_KEY ) as QUANTITY,
	XSUM(PRODUCT_FORECAST_FACT.EXPECTED_VOLUME for TIME_.CURRENT_YEAR, 
	TIME_.QUARTER_KEY, TIME_.MONTH_KEY, PRODUCT.PRODUCT_LINE_CODE,
	PRODUCT.PRODUCT_TYPE_CODE, PRODUCT.PRODUCT_KEY ) as SALES_TARGET
from 
	(select TIME.DAY_KEY,TIME.MONTH_KEY, TIME.QUARTER_KEY,
	TIME.CURRENT_YEAR,TIME.MONTH_EN as MONTH 
	from TIME_DIMENSION TIME) TIME 
	joinSALES_FACT on (TIME_.DAY_KEY = SALES_FACT.ORDER_DAY_KEY)joinPRODUCT_FORECAST_FACT on (TIME_.MONTH_KEY =PRODUCT_FORECAST_FACT.MONTH_KEY)joinPRODUCT (PRODUCT.PRODUCT_KEY = PRODUCT_FORECAST_FACT.PRODUCT_KEY) 
where 
	(PRODUCT.PRODUCT_NAME in ('Aloe Relief','Course Pro Umbrella'))
and
	(TIME_.MONTH_NAME in ('April 2004','February 2004','February 2006'))
group by 
	TIME_.MONTH_NAME, PRODUCT.PRODUCT_NAME

You can deal with these issues in Report Studio by creating an inner or outer join between queries or creating calculations that compensate for differences in granularity. This workaround requires the report author to have detailed knowledge of the data. There is no workaround available in Query Studio or Analysis Studio.

Please note that this approach can result in a query that joins across multiple fact tables in a single subquery, which can have a negative impact on performance.

Multiple-fact, Multiple-grain Query on Non-Conformed Dimensions

If a non-conformed dimension is added to the query, the nature of the result returned by the stitched query is changed. It is no longer possible to aggregate records to a lowest common level of granularity because one side of the query has dimensionality that is not common to the other side of the query. The result returned is really two correlated lists, which are related to each other only through the shared dimensional information.

The Result

The results of individual queries on the respective star schemas look like this.

Month	Product	Order method dimension	Quantity
February 2007	Aloe Relief	Telephone	55
February 2007	Aloe Relief	Email	41
March 2007	Aloe Relief	Web	1310
March 2007	Course Pro Umbrella	Fax	525
April 2007	Aloe Relief	Sales Visit	65
April 2007	Course Pro Umbrella	Sales Visit	46

Month	Product name	Quantity	Expected volume
April 2007	Aloe Relief	1,410	1,690
April 2007	Course Pro Umbrella	132	125
February 2007	Aloe Relief	270	245
February 2007	Course Pro Umbrella		1
February 2006	Aloe Relief	88	92

Querying the same items from both star schemas yields the following result.

Month	Product name	Order method dimension	Quantity	Expected volume
February 2007	Aloe Relief	Telephone	55	100,800
February 2007	Aloe Relief	Email	41	100,800
March 2007	Aloe Relief	Web	1310	113,200
March 2007	Course Pro Umbrella	Fax	525	1,554,800
April 2007	Aloe Relief	Sales Visit	65	113,200
April 2007	Course Pro Umbrella	Sales Visit	46	109,600

In this result, the lower level of granularity for records from Sales results in more records being returned for each month and product combination. There is now a 1-n relationship between the rows returned from Product forecast and those returned from Sales.

When you compare this to the result returned in the example of the multiple-fact, multiple grain query on conformed dimensions, you can see that more records are returned and that Expected volume results are repeated across multiple Order Methods. Adding Order Method to the query effectively changes the relationship between Quantity data and Expected volume data to a 1-n relationship. It is no longer possible to relate a single value from Expected volume to one value from Quantity.

Grouping on the Month key demonstrates that the result in this example is based on the same data set as the result in the multiple-fact, multiple-grain query but with a greater degree of granularity.

The SQL

The stitched SQL generated for this example is very similar to the SQL generated in the multiple-fact, multiple-grain query . The main difference is the addition of Order Method. Order Method is not a conformed dimension and affects only the query against the Sales fact table.

select 
	D2.QUANTITY as QUANTITY,
	D3.EXPECTED_VOLUME as SALES_TARGET, 
	coalesce(D2.PRODUCT_NAME,D3.PRODUCT_NAME) as PRODUCT,coalesce(D2.MONTH_NAME,D3.MONTH_NAME) as MONTH, 
	D2.ORDER_METHOD as ORDER_METHOD
from 
	(select 
	PRODUCT.PRODUCT_NAME as PRODUCT,
	TIME.MONTH_NAME as MONTH,
	ORDER_METHOD.ORDER_METHOD as ORDER_METHOD,
	XSUM(SALES_FACT.QUANTITY for TIME.CURRENT_YEAR,TIME.QUARTER_KEY,
	TIME.MONTH_KEY,PRODUCT.PRODUCT_LINE_CODE,PRODUCT.PRODUCT_TYPE_CODE,
	PRODUCT.PRODUCT_KEY,ORDER_METHOD_DIMENSION.ORDER_METHOD_KEY) as
QUANTITY
from 
	PRODUCT_DIMENSION PRODUCT
	join
	SALES_FACT SALES_FACT
	on (PRODUCT.PRODUCT_KEY = SALES_FACT.PRODUCT_KEY)
	join 
	ORDER_METHOD_DIMENSION ORDER_METHOD
	on (ORDER_METHOD.ORDER_METHOD_KEY = SALES_FACT.ORDER_METHOD_KEY)
	join TIME_DIMENSION TIME
	on ( TIME.DAY_KEY = SALES_FACT.ORDER_DAY_KEY)
where 
	(PRODUCT.PRODUCT_NAME in ('Aloe Relief','Course Pro Umbrella'))
and 
	( TIME.MONTH_NAME in ('April 2004','February 2004','February 2006'))
 group byPRODUCT.PRODUCT_NAME,TIME.MONTH_NAME,ORDER_METHOD.ORDER_METHOD) D2full outer join 
(select 
	PRODUCT.PRODUCT_NAME as PRODUCT,
	TIME.MONTH_NAME as MONTH,
	XSUM(PRODUCT_FORECAST_FACT.EXPECTED_VOLUME for TIME.CURRENT_YEAR,
	TIME.QUARTER_KEY,TIME.MONTH_KEY,PRODUCT.PRODUCT_LINE_CODE,
	PRODUCT.PRODUCT_TYPE_CODE,PRODUCT.PRODUCT_KEY) as EXPECTED_VOLUME
from 
	PRODUCT_DIMENSION PRODUCT
	join 
	PRODUCT_FORECAST_FACT PRODUCT_FORECAST_FACT
	on (PRODUCT.PRODUCT_KEY = PRODUCT_FORECAST_FACT.PRODUCT_KEY)
	join 
(select 
	TIME.CURRENT_YEAR as CURRENT_YEAR,
	TIME.QUARTER_KEY as QUARTER_KEY,
	TIME.MONTH_KEY as MONTH_KEY,
	XMIN(TIME.MONTH_NAME for TIME.CURRENT_YEAR, TIME.QUARTER_KEY,
	TIME.MONTH_KEY) as MONTH_NAME
from 
	TIME_DIMENSION TIME
group by 
	TIME.CURRENT_YEAR,
	TIME.QUARTER_KEY,
	TIME.MONTH_KEY
	) TIME
	on (TIME.MONTH_KEY = PRODUCT_FORECAST_FACT.MONTH_KEY)
where 
	(PRODUCT.PRODUCT_NAME in ('Aloe Relief','Course Pro Umbrella'))
and 
	(TIME.MONTH_NAME in ('April 2004','February 2004','February 2006')) 
 group by PRODUCT.PRODUCT_NAME,TIME.MONTH_NAME) D3on ((D2.PRODUCT_NAME = D3.PRODUCT_NAME) and(D2.MONTH_NAME = D3.MONTH_NAME))