I have the following query which takes more than 3 minutes to run. There are just 10,000 records in the Fact table. The largest dimension has about 2000 records in it. How can I optimize it?
SELECT NON EMPTY { [Measures].[Productivity Time in hours - Edited Docs],
[Measures].[Productivity Time in Hours - Typed Docs], [
Measures].[Productivity Time in Hours], [Measures].[Lines Per 1000],
[Measures].[Hours Saved],
[Measures].[Time Saved %],
[Measures].[Typed Lines],
[Measures].[Speech Utilization] } ON COLUMNS,
NON EMPTY {
([Dim Customer].[Customer Name].[Customer Name].ALLMEMBERS *
[Dim Customer].[Dictator Site Name].[Dictator Site Name].ALLMEMBERS *
[Dim MT].[Creator First Name].[Creator First Name].ALLMEMBERS
* [Dim Date].[The Year].[The Year].ALLMEMBERS }
WHERE ( [Dim Date].[The Year].&[2003] : [Dim Date].[The Year].&[2004],
[Dim Date].[The Month].&[January] :[Dim Date].[The Month].&[December])
The problem is with calculated measures that you use in your query. Please provide the expressions for all the calculated measures:
[Measures].[Productivity Time in hours - Edited Docs]
[Measures].[Productivity Time in Hours - Typed Docs]
[Measures].[Productivity Time in Hours], [Measures].[Lines Per 1000]
[Measures].[Hours Saved]
[Measures].[Time Saved %]
[Measures].[Typed Lines]
[Measures].[Speech Utilization]
|||
These are some of the calculations: ProductivityTime, TextLC are base measures.
CREATE MEMBER CURRENTCUBE.[MEASURES].[Productivity Time in Hours]
AS ([Doc Rev Count].[Doc Rev Count].&[1],[Measures].[Productivity Time]) /3600,
FORMAT_STRING = "#",
NON_EMPTY_BEHAVIOR = { [Productivity Time] },
VISIBLE = 1;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Productivity Time in hours - Edited Docs]
AS ([Dim Speech].[New Speech Chargable Value].&[Edited],[Doc Rev Count].[Doc Rev Count].&[1],[Measures].[Productivity Time])/3600,
FORMAT_STRING = "#",
NON_EMPTY_BEHAVIOR = { [Productivity Time] },
VISIBLE = 1;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Productivity Time in Hours - Typed Docs]
AS ([Dim Speech].[New Speech Chargable Value].&[Typed],[Doc Rev Count].[Doc Rev Count].&1],[Measures].[Productivity Time])/3600,
FORMAT_STRING = "#",
NON_EMPTY_BEHAVIOR = { [Productivity Time] },
VISIBLE = 1;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Typed Lines]
AS ([Measures].[Text LC],[Dim Speech].[New Speech Chargable Value].&[Typed]),
FORMAT_STRING = "#,#.00",
NON_EMPTY_BEHAVIOR = { [Text LC] },
VISIBLE = 1;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Hours taken to type]
AS [Measures].[Text LC]/[Measures].[Typed Lines/Hr over 2 years],
FORMAT_STRING = "#,#.00",
NON_EMPTY_BEHAVIOR = { [Text LC] },
VISIBLE = 1;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Typed Lines/Hr over 2 years]
AS [Measures].[Typed Lines over 2 years]/[Measures].[Productivity Time in Hours over 2 years - Typed Docs],
FORMAT_STRING = "#,#.00",
VISIBLE = 1;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Typed Lines Over 2 years]
AS SUM(LASTPERIODS(24,[Dim Date].[Month Year]),([Doc Rev Count].[Doc Rev Count].&[1],[Dim Speech].[New Speech Chargable Value].&[Typed],[Measures].[Text LC])),
FORMAT_STRING = "#,#.00",
NON_EMPTY_BEHAVIOR = { [Text LC] },
VISIBLE = 1 ;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Productivity Time in Hours over 2 years - Typed Docs]
AS SUM(LASTPERIODS(24,[Dim Date].[Month Year]),([Dim Speech].[New Speech Chargable Value].&[Typed],[Doc Rev Count].[Doc Rev Count].&[1],[Measures].[Productivity Time]) /3600),
FORMAT_STRING = "#,#.00",
VISIBLE = 1 ;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Time Saved %]
AS ([Measures].[Hours Saved]/[Measures].[Hours taken to type]) * 100,
FORMAT_STRING = "Percent",
VISIBLE = 1;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Speech Edited for Rev 1]
AS ([Dim Speech].[New Speech Chargable Value].&[Edited],[Doc Rev Count].[Doc Rev Count].&[1],[Measures].[Text LC]),
FORMAT_STRING = "#,#.00",
VISIBLE = 1;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Hours Saved]
AS ([Measures].[Speech Edited for Rev 1] / [Measures].[Typed Lines/Hr over 2 years])
/ ([Measures].[Typed Lines Over 2 years] / [Measures].[Speech Edited Lines/Hr]),
FORMAT_STRING = "#,#.00",
VISIBLE = 1;
CREATE MEMBER CURRENTCUBE.[MEASURES].[Lines Per 1000]
AS ([Doc Rev Count].[Doc Rev Count].&[1],[Measures].[Text LC] )/ 1000,
FORMAT_STRING = "#,#.00",
NON_EMPTY_BEHAVIOR = { [Text LC] },
VISIBLE = 1 ;
|||I am afraid I have bad news for you. There are many calculated measures above which have their NON_EMPTY_BEHAVIOR defined incorrectly (for example Productivity Time in Hours, Productivity Time in hours - Edited Docs and others), and therefore should be removed to guarantee correctness of the results. This will make performance worse.|||I had initially put the NON_EMPTY_BEHAVIOR only for those measures where [Measures].[Text LC] or [Measures]. [Productivity Time] appeared in the denominator. But still had issues with performance when the number of dimensions increased.|||Yes - what I tried to say in the previous message, that there doesn't seem to be a solution to your performance problem. NON_EMPTY_BEHAVIOR improves performance of queries similar to yours, but unfortunatelly, it cannot be applied in your case.|||
Is the cross join of too many dimensions causing the performance deterioration? Each dimension at present has about 2000 records . So 2000 * 2000 * 2000 * 2000 does come upto a huge number.
What is the general dimension size that analysis services can handle especially when cross joining multiple dimensions.
The production datawarehouse would have much more records than the current fact table and dimensions have.
Any tips on what can be done to improve performance?
|||
Is there a particular business need for selecting that large of a crossjoin of members? It would seem that even if only a small percentage of the tuples created by the crossjoin have data, that would still produce a cellset that has too much data to be useful (in a report, for example).
Given that I don't know anything about your business requirements, I can't really add much to the discussion, but in a lot of cases, when faced with a "requirement" like this, I push back on the requirement based on the usuability (or unusability) of the result, hoping to get some change to the requirement introduced that will help with the performance of the resulting query (like adding additional filters to the query to reduce the crossjoin, for example).
Dave F.
|||These reports are basically for the management of the organization I work in. They want drilldown reports with percentages and totals regarding documents transcribed for all customers, all sites, all worktypes, all transcriptionists and all doctors for a selected period. I did suggest breaking it down into more manageable chunks but the argument put forth is that analysis services is meant to handle precisely these kind of scenarios with large amounts of data.|||What type of front-end tool are you using? If it is a pivot-table type of tool (like Excel 2007 or ProClarity), then if your users start at a high-level of detail (like the All level for each of the dimensions listed in your query), I doubt they will ever get down to the level of detail your query seems to indicate (expanding every dimension, for example). So, your performance will likely be okay.
If, however, you are using a reporting tool like Reporting Services, then I'd suggest you implement this a a series of reports to "simulate" the drilldown behavior they might be looking for. This could be done in a number of ways with linked reports, subreports, etc. Would that be possible?
Yes, Analysis Services is built to handle large data sets, aggregations, etc. But your query appears to be closer to a relational query (pulling back a large amount of very detailed data) than a dimensional query...
Dave Fackler
|||I'm using Reporting Services.
I will give Linked reports or a subreports a try. That seems to be the only way out.
The data I want is not the details of each record in the datawarehouse but the aggregates for each customer/ site/ worktype/doctor at each level.
Does the performance of analysis services decrease when more dimensions are involved in a query?
What is the size of dimensions or a cross join between them that analysis services handles efficiently?
|||
> Does the performance of analysis services decrease when more dimensions are involved in a query?
Depends on the scenarios. In your scenario - yes performance decreases with the size of the axis growing.
|||
Dave Fackler wrote: But your query appears to be closer to a relational query (pulling back a large amount of very detailed data) than a dimensional query...
Dave Fackler
I am in the same situation than Reena33, would you suggest me to change my MDX query for a classical SQL query?
In this case, how can I obtain calcultations like 'SUM( PERIODSTODATE([Time].[Hierarchy].[Year],... )' or 'PARALLELPERIOD(...)'?
Some of my queries need more than 5 hours to be executed...
Julien
|||I noticed that using WHERE was slower than using SELECT FROM SELECT:
FROM ( SELECT
( {[Dim Date].[The Year].&[2003], [Dim Date].[The Year].&[2004]})
ON COLUMNS
FROM [your Cube])
[Dim Date].[The Month].&[January] :[Dim Date].[The Month].&[December]) is not necesarry because you want each month
another tip: if you are using partition you can try use Exists(), like this:
SELECT NON EMPTY { [Measures].[Productivity Time in hours - Edited Docs],
[Measures].[Productivity Time in Hours - Typed Docs], [
Measures].[Productivity Time in Hours], [Measures].[Lines Per 1000],
[Measures].[Hours Saved],
[Measures].[Time Saved %],
[Measures].[Typed Lines],
[Measures].[Speech Utilization] } ON COLUMNS,
NON EMPTY {
(
Exists([Dim Customer].[Customer Name].[Customer Name].ALLMEMBERS *
[Dim Customer].[Dictator Site Name].[Dictator Site Name].ALLMEMBERS *
[Dim MT].[Creator First Name].[Creator First Name].ALLMEMBERS, , 'Your partition')
* [Dim Date].[The Year].[The Year].ALLMEMBERS ) }
Be carefull that the results will be the same!
(I didn't include the Time Dimension in the Exists function, in my case, it's not correct, because I'm using calculated member over the time.
I hope I could help
No comments:
Post a Comment