April 16, 2014 Leave a comment
A project I worked on for some time went live months ago. My role as “tech lead” had a foot in many camps: solution architecture, business analysis, scrum master, data modeling, BI evangelism and BI development. We built a data warehouse of over half a terabyte with a handful of tables edging half a billion rows. It is a major component of the data-services layer for the organization. Thousands of applications/reports (primarily Excel/Access apps on network shares) that were tightly-coupled with systems of record were either replaced by a corporate BI & reporting solution, or rebuilt to consume data from reusable data services. We retired other competing “data layers” (legacy DW and a data-service + BI layer with a mix of technologies including multidimensional cubes) and migrated their consumers to the new architecture. We also enhanced limited MDM capabilities resulting in greater business ownership of master data, rather than the MDM repository being an unnecessary duplication of data. A data governance model has been established to ensure growth and longevity of the solution.
As is the case with many enterprise organizations that have built a culture of encouraging (or even training) business users to access systems of record directly, such a project requires a fundamental change in business processes, and faces some level of resistance. It was not enough to tell stakeholders that a reusable data layer would provide a “single version of the truth”, promote consistent decisions across the organization, lower the total cost of ownership due to addressing inefficiencies where the same business calculations are being applied in hundreds of places, inconsistent terminology, etc. What forced the issue was that the biggest system of record in the organization (containing roughly 50% of the data anyone is interested in) was being replaced. Given that thousands of apps were tightly coupled with this SoR and therefore exposed, they would all break and the business was at risk of grinding to a halt. This was enough to convince key stakeholders of the benefits of a reusable data layer coupled with governance model, in part to protect them from this situation in the future. So we went live alongside this major SoR migration, replacing or integrating countless apps, cleaning up the architecture of the organization, introducing hundreds of users to business intelligence as a concept, and transforming the way data is accessed for decision making. I personally trained over 200 users on how to use the new BI layer to access corporate data with minimal technical skills. My motto throughout has been “clicky clicky draggy droppy” data access. I think it’s fair to say the project was a success.
Focusing on SSAS …
We started with multidimensional and had performance issues. Despite optimization, multidimensional struggled. We moved to tabular. I did not find it difficult to get 50 to 100 GB of memory on the servers, which was more than enough for tabular when limited to the data set required for BI. However, as pointed out by others in the blogosphere, the better performance of tabular did not come for free …
Personally, I did not find limited functionality in tabular to be a major issue. As I mentioned in my Multidimensional or Tabular post, most functionality can be delivered even if using a “workaround” (often DAX instead of decorating the metadata of the model in multidimensional). True show stoppers like unary operators and write back were not required.
The bigger issues were with the tabular development experience. The tabular-model designer inherited from Power Pivot is sluggish for models with lots of tables, and buggy. The Excel-like DAX formula bar is, to put it politely, annoying. Without saying anything too controversial, MS corporate BI has played second fiddle lately.
Here are a couple of tips/techniques. This is not a comprehensive list.
Author measures in a query-authoring tool and copy/paste them into the tabular model
Authoring measures in the tabular model designer is not effective because the workspace database contains hardly any data. Unfortunately, my laptop doesn’t have 50 GB of memory. Setting the workspace database to point at the Dev instance is not an option because it would exacerbate the sluggish development experience, not to mention screwing up partitions on the Dev instance, etc. The workspace database is basically redundant in an enterprise development scenario.
If you prefer DAX queries, author measures in DAX Studio (querying the Dev instance of course) and copy/paste the measure definitions into the tabular model designer when you’re done.
If you prefer MDX queries, author measures in Management Studio and copy/paste the measure definitions into the tabular model designer when you’re done. You can author DAX measures in the WITH clause of an MDX query. I personally do this a lot because I find it less typing for an MDX SELECT statement than a DAX EVALUATE statement. Also I can refer to user hierarchies in the query (if not the measures).
For large models, use a subset in the tabular model designer
The tabular model designer inherited from Power Pivot may work well for a typical Power Pivot model. This may not be the case in SSAS tabular. The largest of the models I work with has about 80 tables, many with lots of columns. There are facts at various levels of granularity reducing the denormalization opportunity. Including internal measures not visible to users, there are over 800 measures (thankfully organized using display folders)! Due to business requirements to report measures side by side, I cannot split it into multiple tabular models.
Even in Manual Calculations mode (Model > Calculation Options > Manual Calculation), this model freezes the IDE for 10 to 15 seconds every time I do anything trivial. For significant modeling enhancements, this is unworkable. I simply don’t have the patience to sit there while the IDE is “Finishing …” all the time.
For significant enhancements, use BISM Normalizer to export a subset of the model to a temporary tabular project. Do the enhancements in there where there is a responsive development experience, and then BISM Normalizer the enhancements back to the main model when you’re done. This saves time and minimizes hair loss.
Place calculation logic at the appropriate level of the architecture
The tabular storage engine is clearly much faster than that of multidimensional. The same cannot be said for the formula engine.
This may sound obvious, but place calculation logic at the appropriate level of the architecture. Some calculations (e.g. traditional aggregated level calcs) are much better suited to a cube/tabular model than the ETL. In other cases there may be a considerable performance gain by performing a calculation in the ETL rather than the tabular model, thereby avoiding the formula engine.
This also promotes reusability of business logic because non-BI applications that submit SQL instead of MDX/DAX can leverage the same calculations.