Experiences & Tabular Tips

A project I worked on for some time went live months ago. My role as “tech lead” had a foot in many camps: solution architecture, business analysis, scrum master, data modeling, BI evangelism and BI development. We built a data warehouse of over half a terabyte with a handful of tables edging half a billion rows. It is a major component of the data-services layer for the organization. Thousands of applications/reports (primarily Excel/Access apps on network shares) that were tightly-coupled with systems of record were either replaced by a corporate BI & reporting solution, or rebuilt to consume data from reusable data services. We retired other competing “data layers” (legacy DW and a data-service + BI layer with a mix of technologies including multidimensional cubes) and migrated their consumers to the new architecture. We also enhanced limited MDM capabilities resulting in greater business ownership of master data, rather than the MDM repository being an unnecessary duplication of data. A data governance model has been established to ensure growth and longevity of the solution.

As is the case with many enterprise organizations that have built a culture of encouraging (or even training) business users to access systems of record directly, such a project requires a fundamental change in business processes, and faces some level of resistance. It was not enough to tell stakeholders that a reusable data layer would provide a “single version of the truth”, promote consistent decisions across the organization, lower the total cost of ownership due to addressing inefficiencies where the same business calculations are being applied in hundreds of places, inconsistent terminology, etc. What forced the issue was that the biggest system of record in the organization (containing roughly 50% of the data anyone is interested in) was being replaced. Given that thousands of apps were tightly coupled with this SoR and therefore exposed, they would all break and the business was at risk of grinding to a halt. This was enough to convince key stakeholders of the benefits of a reusable data layer coupled with governance model, in part to protect them from this situation in the future. So we went live alongside this major SoR migration, replacing or integrating countless apps, cleaning up the architecture of the organization, introducing hundreds of users to business intelligence as a concept, and transforming the way data is accessed for decision making. I personally trained over 200 users on how to use the new BI layer to access corporate data with minimal technical skills. My motto throughout has been “clicky clicky draggy droppy” data access. I think it’s fair to say the project was a success.

Focusing on SSAS …

We started with multidimensional and had performance issues. Despite optimization, multidimensional struggled. We moved to tabular. I did not find it difficult to get 50 to 100 GB of memory on the servers, which was more than enough for tabular when limited to the data set required for BI. However, as pointed out by others in the blogosphere, the better performance of tabular did not come for free …

Personally, I did not find limited functionality in tabular to be a major issue. As I mentioned in my Multidimensional or Tabular post, most functionality can be delivered even if using a “workaround” (often DAX instead of decorating the metadata of the model in multidimensional). True show stoppers like unary operators and write back were not required.

The bigger issues were with the tabular development experience. The tabular-model designer inherited from Power Pivot is sluggish for models with lots of tables, and buggy. The Excel-like DAX formula bar is, to put it politely, annoying. Without saying anything too controversial, MS corporate BI has played second fiddle lately.

Here are a couple of tips/techniques. This is not a comprehensive list.

Author measures in a query-authoring tool and copy/paste them into the tabular model

Authoring measures in the tabular model designer is not effective because the workspace database contains hardly any data. Unfortunately, my laptop doesn’t have 50 GB of memory. Setting the workspace database to point at the Dev instance is not an option because it would exacerbate the sluggish development experience, not to mention screwing up partitions on the Dev instance, etc. The workspace database is basically redundant in an enterprise development scenario.

If you prefer DAX queries, author measures in DAX Studio (querying the Dev instance of course) and copy/paste the measure definitions into the tabular model designer when you’re done.

If you prefer MDX queries, author measures in Management Studio and copy/paste the measure definitions into the tabular model designer when you’re done. You can author DAX measures in the WITH clause of an MDX query. I personally do this a lot because I find it less typing for an MDX SELECT statement than a DAX EVALUATE statement. Also I can refer to user hierarchies in the query (if not the measures).

For large models, use a subset in the tabular model designer

The tabular model designer inherited from Power Pivot may work well for a typical Power Pivot model. This may not be the case in SSAS tabular. The largest of the models I work with has about 80 tables, many with lots of columns. There are facts at various levels of granularity reducing the denormalization opportunity. Including internal measures not visible to users, there are over 800 measures (thankfully organized using display folders)! Due to business requirements to report measures side by side, I cannot split it into multiple tabular models.

Even in Manual Calculations mode (Model > Calculation Options > Manual Calculation), this model freezes the IDE for 10 to 15 seconds every time I do anything trivial. For significant modeling enhancements, this is unworkable. I simply don’t have the patience to sit there while the IDE is “Finishing …” all the time.

For significant enhancements, use BISM Normalizer to export a subset of the model to a temporary tabular project. Do the enhancements in there where there is a responsive development experience, and then BISM Normalizer the enhancements back to the main model when you’re done. This saves time and minimizes hair loss.

Place calculation logic at the appropriate level of the architecture

The tabular storage engine is clearly much faster than that of multidimensional. The same cannot be said for the formula engine.

This may sound obvious, but place calculation logic at the appropriate level of the architecture. Some calculations (e.g. traditional aggregated level calcs) are much better suited to a cube/tabular model than the ETL. In other cases there may be a considerable performance gain by performing a calculation in the ETL rather than the tabular model, thereby avoiding the formula engine.

This also promotes reusability of business logic because non-BI applications that submit SQL instead of MDX/DAX can leverage the same calculations.

BISM Normalizer: Version 1.3.12 Released!

Download it from the BISM Normalizer Visual Studio Gallery page.

Enhancements in 1.3.12.3

  • Support for Visual Studio 2013 and SQL Server 2014. To run Visual Studio 2010/2012/2013 on the same machine, point them at the same local Addin folder in Visual Studio options.
  • Using a new (very simple) InstallShield installer. This will hopefully resolve the issue with the installation path pointing at a remote Addin folder based on target machine settings. I am unable to test this.

Update 4/4/2014: fixes in 1.3.12.4

  • Fix for BISM Normalizer window re-initializes when loses focus (Visual Studio 2012 and 2013 only).

Video of BISM Normalizer

Revised video of BISM Normalizer.

BISM Normalizer: Version 1.3.11 Released!

Download it from the BISM Normalizer Visual Studio Gallery page.

Enhancements in Version 1.3.11.1

Support for tabular objects created by BIDS Helper.

  • Actions are first-class objects listed in the differences grid.
  • Display folders are attributes of other objects (tables with columns/hierarchies, measures) and visible in those objects’ definitions.
  • Translations are attributes of other objects (tables with columns/hierarchies, measures, perspectives, actions) and visible in those objects’ definitions.

It is necessary to enable the features in the BISM Normalizer options dialog (accessible from the Connections dialog).

BISM Normalizer Options

BISM Normalizer is 2 Years Old!

BISM Normalizer is 2 years old today. I launched it on Christmas Day 2011 (way before SQL Server 2012 hit RTM). It’s arrival was marked by a wise man writing a blog post.

Having done various presentations about BISM Normalizer at the PASS Community Summit, SQLBits and other events, I gradually became more aggressive in selling its benefits – especially challenging the SSAS Deployment Wizard.

I have considered opening up the source code on Codeplex. I also had discussions about selling the source code (the executable is obfuscated), but they didn’t work out. I don’t know the future of BISM Normalizer, but the most likely outcome is I continue to give it away for free on the VS Gallery as I have done till now. The benefits of giving it away (for me) are that it raises my profile both as a consultant and in the SQL community.

So consider BISM Normalizer a Christmas gift of enterprise code management for Analysis Services, facilitated deployment, and promotion of a “single version of the truth” for business definitions covered by BI models.

Merry Christmas – or Happy Holidays – to you and your family!

BISM Normalizer: Version 1.3.10 Released!

Download it from the BISM Normalizer Visual Studio Gallery page.

Enhancements in Version 1.3.10.3

  • Measure/column formats included in object definitions for comparison.  Note: column datatypes have always been included.
  • Measure/column/table visibility included in object definitions for comparison.
  • Expiration date extended to June 30th 2014. It is unclear what will happen to BISM Normalizer after this time, but I have no intention of charging license fees.

Multidimensional or Tabular

This post is not a list of multidimensional features unsupported by tabular. This has been documented already by various reliable sources including the following posts, as well as PASS presentations.

First thing I would like to say is I agree that there is lots of work for tabular to catch up to the feature-rich multidimensional. The tabular-model designer inherited from Power Pivot is sluggish for models with lots of tables, and buggy. The Excel-like DAX formula bar is, to put it politely, annoying. Without saying anything too controversial, MS corporate BI has been playing second fiddle lately.

However, tabular does make sense for many customers today. For most customers, having fast performance is more important than the unsupported features – which invariably either have “workarounds” or are fringe use cases.

On the workarounds, if the same functionality can be delivered to the business, they don’t care if we technical people see it as a “workaround” because it’s not delivered the same way we are used to. And the business people are the ones that matter. This applies to many-to-many relationships, parent-child hierarchies, role-playing dimensions (can create multiple instances of same table), and various other items.

For what I’m calling the fringe use cases, the supportability of some of these does not make sense for many customers. Hand a solution to support that uses MDX stored procs, extensive scoped-cell assignments and they will struggle. How many implementations use these features because the developer thought they were cool rather than having any real business need? I think quite a few.

Other use cases may be showstoppers like unary operators and writeback, but not for the majority of implementations.

Scoped-cell assignments are in the potential showstopper list too, but in most cases if calculation logic is pushed to the ETL layer (where it belongs if not one of the strengths of the cube/tabular model like aggregated level calcs, or those which would cause a data explosion problem at the relational level, etc) to avoid the formula engine where possible, then DAX is a pretty capable and powerful language for calculations built into the tabular model.

On the tabular memory limitation, many customers I’ve talked to are worried they won’t fit into memory when they are actually nowhere near the upper limit of what they can relatively easily get on a server (especially when limited to the required data). Also, more memory will only become more viable in the future. For a multi-terabyte data warehouse implementation like a Yahoo.com, then yes but again this is a fringe use case.

As noted elsewhere,

  • When a project chooses multidimensional or tabular, it is not possible to change your mind without starting development again.
  • I think it is widely accepted that Microsoft is more likely to build new features and put future development investment into tabular than multidimensional.

It is a valid statement that a project may need some of the features unsupported by tabular at a later date, which could be a problem. Conversely a project may encounter performance issues with multidimensional that were not anticipated at the start of the project.  One could also argue that, as new features are built into tabular, could be stuck with multidimensional and unable to leverage better tabular capabilities in the future.

Update Nov 13 2013 – post from Marco Russo: Updates about Multidimensional vs Tabular

Follow

Get every new post delivered to your Inbox.