BISM Normalizer: Version 3.0 Released for SSAS 2016!!!

This is a major new version of BISM Normalizer. It can be downloaded from a different Visual Studio Gallery page. It will run for free till December 31st 2016.

Visual Studio 2010/2012/2013 (SQL Server 2012/2014) are not supported by BISM Normalizer 3. For these, you can still use BISM Normalizer 2.  See the Purchase page for more info on version compatibility.

Enhancements in Version 3.0

Support for SQL Server Analysis Services 2016 and Visual Studio 2015

SSAS compatibility levels 1200, 1103 and 1100 are all supported.

JSON definitions with improved difference highlighting

Diff

Processing on database deployment

Processing

Like SSDT, processing options include “Default”, “Do Not Process” and “Full”.  There is also an option to process only the tables affected by the comparison (Create and Update), to avoid unnecessary processing.

ProcessingOptions

When deploying to a database (not project), BISM Normalizer will check connections set up for impersonation, and prompt for username and password (like deployment from SSDT).

Impersonation Credentials

Support for DirectQuery models with compatibility level 1200

Both source and target models must have the DirectQuery Mode property set to On for successful comparison.

Switch source and target in Connections dialog

Switch Connections

Improved scripting for JSON and XMLA

BISM Normalizer will use the appropriate editor if available. If not found, the option to save to a file is provided instead. Note: the JSON editor is not installed as part of a SSDT-only installation with the Visual Studio Integrated Shell.

Improved Scripting

Improved Scripting Xmla

Code behind BSMN file

View or modify contents of BSMN files.

View Code

Associate BSMN file types with Visual Studio

Open BSMN files from Windows Explorer with Visual Studio as the default application. Also display the BISM Normalizer icon in Solution Explorer. To enable this, right click a BSMN file in Solution Explorer and select the Install Solution Explorer Icon context menu. Note: administrator permissions are required to even see the menu option. This can’t be set up automatically with standard VSIX deployment from the Visual Studio Gallery because of the admin requirement.

Install Solution Explorer Icon

BISM Normalizer: Version 2.0 Released!

Enhancements in Version 2.0

  • BISM Normalizer has been migrated to a VS package; it is no longer a VS add-in.
    • Add-ins are being deprecated in VS.  In the CTP of VS 2015, the Add-in Manager no longer exists.
    • VS packages provide richer functionality than add-ins.  Microsoft project templates (e.g. C# projects) are built using VS packages.
  • BISM Normalizer is now a custom editor.  This provides the following benefits.
    • Multiple comparisons are supported at the same time.
    • Comparisons can be saved to a BSMN file.
      • Skip selections, connections and options are retained when reopened.  This means it is not necessary to make the same manual selections when deploying new versions of the same tabular model.
      • BSMN files can be added to tabular projects using the VS New Item dialog.
      • In a future release (not 2.0), the BSMN file will parameterize command-line deployments using BISM Normalizer.
  • VSIX deployment direct from the VS Gallery.
    • Notifications of new releases in the VS Notification tool window.
    • Upgrade to new versions from within VS in the Tools > Extensions and Updates window.
  • Support for Visual Studio 2010, 2012 & 2103.
  • Numerous usability and look-and-feel enhancements.

Notes

  • Remember to uninstall the previous addin version of BISM Normalizer from Control Panel > Programs and Features before installing the new version.  If you see the big old BISM Normalizer toolbar, you are using the old version.

SSAS Locking: CommitTimeout and ForceCommitTimeout

There are already plenty of good posts out there on this topic:

As mentioned by Andrew Calvett, it is possible to set the CommitTimeout for a Process command.  Here is how to do it.

<Execute xmlns="urn:schemas-microsoft-com:xml-analysis">
  <Command>

    <Process xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
      <Type>ProcessFull</Type>
      <Object>
        <DatabaseID>AdventureWorks</DatabaseID>
        <DimensionID>Date_d2c7ec3d-c72c-435d-bd43-8283714cc2dd</DimensionID>
      </Object>
    </Process>

  </Command>
  <Properties>
    <PropertyList>
       <CommitTimeout>20000</CommitTimeout>
   </PropertyList>
  </Properties>
</Execute>

 

Setting CommitTimeout to 20 seconds (20000 milliseconds) means it will kick in before the server-level default ForceCommitTimeout of 30 seconds. To try this out, run the following query, which takes about 50 seconds on my laptop.  As soon as the query starts running, execute the process command.  It should rollback the process command and allow the query to run to completion.

DEFINE
MEASURE 'Date'[Bad Measure] =
   COUNTROWS(GENERATE(ALL('Internet Sales'), VALUES('Date'[Day Of Year])))
EVALUATE
ADDCOLUMNS(
   VALUES('Date'[Calendar Year])
   ,"Bad Measure", [Bad Measure]
)

 

Change CommitTimeout to 40 seconds (40000 milliseconds) and the default ForceCommitTimeout of 30 seconds will kick in instead.  The query will fail – instead of the Process command – “because of locking conflicts”.

Experiences & Tabular Tips

A project I worked on for some time went live months ago. My role as “tech lead” had a foot in many camps: solution architecture, business analysis, scrum master, data modeling, BI evangelism and BI development. We built a data warehouse of over half a terabyte with a handful of tables edging half a billion rows. It is a major component of the data-services layer for the organization. Thousands of applications/reports (primarily Excel/Access apps on network shares) that were tightly-coupled with systems of record were either replaced by a corporate BI & reporting solution, or rebuilt to consume data from reusable data services. We retired other competing “data layers” (legacy DW and a data-service + BI layer with a mix of technologies including multidimensional cubes). We also enhanced limited MDM capabilities resulting in greater business ownership of master data, rather than using the MDM repository as an unnecessary duplication of data. A data-governance model has been established to ensure growth and longevity of the solution.

As is the case with many enterprise organizations that have built a culture of encouraging (or even training) business users to query corporate systems of record directly, such a project required a fundamental change in business processes, and faced some resistance. It was not enough to tell stakeholders that a reusable data layer would provide a “single version of the truth”, promote consistent decisions across the organization, lower the total cost of ownership due to addressing inefficiencies where the same business calculations are being applied in hundreds of places, inconsistent terminology, etc. What forced the issue was that the biggest system of record in the organization (containing roughly 50% of the data anyone is interested in) was being replaced. Given that thousands of apps were tightly coupled with this SoR and therefore exposed, they would all break and the business was at risk of grinding to a halt. This was enough to convince key stakeholders of the benefits of a reusable data layer coupled with governance model, in part to protect them from this situation in the future. So we went live alongside this major SoR migration, replacing or integrating countless apps, cleaning up the architecture of the organization, introducing hundreds of users to business intelligence as a concept, and transforming the way data is accessed for decision making. I personally trained over 200 users on how to use the new BI layer to access corporate data with minimal technical skills. My motto throughout has been “clicky clicky draggy droppy” data access. I think it’s fair to say the project was a success.

Focusing on SSAS …

We started with multidimensional and had performance issues. Despite optimization, multidimensional struggled. We moved to tabular. I did not find it difficult to get 50 to 100 GB of memory on the servers, which was more than enough for tabular when limited to the data set required for BI. However, as pointed out by others in the blogosphere, the better performance of tabular did not come for free …

Personally, I did not find limited functionality in tabular to be a major issue. As I mentioned in my Multidimensional or Tabular post, most functionality can be delivered even if using a “workaround” (often DAX instead of decorating the metadata of the model in multidimensional). True show stoppers like unary operators and write back were not required.

The bigger issues were with the tabular development experience. The tabular-model designer inherited from Power Pivot is sluggish for models with lots of tables. The Excel-like DAX formula bar is, to put it politely, annoying. Without saying anything too controversial, MS corporate BI has played second fiddle lately.

Here are a couple of tips/techniques. This is not a comprehensive list.

Author measures in a query-authoring tool and copy/paste them into the tabular model

Authoring measures in the tabular model designer is not effective because the workspace database contains hardly any data. Unfortunately, my laptop doesn’t have 50 GB of memory. Setting the workspace database to point at the Dev instance is not an option because it would exacerbate the sluggish development experience, not to mention screwing up partitions on the Dev instance, etc. Aside from providing a modeling scratchpad, the workspace database is basically redundant in an enterprise development scenario.

If you prefer DAX queries, author measures in DAX Studio (querying the Dev instance of course) and copy/paste the measure definitions into the tabular model designer when you’re done.

If you prefer MDX queries, author measures in Management Studio and copy/paste the measure definitions into the tabular model designer when you’re done. You can author DAX measures in the WITH clause of an MDX query. I personally do this a lot because I find it less typing for an MDX SELECT statement than a DAX EVALUATE statement. Also I can refer to user hierarchies in the query (if not the measures).

For large models, use a subset in the tabular model designer

The tabular model designer inherited from Power Pivot may work well for a typical Power Pivot model. This is not always the case in SSAS tabular. The largest of the models I work with has about 80 tables, many with lots of columns. There are facts at various levels of granularity reducing the denormalization opportunity. Including internal measures not visible to users, there are over 800 measures (thankfully organized using display folders)! Due to business requirements to report measures side by side, I cannot split it into multiple tabular models.

Even in Manual Calculations mode (Model > Calculation Options > Manual Calculation), this model freezes the IDE for 10 to 15 seconds every time I do anything trivial. For significant modeling enhancements, this is unworkable. I simply don’t have the patience to sit there while the IDE is “Finishing …” all the time.

For significant enhancements, use BISM Normalizer to export a subset of the model to a temporary tabular project. Do the enhancements in there where there is a responsive development experience, and then BISM Normalizer the enhancements back to the main model when you’re done. This saves time and minimizes hair loss.

Place calculation logic at the appropriate level of the architecture

The tabular storage engine is clearly much faster than that of multidimensional. The same cannot be said for the formula engine.

This may sound obvious, but place calculation logic at the appropriate level of the architecture. Some calculations (e.g. traditional aggregated level calcs) are much better suited to a cube/tabular model than the ETL. In other cases there may be a considerable performance gain by performing a calculation in the ETL rather than the tabular model, thereby avoiding the formula engine.

This also promotes reusability of business logic because non-BI applications that submit SQL instead of MDX/DAX can leverage the same calculations.

Video of BISM Normalizer

Video of BISM Normalizer 1.

BISM Normalizer: Version 1.3.11 Released!

Download it from the BISM Normalizer Visual Studio Gallery page.

Enhancements in Version 1.3.11.1

Support for tabular objects created by BIDS Helper.

  • Actions are first-class objects listed in the differences grid.
  • Display folders are attributes of other objects (tables with columns/hierarchies, measures) and visible in those objects’ definitions.
  • Translations are attributes of other objects (tables with columns/hierarchies, measures, perspectives, actions) and visible in those objects’ definitions.

It is necessary to enable the features in the BISM Normalizer options dialog (accessible from the Connections dialog).

BISM Normalizer Options

Multidimensional or Tabular

This post is not a list of multidimensional features unsupported by tabular. This has been documented already by various reliable sources including the following posts, as well as PASS presentations.

First thing I would like to say is I agree that there is lots of work for tabular to catch up to the feature-rich multidimensional. The tabular-model designer inherited from Power Pivot is sluggish for models with lots of tables. The Excel-like DAX formula bar is, to put it politely, annoying. Without saying anything too controversial, MS corporate BI has been playing second fiddle lately.

However, tabular does make sense for many customers today. For most customers, having fast performance is more important than the unsupported features – which invariably either have “workarounds” or are fringe use cases.

On the workarounds, if the same functionality can be delivered to the business, they don’t care if we technical people see it as a “workaround” because it’s not delivered the same way we are used to. And the business people are the ones that matter. This applies to many-to-many relationships, parent-child hierarchies, role-playing dimensions (can create multiple instances of same table), and various other items.

For what I’m calling the fringe use cases, the supportability of some of these does not make sense for many customers. Hand a solution to support that uses MDX stored procs, extensive scoped-cell assignments and they will struggle. How many implementations use these features because the developer thought they were cool rather than having any real business need? I think quite a few.

Other use cases may be showstoppers like unary operators and writeback, but not for the majority of implementations.

Scoped-cell assignments are in the potential showstopper list too, but in most cases if calculation logic is pushed to the ETL layer (where it belongs if not one of the strengths of the cube/tabular model like aggregated level calcs, or those which would cause a data explosion problem at the relational level, etc) to avoid the formula engine where possible, then DAX is a pretty capable and powerful language for calculations built into the tabular model.

On the tabular memory limitation, many customers I’ve talked to are worried they won’t fit into memory when they are actually nowhere near the upper limit of what they can relatively easily get on a server (especially when limited to the required data). Also, more memory will only become more viable in the future. For a multi-terabyte data warehouse implementation like a Yahoo.com, then yes but again this is a fringe use case.

As noted elsewhere,

  • When a project chooses multidimensional or tabular, it is not possible to change your mind without starting development again.
  • I think it is widely accepted that Microsoft is more likely to build new features and put future development investment into tabular than multidimensional.

It is a valid statement that a project may need some of the features unsupported by tabular at a later date, which could be a problem. Conversely a project may encounter performance issues with multidimensional that were not anticipated at the start of the project.  One could also argue that, as new features are built into tabular, could be stuck with multidimensional and unable to leverage better tabular capabilities in the future.

Update Nov 13 2013 – post from Marco Russo: Updates about Multidimensional vs Tabular

Deploy Tabular Models and Retain Partitions

Kasper de Jonge asked a question on Twitter that I would like to answer here to a) give it more exposure, and b) I pretty quickly ran out of 140 characters in my Twitter reply.

I initially tweeted: Problem when deploy tabular models to Dev server from SSDT & lose dynamic partitions? Deploy with BISM Normalizer & retain partitions

Kasper tweeted: Hey Christian, what do you mean by dynamic partitions?

And here is my response …

Hi Kasper,

We often partition the large tables in a tabular model to speed up processing times (as I’m sure you know).  For example, we might partition by month and then process only the most recent couple of months every night – which is a lot quicker than processing the whole table.  Normally this is done in AMO code that is called from a SSIS package – so it can be managed by the ETL to increment/drop partitions, etc.  The version of the model in source control does not contain the partition objects.

While developing in SSDT, we frequently deploy to a dev server (obviously not test/production).  This is necessary because all the dev reports are pointing at the server – not the workspace database on the developer machine.  If the dev ETL has built partitions, they are lost upon deployment from SSDT (using right-click deploy) – and they take a long time to rebuild/reprocess.

When creating a BISM Normalizer difference comparison, we have the option to “include partitions in table definitions”.  If we leave this unchecked, partitions will not be considered when comparing tables.  BISM Normalizer will treat such tables as equal and will not mess with them – thereby avoiding reprocessing.

Include Partitions Option

The SSAS Deployment Wizard is the only other deployment method for tabular models that supports “retain partitions” functionality.  However, you would not use it to deploy from SSDT to the dev server when making a change to a DAX calc or something.  BIDS Helper‘s Deploy MDX Script feature serves a similar purpose for multidimensional.

For regular Test/Production deployments, we can use the deployment wizard. Alternatively, we can use BISM Normalizer to (more easily) create an xmla script for the release to apply on Test/Production.

Another use case is bug fixes and partial deployments.  BISM Normalizer can create a script to apply on Test/Production – which updates only the calculation(s) that need to be fixed without a full release cycle – and does not mess with everything else that is already there.  The deployment wizard only supports “all-or-nothing” deployments, which are not appropriate for bug fixes and partial deployments.

Cheers!

Christian

Why not BISM Normalizer work directly with PowerPivot workbooks?

Gerhard Brueckl asked a question on the BISM Normalizer page Q&A section that I would like to answer here to give it more exposure.

Gerhard wrote …

Hi Christian,

first of all – thanks for that awesome tool!
Is it somehow possible to compare a Server/Project to a PowerPivot Workbook and vice-versa?
Office 2013 introduced a new object model for PowerPivot where you can access the PowerPivot – is it possible to extend BISM Normalizer to make use of this?

greets,
gerhard

My response …

Hi Gerhard,

Thanks for the feedback! Others have asked the same question including Grant Paisley.

I would love to have BISM Normalizer work directly with PowerPivot. Not only allow selection of an Excel file to act as a source/target model, but also have an Excel Addin version of BISM Normalizer to support non-Visual Studio users.

I agree there is a strong use case for this. PowerPivot enables users to easily pull data from uncommon sources – and often mash it up with corporate data. So inheriting (some or all of) another tabular model’s metadata would be useful to say the least.

PowerPivot now has the API you mentioned, but it would be much easier if Microsoft provided a single API for both Analysis Services and PowerPivot. The easiest way for me would be if Microsoft allowed access to the PowerPivot AMO interface. Instead, Microsoft has chosen to keep this interface private.

I built an object model on top of AMO for BISM Normalizer. It contains classes for all the objects that BISM Normalizer displays on the differences grid and abstracts calls to AMO. But even so, under the abstraction layer, BISM Normalizer basically just copies AMO objects from source to target. It does not need to rebuild them, which would be required to get them from AMO to DataModel OM in Excel 2013 and vice versa.

I raised this Connect item for this purpose. Feel free to vote for it.

Thanks,
Christian

IF ISCROSSFILTERED performs better than IF HASONEVALUE

See this post for the functional difference between ISCROSSFILTERED and HASONEVALUE.

This DAX query on AdventureWorks runs in less than a second (SP1).

DEFINE
MEASURE 'Date'[Bad Measure] =
   COUNTROWS(GENERATE(ALL('Internet Sales'), VALUES('Date'[Week Of Year])))
MEASURE 'Date'[Conditional Evaluation] =
   IF(ISCROSSFILTERED('Product'[Product Name]), [Bad Measure], 0)
//  IF(HASONEVALUE('Product'[Product Name]), [Bad Measure], 0)
EVALUATE
ADDCOLUMNS(
   VALUES('Date'[Calendar Year])
   ,"Conditional Evaluation", [Conditional Evaluation] //('Product'[Product Name] = "Mountain Pump")
)
ORDER BY 'Date'[Calendar Year]

Uncomment the product filter [‘Product'[Product Name] = “Mountain Pump”] to include it. Now it takes about 8 seconds. Looking at the query, this makes sense because it has to call the “Bad Measure”. This suggests that IF ISCROSSFILTERED is optimized to perform conditional logic only when necessary. This behavior is similar to short-circuit evaluation, but not quite the same thing.

Now comment out the line containing ISCROSSFILTERED and uncomment the line containing HASONEVALUE. It takes about 8 seconds whether the product filter is applied or not. This suggests that using IF HASONEVALUE is not optimized; it does perform conditional logic unnecessarily.

Having watched Marco Russo’s Inside DAX Query Plans session from the PASS 2012 Summit, I ran Profiler traces to confirm this theory. For the above query at least, this does seem to be the case.

  • Even when the product filter is not applied, IF HASONEVALUE executes a Vertipaq SE Query on ‘Internet Sales'[RowNumber]. RowNumber is an internal, unique column on every table.
  • The physical DAX query plan includes the following operation, which I believe means it is iterating 19.2 million rows for the formula engine to deal with.
Spool_Iterator<Spool>: IterPhyOp IterCols(0, 108, 136)('Date'[Calendar Year], 'Internet Sales'[RowNumber], 'Date'[Week Of Year]) #Records=19206564 ...

Note: 19,206,564 = 60,398 (number of rows in Internet Sales) * 53 (number of distinct Week of Year values) * 6 (number of rows in the resultset)

Needless to say, none of this happens with IF ISCROSSFILTERED when the product filter is not applied. There are less Vertipaq SE Queries, and the largest Spool_Iterator operation in the physical plan is for 6 rows (the number of rows in the resultset).