Comparing and Syncing Data With Data Inspector

image

Earlier this month Pragmatic Works released Workbench. Workbench combines the powerful tools many of you are already using with an entire host of new, powerful features part of DBA xPress.

DBA xPress features tools designed to make designing, maintaining, and administering your SQL Server databases easier than before. Now you can easily do things like visually navigate your databases, visualize schema dependencies, compare, synchronize, script, and navigate schemas, and more.

image

One of my personal favorite tools included with DBA xPress is the Data Inspector. Data Inspector allows you to create a comparison between the data stored in two different databases. Not only can you compare the data between databases, but you can also copy the data from one database to another.

With Data Inspector, there are three ways to sync data between databases. Data can either be synced directly between databases with the Data Inspector Synchronization Wizard,

a script can be generated to synchronize the data for you, or a hybrid of the first two options will directly sync the data and generate the script for you.

I’m very excited about this tool and am looking forward to using it to speed up my data warehouse development projects. Head over to PragmaticWorks.com and check out DBA xPress.

Implementing Security With SSAS

Pragmatic Works just published a video on their YouTube channel put together by yours truly on implementing security in your SQL Server Analysis Services cube.

The video covers implementing basic dimensional security, cell security, as well as an extended look at implementing dynamic data driven security. The video is about an hour long so grab a bag of popcorn, sit back, and hopefully learn how to make your cube as secure as if it were guarded by a squad of Segway riding special forces commandos.

So check out my video Implementing Security With SSAS and feel free to post any questions or comments!

I’m Speaking at SQL Saturday #168: BI Edition!

imageSQL Saturday #168: BI Edition is coming up on November 17th in Tampa, Florida and yours truly will be presenting! This SQL Saturday event will be all Business Intelligence focused! All the sessions will either be ETL, analytics, or reporting related with a miscellaneous track focusing on professional development and DBA topics. Get registered ASAP for this awesome BI focused SQL Saturday!

I’ll be co-presenting with Mitchell Pearson (blog | twitter) on an introduction to SSIS 2012 called SSIS 2012 In Yo’ Face: Introduction to SSIS. We’re going to cover commonly used SSIS tasks and transforms, the SSIS expression language, and how to create powerfully dynamic packages.

So head over to the SQL Saturday #168 site and get registered. I look forward to seeing everyone in our session! It’s going to be a blast!

Generating Junk/Test Data For SSIS

Everyone once in a while I’ll run across a requirement for me to generate a bunch of junk data for the sole purpose of performance testing or benchmarking. There are plenty of ways that I could generate my test data, including clever TSQL scripts or a Data Flow Script Component as a source. And those methods work well, but are somewhat limited if I need to generate unicode string data for testing. But that’s where Task Factory comes in.

imageIncluded in the latest release of Task Factory is the Placeholder Source. The beauty of the Place Holder Source is that I can quickly and easily generate test data for my package.

Here you can see the editor for the Placeholder Source. On the Component Properties tab, I can specify how many rows of junk data I’d like to generate. The default is 10,000 rows, but I can set it to whatever I like. I’ve tested the source component with 100 million records before with no problem.

image

On the Input and Output Properties tab is where you specify which columns and data types you’d like to include in your junk data.

image

By default, the Placeholder Source creates 5 output columns of various data type, but you can add more fields of whatever data type you like. Just click Add Column and then select the data type you want. Here I’ve added an additional column and selected Integer as the data type.

All that’s left to do is wire up my package and click execute. Instantly tons of junk data is created for us to test or play with. If you’re interested in expanding your SSIS arsenal, head over to PragmaticWorks.com and download Task Factory. The best part is that Task Factory is free to download. A license is only required to run the Task Factory components remotely. So why not check it out?

SSIS Advanced Execute Package Task with Task Factory

In the last few months, I’ve been giving a preview on some of the new advanced SSIS tasks and transforms being released in the coming version of Task Factory. Well Task Factory 3.0 has finally been released last month, and its packed with a ton of really cool tasks and transforms that are going to make SSIS development easier and will increase the performance of your packages.

image

One of the newest SSIS tasks released in Task Factory 3.0 is the Advanced Execute Package task. The really cool part about this task is the MUCH improved interface. It’s now incredibly simple to set up variable mappings from a parent package to a child package.

Here you can see the editor for the Advanced Execute Package Task. There are three areas of the editor I want to take a look at.

image

The first area, highlighted in green, is where you specify your connection to the child package. If the package is in the file system, you simply browse to the location of the package. If the package is located on a SQL Server, you need to select the SSIS connection manager to use. In either case, if the package has a password, you’ll need to enter that.

The second area of the Advanced Execute Package Task, highlighted in yellow, is where you create mappings between variables in the parent package and variables in the child package. In the Assignment Direction drop down list, you have a few different options.

image

We can read a variable from the child package, write a variable to the child package, or do both simultaneously. In my case, I’m passing a variable to the child package so I will select “Write variable to child package”. Then I just need to select the Child Package Variable and the Parent Package Variable.

image

Once I’ve done that, I can just click “Add Mapping”. Now we can see the mapping created between the two package variables in the last area (highlighted in pink in figure 1). Using this method, we can create multiple variable mappings between the parent package and the child package.

image

As you can see the Advanced Execute Package Tasks from Task Factory is very intuitive and simple. If you’ve used the native SSIS Execute Package Task, you’ll recognize that the new Advanced Execute Package Tasks is much simpler and easier to configure.

Pragmatic Works is Hiring! Mid/Senior B.I. and DBA Peeps!

PragmaticWorks logo

If four years ago you had told me that I’d be working for a company like Pragmatic Works, I wouldn’t have believed you. Simply put, working at Pragmatic Works and doing what I do is the best job I’ve ever had, for many reasons. Here’s a few of them.

The culture at Pragmatic Works is awesome. You might think with a rock star cast of experienced B.I. and SQL professionals like we have here at PW that the culture would be cut throat, every man/diva for himself, but its quite the opposite.  The guys here are humble and focused on helping each other learn and grow. I’ve seen it happen time and time again where somebody will hit a wall with a project or come up against a road block, and several guys on our team will take time out of their schedule, possibly after hours, to help overcome the issue. This isn’t the exception either. This behavior is a regular occurrence. Although we are a consulting firm, we are very much a team! Check out our Company Culture points at PragmaticWorks.com.

One of the other very cool things about working at Pragmatic Works is that we are constantly encouraged to speak at community events. We even receive compensation for speaking at awesome events like PASS Summit, SQL Rally, or SQL Saturdays. It’s pretty cool to be able to be a part of the SQL community and help others learn. This is one of the passions of the PW team.

If you’re in the job market for a Mid or Senior Business Intelligence Consultant or Senior DBA position, get in touch with me ASAP! There is work to be done!

If you’re not following me on Twitter, do it now!

NON EMPTY vs Nonempty(): To the Death!

So what is the difference between NON EMPTY and Nonempty()? I’ve had this question asked several times and have been meaning to blog it for a little while but here’s me just getting around to it. So let’s jump right in.

We have two queries we’re going to take a look at it in order for us to better understand the difference between NON EMPTY and Nonempty(). Behold our first very boring query:

SELECT
([Date].[Calendar Year].&[2005]) ON 0,
NON EMPTY
(
[Date].[Calendar Quarter of Year].members) ON 1
FROM [Adventure Works]
WHERE [Measures].[Reseller Sales Amount];

Fig. 1a

Now here are the results:

image

Fig. 1b

As you can see, Q1 and Q2 are excluded from the results because the cells are empty. The NON EMPTY keyword essentially says, “After we decide which members go where and which cells are going to be returned, get rid of the empty cells.” If we take a look at the execution tree using MDX Studio, we can see the 2005 partition is the only partition being hit because NON EMPTY is being applied at the top level as the very last step in the query execution. The 0 axis is taken into account before evaluating which cells are empty.

image

Fig. 1c

Also, its important to note that the NON EMPTY keyword can only be used at the axis level. I used it on the 1 axis, but I could have used it on each axis in my query. I must also mention that the Nonempty function accepts a second argument and its very important that you specify this second argument even though it is not absolutely necessary for you to use the function.

Now lets take a look at our second query:

SELECT
([Date].[Calendar Year].&[2005]) ON 0,
NONEMPTY([Date].[Calendar Quarter of Year].members,[Measures].[Reseller Sales Amount]) ON 1
FROM [Adventure Works]
WHERE [Measures].[Reseller Sales Amount];

Fig. 2a

This time I’m using the Nonempty() function. Because the Nonempty() function is in fact a function, we can use it anywhere in the query: On rows, columns, sub-selects or in the Where clause. I just happen to be using it in the set defined on the row axis. Anyways, check out the results:

image

Fig. 2b

What’s this?! Empty cells! You may be asking yourself, “Self, what gives?”. I’ll give you a hint. Take a look at the query results if we execute the same query across all years rather than just for 2005. Here’s the query:

SELECT
([Date].[Calendar Year].children) ON 0,
NONEMPTY([Date].[Calendar Quarter of Year].members, [Measures].[Reseller Sales Amount]) ON 1
FROM [Adventure Works]
WHERE [Measures].[Reseller Sales Amount];

Fig. 2c

And the results:

image

Fig. 2d

Because there are cells for other years outside of 2005, Nonempty() does not eliminate Q1 and Q2, as seen in Fig. 2b. The Nonempty() is not evaluated as the last step like the NONEMPTY keyword. The Nonempty() function is evaluated when SSAS determine which members will be included in the axis. So before it knows that the query is only limited to 2005, Nonempty() has already determined which cells are going to be excluded and included. In this case, no rows are eliminated. Just take a look at the execution tree:

image

Fig. 2e

We can see all partitions are hit because of the Nonempty() function even though our results only display 2005.

With these facts in mind, its important to use the NONEMPTY keyword and the Nonempty() function because they could get you into trouble. In the case of the query shown above, the NONEMPTY keyword is probably the best bet because only the necessary partition is scanned and less cells are evaluated. But what about in the case of the following query?

Here’s the query:

SELECT
{[Measures].[Internet Sales Amount]} ON COLUMNS
,{
Filter
(
CrossJoin
(
[Customer].[City].MEMBERS
,[Date].[Calendar].[Date].MEMBERS
)
,
[Measures].[Internet Sales Amount] >; 10000
)
} ON ROWS
FROM [Adventure Works];

Fig. 3a

Here’s the count of cells returned:

image

Fig. 3b

Should we use the NONEMPTY keyword or the Nonempty() function? Let’s try NONEMPTY first.

SELECT
{[Measures].[Internet Sales Amount]} ON COLUMNS
,NON EMPTY
{
Filter
(
CrossJoin
(
[Customer].[City].MEMBERS
,[Date].[Calendar].[Date].MEMBERS
)
,
[Measures].[Internet Sales Amount] >; 10000
)
} ON ROWS
FROM [Adventure Works];

Fig. 3c

And the cell count:

image

Fig. 3d

You can see the exact same cell set was returned. In this case, NON EMPTY didn’t do anything for us. This is because our Filter clause is still evaluating empty cells because NON EMPTY has not been applied yet. But let’s try the Nonempty() function. Here’s the query:

SELECT
{[Measures].[Internet Sales Amount]} ON COLUMNS
,{
Filter
(
NonEmpty
(
CrossJoin
(
[Customer].[City].MEMBERS
,[Date].[Calendar].[Date].MEMBERS
),
[Measures].[Internet Sales Amount]
)
,
[Measures].[Internet Sales Amount] >; 10000
)
} ON ROWS
FROM [Adventure Works];

Fig. 3e

But take a look at the cell count:

image

Fig. 3f

Only 40 rows this time! In the case of the query in Fig. 3a, the Nonempty() function was the optimum solution. My point is that its important to understand the differences between the NONEMPTY keyword and the Nonempty() function and to use them properly. I hope you found this useful.

Conclusions

The bottom line difference between the NON EMPTY keyword and the NonEmpty() function is in when the empty space is evaluated.

NON EMPTY keyword: Empty space is evaluated as the very final step in determining which tuples to return.
NonEmpty() function: Empty space is evaluated when the set is determined in the axis, sub-select or Where clause.

The NonEmpty() function also allows you a little more granular control over how empty space is evaluated by using the second argument in the NonEmpty() function.

Depending on your requirements for the query, you may use both NON EMPTY and NonEmpty() or only one of the two.

Resources

Learn more about the NonEmpty() function.

Learn more about the NON EMPTY keyword and working with empty space.

Feedback?

If you found this post useful, please share it! And if you have any questions, please leave a comment or send me a message on Twitter.

Sneak Peek: Advanced Conditional Split Transform in Task Factory 3.0

IMG_8325

You may have seen my previous blog post a couple weeks back highlighting some of the neat things the dev team of Pragmatic Works is including in the latest build of Task Factory. Well I was doing some more sneaking around and I was able to snag these screen shots of the new and still experimental Advanced Conditional Split Transform!

I think one of the biggest and coolest improvements is the new functions made available for us in Advanced Conditional Split Transform. IMG_8328You’ll notice the editor for this transform is very similar to the Advanced Derived Column Transform we looked at last time. Now you have access to a ton more functions and expression that allow you to do some very cool things, like data cleansing with Regex, encrypting/decrypting fields, and access parts of file names with an expression. The SSIS expression language has been vastly improved and expanded in these transforms and is now more intuitive and easier to use than ever before.

IMG_8327One of the other things that is new with the Advanced Conditional Split transform included in Task Factory 3.0 is the ability to test and validate expressions within the editor for the Advanced Conditional Split transform. Not only can you validate your expressions in the editor before run time, you can actually test the performance of the expression you have written by specifying a number of iterations to test the expression with. This should give you a good idea of how the expression will perform during run time. This is a feature we haven’t had before. Now those of us who are SSIS developers will be able to test, fine tune, and retest our SSIS expressions!

This new generation of SSIS tasks and transforms are going to change the way I build my SSIS packages due to the new flexibility and power. Stay tuned for future posts on some of the incredible improvements and additions that are going to be available in the next version of Task Factory by Pragmatic Works.

Sneak Peek: Super Advanced Derived Column Transform In Next Release of Task Factory

Don’t ask how I was able to acquire these screenshots or what favors I may or may not have performed in order to snag these pics of the next release of Task Factory. I’m not proud of what I’ve become. But the fact of the matter is that Task Factory 3.0 is going to be insane!

Task Factory 3.0 has about 10 or so new tasks and transforms that radically expand the power of SSIS. One of the new Data Flow transforms included in Task Factory 3.0 is the Advanced Derived Column Transform. The Advanced Derived Column Transform includes a ridiculous amount of new expressions and functions. Sources tell me to expect around 180 or so total functions in the Advanced Derived Column Transform.

I snagged these screenshots which show some of the new functions. You can see what appears to be a host of Regular Expression functions available for use in the transform! Yes! It’s about time!

IMG_8082

Also, it appears there a bunch of pretty interesting functions having to do with parsing file paths and names, such as GetDirectoryName, GetFileName, and GetFullPath, among others.

IMG_8083

One of the coolest features of the Advanced Derived Column Transform that I’m really excited about is the Expression Validation pane. Not only can you validate your expression and preview the expression results, you can also test your expression for performance!

IMG_8084

This new version of Task Factory is going to be sick. Keep on eye on PragmaticWorks.com for news of the release, which I’m told should be sometime in the next month. I’ll keep you posted so keep checking back here for more information.

SSRS Performance Monitoring With BI xPress 3.5

Earlier this month the latest version of BI xPress was released. There are several new features in this latest release of BI xPress, but one of the coolest is the SSRS Performance Monitoring.

The BI xPress SSRS Performance Monitor has four performance monitoring panels.

  • Average Run Time

    The Average Run Time panel displays the average run times of your report. This is a great place to figure out which reports are taking the longest to execute. This is the place to start if you’re trying to figure out why a report is taking a long time to run. There are three different very helpful metrics displayed in the panel:

    • Avg. Data Retrieval Time: This number shows how long it took to retrieve the data from the source system. If the Avg. Data Retrieval time is the majority of the report run time, take a look at the source query and find ways to optimize it.
    • Avg. Processing Time: This is the time the SSRS server takes to calculate and process the retrieved source data. Any formulas or expressions in the report are calculated during this time.
    • Avg. Rendering Time: This is the amount of time it takes to paint the graphics, charts, text, etc.
  • Longest Running Reports on Average


    This panel provides you a chart that can be read quickly and easily to allow you to see if a report’s execution is taking too long, which could negatively effect the performance of the entire SSRS server.

  • Frequent Reports


    The Frequent Reports panel displays the reports that are most frequently run. If your reports are being executed more than expected, this could lead to scalability issues. Tracking the number of executions is a good way for you to begin to determine if its time to scale up or out.

  • Active Users

    The Active Users panel displays report executions by users.

This newest feature is one I’ve been waiting for a long time since I first saw it during the development phase. To read more about BI xPress and the numerous other features included, head over to PragmaticWorks.com and check it out.

Dustin Ryan is a Data Specialist interested in Azure, SQL Server, and Power BI.

%d bloggers like this: