All posts by Dustin Ryan

Dustin is a Data Platform Solutions Architect on the Education team at Microsoft. You can find Dustin blogging and speaking at events like SQL Saturday, Code Camp, and SQL Rally. Follow Dustin on Twitter @SQLDusty.

SSIS Advanced Execute Package Task with Task Factory

In the last few months, I’ve been giving a preview on some of the new advanced SSIS tasks and transforms being released in the coming version of Task Factory. Well Task Factory 3.0 has finally been released last month, and its packed with a ton of really cool tasks and transforms that are going to make SSIS development easier and will increase the performance of your packages.

image

One of the newest SSIS tasks released in Task Factory 3.0 is the Advanced Execute Package task. The really cool part about this task is the MUCH improved interface. It’s now incredibly simple to set up variable mappings from a parent package to a child package.

Here you can see the editor for the Advanced Execute Package Task. There are three areas of the editor I want to take a look at.

image

The first area, highlighted in green, is where you specify your connection to the child package. If the package is in the file system, you simply browse to the location of the package. If the package is located on a SQL Server, you need to select the SSIS connection manager to use. In either case, if the package has a password, you’ll need to enter that.

The second area of the Advanced Execute Package Task, highlighted in yellow, is where you create mappings between variables in the parent package and variables in the child package. In the Assignment Direction drop down list, you have a few different options.

image

We can read a variable from the child package, write a variable to the child package, or do both simultaneously. In my case, I’m passing a variable to the child package so I will select “Write variable to child package”. Then I just need to select the Child Package Variable and the Parent Package Variable.

image

Once I’ve done that, I can just click “Add Mapping”. Now we can see the mapping created between the two package variables in the last area (highlighted in pink in figure 1). Using this method, we can create multiple variable mappings between the parent package and the child package.

image

As you can see the Advanced Execute Package Tasks from Task Factory is very intuitive and simple. If you’ve used the native SSIS Execute Package Task, you’ll recognize that the new Advanced Execute Package Tasks is much simpler and easier to configure.

Pragmatic Works is Hiring! Mid/Senior B.I. and DBA Peeps!

PragmaticWorks logo

If four years ago you had told me that I’d be working for a company like Pragmatic Works, I wouldn’t have believed you. Simply put, working at Pragmatic Works and doing what I do is the best job I’ve ever had, for many reasons. Here’s a few of them.

The culture at Pragmatic Works is awesome. You might think with a rock star cast of experienced B.I. and SQL professionals like we have here at PW that the culture would be cut throat, every man/diva for himself, but its quite the opposite.  The guys here are humble and focused on helping each other learn and grow. I’ve seen it happen time and time again where somebody will hit a wall with a project or come up against a road block, and several guys on our team will take time out of their schedule, possibly after hours, to help overcome the issue. This isn’t the exception either. This behavior is a regular occurrence. Although we are a consulting firm, we are very much a team! Check out our Company Culture points at PragmaticWorks.com.

One of the other very cool things about working at Pragmatic Works is that we are constantly encouraged to speak at community events. We even receive compensation for speaking at awesome events like PASS Summit, SQL Rally, or SQL Saturdays. It’s pretty cool to be able to be a part of the SQL community and help others learn. This is one of the passions of the PW team.

If you’re in the job market for a Mid or Senior Business Intelligence Consultant or Senior DBA position, get in touch with me ASAP! There is work to be done!

If you’re not following me on Twitter, do it now!

NON EMPTY vs Nonempty(): To the Death!

So what is the difference between NON EMPTY and Nonempty()? I’ve had this question asked several times and have been meaning to blog it for a little while but here’s me just getting around to it. So let’s jump right in.

We have two queries we’re going to take a look at it in order for us to better understand the difference between NON EMPTY and Nonempty(). Behold our first very boring query:

SELECT
([Date].[Calendar Year].&[2005]) ON 0,
NON EMPTY
(
[Date].[Calendar Quarter of Year].members) ON 1
FROM [Adventure Works]
WHERE [Measures].[Reseller Sales Amount];

Fig. 1a

Now here are the results:

image

Fig. 1b

As you can see, Q1 and Q2 are excluded from the results because the cells are empty. The NON EMPTY keyword essentially says, “After we decide which members go where and which cells are going to be returned, get rid of the empty cells.” If we take a look at the execution tree using MDX Studio, we can see the 2005 partition is the only partition being hit because NON EMPTY is being applied at the top level as the very last step in the query execution. The 0 axis is taken into account before evaluating which cells are empty.

image

Fig. 1c

Also, its important to note that the NON EMPTY keyword can only be used at the axis level. I used it on the 1 axis, but I could have used it on each axis in my query. I must also mention that the Nonempty function accepts a second argument and its very important that you specify this second argument even though it is not absolutely necessary for you to use the function.

Now lets take a look at our second query:

SELECT
([Date].[Calendar Year].&[2005]) ON 0,
NONEMPTY([Date].[Calendar Quarter of Year].members,[Measures].[Reseller Sales Amount]) ON 1
FROM [Adventure Works]
WHERE [Measures].[Reseller Sales Amount];

Fig. 2a

This time I’m using the Nonempty() function. Because the Nonempty() function is in fact a function, we can use it anywhere in the query: On rows, columns, sub-selects or in the Where clause. I just happen to be using it in the set defined on the row axis. Anyways, check out the results:

image

Fig. 2b

What’s this?! Empty cells! You may be asking yourself, “Self, what gives?”. I’ll give you a hint. Take a look at the query results if we execute the same query across all years rather than just for 2005. Here’s the query:

SELECT
([Date].[Calendar Year].children) ON 0,
NONEMPTY([Date].[Calendar Quarter of Year].members, [Measures].[Reseller Sales Amount]) ON 1
FROM [Adventure Works]
WHERE [Measures].[Reseller Sales Amount];

Fig. 2c

And the results:

image

Fig. 2d

Because there are cells for other years outside of 2005, Nonempty() does not eliminate Q1 and Q2, as seen in Fig. 2b. The Nonempty() is not evaluated as the last step like the NONEMPTY keyword. The Nonempty() function is evaluated when SSAS determine which members will be included in the axis. So before it knows that the query is only limited to 2005, Nonempty() has already determined which cells are going to be excluded and included. In this case, no rows are eliminated. Just take a look at the execution tree:

image

Fig. 2e

We can see all partitions are hit because of the Nonempty() function even though our results only display 2005.

With these facts in mind, its important to use the NONEMPTY keyword and the Nonempty() function because they could get you into trouble. In the case of the query shown above, the NONEMPTY keyword is probably the best bet because only the necessary partition is scanned and less cells are evaluated. But what about in the case of the following query?

Here’s the query:

SELECT
{[Measures].[Internet Sales Amount]} ON COLUMNS
,{
Filter
(
CrossJoin
(
[Customer].[City].MEMBERS
,[Date].[Calendar].[Date].MEMBERS
)
,
[Measures].[Internet Sales Amount] >; 10000
)
} ON ROWS
FROM [Adventure Works];

Fig. 3a

Here’s the count of cells returned:

image

Fig. 3b

Should we use the NONEMPTY keyword or the Nonempty() function? Let’s try NONEMPTY first.

SELECT
{[Measures].[Internet Sales Amount]} ON COLUMNS
,NON EMPTY
{
Filter
(
CrossJoin
(
[Customer].[City].MEMBERS
,[Date].[Calendar].[Date].MEMBERS
)
,
[Measures].[Internet Sales Amount] >; 10000
)
} ON ROWS
FROM [Adventure Works];

Fig. 3c

And the cell count:

image

Fig. 3d

You can see the exact same cell set was returned. In this case, NON EMPTY didn’t do anything for us. This is because our Filter clause is still evaluating empty cells because NON EMPTY has not been applied yet. But let’s try the Nonempty() function. Here’s the query:

SELECT
{[Measures].[Internet Sales Amount]} ON COLUMNS
,{
Filter
(
NonEmpty
(
CrossJoin
(
[Customer].[City].MEMBERS
,[Date].[Calendar].[Date].MEMBERS
),
[Measures].[Internet Sales Amount]
)
,
[Measures].[Internet Sales Amount] >; 10000
)
} ON ROWS
FROM [Adventure Works];

Fig. 3e

But take a look at the cell count:

image

Fig. 3f

Only 40 rows this time! In the case of the query in Fig. 3a, the Nonempty() function was the optimum solution. My point is that its important to understand the differences between the NONEMPTY keyword and the Nonempty() function and to use them properly. I hope you found this useful.

Conclusions

The bottom line difference between the NON EMPTY keyword and the NonEmpty() function is in when the empty space is evaluated.

NON EMPTY keyword: Empty space is evaluated as the very final step in determining which tuples to return.
NonEmpty() function: Empty space is evaluated when the set is determined in the axis, sub-select or Where clause.

The NonEmpty() function also allows you a little more granular control over how empty space is evaluated by using the second argument in the NonEmpty() function.

Depending on your requirements for the query, you may use both NON EMPTY and NonEmpty() or only one of the two.

Resources

Learn more about the NonEmpty() function.

Learn more about the NON EMPTY keyword and working with empty space.

Feedback?

If you found this post useful, please share it! And if you have any questions, please leave a comment or send me a message on Twitter.

Sneak Peek: Advanced Conditional Split Transform in Task Factory 3.0

IMG_8325

You may have seen my previous blog post a couple weeks back highlighting some of the neat things the dev team of Pragmatic Works is including in the latest build of Task Factory. Well I was doing some more sneaking around and I was able to snag these screen shots of the new and still experimental Advanced Conditional Split Transform!

I think one of the biggest and coolest improvements is the new functions made available for us in Advanced Conditional Split Transform. IMG_8328You’ll notice the editor for this transform is very similar to the Advanced Derived Column Transform we looked at last time. Now you have access to a ton more functions and expression that allow you to do some very cool things, like data cleansing with Regex, encrypting/decrypting fields, and access parts of file names with an expression. The SSIS expression language has been vastly improved and expanded in these transforms and is now more intuitive and easier to use than ever before.

IMG_8327One of the other things that is new with the Advanced Conditional Split transform included in Task Factory 3.0 is the ability to test and validate expressions within the editor for the Advanced Conditional Split transform. Not only can you validate your expressions in the editor before run time, you can actually test the performance of the expression you have written by specifying a number of iterations to test the expression with. This should give you a good idea of how the expression will perform during run time. This is a feature we haven’t had before. Now those of us who are SSIS developers will be able to test, fine tune, and retest our SSIS expressions!

This new generation of SSIS tasks and transforms are going to change the way I build my SSIS packages due to the new flexibility and power. Stay tuned for future posts on some of the incredible improvements and additions that are going to be available in the next version of Task Factory by Pragmatic Works.

Sneak Peek: Super Advanced Derived Column Transform In Next Release of Task Factory

Don’t ask how I was able to acquire these screenshots or what favors I may or may not have performed in order to snag these pics of the next release of Task Factory. I’m not proud of what I’ve become. But the fact of the matter is that Task Factory 3.0 is going to be insane!

Task Factory 3.0 has about 10 or so new tasks and transforms that radically expand the power of SSIS. One of the new Data Flow transforms included in Task Factory 3.0 is the Advanced Derived Column Transform. The Advanced Derived Column Transform includes a ridiculous amount of new expressions and functions. Sources tell me to expect around 180 or so total functions in the Advanced Derived Column Transform.

I snagged these screenshots which show some of the new functions. You can see what appears to be a host of Regular Expression functions available for use in the transform! Yes! It’s about time!

IMG_8082

Also, it appears there a bunch of pretty interesting functions having to do with parsing file paths and names, such as GetDirectoryName, GetFileName, and GetFullPath, among others.

IMG_8083

One of the coolest features of the Advanced Derived Column Transform that I’m really excited about is the Expression Validation pane. Not only can you validate your expression and preview the expression results, you can also test your expression for performance!

IMG_8084

This new version of Task Factory is going to be sick. Keep on eye on PragmaticWorks.com for news of the release, which I’m told should be sometime in the next month. I’ll keep you posted so keep checking back here for more information.

SSRS Performance Monitoring With BI xPress 3.5

Earlier this month the latest version of BI xPress was released. There are several new features in this latest release of BI xPress, but one of the coolest is the SSRS Performance Monitoring.

The BI xPress SSRS Performance Monitor has four performance monitoring panels.

  • Average Run Time

    The Average Run Time panel displays the average run times of your report. This is a great place to figure out which reports are taking the longest to execute. This is the place to start if you’re trying to figure out why a report is taking a long time to run. There are three different very helpful metrics displayed in the panel:

    • Avg. Data Retrieval Time: This number shows how long it took to retrieve the data from the source system. If the Avg. Data Retrieval time is the majority of the report run time, take a look at the source query and find ways to optimize it.
    • Avg. Processing Time: This is the time the SSRS server takes to calculate and process the retrieved source data. Any formulas or expressions in the report are calculated during this time.
    • Avg. Rendering Time: This is the amount of time it takes to paint the graphics, charts, text, etc.
  • Longest Running Reports on Average


    This panel provides you a chart that can be read quickly and easily to allow you to see if a report’s execution is taking too long, which could negatively effect the performance of the entire SSRS server.

  • Frequent Reports


    The Frequent Reports panel displays the reports that are most frequently run. If your reports are being executed more than expected, this could lead to scalability issues. Tracking the number of executions is a good way for you to begin to determine if its time to scale up or out.

  • Active Users

    The Active Users panel displays report executions by users.

This newest feature is one I’ve been waiting for a long time since I first saw it during the development phase. To read more about BI xPress and the numerous other features included, head over to PragmaticWorks.com and check it out.

SQL Rally 2012 Recap In Yo’ Face!

This past Thursday and Friday I attended my first PASS SQL Rally event in Dallas, Texas. The week was full of some pretty amazing sessions presented by some even more amazing speakers. If you were a DBA looking for performance tuning tips or other ways to perfect your profession, there were tons of great sessions for you. If you were a developer looking some great development tips and tricks to make your life easier, there were some really informative sessions given. And if you were a business intelligence developer, like myself, hoping to take your skills to the next level, there was definitely some amazing material to take advantage of.

I focused heavily on the business intelligence space and spent most of my time attending those sessions. I learned some great tips, increased my knowledge in several areas, and also networked a little with some really bright and great all-around people.

So What Sessions Did I Attend And What Did I Learn?

Performance Tuning SSAS Processing with John Welch

The first sessions I attended was Performance Tuning SSAS Processing given by John Welch (blog | twitter). There’s no doubt about, John knows his stuff. I learned a ton from his session. Here’s a few of the tips I picked up from John’s great sessions:

    • Instead of bringing entire tables into your DSV, use views. Views will allow you to tweak the SQL and use query hints to improve the performance of the Select statement.
    • When Analysis Services processes data, it does so one buffer at a time. If AS detects duplicate rows in a buffer, it will aggregate the data. Because AS aggregates duplicate records, you can save on space and increase performance if you order the data in your Select statement in your partitions.
    • Tune SSAS processing in 3 steps:
      • Dimension Processing
      • Partition ProcessData
      • Partition ProcessIndexes
    • Set the AttributeHierarchyOptimize property to false for attributes that are not used in queries often or if the attribute has a high cardinality and a near 1:1 relationship with the key attribute.

The Report Part Library with Jessica Moss

The next session I attended was The Report Part Library with Jessica Moss (blog | twitter). Jessica is one lady who is an amazing speaker and an SSRS guru without a doubt. Jessica’s wonderful sense of humor and bubbly personality made the session enjoyable and exciting. We covered how to create report parts, share those report parts across multiple reports, and then update those same report parts. Jessica also covered how to organize and manage a large Report Part Library. It was really great to finally meet Jessica in person.

Stop! Consolidate and Listen! with Jorge Segarra

After lunch, I attend Jorge Segarra’s (blog | twitter) session called, “Stop! Consolidate and Listen!”. Jorge covered some of the different methods of consolidation and dug into virtualization. Between Jorge’s sad rap skills and Patrick Leblanc’s even weaker beat boxing, I managed to pick up a few pointers.

Some of the Pro’s to virtualization:

    • Great isolation!
    • Flexibility
      • Snapshotting virtual machines (so if something goes horribly wrong, you can just roll back to the last snapshot)
      • Creating and dropping virtual machines is easy as pie
      • VM’s can be migrated across servers without having to be shut down!

Some of the Gotcha’s:

    • Behind the scenes resources could actually be shared, so make sure you understand the systems that your VM’s are running on.

Also, check out the free MAP toolkit from Microsoft. Jorge gave a great intro into the tool and how to use it to determine which machines are prime candidates for consolidation and virtualization.

Data Mining with Devin Knight & Adam Jorgensen

The last session I attended was Data Mining with Devin Knight (blog | twitter) and Adam Jorgensen (blog | twitter). Being far less than a master of data mining, I really looked forward to this session.

Data mining does three things:

  • Explores the data
  • Identifies patterns in the data
  • Performs predictions based on those identified patterns

There are several algorithms available for us to use:

  • Decision tree
    • Allows us to see how someone could come to a decision?
  • Clustering
    • Measures the distances between the cluster of points on a graph and an outlier
    • This method is often used for fraud detection and data validation
  • Time Series
    • Predict 3 units (days, months, years, etc) into the future
    • Based on historical values
  • Sequence Clustering
    • Used to determine somebodies next step
    • Where will a person click next on a web site?
  • Association Rules
    • Market basket analysis
    • Amazon uses this: People who bought this also bought this.
  • Naïve Bayes
    • A data mining algorithm that is literally hundreds of years old
    • A classification algorithm that could be used to determine the differences between customer who buy a product and customers who do not buy a product.
  • Neural Networks
    • Typically used with FICO scores
    • Takes multiple factors into account but can’t really point to one or two things to tell you how it came to a decision

Zero To Cube: Fast Track To Analytics with Adam Jorgensen & Dustin Ryan

I also had the wonderful pleasure of presenting on SSAS with Adam Jorgensen (blog | twitter). Our session was titled, “Zero To Cube: Fast Track To Analytics”. We had volunteers (some were volunteered by others) drive through creating a cube from the ground up. Within an hour and 15 minutes, we created a connection to our data source, created the data source view, made some enhancements to the data source view, created our cube, built a couple hierarchies and fleshed out the dimensions, and even built a report in Excel with a pivot table and a chart. We had a great, very participative audience and we had a blast! Thank you so much to all those that came to our session!

All in all, this year’s SQL Rally was an amazing event. I learned a lot of new things, made some great friends, and had a great time. If you weren’t able to make the event, you missed out big time. Definitely put next years SQL Rally on your calendar and make attending that event a priority. You won’t be sorry.

Intro to MDX Session Slide Deck and MDX Script

Thanks to everyone who attended my session at SQL Saturday #130 in Jacksonville, FL a couple weeks back. I apologize for posting this so late, but better late than never. To download my session materials, just click this link. In it you’ll find my PowerPoint slide deck and the MDX script I used in the class. If you have any questions about what we went over or any questions regarding the materials, feel free to leave me a comment or shoot me an email. Thanks again to all those who attended… except the guy who gave me a “Did not meet expectations” on my presentation for “talking about SSAS too much”. Last I checked, SSAS and MDX were kind of related, right?

Intro to MDX at SQL Saturday #130 – Jacksonville, FL

This weekend is SQL Saturday #130 in Jacksonville, FL. On Friday, April 27th, is a great pre-con session by SQL Server MVP, Kevin Kline on Performance Tuning SQL Server 2008 R2. Go here to get registered for this phenomenal opportunity to learn from an unquestioned expert.

Also, I’ll be presenting a session on MDX called MDX 101: An Introduction to MDX. I’m going to be giving an introduction into the multidimensional expression language used to query SSAS cubes. In this session, we’ will learn the basic MDX query syntax, how to navigate a hierarchy, how to create dynamic time calculations, and more.

Head here to get registered for this massive event! There are already 500 people registered for this awesome event so if you’re thinking about going you better sign up now before they lock down registrations!

Create Date Dimension Table Script

On a recent project I created a Date dimension script that I thought, I’d share (mostly so I would have access to it from anywhere). I based this script on one written by Mike Davis, but I included a few changes such as a first and last day of month flag, open flag, relative time period attributes, and a few others. Enjoy! Continue reading Create Date Dimension Table Script