Are You Using SSIS Best Practices?

imageI’ve done a lot of consulting work over the last several years but one thing I’ve noticed that every single client is interested in is using best practices. Whether we’re talking about building SSRS reports, designing a data warehouse, building a 2 TB cube, or designing an ETL solution for a large data warehouse, everyone wants to use best practices. But its easier said than done. How do you enforce best practice rules across your organization? BI xPress attempts to remedy that problem with its SSIS Best Practices Analyzer.

The great part of the BI xPress SSIS Best Practices Analyzer is that you have the option to create your very own custom rules to enforce across your organization. Of course the tool comes packed with a list of rules put together by our team of ETL developers, but you also have the option to create your own rules to add to and improve upon our list of best practices. Very cool stuff!

Basically the way the tool works is very simple. BI xPress reads the properties of the packages you wish to enforce your organization’s best practices on and tests the properties based on the rules you’ve defined. For example, imagine your organization follows a specific naming convention for each and every component in your SSIS package. Simply create the rule that checks for proper naming convention using Rule Create dialogue.

After you’ve created your rules based on your teams best practices, its very easy to check your existing SSIS package to verify that they are utilizing best practices.

You can use your custom rules your team has developed or you can use the included rules put together by our team of consultants for those of you that are looking for best practice recommendations. If you’re looking to standardize yours SSIS packages and development life cycle, I’d encourage you to take a look at the BI xPress SSIS Best Practices Analyzer. Head over to PragmaticWorks.com for a free trial download.

Building Your First Tabular Model Webinar Recording and Questionnaire Now Available

Thanks to everyone who attended my webinar on the ins and outs and the basics of building your first Tabular Model. I hope everyone learned a little something and maybe picked up a couple tips or tricks along the way.

 Click Here To View The Recording

And thanks to everyone who had questions! I apologize for not being able to get to the questions during the webinar, so here are answers to a few of the questions posted in the chat window.

Question: “Don’t use joins” – that includes views that have joins under the covers, right?
Answer: That’s correct. That’s still going to create additional stress on the relational engine at query time and the last thing you want is to run into a locking/blocking issue during processing because you have complex queries behind the tables in your Tabular Model. If you need to join to other tables to pick up other columns, I would suggest getting with your ETL people and materializing the desired columns in the table so you can eliminate the joins.

Question: Could you please explain what difference of perspective and role?
Answer: A Role is used to define member permissions for your model. The Role defines what actions a user can perform on the model, such as read or process. A Perspective is a viewable subset of the model. A Perspective is similar to a SQL Server View on a Table. It can be used to display a subset of the model to a business unit in order to make it a little more simple to navigate through the model.

Question: What is the difference between Active & Inactive relationships?
Answer: In a Tabular Model, multiple relationships can exist between two table, but to create accurate calculations, we need a single path from one table to another. This means that even though multiple relationships exist between the two tables, only one will be actively used. The inactive relationships between the tables will not be used.

Question: Can a tabular “object” once built be put in source control and deployed by IT like SSIS packages and SSAS cubes?
Answer: Yes, Tabular Models can be deployed using the SSAS Deployment Wizard. The Create Script can also be generated from a deployed copy of the database and used to deploy to another server. For more information on SSAS Tabular deployment methods, see here.

Question: is there a way to change the database name and remove the ugly guid?
Answer:
The guid is only appended to the database on the workspace server. The database can be deployed with whatever name you like. You can set the database name by going to the project properties and setting the Database property. Whatever you enter in as the Database name will be the name of the database once you deploy the database and will not include the yucky guid, unless you company standards are to use the guid :).
image

Questions: Does Tabular model have superior performance over OLAP based cubes? Also can Tabular model be used for time series data?
Answers:
I would say generally speaking that a Tabular Model will have much faster query response on average, which is definitely one of the strengths of an in-memory database. OLAP cubes have the ability to contain much more data and can scale out better but in my experience queries against Tabular Models are usually much faster than queries against Multidimensional databases. Tabular Models also allow for time calculations. For more information on the kinds of time intelligence functions and calculations you can create with DAX, check this out.

Once again, thank you so much to everyone who attended my webinar. If I didn’t get to answer your question, I apologize. Feel free to post your question in the comments or on the forums of BIDN.com.

And don’t forget to follow me on Twitter!

Unit Testing Your SSIS Packages

One of the challenges SSIS developers often face is being able to meaningfully test our SSIS packages or components within the packages. Pragmatic Works has released a new component to BI xPress that allows you to easily create a suite of tests to perform on an SSIS package.

Unit Tests can be performed on entire packages or individual components in a package. And those Unit Tests are grouped into Test Suites, so think of a Test Suite as a series of unit tests designed to test the functionality  of a set of components in a package or a series of packages.

Set up steps can be configured to make retesting very easy. If a file needs to be created or a T-SQL command executed before a test can be performed, those steps can be configured within the BI xPress unit testing interface. And these set up steps can be performed once for the Test Suite or once for each Unit Test within the Test Suite.

Once the packages have been added to the Test Suite, creating the Unit Test is as simple as selecting the package or task within a package to execute. Then specify the expected results.

Then its time to create the Unit Test Assertions, which are the steps to check to execution of the packages or package components. An Assertion could be anything from checking a table for a certain number of rows, checking for the existence of a file, or checking a property within the package, among many others.

The entire Test Suite can be executed or an individual Unit Test can be fire. The test results are displayed in an easy to read format.

Take a look at SSIS Unit Tests with BI xPress if you’re looking for a way to streamline testing of your SSIS packages. I’ve personally used this tool during client engagements and its a great way to standardize testing of your SSIS packages.

Performance Tuning of Tabular Models in SSAS 2012 Whitepaper Now Available

Tabular Model Via MSDN, there’s now a great whitepaper called Performance Tuning of Tabular Models in SSAS 2012 available for your viewing pleasure. There’s a treasure trove of great information in this whitepaper and I highly recommend anyone developing or reporting on Tabular Models to take a look at this whitepaper. So definitely take a moment to download and read this whitepaper.

While you download this historic piece of literature, here’s three little tidbits of knowledge I picked after my initial skim-through:

Partitions Don’t Help Query Performance

Partitions do not improve processing time and/or query time. In Tabular Models, partitioning tables in the model only serves to allow the administrator to selectively refresh smaller subsets of data as is the case with an incremental load of a fact table. If your fact table is incrementally loaded, you can save processing time and help you make your processing window by only processing the affected partitions, but partitioning alone will not improve processing time if you’re still processing all partitions.

Partitions in a table are processed serially, unlike partitions in a measure group of a Multidimensional Database which are processed in parallel. In a Tabular Model, however, since each table is independent of each other, tables can be processed in parallel even if partitions in a given table aren’t.

Unlike in a Multidimensional Database, dimension tables in a Tabular Model can be partitioned. This opens the door for incremental processing of those dimensions as well as some unique partitioning strategies for those dimensions.

Memory Is Crucial, Duh!

Obviously with a Tabular Model you need to be able to fit the model in memory to fully utilize the magical power that is Tabular. But if you have a 10 GB model, how much memory do you need? The correct answer is about 30 GB of memory. Why 30 GB you ask? During a Process Full of your Tabular Model, the database is kept online until the transaction for the processing operation is committed. That means that for the given 10 GB model, you need to be able to hold two copies in memory: 10GB for the old data and 10GB for the new data. Then you’ll likely need around 5 GB – 10 GB for various processing overheads. So keep in mind that you could need significantly more memory than you might think is necessary to support a single Tabular Model.

Table Queries Are Unaltered During Processing

In a Multidimensional database, the heavy lifting for dimensions is transferred

from SSAS to the relational database by executing “Select Distinct” statements for each attribute. For measure groups in a Multidimensional database, the query is wrapped in a subselect with only the necessary columns returned. This also means that using CTE’s, Order By statements, and procedures are not an option in a Multidimensional database. In a Tabular Model, however, the query for the table is unaltered which means that using CTE’s, procedures, Order By’s, and various other t-sql features is now possible. But this also means that the responsibility for returning only the required columns in the query is now the developer’s. Without removing the unnecessary columns from the query, processing will be less efficient and could take longer.

Like I said earlier, this is a great whitepaper and I highly recommend that you check it out!

The Whaty-What of Data Mining Algorithms in SSAS

SQL Server Analysis Services features nine different data mining algorithms that looks for specific types of patterns in trends in order to make predictions about your data. This is a potentially very powerful tool and since I’ve been learning more about data mining recently I figured I’d put together a little bit of information and research I’ve done on these algorithms for my own reference as well as for the benefit of others. 

Below you will see a list of the Data Mining Algorithms included in SSAS 2008 to 2012. I’ve included the type of the algorithm, what it does, and an example or two of when one might decide its an appropriate algorithm for your data and requirements. I’ve also included links to TechNet for more information.

Decision Tree
Type: Classification

What: This is probably the most popular data mining algorithm simply because the results are very easy to understand. Decision Tree tries to predict the value of a column or columns based on the relationships between the columns you have identified. Decision Tree also determines which input columns are most highly correlated to your prediction column(s).
When: Use the Decision Tree algorithm when you want to try to predict if a customer will buy your product or what characteristics may make a person a potentially good customer. For example, if you want to determine which customers you should send coupons, use Decision Tree to determine if a person has the qualities and characteristics of past customers.

Linear Regression
Type: Regression

What: This algorithm is used to predict continuous variables using continuous input variables. Linear Regression is a simpler version of Decision Tree without the splits in the tree.
When: Use the Linear Regression algorithm when you want to compute a trend for your sales data.

Naïve Bayes
Type: Classification

What: The Naïve Bayes algorithm is based on Bayes’ theorems and good for performing predictive analytics. This algorithm calculates a probability based on input columns you define.
When: An example of when you might use the Naïve Bayes algorithm would be when you want to predict how likely a customer is likely to respond to an email blast of how likely a patient is to get sick or contract a disease based on their demographic information.

Neural Network
Type: Classification/Regression

What: The Neural Network algorithm may be one of the least used data mining algorithms because it is the most difficult to interpret. Basically Neural Network combines each possible input attribute value with each predictable attribute value to determine probabilities. These probabilities can be used for classification or regression, but you may have a difficult time determining how the algorithm reached a particular conclusion.
When: Use this when you want to analyze the relationships between many complex inputs to determine a certain output, such as predicting stock movements or currency fluctuations.

Logistic Regression
Type: Regression

What: The Logistic Regression algorithm is a statistical method for determining the contribution of specified inputs to a particular set of outcomes. This algorithm is similar to Neural Network in the way that it models the relationships between the various inputs.
When: An example of when you might use this algorithm would be to predict what characteristics make a customer a repeat customer or if a convicted criminal is likely to become a repeat offender.

Clustering
Type: Segmentation

What: The Clustering algorithm is probably very close to Decision Tree as far as data mining algorithms that are used most frequently simply because, like Decision Tree, it is also very easy to understand. The Clustering algorithm groups cases in a data set using the input columns into groups or clusters of cases with similar characteristics.
When: This algorithm is great for detecting fraud or anomalies in your data because it is very easy to see which data does not fit into a cluster.

Sequence Clustering
Type: Sequence Analysis

What: Sequence Clustering is similar to clustering except instead of looking for clusters based on the similarity of characteristics, the clusters are based on a model. The algorithm groups sequences of events that are identical.
When: An example of when you might use this algorithm would be when you want to determine which sequence of events are likely to lead to hardware failure in your environment.

Association Rules
Type: Association

What: The Association Rules algorithm defines combinations of items in a set by scanning the data in the specified inputs and identifying frequent combinations of items.
When: Use this algorithm when you want to identify opportunities for cross selling. For example, the Association Rules may determine that if a customer purchases milk there may be an opportunity to market cookies to the customer.

Time Series
Type: Regression

What: The Time Series algorithm is useful in predicting or forecasting continuous variables over time, such as sales metrics. Time Series does not require additional data to make predictions about the future. Time Series makes predictions 3 or 5 units into the future.
When: Use the Time Series algorithm when you want to predict sales for the next 3 or 5 months.

Are You Using SSIS Best Practices?

image

Earlier this month, Pragmatic Works released a new tool for their Pragmatic Works Workbench toolbox: The Best Practice Analyzer. The guys and gals over at Pragmatic Works have come up with a tool that analyzes your existing SSIS packages and compares them with a standardized set of best practice guidelines produced by the expert team of consultants and developers you’ve come to know and love over the past several years.

The Best Practice Analyzer combs through the SSIS packages you have selected and produces a report outlining all the best practice violations including violations of varying severities, such as “Warning”, “Error”, “Performance”, and “Informational”.

First, select the packages you’d like to investigate for adherence to best practices.

Next, select the best practices to be included in the report.

The end result is an easy-to-read report identifying any violations of best practices.

Violations could include everything between leaving a component description blank, using a fully block transform in a Data Flow Task, not enabling error logging, plus many more. Head over to PragmaticWorks.com to get more information on the SSIS Best Practice Analyzer.

Migrating Away From Informatica To SSIS? You’ll Need Workflow Profiler

image

If you’ve been paying attention to this blog or PragmaticWorks.com, you probably picked up on the fact that Pragmatic Works has put together a new service offering for those companies currently running Informatica as their ETL tool of choice and are sick and tired of paying hundreds of thousands of dollars a year in licensing (and in some cases millions of dollars! o_O!!). Pragmatic Works is now offering to quickly and efficiently migrate their clients from Informatica to SQL Server. One of the ways Pragmatic Works makes this process less painful is by using their tool, Workflow Profiler.

The primary goal of Workflow Profiler is to fully document your Informatica PowerCenter MetaData environment very quickly, thus giving you a better idea of just how much effort could be required during the migration process. This information is displayed in a series of detailed reports viewable in Workflow Profiler.

Reports, such as the Widget Distribution Summary Report, give you a detailed analysis of your Widget distribution.

Reports like the Workflow Detail Report, Mapping Summary Report, Mapping Detail Report, and many others display all types of information regarding your Workflows, including widget usage, what tasks are used in the workflows, which workflows use parameter files and more.

There is also the Workflow Execution Summary Report. This report displays all kind of performance data and metrics including execution status, performance numbers for each workflow and even individual tasks, and success/failed row counts for sources and targets for each session task.

If you’re looking for more insight into your Informatica environment and would like to know more about what kind of performance you’re getting out of your Workflows, I recommend checking out Workflow Profiler. Download the free trial here.

Thanks For Attending My SQLSat192 Session!

SQLSat192Thanks to all who attended my session MDXplosion! Intro to MDX at SQL Saturday 192 in Tampa over the past weekend! It was a great event and I had a blast presenting, networking, and hanging out with some great friends. And a special thanks to all the sponsors, volunteers, and Pam Shaw for making this great event possible. 

If you’d like to download my slide deck and code examples from the presentation, you can get that here. If you have more questions about MDX and want to learn more, you can find my content on MDX here. And, of course, you can email me at dryan (at) pragmaticworks.com or find me on Twitter here.

SQL Saturday Is Coming To Jacksonville, FL and I’m Speaking!

On April 27th next month, SQL Saturday #215 will be here in beautiful, sunny Jacksonville, Florida and none other than yours truly will be speaking, bright and early at 8:30 in the morning!

I’ll be presenting MDXplosion! Intro to MDX. The MDX query language is a powerful tool for extracting data from your cube but can often seem confusing and overwhelming to the untrained eye. But have no fear! It’s easier than you might think. In this session, I’m going to walk you through the basic MDX query syntax, how to navigate a hierarchy, how to create dynamic time calculations, and how to add calculations to your cube.

Also, don’t forget to check out the awesome Pre Con being offered by Brian Knight and Devin Knight on Building a SQL Server 2012 BI Platform. This pre con will cover everything from designing a data warehouse and loading it with SSIS to building an Analysis Services Cube leveraged by Reporting Services and SharePoint BI. You’ll definitely want to be a part of that, so register here for this amazing Pre Con.

If you’re not already registered for this awesome event, get on the ball! SQL Saturday Jacksonville is always a huge event and a ton of fun so get signed up ASAP! And don’t forget to check out my session: MDXplosion! Intro to MDX.

Pragmatic Works Now Offering New Service Migrating From Informatica To SSIS

Today for the first time, Brian Knight, founder of Pragmatic Works, announced that Pragmatic Works will be offering a new service aimed at migrating their clients from Informatica to SSIS!

This is a very exciting offering for Pragmatic Works and especially those companies running Informatica and looking to break away from the expensive licensing model. Check out this video with Pragmatic Works experts Brian Knight and Devin Knight discuss some of the differences between Informatica and SSIS. Stay until the end of the clip to catch the announcement of the new service!

For more information on migrating from Informatica to SSIS, head to http://www.PragmaticWorks.com or call (904) 638-5743.

Dustin Ryan is a Data Specialist interested in Azure, SQL Server, and Power BI.

%d bloggers like this: