All posts by Dustin Ryan

Dustin is a Data Platform Solutions Architect on the Education team at Microsoft. You can find Dustin blogging and speaking at events like SQL Saturday, Code Camp, and SQL Rally. Follow Dustin on Twitter @SQLDusty.

The Whaty-What of Data Mining Algorithms in SSAS

SQL Server Analysis Services features nine different data mining algorithms that looks for specific types of patterns in trends in order to make predictions about your data. This is a potentially very powerful tool and since I’ve been learning more about data mining recently I figured I’d put together a little bit of information and research I’ve done on these algorithms for my own reference as well as for the benefit of others. 

Below you will see a list of the Data Mining Algorithms included in SSAS 2008 to 2012. I’ve included the type of the algorithm, what it does, and an example or two of when one might decide its an appropriate algorithm for your data and requirements. I’ve also included links to TechNet for more information.

Decision Tree
Type: Classification

What: This is probably the most popular data mining algorithm simply because the results are very easy to understand. Decision Tree tries to predict the value of a column or columns based on the relationships between the columns you have identified. Decision Tree also determines which input columns are most highly correlated to your prediction column(s).
When: Use the Decision Tree algorithm when you want to try to predict if a customer will buy your product or what characteristics may make a person a potentially good customer. For example, if you want to determine which customers you should send coupons, use Decision Tree to determine if a person has the qualities and characteristics of past customers.

Linear Regression
Type: Regression

What: This algorithm is used to predict continuous variables using continuous input variables. Linear Regression is a simpler version of Decision Tree without the splits in the tree.
When: Use the Linear Regression algorithm when you want to compute a trend for your sales data.

Naïve Bayes
Type: Classification

What: The Naïve Bayes algorithm is based on Bayes’ theorems and good for performing predictive analytics. This algorithm calculates a probability based on input columns you define.
When: An example of when you might use the Naïve Bayes algorithm would be when you want to predict how likely a customer is likely to respond to an email blast of how likely a patient is to get sick or contract a disease based on their demographic information.

Neural Network
Type: Classification/Regression

What: The Neural Network algorithm may be one of the least used data mining algorithms because it is the most difficult to interpret. Basically Neural Network combines each possible input attribute value with each predictable attribute value to determine probabilities. These probabilities can be used for classification or regression, but you may have a difficult time determining how the algorithm reached a particular conclusion.
When: Use this when you want to analyze the relationships between many complex inputs to determine a certain output, such as predicting stock movements or currency fluctuations.

Logistic Regression
Type: Regression

What: The Logistic Regression algorithm is a statistical method for determining the contribution of specified inputs to a particular set of outcomes. This algorithm is similar to Neural Network in the way that it models the relationships between the various inputs.
When: An example of when you might use this algorithm would be to predict what characteristics make a customer a repeat customer or if a convicted criminal is likely to become a repeat offender.

Clustering
Type: Segmentation

What: The Clustering algorithm is probably very close to Decision Tree as far as data mining algorithms that are used most frequently simply because, like Decision Tree, it is also very easy to understand. The Clustering algorithm groups cases in a data set using the input columns into groups or clusters of cases with similar characteristics.
When: This algorithm is great for detecting fraud or anomalies in your data because it is very easy to see which data does not fit into a cluster.

Sequence Clustering
Type: Sequence Analysis

What: Sequence Clustering is similar to clustering except instead of looking for clusters based on the similarity of characteristics, the clusters are based on a model. The algorithm groups sequences of events that are identical.
When: An example of when you might use this algorithm would be when you want to determine which sequence of events are likely to lead to hardware failure in your environment.

Association Rules
Type: Association

What: The Association Rules algorithm defines combinations of items in a set by scanning the data in the specified inputs and identifying frequent combinations of items.
When: Use this algorithm when you want to identify opportunities for cross selling. For example, the Association Rules may determine that if a customer purchases milk there may be an opportunity to market cookies to the customer.

Time Series
Type: Regression

What: The Time Series algorithm is useful in predicting or forecasting continuous variables over time, such as sales metrics. Time Series does not require additional data to make predictions about the future. Time Series makes predictions 3 or 5 units into the future.
When: Use the Time Series algorithm when you want to predict sales for the next 3 or 5 months.

Are You Using SSIS Best Practices?

image

Earlier this month, Pragmatic Works released a new tool for their Pragmatic Works Workbench toolbox: The Best Practice Analyzer. The guys and gals over at Pragmatic Works have come up with a tool that analyzes your existing SSIS packages and compares them with a standardized set of best practice guidelines produced by the expert team of consultants and developers you’ve come to know and love over the past several years.

The Best Practice Analyzer combs through the SSIS packages you have selected and produces a report outlining all the best practice violations including violations of varying severities, such as “Warning”, “Error”, “Performance”, and “Informational”.

First, select the packages you’d like to investigate for adherence to best practices.

Next, select the best practices to be included in the report.

The end result is an easy-to-read report identifying any violations of best practices.

Violations could include everything between leaving a component description blank, using a fully block transform in a Data Flow Task, not enabling error logging, plus many more. Head over to PragmaticWorks.com to get more information on the SSIS Best Practice Analyzer.

Migrating Away From Informatica To SSIS? You’ll Need Workflow Profiler

image

If you’ve been paying attention to this blog or PragmaticWorks.com, you probably picked up on the fact that Pragmatic Works has put together a new service offering for those companies currently running Informatica as their ETL tool of choice and are sick and tired of paying hundreds of thousands of dollars a year in licensing (and in some cases millions of dollars! o_O!!). Pragmatic Works is now offering to quickly and efficiently migrate their clients from Informatica to SQL Server. One of the ways Pragmatic Works makes this process less painful is by using their tool, Workflow Profiler.

The primary goal of Workflow Profiler is to fully document your Informatica PowerCenter MetaData environment very quickly, thus giving you a better idea of just how much effort could be required during the migration process. This information is displayed in a series of detailed reports viewable in Workflow Profiler.

Reports, such as the Widget Distribution Summary Report, give you a detailed analysis of your Widget distribution.

Reports like the Workflow Detail Report, Mapping Summary Report, Mapping Detail Report, and many others display all types of information regarding your Workflows, including widget usage, what tasks are used in the workflows, which workflows use parameter files and more.

There is also the Workflow Execution Summary Report. This report displays all kind of performance data and metrics including execution status, performance numbers for each workflow and even individual tasks, and success/failed row counts for sources and targets for each session task.

If you’re looking for more insight into your Informatica environment and would like to know more about what kind of performance you’re getting out of your Workflows, I recommend checking out Workflow Profiler. Download the free trial here.

Thanks For Attending My SQLSat192 Session!

SQLSat192Thanks to all who attended my session MDXplosion! Intro to MDX at SQL Saturday 192 in Tampa over the past weekend! It was a great event and I had a blast presenting, networking, and hanging out with some great friends. And a special thanks to all the sponsors, volunteers, and Pam Shaw for making this great event possible. 

If you’d like to download my slide deck and code examples from the presentation, you can get that here. If you have more questions about MDX and want to learn more, you can find my content on MDX here. And, of course, you can email me at dryan (at) pragmaticworks.com or find me on Twitter here.

SQL Saturday Is Coming To Jacksonville, FL and I’m Speaking!

On April 27th next month, SQL Saturday #215 will be here in beautiful, sunny Jacksonville, Florida and none other than yours truly will be speaking, bright and early at 8:30 in the morning!

I’ll be presenting MDXplosion! Intro to MDX. The MDX query language is a powerful tool for extracting data from your cube but can often seem confusing and overwhelming to the untrained eye. But have no fear! It’s easier than you might think. In this session, I’m going to walk you through the basic MDX query syntax, how to navigate a hierarchy, how to create dynamic time calculations, and how to add calculations to your cube.

Also, don’t forget to check out the awesome Pre Con being offered by Brian Knight and Devin Knight on Building a SQL Server 2012 BI Platform. This pre con will cover everything from designing a data warehouse and loading it with SSIS to building an Analysis Services Cube leveraged by Reporting Services and SharePoint BI. You’ll definitely want to be a part of that, so register here for this amazing Pre Con.

If you’re not already registered for this awesome event, get on the ball! SQL Saturday Jacksonville is always a huge event and a ton of fun so get signed up ASAP! And don’t forget to check out my session: MDXplosion! Intro to MDX.

Pragmatic Works Now Offering New Service Migrating From Informatica To SSIS

Today for the first time, Brian Knight, founder of Pragmatic Works, announced that Pragmatic Works will be offering a new service aimed at migrating their clients from Informatica to SSIS!

This is a very exciting offering for Pragmatic Works and especially those companies running Informatica and looking to break away from the expensive licensing model. Check out this video with Pragmatic Works experts Brian Knight and Devin Knight discuss some of the differences between Informatica and SSIS. Stay until the end of the clip to catch the announcement of the new service!

For more information on migrating from Informatica to SSIS, head to http://www.PragmaticWorks.com or call (904) 638-5743.

I’m Speaking At SQL Saturday 192 In Tampa

sqlsat192_speaking

Coming this 2nd day of March, I will be speaking at SQL Saturday #192 in Tampa, Florida! It’s going to be off the chain no doubt, as the kids say.

I’ll be presenting my session, MDXplosion! Intro To MDX. The MDX query language is a powerful tool for extracting data from your cube but can often seem confusing and overwhelming to the untrained eye. But have no fear! It’s easier than you might think. In this session, we’ll go over the basic MDX query syntax, how to navigate a hierarchy, how to create dynamic time calculations, and how to add calculations to your cube. It’s going to be a blast and I’m really looking forward to it!

If you’re in the area or can make it down, get registered for this event! It’s going to be a great time and with the opportunity for free training at this amazing quality from industry pro’s, it’d be crazy to pass on this event!

Browsing SQL Server Databases With DBA xPress

image

With the latest release of the Pragmatic Workbench, we are exposed to the Object Browser. The Object Browser is a very cool feature that is party of the DBA xPress package of tools designed with the database administrator in mind. This is the first time I’m exploring this tool, so I wanted to walk through the feature with you and give my take on it (even though I’m not a DBA, but nonetheless…).

When you first launch Object Browser, you must first point to the server where your database exists. So far so good.

image

After connecting, we’re immediately treated to a nifty tree view of our database and the objects within that database. In this case, of course, I’ve connected to the Adventure Works DW 2008 R2 database.

image

The browser is extremely responsive and very quick. Drilling down and exploring the database is slick and fast. Once you’ve connected to the database, you can browse it and search for a single object using the search function. All I had to do is type in my search term and the tree view filtered faster than Google.

image

One of the really cool features is that you can see the SQL script to create the object you’ve selected. whether it’s a table, a view, a column in a table, a procedure, or a role. Very handy! We can also hover over the objects and view the meta data for that object.

image

Another neat feature is the ability to take a snapshot of the database. Click the Take Snapshot button at the top of the Object Browser. A nice, clean XML file is the output and can be saved to the location of your choosing.

image

To learn more about DBA xPress and download the free trial, head here.

Create XML Files With SSIS

imageIf you’ve ever tried to write data to an XML file with SSIS you know there’s not a slick, easy way to do it. You could use a Script Component in a Data Flow Task, as illustrated by fellow Pragmatic Works employee, Jason Strate. But there is an easier way. In comes Task Factory with its new XML Generator Transform Data Flow component.

image

The Task Factory XML Transform component allows you to specify the XSD file, which contains the schema information for the XML data to be output. Once you’ve selected the XSD file, any column from the source data that should be included in the XML document needs to be mapped to a matching field from the XSD file.

image

Next, you need to specify the relationship between the parent and child elements. Then map the field to establish the relationship.

Send the data to a Flat File Destination pointed at an XML file and you’re in business. Download the free trial of Task Factory to check out the XML Generator Transform and the 36 other tasks and transforms.

Import Native SSIS Performance Data With BI xPress

imageOne of the most powerful features of BI xPress is the Auditing Framework, which allows you to apply a standardized and robust auditing framework to multiple packages in just a few clicks. I’ve blogged about the BI xPress Auditing Framework before because its an extremely impressive tool that has saved me and the teams I’ve worked with countless hours. But now that SQL Server 2012 has been released, we have a whole slew of execution and performance data available to us natively within the 2012 SSIS Catalog. Wouldn’t it be great if we could view the native execution and performance data within the BI xPress Monitoring Console even if the BI xPress Auditing Framework has not been applied?

Your prayers have been answered. If you’re running SSIS 2012, you can now easily import the native execution and performance data into the BI xPress database for even more in depth reporting on the execution of your SSIS packages. Of course, this feature only works if you’re running 2012 :).

To import the SSIS Catalog data into the BI xPress database, click the SSIS Catalog Import button located on the Pragmatic Workbench home screen.

image

Next specify the location of the BI xPress database that is your target for the import. Then add the SSIS Catalog as the source for the import.

image

imageAnd the really nice part is that you can turn on Auto Importing so that the native performance and execution data will be automatically imported into the BI xPress database!

Now that the data has been imported into the BI xPress database, any native SSIS 2012 package stored in the SSIS Catalog can now monitored and measured in the BI xPress Monitoring Console. Head over to PragmaticWorks.com for more information and a trial download.