The old times
If you needed a platform to do data movement or data transformations you probably ended up using Microsoft Integration Services.
It did (and still do) a great job on solving this kind of questions. But this is an on-premise solution and Integration Services itself doesn’t exist in the Microsoft Cloud. So how are we able to move data to cloud or within the cloud from one service to another?
The Azure Data Factory
Is the Data Factory a cloud version of Microsoft Integration Services? No.
Data Factory is complete new product on the Azure Stack. You will notice that some more complex things that would be possible with Integration Services are not possible with Data Factory. But still the Data Factory is an awesome product which allows us to do similar things.
A Data factory is made of different building blocks:
- Dataset: temporary tables for processing
- Activities: perform actions on your data. Can use multiple datasets as input and produces 1 output dataset
- Pipeline: logical grouping of activities for manipulating data
- Linked Service: connection to data source (both in the cloud as on-premise)
A Sample – The Copy Wizard
If you have created a Azure Data Factory. You can find a button “Copy Data”. Click on that one and there will open a new window with a wizard to configure everything.
On the first page you have to give some more general information.
- Name of the Task
- Description of the task
- How to schedule the task or is it a run once task
Secondly you a need a source to copy from. This can be an on-premise data source (you need to install a Data Management Gateway) or a cloud service. In this example i am choosing a text file on my Azure Blob Storage. For each type you have to give the wizard the connection information which can be very specific for each kind of data source.
For a blob storage you can choose an individual file or a folder.
In case of a folder the Data Factory is iterating all files within this folder and loads everything. I’m only using one file because then i get a preview of data in it.
Do the same for the destination. I’m choosing a Azure SQL database as destination.
Make sure the table already exist in the database, because Data Factory won’t create one for you.
Summary
So Data Factory can be used to move and transform data between on-premise and cloud services. The copy data wizard is good to copy data straightforward without doing transformations. Data Factory is made to crunch huge amounts of data in a small amount of time. There will come more posts about Data Factory.
So keep posted for more details about data transformation within the Data Factory.