Introduction To SSIS
What is data integration and why do we need it.
-> Data integration is the teeth using various tools. Let's take example you have to understand the need of data integration
-> Suppose I am in a company where I have different departments now if the department has some data right so based on the company requirements to choose the database so for example my accounting team chooses SAT to store all the analytical data or you can say my sales team uses Salesforce CRM after manage all the customer details similarly different team uses different database my marking team is using Oracle manufacturing is using db2 and many more so all this depends on the requirements of the company
-> Now I have different databases where my different type of data is stored but let's say the manager asked me to analyze all the departments and tell me who brings the best revenue out of all the teams so what we do now any idea guys if you could suggest me with something or you could help me with this current problem okay
-> We can connect the databases this is absolutely right but let me tell you to connect the databases it's not free you need a connection object or adapter for it what about dealing with these connected databases will create more complexities for you because if you have large data say you have hundred databases and to connect all of them it will consume a lot of time okay but now what would the solution there here a simple solution would be data integration
-> Data integration what I mean is you can integrate all your data present in different databases and combine them at the same platform
-> Data integration is a process you follow to get data from multiple sources your data can be in any form it can be heterogeneous and homogeneous now by this term I mean data can be in structured form it can be semi structured form I can be unstructured so these are dissimilar data but if these dissimilar data are combined together into meaningful and valuable information wouldn't that be great so this exactly is data integration
-> Data integration was going recently but before that also people uses the data integration the people then realize the potential of data integration so they use different methods to achieve it so here I suggest some few ways from which you can achieve data integration such as data modelling where you first create a model and then perform operations on it then there is data profiling bit you take a sample data and check if there is some inconsistencies errors or some variations to it similarly
-> Advantages of data integration are it reduces complexity it means deliver data to any system so data integration is all about managing complexity streamlining these connections and making it easy to deliver data to any system, second advantage is data integrity now integrity has a major role in data integration so data indicative basically business with cleansing and validating your data so all of us need our data to be high quality and robust right
What is SSIS. How SSIS works
-> SSIS stands for sequence of integration services which is the service from Microsoft
-> SSIS is the service of Microsoft that basically performs data integration or you can say merging of data from different data sources which can be from flat file it can be from exit it can be for SCP all right pull or anything
-> So it is basically used to perform a broad range of data integration as well as data a transformation task so in a whole you can say it basically perform data migration so exercise is a platform for data integration and workflow applications by data integration
-> We already know the data is retrieved and combined in a structure which has a unified view next we have workflow now a workflow can do several things sometimes you just need some steps or path in the path that institution which is either based on time period or maybe a parameter that is passed or queried from the database now after identifying it you can choose any path you want to take this is
-> We've already discussed that exercise is a platform for data integration and growth to applications so these two things are carried out using an exercise package we'll be talking about exercise package in for the most light to air
-> Three major components the first is operational data followed by an ATM process and then the data warehouse so let's understand each one of them in detail now so the first component is operational data now what exactly it is so an operational data or you can say an ODS which stands for operational data store which is a database that is used to integrate data from multiple sources also one key point of operational data is analyzer master date of tow where the data is not pass back to operational systems it may be passed for for the operations and the data warehouse for reporting but it is not passed back to the operational system
-> ETL is a process responsible for putting data out of the source which can be of any format it can give excellent flat file and placing the hole into a data warehouse also an ETL process ensures that the data stored in the warehouse is relevant it is useful to the business users it is accurate and it is high quality also it is easy to access so that the warehouse is used efficiently and effectively by the business users so it will help the organization to make meaningful data-driven decisions by interpreting and transforming large amount of structured and unstructured data even though
-> ETL is a three word concept but it is actually divided into four phases so the
(a). First phase is capture it is also known as an extract phase so in this case it basically takes the source data or metadata which can be present in any format
(b). The next process is scrub subscribe basically identify errors in your original data for checking these errors and inconsistencies it uses some artificial intelligence techniques to verify the quality of the data should verify its quality of the data and basically ensure that the quality of the data is met or not thirds
(c). The transformation is another process where your source data is converted to the required format you want the transformation is modeling or changing your data to meet the requirements it can be with respect number of rows and column processing if you want to increase the number of rows or columns you get transform it accordingly
(d). Final stage is load and index so in this date it knows the data and validates number of flows that is processed meets the required number of rows once your loading is done indexing helps you track the number of rows or the amount of data you are loading into the warehouse so it basically checks the data through indexing and identify the data is inside format
What is data warehouse. What the data warehouse is
-> Data warehouse is a single complete and consistent tour of data which is formulated by combining the data from various sources and then they combine the data from different sources they are not simply saying just go and take data from different sources and combine them together it has to be a purpose for it so we as an analyst or consultant
-> What a big system is all this will become secondary and it will naturally come to you so
-> Data warehouse is a technique where you pull the data or assemble the data from various sources and combine them
-> Data warehouse is a structure of their analytics various queries can be fired and you can get faster query responses if you compare it to the database