Data Import from Amazon S3 SSIS bucket using an integration service SSIS package

SQLShack

SQL Server training Español

Data Import from Amazon S3 SSIS bucket using an integration service SSIS package

October 28, 2019 by Rajendra Gupta This article explores data import in SQL Server from a CSV file stored in the Amazon S3 SSIS bucket.

Introduction

The SSIS package is an ETL tool for data extraction, transformation and loading data from various sources such as OLE DB, ODBC, Ado.NET, flat-file connections. Nowadays, many organizations have hybrid infrastructures model. It is a combination of both On-premise and cloud infrastructure. It is essential for a DBA and developer to be familiar with cloud solutions as well. Recently I have a requirement in that to import S3 bucket property and file data with the following source and destination: Source: CSV file stored in AWS S3 bucket Destination: On-premise SQL Server database table First, let’s take an overview of the AWS S3 bucket.

Overview of Amazon S3

AWS S3 is an acronym of Amazon Web Service Simple Storage service. It is a highly scalable and cost-effective cloud storage for data storage, archival. We can upload any file extension in the S3 bucket. Single file size cannot exceed beyond five TB. Users can control the file properties, accessibility such as public or private accessibility. We can store unlimited data in the S3 bucket. The best part is that you only pay for the used storage.

AWS Free Tier account

The starting point of the AWS journey is the AWS Console and creates an AWS free tier account. Go to this URL, and it opens the following free tier page of AWS: We can create a free account, and it is 12 months free. You can use it as a starting point to learn AWS: Few vital features of free tier account in AWS are as following: 750 hours per month free usage of Amazon EC2 instances 750 hours per month free usage of Amazon RDS t2.micro server 30 GB per month usage of Amazon elastic storage 5 GB free AWS S3 storage You can go through the URL to read more about the free account in AWS. Click on Create a Free Account and go through the process by entering your personal information such as name, country, email address, etc. Here is a catch; you need to provide a valid debit or credit card. It deducts 2 rupees from your credit or debit card. Do not worry; you will also get it back in a few days. It is a mandatory step to sign up for a free account. Amazon does not deduct any money until you have not exhausted the limit of the free tier account.

Amazon S3 configuration

To create an S3 bucket do the following: Open the AWS web console and login to the account using IAM user having permissions to create, access, upload, and import data into the S3 bucket Navigate to Services, Click on S3 in the Storage section: It lists out all existing AWS S3 Bucket. We can see bucket name, access level, bucket region and the created data of the S3 bucket Click on Create bucket and provide the S3 bucket name: Amazon S3 is a region-specific resource; therefore; you need to select the region from the drop-down list. Then, click on Create: It creates the S3 bucket with default configurations and shows in the bucket list. This bucket belongs to Asia Pacific (Mumbai) region: Open this S3 bucket, drag the CSV file from the source system, drop in the S3 bucket, and click the Upload button: You can see the CSV file in the S3 bucket:

Amazon S3 SSIS data upload

As described earlier, we require data import from CSV file (stored in AWS S3 bucket) into the SQL server table. Question: How do you connect an SSIS package with an AWS S3 bucket? By default, the SSIS package does not allow you to connect with the AWS S3 bucket. It does not mean that we do have any mechanism to do so. We can browse Visual studio Marketplace and use Amazon S3 SSIS Components (SSIS Productivity Pack) for our requirements: It enables SSIS package and Amazon S3 SSIS bucket integration without writing any specific code. You can easily use this SSIS productivity pack to download complete the task effectively and efficiently. Note: It is trial version software, and you can use this to test out software functionality using development tools such as SSDT. You do not require a license in this case. Read more about licensing and available licenses using the link: Download the appropriate 64-bit or 32-bit. Double-click on the installer and follow up the installation wizard for installing SSIS productivity pack with default configurations:

Amazon S3 SSIS package configuration

Once you installed SSIS Productivity Pack, launch Visual Studio (SSDT). I am using SSDT Visual Studio 2015 in this article.

S3 bucket connection

The first step is to create a connection pointing to the Amazon S3 bucket. Right-click on the connection manager and choose New connection: It opens a list of available connection managers. We require connecting with the AWS S3 bucket; therefore, select the Amazon S3 and click on Add. It opens the Amazon S3 Connection Manager. In General page of the connection manager, we specify connection properties and login credentials to the AWS S3 bucket: Access and secret key: We specify access and secret key on an IAM user. We get these keys from the IAM console at AWS web services Logon to AWS Console and navigate to IAM in the services menu Navigate to Users and click on a specific user In the user, properties go to the Security credentials tab Choose the access key and download the secret key in a CSV file You need to store the access key and secret key in a secured location. Specify it in the Amazon S3 Connection Manager. It authenticates user account in Amazon services using these keys, and you can see the bucket name in the drop-down list Click on Test Connection, and it checks for connection to the Amazon S3 SSIS bucket. Once the connection is successful, you get the following message: Click OK, and we can see AWS S3 Connection in the connection manager:

Add a Data Flow task

In the next step, add a data flow task in the SSIS package for the Amazon S3 SSIS bucket: Rename the data flow task as AWS S3 Data Import: Double-click on this task, and it takes you to the data flow screen. In the SSIS toolbox on the data flow tab, you can see options for Amazon S3 bucket: Amazon S3 Destination Amazon S3 Source Premium Flat File source Premium Flat File destination Drag Premium Flat File Source in the data flow task from the SSIS toolbox: Double-click on Premium Flat File Source and it opens the editor window for configuration: Connection Manager: Select the existing connection to the S3 bucket from the drop-down. If you do not have an existing connection to the S3 bucket, click on New. You require specifying keys (access and security key) information, as shown in previous steps Source File Path: It is the CSV file path in the S3 bucket. Click on eclipse and browse to the CSV file path. You might have multiple files in the S3 bucket. The source file path should point to a valid CSV file Click on eclipse (…) on the source file path and browse to the CSV file path: We can see the CSV file in the Source item path column. The first row has the header: My CSV file first column contains column headers. Remove the check from here if the CSV file does not contains headers Skip Empty Rows: CSV file might contain empty rows. Put a check on this to skip the empty rows Click on columns tab to view excel file columns. We may uncheck the columns if we do not want to import them into SQL Server tables. It also shows the data type and length automatically. We can change the column data types and their properties in this window. For this demonstration, let’s go with default columns properties in the Amazon S3 SSIS package: Click OK and add an OLE DB destination connection. This OLE DB destination should point to the SQL Server instance as per our requirement. Right-click and configure the destination with the SQL Server connection and SQL Server table. You also need to do the mapping between source and destination columns: Once the OLE DB destination connection is successful, you can see the package as shown below: Before executing the Amazon S3 SSIS package, rename the tasks as follows: Premium Flat File Source -> AWS S3 Bucket CSV file OLE DB Destination editor: SQL Server destination table Execute the SSIS package, and it transfers 100 rows from the source CSV file to SQL Server table. Verify the records in the SQL Server table.

Conclusion

In this article, we explored the method to integrate the SSIS package and the Amazon S3 SSIS bucket. We can use the component specified in this article for data import, export between On-premise and cloud systems as well. Author Recent Posts Rajendra GuptaHi! I am Rajendra Gupta, Database Specialist and Architect, helping organizations implement Microsoft SQL Server, Azure, Couchbase, AWS solutions fast and efficiently, fix related issues, and Performance Tuning with over 14 years of experience.

I am the author of the book "DP-300 Administering Relational Database on Microsoft Azure". I published more than 650 technical articles on MSSQLTips, SQLShack, Quest, CodingSight, and SeveralNines.

I am the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups.

Based on my contribution to the SQL Server community, I have been recognized as the prestigious Best Author of the Year continuously in 2019, 2020, and 2021 (2nd Rank) at SQLShack and the MSSQLTIPS champions award in 2020.

Personal Blog: https://www.dbblogger.com
I am always interested in new challenges so if you need consulting help, reach me at [email protected]

View all posts by Rajendra Gupta Latest posts by Rajendra Gupta (see all) Copy data from AWS RDS SQL Server to Azure SQL Database - October 21, 2022 Rename on-premises SQL Server database and Azure SQL database - October 18, 2022 SQL Commands to check current Date and Time (Timestamp) in SQL Server - October 7, 2022

Load data from AWS S3 to AWS RDS SQL Server databases using AWS Glue Learn AWS CLI – Interact with AWS S3 Buckets using AWS CLI How to connect AWS RDS SQL Server with AWS Glue Load data into AWS Redshift from AWS S3 Launch a static website using AWS S3 buckets 18,213 Views

SQL Convert Date functions and formats SQL Variables: Basics and usage SQL PARTITION BY Clause overview Different ways to SQL delete duplicate rows from a SQL Table How to UPDATE from a SELECT statement in SQL Server SQL Server functions for converting a String to a Date SELECT INTO TEMP TABLE statement in SQL Server SQL WHILE loop with simple examples How to backup and restore MySQL databases using the mysqldump command CASE statement in SQL Overview of SQL RANK functions Understanding the SQL MERGE statement INSERT INTO SELECT statement overview and examples SQL multiple joins for beginners with examples Understanding the SQL Decimal data type DELETE CASCADE and UPDATE CASCADE in SQL Server foreign key SQL Not Equal Operator introduction and examples SQL CROSS JOIN with examples The Table Variable in SQL Server SQL Server table hints – WITH (NOLOCK) best practices

SQL Server Transaction Log Backup, Truncate and Shrink Operations Six different methods to copy tables between databases in SQL Server How to implement error handling in SQL Server Working with the SQL Server command line (sqlcmd) Methods to avoid the SQL divide by zero error Query optimization techniques in SQL Server: tips and tricks How to create and configure a linked server in SQL Server Management Studio SQL replace: How to replace ASCII special characters in SQL Server How to identify slow running queries in SQL Server SQL varchar data type deep dive How to implement array-like functionality in SQL Server All about locking in SQL Server SQL Server stored procedures for beginners Database table partitioning in SQL Server How to drop temp tables in SQL Server How to determine free space and file size for SQL Server databases Using PowerShell to split a string into an array KILL SPID command in SQL Server How to install SQL Server Express edition SQL Union overview, usage and examples

Solutions

Read a SQL Server transaction logSQL Server database auditing techniquesHow to recover SQL Server data from accidental UPDATE and DELETE operationsHow to quickly search for SQL database data and objectsSynchronize SQL Server databases in different remote sourcesRecover SQL data from a dropped table without backupsHow to restore specific table(s) from a SQL Server database backupRecover deleted SQL data from transaction logsHow to recover SQL Server data from accidental updates without backupsAutomatically compare and synchronize SQL Server dataOpen LDF file and view LDF file contentQuickly convert SQL code to language-specific client codeHow to recover a single table from a SQL Server database backupRecover data lost due to a TRUNCATE operation without backupsHow to recover SQL Server data from accidental DELETE, TRUNCATE and DROP operationsReverting your SQL Server database back to a specific point in timeHow to create SSIS package documentationMigrate a SQL Server database to a newer version of SQL ServerHow to restore a SQL Server database backup to an older version of SQL Server

Categories and tips

►Auditing and compliance (50) Auditing (40) Data classification (1) Data masking (9) Azure (295) Azure Data Studio (46) Backup and restore (108) ▼Business Intelligence (482) Analysis Services (SSAS) (47) Biml (10) Data Mining (14) Data Quality Services (4) Data Tools (SSDT) (13) Data Warehouse (16) Excel (20) General (39) Integration Services (SSIS) (125) Master Data Services (6) OLAP cube (15) PowerBI (95) Reporting Services (SSRS) (67) Data science (21) ►Database design (233) Clustering (16) Common Table Expressions (CTE) (11) Concurrency (1) Constraints (8) Data types (11) FILESTREAM (22) General database design (104) Partitioning (13) Relationships and dependencies (12) Temporal tables (12) Views (16) ►Database development (418) Comparison (4) Continuous delivery (CD) (5) Continuous integration (CI) (11) Development (146) Functions (106) Hyper-V (1) Search (10) Source Control (15) SQL unit testing (23) Stored procedures (34) String Concatenation (2) Synonyms (1) Team Explorer (2) Testing (35) Visual Studio (14) DBAtools (35) DevOps (23) DevSecOps (2) Documentation (22) ETL (76) ►Features (213) Adaptive query processing (11) Bulk insert (16) Database mail (10) DBCC (7) Experimentation Assistant (DEA) (3) High Availability (36) Query store (10) Replication (40) Transaction log (59) Transparent Data Encryption (TDE) (21) Importing, exporting (51) Installation, setup and configuration (121) Jobs (42) ►Languages and coding (686) Cursors (9) DDL (9) DML (6) JSON (17) PowerShell (77) Python (37) R (16) SQL commands (196) SQLCMD (7) String functions (21) T-SQL (275) XML (15) Lists (12) Machine learning (37) Maintenance (99) Migration (50) Miscellaneous (1) ►Performance tuning (869) Alerting (8) Always On Availability Groups (82) Buffer Pool Extension (BPE) (9) Columnstore index (9) Deadlocks (16) Execution plans (125) In-Memory OLTP (22) Indexes (79) Latches (5) Locking (10) Monitoring (100) Performance (196) Performance counters (28) Performance Testing (9) Query analysis (121) Reports (20) SSAS monitoring (3) SSIS monitoring (10) SSRS monitoring (4) Wait types (11) ►Professional development (68) Professional development (27) Project management (9) SQL interview questions (32) Recovery (33) Security (84) Server management (24) SQL Azure (271) SQL Server Management Studio (SSMS) (90) SQL Server on Linux (21) ►SQL Server versions (177) SQL Server 2012 (6) SQL Server 2016 (63) SQL Server 2017 (49) SQL Server 2019 (57) SQL Server 2022 (2) ▼Technologies (334) AWS (45) AWS RDS (56) Azure Cosmos DB (28) Containers (12) Docker (9) Graph database (13) Kerberos (2) Kubernetes (1) Linux (44) LocalDB (2) MySQL (49) Oracle (10) PolyBase (10) PostgreSQL (36) SharePoint (4) Ubuntu (13) Uncategorized (4) Utilities (21) Helpers and best practices BI performance counters SQL code smells rules SQL Server wait types © 2022 Quest Software Inc. ALL RIGHTS RESERVED. GDPR Terms of Use Privacy

TREND NOW

Data Import from Amazon S3 SSIS bucket using an integration service SSIS package

SQLShack

Data Import from Amazon S3 SSIS bucket using an integration service SSIS package

Introduction

Overview of Amazon S3

AWS Free Tier account

Amazon S3 configuration

Amazon S3 SSIS data upload

Amazon S3 SSIS package configuration

S3 bucket connection

Add a Data Flow task

Conclusion

Related posts

Follow us

Popular

Trending

Solutions

Categories and tips

Comments (0)

Leave a Comment