Enhanced PolyBase SQL 2019 External tables SQL Server Catalog view and PushDown

Enhanced PolyBase SQL 2019 External tables SQL Server Catalog view and PushDown

Enhanced PolyBase SQL 2019 - External tables SQL Server, Catalog view and PushDown

SQLShack

SQL Server training Español

Enhanced PolyBase SQL 2019 – External tables SQL Server Catalog view and PushDown

November 6, 2018 by Rajendra Gupta This article is part 4 of the series for SQL Server 2019 Enhanced PolyBase. Let quickly recap the previous articles. Part 1: We installed SQL Server 2019 PolyBase feature along with Azure Data Studio and SQL Server 2019 preview extension to explore its features Part 2: In this part, we learned to create an External table using Azure Data Studio ‘External table wizard’ for the Oracle data source Part 3: We learned the useful features of External tables like joins and created an external table using t-SQL instead of the GUI mode for Oracle database in this series article We have learned earlier that PolyBase in SQL Server 2019 Preview allows access to various data sources such as SQL Server, Oracle, MongoDB, Teradata, and ODBC based sources etc. Azure Data Studio SQL Server 2019 preview extension currently supports for SQL Server and Oracle data sources only from the External table wizard. In this series, we will create an external table for SQL Server and explore some more features around it. Launch Azure Data Studio and connect to the SQL Server 2019 preview instance. Right click on the database and launch ‘Create External Table’.
This opens up the wizard to create the external tables. Recently, I faced an issue where the wizard stuck in the ‘step 1’. Progress bar icon keeps rotating and does not show any error message or any progress.
After some time, we get the error message ‘Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.’ This is the general error message and does not point to clear error message. I tried to close the wizard multiple times and launch it again but it remains the same. It does not show any error message to troubleshoot it further. Later, during the investigation, I figured out that PolyBase services were in stopped condition. SQL Server PolyBase Data Management Service SQL Server PolyBase Engine Let us start these services. Now again launch the ‘Create External Table’ wizard in Azure Data Studio. This starts the wizard successfully. Therefore, monitor the service status before launching the wizard to avoid any issues.

External table for SQL Server

In this section, we will use the below source and destination instances. Source Instance (here we will create external table): SQL Server 2019 (Named instance – SQL2019) Destination Instance (External table will point here): SQL Server 2019 (Default instance – MSSQLSERVER) Click on the ‘SQL Server’ in the data source type of wizard and proceed to the next step. In the next step, create the Database Master Key to secure the credentials used by the external data source. We should use a complex password with a combination of lower case, upper case, alphanumeric and special characters. Go to the next step and create data source connection. Server Name should in the format of [Instance Name IP Address].[Port] This credential should have permission onto the SQL Server where we will point out external tables. Connect to the instance, create a login, and provide read permission to the user on WideWorldImporters database. 1234567 CREATE LOGIN [DemoSQL2019] WITH PASSWORD=N'f3EzbtBSXu7iNaKdtRXd+soU0ab6Pwu6BSfMOI7jqms=', DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFFGOCREATE USER [DemoSQL2019] FOR LOGIN [DemoSQL2019]GOALTER ROLE [db_datareader] ADD MEMBER [DemoSQL2019]GO Now we can go further and see that no login failure issue occurred. Select the table from the database. We see here that the mapping as below: Source Table: Sales.Invoices Destination table: dbo.invoices It automatically selects the destination schema as dbo since sales schema does not exist in our database. Therefore, let us create the schema and refresh the schema list to show it here: Select the sales schema from the drop-down. Click Next to view summary: Click on Create to configure an external table. Once the external table is created, we can access the data from it.

Catalog views for PolyBase

We can view the external tables in using the catalog view sys.external_tables. It shows all the external tables in the current database: 1234 SELECT execution_id, status,st.text, dr.total_elapsed_time FROM sys.dm_exec_distributed_requests dr cross apply sys.dm_exec_sql_text(sql_handle) st ORDER BY total_elapsed_time DESC;
We can also get the information about the data source using the catalog view sys.external_data_sources. Using below query, we can see the name of the data source, location (location contains database and instance IP address along with the instance port address): 1 SELECT name, location, type FROM sys.external_data_sources

Understand the script generated by the wizard

The Create an External table wizard can also be used to create scripts for the whole process. Let us understand the script generated by the Azure Data Studio for creating an external table in the above example by breaking the query into multiple parts. Below query creates the Master Key Encryption In this step, it created the database scoped credential. Create external data source pointing to SQL Server. In this external data source query, we need to specify the location in the format of ://[:]. Since we are creating the external table for SQL Server, we need to specify the vendor as ‘sqlserver’. We also need to specify the port address with the colon. For example, in below query, we specified port number as 5290. In the below section, we will create an external table. We need to create an external table similar to the relational database table with the column properties. We also need to specify a location for the object along with the data source. For example, in below query, we specify the location as [WideWorldImporters].[Sales].[Invoices] and DATA_SOURCE as [SQLServer]

PUSHDOWN in PolyBase

We normally use predicates in the query in order to get a subset of the rows from the table. This subset allows pulling the records based on the conditions defined with where clause. These predicates can be as following as per the docs. Binary comparison operators ( <, >, =, !=, <>, >=, <= ) for numeric, date, and time values. Arithmetic operators ( +, -, *, /, % ). Logical operators (AND, OR). Unary operators (NOT, IS NULL, IS NOT NULL). In PolyBase, we can use pushdown to improve the performance of the query for the external table. Mostly, we use this feature for the scale-out cluster cases where we can see significant improvement of the query performance. In this example, we will be using the standalone PolyBase configuration. When we create an external data source for external table, we have the option to specify the value for PUSHDOWN as ON or OFF. The default value for pushdown is ON. Therefore, we do not need to specify a pushdown value if we want to enable it. Using PUSHDOWN, we can choose to move the computation to source system or not. The syntax for an external data source with pushdown is as below: 123456 CREATE EXTERNAL DATA SOURCE [DataSourceName]WITH ( LOCATION = sqlserver://SqlServer,-- PUSHDOWN = ON OFF, CREDENTIAL = Credentials); We have already created a data source with a default value (Pushdown=ON) for the external table pointing to another SQL Server instance. Therefore, we will run the query with predicate with and without pushdown. To disable, pushdown we can use the predicate OPTION (DISABLE EXTERNAL PUSHDOWN) in the query. Similarly, while creating the external data source if we disabled the pushdown, we can enable it while running the query as OPTION (FORCE EXTERNALPUSHDOWN); Let us run the query and see the difference in performance. Execute query with predicate and enabling Pushdown: In this query, we do not specify OPTION (FORCE EXTERNALPUSHDOWN)since it is by default enabled in the data source. 12345678910 SELECT [InvoiceID],[CustomerID],[BillToCustomerID],[OrderID] ,[DeliveryMethodID],[ContactPersonID],[AccountsPersonID],[SalespersonPersonID] ,[PackedByPersonID] ,[InvoiceDate],[CustomerPurchaseOrderNumber],[IsCreditNote] ,[CreditNoteReason],[Comments],[DeliveryInstructions],[InternalComments] ,[TotalDryItems] ,[TotalChillerItems] ,[DeliveryRun] ,[RunPosition],[LastEditedBy] ,[LastEditedWhen] FROM [SQLShackDemo].[sales].[Invoices] where InvoiceID>250 order by CustomerID Query with predicate without pushdown: In this query, we disabled the pushdown with predicates OPTION (DISABLE EXTERNALPUSHDOWN): 1234567891011 SELECT [InvoiceID],[CustomerID],[BillToCustomerID],[OrderID] ,[DeliveryMethodID],[ContactPersonID],[AccountsPersonID],[SalespersonPersonID] ,[PackedByPersonID] ,[InvoiceDate],[CustomerPurchaseOrderNumber],[IsCreditNote] ,[CreditNoteReason],[Comments],[DeliveryInstructions],[InternalComments] ,[TotalDryItems] ,[TotalChillerItems] ,[DeliveryRun] ,[RunPosition],[LastEditedBy] ,[LastEditedWhen] FROM [SQLShackDemo].[sales].[Invoices] where InvoiceID>250 order by CustomerID OPTION(DISABLE EXTERNALPUSHDOWN) We can see here the query without pushdown took 30.524 seconds while query with pushdown took 19.754 seconds so there is a significant performance improvement with this approach. PUSHDOWN allows moving computation source, which we can see improvement in performance.

Conclusion

In this latest article in our series, we have learned to create an external table for SQL Server data source with the Azure Data Studio Create external table wizard along with T-SQL as well. We also learned about the PushDown approach for computation queries. In the next series of the article, we will explore more on PolyBase for different data sources.

Table of contents

Enhanced PolyBase SQL 2019 – Installation and basic overview Enhanced PolyBase SQL 2019 – External tables for Oracle DB Enhanced PolyBase SQL 2019 – External tables using t-SQL Enhanced PolyBase SQL 2019 – External tables SQL Server Catalog view and PushDown Enhanced PolyBase SQL 2019 – MongoDB and external table
Author Recent Posts Rajendra GuptaHi! I am Rajendra Gupta, Database Specialist and Architect, helping organizations implement Microsoft SQL Server, Azure, Couchbase, AWS solutions fast and efficiently, fix related issues, and Performance Tuning with over 14 years of experience.

I am the author of the book "DP-300 Administering Relational Database on Microsoft Azure". I published more than 650 technical articles on MSSQLTips, SQLShack, Quest, CodingSight, and SeveralNines.

I am the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups.

Based on my contribution to the SQL Server community, I have been recognized as the prestigious Best Author of the Year continuously in 2019, 2020, and 2021 (2nd Rank) at SQLShack and the MSSQLTIPS champions award in 2020.

Personal Blog: https://www.dbblogger.com
I am always interested in new challenges so if you need consulting help, reach me at [email protected]

View all posts by Rajendra Gupta Latest posts by Rajendra Gupta (see all) Copy data from AWS RDS SQL Server to Azure SQL Database - October 21, 2022 Rename on-premises SQL Server database and Azure SQL database - October 18, 2022 SQL Commands to check current Date and Time (Timestamp) in SQL Server - October 7, 2022

Related posts

Enhanced PolyBase SQL 2019 – External tables using t-SQL Enhanced PolyBase SQL 2019 – External tables for Oracle DB Enhanced PolyBase SQL 2019 – MongoDB and external table Enhanced PolyBase SQL 2019 – Installation and basic overview Query Amazon Athena external tables using SQL Server 10,529 Views

Follow us

Popular

SQL Convert Date functions and formats SQL Variables: Basics and usage SQL PARTITION BY Clause overview Different ways to SQL delete duplicate rows from a SQL Table How to UPDATE from a SELECT statement in SQL Server SQL Server functions for converting a String to a Date SELECT INTO TEMP TABLE statement in SQL Server SQL WHILE loop with simple examples How to backup and restore MySQL databases using the mysqldump command CASE statement in SQL Overview of SQL RANK functions Understanding the SQL MERGE statement INSERT INTO SELECT statement overview and examples SQL multiple joins for beginners with examples Understanding the SQL Decimal data type DELETE CASCADE and UPDATE CASCADE in SQL Server foreign key SQL Not Equal Operator introduction and examples SQL CROSS JOIN with examples The Table Variable in SQL Server SQL Server table hints – WITH (NOLOCK) best practices

Trending

SQL Server Transaction Log Backup, Truncate and Shrink Operations Six different methods to copy tables between databases in SQL Server How to implement error handling in SQL Server Working with the SQL Server command line (sqlcmd) Methods to avoid the SQL divide by zero error Query optimization techniques in SQL Server: tips and tricks How to create and configure a linked server in SQL Server Management Studio SQL replace: How to replace ASCII special characters in SQL Server How to identify slow running queries in SQL Server SQL varchar data type deep dive How to implement array-like functionality in SQL Server All about locking in SQL Server SQL Server stored procedures for beginners Database table partitioning in SQL Server How to drop temp tables in SQL Server How to determine free space and file size for SQL Server databases Using PowerShell to split a string into an array KILL SPID command in SQL Server How to install SQL Server Express edition SQL Union overview, usage and examples

Solutions

Read a SQL Server transaction logSQL Server database auditing techniquesHow to recover SQL Server data from accidental UPDATE and DELETE operationsHow to quickly search for SQL database data and objectsSynchronize SQL Server databases in different remote sourcesRecover SQL data from a dropped table without backupsHow to restore specific table(s) from a SQL Server database backupRecover deleted SQL data from transaction logsHow to recover SQL Server data from accidental updates without backupsAutomatically compare and synchronize SQL Server dataOpen LDF file and view LDF file contentQuickly convert SQL code to language-specific client codeHow to recover a single table from a SQL Server database backupRecover data lost due to a TRUNCATE operation without backupsHow to recover SQL Server data from accidental DELETE, TRUNCATE and DROP operationsReverting your SQL Server database back to a specific point in timeHow to create SSIS package documentationMigrate a SQL Server database to a newer version of SQL ServerHow to restore a SQL Server database backup to an older version of SQL Server

Categories and tips

►Auditing and compliance (50) Auditing (40) Data classification (1) Data masking (9) Azure (295) Azure Data Studio (46) Backup and restore (108) ►Business Intelligence (482) Analysis Services (SSAS) (47) Biml (10) Data Mining (14) Data Quality Services (4) Data Tools (SSDT) (13) Data Warehouse (16) Excel (20) General (39) Integration Services (SSIS) (125) Master Data Services (6) OLAP cube (15) PowerBI (95) Reporting Services (SSRS) (67) Data science (21) ►Database design (233) Clustering (16) Common Table Expressions (CTE) (11) Concurrency (1) Constraints (8) Data types (11) FILESTREAM (22) General database design (104) Partitioning (13) Relationships and dependencies (12) Temporal tables (12) Views (16) ▼Database development (418) Comparison (4) Continuous delivery (CD) (5) Continuous integration (CI) (11) Development (146) Functions (106) Hyper-V (1) Search (10) Source Control (15) SQL unit testing (23) Stored procedures (34) String Concatenation (2) Synonyms (1) Team Explorer (2) Testing (35) Visual Studio (14) DBAtools (35) DevOps (23) DevSecOps (2) Documentation (22) ETL (76) ►Features (213) Adaptive query processing (11) Bulk insert (16) Database mail (10) DBCC (7) Experimentation Assistant (DEA) (3) High Availability (36) Query store (10) Replication (40) Transaction log (59) Transparent Data Encryption (TDE) (21) Importing, exporting (51) Installation, setup and configuration (121) Jobs (42) ►Languages and coding (686) Cursors (9) DDL (9) DML (6) JSON (17) PowerShell (77) Python (37) R (16) SQL commands (196) SQLCMD (7) String functions (21) T-SQL (275) XML (15) Lists (12) Machine learning (37) Maintenance (99) Migration (50) Miscellaneous (1) ►Performance tuning (869) Alerting (8) Always On Availability Groups (82) Buffer Pool Extension (BPE) (9) Columnstore index (9) Deadlocks (16) Execution plans (125) In-Memory OLTP (22) Indexes (79) Latches (5) Locking (10) Monitoring (100) Performance (196) Performance counters (28) Performance Testing (9) Query analysis (121) Reports (20) SSAS monitoring (3) SSIS monitoring (10) SSRS monitoring (4) Wait types (11) ►Professional development (68) Professional development (27) Project management (9) SQL interview questions (32) Recovery (33) Security (84) Server management (24) SQL Azure (271) SQL Server Management Studio (SSMS) (90) SQL Server on Linux (21) ▼SQL Server versions (177) SQL Server 2012 (6) SQL Server 2016 (63) SQL Server 2017 (49) SQL Server 2019 (57) SQL Server 2022 (2) ▼Technologies (334) AWS (45) AWS RDS (56) Azure Cosmos DB (28) Containers (12) Docker (9) Graph database (13) Kerberos (2) Kubernetes (1) Linux (44) LocalDB (2) MySQL (49) Oracle (10) PolyBase (10) PostgreSQL (36) SharePoint (4) Ubuntu (13) Uncategorized (4) Utilities (21) Helpers and best practices BI performance counters SQL code smells rules SQL Server wait types © 2022 Quest Software Inc. ALL RIGHTS RESERVED. GDPR Terms of Use Privacy
Share:
0 comments

Comments (0)

Leave a Comment

Minimum 10 characters required

* All fields are required. Comments are moderated before appearing.

No comments yet. Be the first to comment!

Enhanced PolyBase SQL 2019 External tables SQL Server Catalog view and PushDown | Trend Now | Trend Now