Enhanced PolyBase SQL 2019 External tables SQL Server Catalog view and PushDown
Enhanced PolyBase SQL 2019 - External tables SQL Server, Catalog view and PushDown
This opens up the wizard to create the external tables. Recently, I faced an issue where the wizard stuck in the ‘step 1’. Progress bar icon keeps rotating and does not show any error message or any progress.
After some time, we get the error message ‘Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.’ This is the general error message and does not point to clear error message. I tried to close the wizard multiple times and launch it again but it remains the same. It does not show any error message to troubleshoot it further. Later, during the investigation, I figured out that PolyBase services were in stopped condition. SQL Server PolyBase Data Management Service SQL Server PolyBase Engine Let us start these services. Now again launch the ‘Create External Table’ wizard in Azure Data Studio. This starts the wizard successfully. Therefore, monitor the service status before launching the wizard to avoid any issues.
We can also get the information about the data source using the catalog view sys.external_data_sources. Using below query, we can see the name of the data source, location (location contains database and instance IP address along with the instance port address): 1 SELECT name, location, type FROM sys.external_data_sources
://[:]. Since we are creating the external table for SQL Server, we need to specify the vendor as ‘sqlserver’. We also need to specify the port address with the colon. For example, in below query, we specified port number as 5290. In the below section, we will create an external table. We need to create an external table similar to the relational database table with the column properties. We also need to specify a location for the object along with the data source. For example, in below query, we specify the location as [WideWorldImporters].[Sales].[Invoices] and DATA_SOURCE as [SQLServer]
Author Recent Posts Rajendra GuptaHi! I am Rajendra Gupta, Database Specialist and Architect, helping organizations implement Microsoft SQL Server, Azure, Couchbase, AWS solutions fast and efficiently, fix related issues, and Performance Tuning with over 14 years of experience.
I am the author of the book "DP-300 Administering Relational Database on Microsoft Azure". I published more than 650 technical articles on MSSQLTips, SQLShack, Quest, CodingSight, and SeveralNines.
I am the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups.
Based on my contribution to the SQL Server community, I have been recognized as the prestigious Best Author of the Year continuously in 2019, 2020, and 2021 (2nd Rank) at SQLShack and the MSSQLTIPS champions award in 2020.
Personal Blog: https://www.dbblogger.com
I am always interested in new challenges so if you need consulting help, reach me at [email protected]
View all posts by Rajendra Gupta Latest posts by Rajendra Gupta (see all) Copy data from AWS RDS SQL Server to Azure SQL Database - October 21, 2022 Rename on-premises SQL Server database and Azure SQL database - October 18, 2022 SQL Commands to check current Date and Time (Timestamp) in SQL Server - October 7, 2022
SQLShack
SQL Server training EspañolEnhanced PolyBase SQL 2019 – External tables SQL Server Catalog view and PushDown
November 6, 2018 by Rajendra Gupta This article is part 4 of the series for SQL Server 2019 Enhanced PolyBase. Let quickly recap the previous articles. Part 1: We installed SQL Server 2019 PolyBase feature along with Azure Data Studio and SQL Server 2019 preview extension to explore its features Part 2: In this part, we learned to create an External table using Azure Data Studio ‘External table wizard’ for the Oracle data source Part 3: We learned the useful features of External tables like joins and created an external table using t-SQL instead of the GUI mode for Oracle database in this series article We have learned earlier that PolyBase in SQL Server 2019 Preview allows access to various data sources such as SQL Server, Oracle, MongoDB, Teradata, and ODBC based sources etc. Azure Data Studio SQL Server 2019 preview extension currently supports for SQL Server and Oracle data sources only from the External table wizard. In this series, we will create an external table for SQL Server and explore some more features around it. Launch Azure Data Studio and connect to the SQL Server 2019 preview instance. Right click on the database and launch ‘Create External Table’.This opens up the wizard to create the external tables. Recently, I faced an issue where the wizard stuck in the ‘step 1’. Progress bar icon keeps rotating and does not show any error message or any progress.
After some time, we get the error message ‘Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.’ This is the general error message and does not point to clear error message. I tried to close the wizard multiple times and launch it again but it remains the same. It does not show any error message to troubleshoot it further. Later, during the investigation, I figured out that PolyBase services were in stopped condition. SQL Server PolyBase Data Management Service SQL Server PolyBase Engine Let us start these services. Now again launch the ‘Create External Table’ wizard in Azure Data Studio. This starts the wizard successfully. Therefore, monitor the service status before launching the wizard to avoid any issues.
External table for SQL Server
In this section, we will use the below source and destination instances. Source Instance (here we will create external table): SQL Server 2019 (Named instance – SQL2019) Destination Instance (External table will point here): SQL Server 2019 (Default instance – MSSQLSERVER) Click on the ‘SQL Server’ in the data source type of wizard and proceed to the next step. In the next step, create the Database Master Key to secure the credentials used by the external data source. We should use a complex password with a combination of lower case, upper case, alphanumeric and special characters. Go to the next step and create data source connection. Server Name should in the format of [Instance Name IP Address].[Port] This credential should have permission onto the SQL Server where we will point out external tables. Connect to the instance, create a login, and provide read permission to the user on WideWorldImporters database. 1234567 CREATE LOGIN [DemoSQL2019] WITH PASSWORD=N'f3EzbtBSXu7iNaKdtRXd+soU0ab6Pwu6BSfMOI7jqms=', DEFAULT_DATABASE=[master], DEFAULT_LANGUAGE=[us_english], CHECK_EXPIRATION=OFF, CHECK_POLICY=OFFGOCREATE USER [DemoSQL2019] FOR LOGIN [DemoSQL2019]GOALTER ROLE [db_datareader] ADD MEMBER [DemoSQL2019]GO Now we can go further and see that no login failure issue occurred. Select the table from the database. We see here that the mapping as below: Source Table: Sales.Invoices Destination table: dbo.invoices It automatically selects the destination schema as dbo since sales schema does not exist in our database. Therefore, let us create the schema and refresh the schema list to show it here: Select the sales schema from the drop-down. Click Next to view summary: Click on Create to configure an external table. Once the external table is created, we can access the data from it.Catalog views for PolyBase
We can view the external tables in using the catalog view sys.external_tables. It shows all the external tables in the current database: 1234 SELECT execution_id, status,st.text, dr.total_elapsed_time FROM sys.dm_exec_distributed_requests dr cross apply sys.dm_exec_sql_text(sql_handle) st ORDER BY total_elapsed_time DESC;We can also get the information about the data source using the catalog view sys.external_data_sources. Using below query, we can see the name of the data source, location (location contains database and instance IP address along with the instance port address): 1 SELECT name, location, type FROM sys.external_data_sources
Understand the script generated by the wizard
The Create an External table wizard can also be used to create scripts for the whole process. Let us understand the script generated by the Azure Data Studio for creating an external table in the above example by breaking the query into multiple parts. Below query creates the Master Key Encryption In this step, it created the database scoped credential. Create external data source pointing to SQL Server. In this external data source query, we need to specify the location in the format ofPUSHDOWN in PolyBase
We normally use predicates in the query in order to get a subset of the rows from the table. This subset allows pulling the records based on the conditions defined with where clause. These predicates can be as following as per the docs. Binary comparison operators ( <, >, =, !=, <>, >=, <= ) for numeric, date, and time values. Arithmetic operators ( +, -, *, /, % ). Logical operators (AND, OR). Unary operators (NOT, IS NULL, IS NOT NULL). In PolyBase, we can use pushdown to improve the performance of the query for the external table. Mostly, we use this feature for the scale-out cluster cases where we can see significant improvement of the query performance. In this example, we will be using the standalone PolyBase configuration. When we create an external data source for external table, we have the option to specify the value for PUSHDOWN as ON or OFF. The default value for pushdown is ON. Therefore, we do not need to specify a pushdown value if we want to enable it. Using PUSHDOWN, we can choose to move the computation to source system or not. The syntax for an external data source with pushdown is as below: 123456 CREATE EXTERNAL DATA SOURCE [DataSourceName]WITH ( LOCATION = sqlserver://SqlServer,-- PUSHDOWN = ON OFF, CREDENTIAL = Credentials); We have already created a data source with a default value (Pushdown=ON) for the external table pointing to another SQL Server instance. Therefore, we will run the query with predicate with and without pushdown. To disable, pushdown we can use the predicate OPTION (DISABLE EXTERNAL PUSHDOWN) in the query. Similarly, while creating the external data source if we disabled the pushdown, we can enable it while running the query as OPTION (FORCE EXTERNALPUSHDOWN); Let us run the query and see the difference in performance. Execute query with predicate and enabling Pushdown: In this query, we do not specify OPTION (FORCE EXTERNALPUSHDOWN)since it is by default enabled in the data source. 12345678910 SELECT [InvoiceID],[CustomerID],[BillToCustomerID],[OrderID] ,[DeliveryMethodID],[ContactPersonID],[AccountsPersonID],[SalespersonPersonID] ,[PackedByPersonID] ,[InvoiceDate],[CustomerPurchaseOrderNumber],[IsCreditNote] ,[CreditNoteReason],[Comments],[DeliveryInstructions],[InternalComments] ,[TotalDryItems] ,[TotalChillerItems] ,[DeliveryRun] ,[RunPosition],[LastEditedBy] ,[LastEditedWhen] FROM [SQLShackDemo].[sales].[Invoices] where InvoiceID>250 order by CustomerID Query with predicate without pushdown: In this query, we disabled the pushdown with predicates OPTION (DISABLE EXTERNALPUSHDOWN): 1234567891011 SELECT [InvoiceID],[CustomerID],[BillToCustomerID],[OrderID] ,[DeliveryMethodID],[ContactPersonID],[AccountsPersonID],[SalespersonPersonID] ,[PackedByPersonID] ,[InvoiceDate],[CustomerPurchaseOrderNumber],[IsCreditNote] ,[CreditNoteReason],[Comments],[DeliveryInstructions],[InternalComments] ,[TotalDryItems] ,[TotalChillerItems] ,[DeliveryRun] ,[RunPosition],[LastEditedBy] ,[LastEditedWhen] FROM [SQLShackDemo].[sales].[Invoices] where InvoiceID>250 order by CustomerID OPTION(DISABLE EXTERNALPUSHDOWN) We can see here the query without pushdown took 30.524 seconds while query with pushdown took 19.754 seconds so there is a significant performance improvement with this approach. PUSHDOWN allows moving computation source, which we can see improvement in performance.Conclusion
In this latest article in our series, we have learned to create an external table for SQL Server data source with the Azure Data Studio Create external table wizard along with T-SQL as well. We also learned about the PushDown approach for computation queries. In the next series of the article, we will explore more on PolyBase for different data sources.Table of contents
Enhanced PolyBase SQL 2019 – Installation and basic overview Enhanced PolyBase SQL 2019 – External tables for Oracle DB Enhanced PolyBase SQL 2019 – External tables using t-SQL Enhanced PolyBase SQL 2019 – External tables SQL Server Catalog view and PushDown Enhanced PolyBase SQL 2019 – MongoDB and external tableAuthor Recent Posts Rajendra GuptaHi! I am Rajendra Gupta, Database Specialist and Architect, helping organizations implement Microsoft SQL Server, Azure, Couchbase, AWS solutions fast and efficiently, fix related issues, and Performance Tuning with over 14 years of experience.
I am the author of the book "DP-300 Administering Relational Database on Microsoft Azure". I published more than 650 technical articles on MSSQLTips, SQLShack, Quest, CodingSight, and SeveralNines.
I am the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups.
Based on my contribution to the SQL Server community, I have been recognized as the prestigious Best Author of the Year continuously in 2019, 2020, and 2021 (2nd Rank) at SQLShack and the MSSQLTIPS champions award in 2020.
Personal Blog: https://www.dbblogger.com
I am always interested in new challenges so if you need consulting help, reach me at [email protected]
View all posts by Rajendra Gupta Latest posts by Rajendra Gupta (see all) Copy data from AWS RDS SQL Server to Azure SQL Database - October 21, 2022 Rename on-premises SQL Server database and Azure SQL database - October 18, 2022 SQL Commands to check current Date and Time (Timestamp) in SQL Server - October 7, 2022