execsql.py is a Python module and program that allows you to apply a SQL query script stored in a text file to an Access database. Special commands can be embedded in SQL comments that will write text and export query results to the console or to disk files. In addition, the output of the last SQL query in the script can be automatically saved in a CSV file or routed to—and then viewed in—an Excel worksheet.
Contents
- Capabilities
- Syntax and options
- Usage notes
- Creating temporary queries
- SQL syntax notes
- Metacommands
- Using script files
- Requirements
- Examples
- Copyright and license
Capabilities
You can use the execsql program to:
- Execute a complex query script that selects and summarizes data (see the examples), and capture the output in a comma-separated-value (CSV) or tab-delimited text file. The CSV file can then be used as input to other software, including statistical, spreadsheet, and database applications.
- Execute a query script that selects and summarizes data, and view the results directly in Excel.
- Execute SQL Data Definition Language (DDL) and Data Manipulation Language (DML) commands, stored in a text file, to create database tables and alter data in the database. Using execsql and other tools such as batch files and the Windows task scheduler, these operations can be set up so that they occur automatically without the need for any interactive use of Access.
- Execute "CREATE QUERY..." and "CREATE TEMPORARY QUERY..." DDL statements, which are not natively supported by Access. This feature allows you to maintain a standard set of queries, or task-specific queries, in a text file and easily add them to any Access database. Support for "CREATE TEMPORARY QUERY..." DDL is also important to allow non-trivial data summarization scripts; use of this feature is described in more detail below.
- Write messages to the console or to a file during the processing of a SQL script, using metacommands embedded in SQL comments within the script.
- Export data, write messages, or halt script processing based on the results of SQL statements using conditional metacommands embedded in SQL statements.
Syntax and Options
Usage Notes
- If the program is run without any arguments it will print a help message simiar to the usage description above.
- Script files can contain comments, which are identified by two dashes ("--") at the start of a line.
- With the exception of the "CREATE TEMPORARY QUERY..." statement, the execsql program does not parse or interpret SQL syntax in any way. The program recognizes a SQL statement as consisting of any sequence of non-comment lines that ends with a line ending with a semicolon. SQL syntax used in the script must conform to that recognized by the Jet database engine.
- Only the results of the final query in the script will be saved as CSV or routed to Excel. If the final query includes the "CREATE TEMPORARY QUERY..." prefix, then no output will be produced regardless of any output specification that is provided on the command line. If the final query is a DDL command, or a DML command other than a SELECT, UNION, or TRANSFORM statement, the output may not be produced, or may be empty or meaningless.
- The program will not automatically add the ".csv" file extension to the output file name given. Specifying an output file name with a different extension (e.g., ".xls") will not alter the format of the output from CSV.
- If Excel output is desired, the output specification must be literally "Excel"—proper case matters. If, for example, the output specification is given as "excel", then a CSV file named "excel" will be created on disk. If query output is sent to Excel, the Excel workbook that is created will be activated (i.e., made visible) after the query script is complete. The Excel workbook will not be automatically saved as a file. If the Excel output format is used, bear in mind that Excel has row and column limitations, whereas CSV files do not.
- Metacommands can be embedded in SQL comments to export data and carry out other actions during the course of the script. These metacommands are identified by the token "!x!" immediately following the SQL comment characters at the beginning of a line. The special commands that are available are listed in the section titled Metacommands.
- The COM library for Excel is very slow. Unless only a very small amount of query output is expected, creating a CSV file and then opening it in Excel will generally be faster than routing the data directly to Excel in memory.
- Scripts that use temporary queries will result in those queries being created in the Access database, and then removed, every time the scripts are run. This can lead to a gradual increase in the size of the Access database file. Access' "Compact and Repair" tool can be run to reclaim the space that is added to the .mdb file when temporary queries are created.
Support for CREATE TEMPORARY QUERY DDL
The syntax of the "CREATE TEMPORARY QUERY" DDL supported by execsql is:
The "TEMPORARY" specification is optional: if it is included, the query will be deleted after the entire script has been executed, and if it is not, the query will remain defined in the database after the script completes. If a query of the same name is already defined in the Access database when the script runs, the existing query will be deleted before the new one is created—no check is performed to determine whether the new and old queries have the same definition, and no warning is issued by execsql that a query definition has been replaced.
The keyword "VIEW" can be used in place of the keyword "QUERY". This alternative provides compatibility with the "CREATE TEMPORARY VIEW" command in PostgreSQL, and minimizes the need to edit any scripts that are intended to be run against both Access and PostgreSQL databases.
SQL Syntax Notes
The version of SQL that is used by the Jet engine when accessed via DAO or ODBC, and thus that must be used in the script files executed with execsql, is generally equivalent to that used within Access itself, but is not identical, and is also not the same in all respects as standard SQL. There are also differences in the SQL syntax accepted by the DAO and ODBC interfaces. To help avoid inconsistencies and errors, here are a few points to keep in mind when creating SQL scripts for use with execsql:
- The Jet engine can fail to correctly parse multi-table JOIN expressions. In these cases you will need to give it some help by parenthesizing parts of the JOIN expression. This means that you have some responsibility for constructing optimized (or at least acceptably good) SQL.
- Visual Basic for Applications (VBA) functions that you might use in Access are not all available via DAO or ODBC. Sometimes these can be worked around with slightlier lengthier code. For example, the 'Nz()' function is not available, but it can be replaced with an expression such as 'Iif([Column] is null, 0, [Column])'.
- Literal string values in SQL statements should be enclosed in single quotes, not double quotes. Although Access allows double quotes to be used, the SQL standard and the connector libraries used for execsql require that single quotes be used.
- Expressions that should produce a floating-point result ('Double') sometimes do not, with the output being truncated or rounded to an integer. A workaround is to multiply and then divide the expression by the same floating-point number; for example: '1.00000001 * <expression> / 1.00000001'.
- The wildcard character used with the LIKE expression must be "*" when creating temporary queries and "%" otherwise. The COM interface, like Access itself, requires "*", whereas the SQL standard and the ODBC interface require "%".
- The BEGIN TRANSACTION statement (and COMMIT, END, and ROLLBACK statements) are not supported as individual SQL statements.
Metacommands
The execsql program supports a limited number of special commands that allow the following actions to be taken at certain points within the script file:
- Include the contents of another SQL command file.
- Write arbitrary text out to the console or to a file.
- Export a view (query or table) to the console or a file in either CSV or tab-delimited format.
- Execute a view (query).
- Stop script processing.
The metacommands must be embedded in SQL comments, and are identified by the token "!x!" immediately following the comment characters at the beginning of the line. Each of the metacommands must be completely on a single line. These commands are as follows.
The SQL commands in the named file will be evaluated in the context of the 'include' statement. All 'include' metacommands are evaluated before any SQL statements or other metacommands, so if all of the 'include' commands cannot be carried out (e.g., if a named file does not exist), then no SQL statements or metacommands will be executed.
String replacement will occur on all following lines of the file before they are evaluated in any other way. Every occurrence of the <match_str>, when immediately preceded and followed by two exclamation points ("!!"), will be replaced by <repl_str>. Substitutions are processed in the order in which they are defined.
The text to be written must be enclosed in double quotes. If no filename is specified, the text will be written to the console. Text will always be appended to any existing file of the given name.
The view name given must be an existing table or query, though it may be one that was just created by a preceding SQL statement. The output filename specified will be overwritten if it exists unless the APPEND keyword is included. If the output name is given as "stdout", the data will be sent to the console instead of to a file. The format specification controls whether the table is written as CSV, tab-delimited format, or text with data aligned in columns.
Executes the specified view (query). This command is intended specifically to be used within a conditional statement, to allow SQL commands to be executed only under specific conditions. The view referenced in this command should be an INSERT, UPDATE, or DELETE statement—executing a SELECT statement in this context would have no purpose. See the 'IF' metacommands listed below.
Script processing is halted, and the execsql.py program terminates.
Execution of the commands to write text or export a data table can be dependent on a test of whether a particular view (table or query) has a non-zero number of rows.
The specified metacommand is executed only if the previous statement generated an error. Errors will result from badly-formed SQL, lack of permissions, or database locks. A query (e.g., an update query) that does not do exactly what you expect it to will not necessarily cause an error to occur that can be evaluated with this statement.
Examples of the use of these commands are shown in Example 2. Future versions of the execsql program will expand on the number and type of commands and conditional tests that are available.
Using Script Files
Using script files to store and execute task-specific queries has a number of advantages over using separate individual queries within the Access database itself:
- Maintenance and management of queries is easier and more reliable. Because all of the SQL needed for a specific data summarization task is kept together in one text file, there is little or no risk that one of the invididual queries needed to complete a specific task will be either deleted or altered. The clutter of queries that can accumulate in an Access database can be substantially reduced or eliminated.
- Creation of a set of queries for a new task is much easier: the script file can be simply copied and edited. Search and replace operations can be used to easily and reliably make changes throughout the entire set of queries that is needed for a particular task.
- Complete documentation can (and should!) be included in the script files, so that the purpose, assumptions, limitations, and history of changes can be easily reviewed by anybody who might consider using or modifying the query script.
- The query script can be more easily preserved to document the way in which data were selected or summarized. For example, a query script that is used for a particular analysis may be kept in a directory with other files associated with that analysis. Query scripts can be easily archived or backed up independently of the database. Script files can be made read-only so that they cannot easily or accidentally be modified after the script for a particular task has been finalized.
- The ability of execsql to automatically generate CSV or Excel output reduces the amount of time that might otherwise be required to interactively open the database, run the appropriate query (not to mention verifying that the query, or any queries that it depends on, have not been altered), and export the result. If the query output will be further processed or used in another scriptable application (e.g., to produce graphics or statistics using R), the execsql program can be combined with other programs in a batch file to further automate the data summarization and analysis process.
- If a database must be maintained in two different formats (e.g., in PostgreSQL for ordinary use, but downloaded to Access for use when a network connection is not available), one script file can potentially be used to carry out exactly the same data selection and summarization operations on both formats of the database.
Requirements
The execsql program uses the Common Object Model (COM) to communicate with Access (for the purpose of creating and deleting temporary queries) and Excel. Python does not include a standard library that supports COM, and so execsql uses Mark Hammond's pywin32 extension for Windows to provide COM support. A version of pywin32 compatible with your version of Python must be installed so that execsql can be used.
This program also uses an ODBC connection to communicate with Access because the COM connection does not provide sufficient information about errors generated by the Jet database engine that occur when it cannot process or complete a SQL command. The pyodbc library is used for this connection, and so it must also be installed.
Examples
Example 1: Use Temporary Queries to Select and Summarize Data
This example illustrates a script that makes use of several temporary queries to select and summarize data, and a final query that prepares the data for export or further use. This example is intended to be run against an EQuIS database.
During the execution of this script, the temporary queries will be created in the database. When the script concludes, the temporary queries will be removed. Nothing except the data itself need be kept in the database to use a script like this one.
Example 2: Execute a Set of QA Queries and Capture the Results
This example illustrates a script that creates several temporary queries to check the codes that are used in a set of staging tables against the appropriate dictionary tables, and, if there are unrecognized codes, writes them out to a text file.
Copyright and License
Copyright (c) 2007-2009, R.Dreas Nielsen
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The GNU General Public License is available at http://www.gnu.org/licenses/.