execsql.py is a Python module and program that allows you to apply a SQL query script stored in a text file to an Access database. The output of the last SQL query in the script can be automatically saved in a CSV file or routed to—and then viewed in—an Excel worksheet.
Capabilities
You can use the execsql program to:
- Execute a complex query script that selects and summarizes data (see the example), and capture the output in a comma-separated-value (CSV) file. The CSV file can then be used as input to other software, including statistical, spreadsheet, and database applications.
- Execute a query script that selects and summarizes data, and view the results directly in Excel.
- Execute SQL Data Definition Language (DDL) and Data Manipulation Language (DML) commands, stored in a text file, to create database tables and alter data in the database. Using execsql and other tools such as batch files and the Windows task scheduler, these operations can be set up so that they occur automatically without the need for any interactive use of Access.
- Execute "CREATE QUERY..." and "CREATE TEMPORARY QUERY..." DDL statements, which are not natively supported by Access. This feature allows you to maintain a standard set of queries, or task-specific queries, in a text file and easily add them to any Access database. Support for "CREATE TEMPORARY QUERY..." DDL is also important to allow non-trivial data summarization scripts; use of this feature is described in more detail below.
Syntax and Options
Notes
- If the program is run without any arguments it will print a help message simiar to the usage description above.
- Script files can contain comments, which are identified by two dashes ("--") at the start of a line.
- With the exception of the "CREATE TEMPORARY QUERY..." statement, the execsql program does not parse or interpret SQL syntax in any way. The program recognizes a SQL statement as consisting of any sequence of non-comment lines that ends with a line ending with a semicolon. SQL syntax used in the script must conform to that recognized by the Jet database engine.
- Only the results of the final query in the script will be saved as CSV or routed to Excel. If the final query includes the "CREATE TEMPORARY QUERY..." prefix, then no output will be produced regardless of any output specification that is provided on the command line. If the final query is a DDL command, or a DML command other than a SELECT, UNION, or TRANSFORM statement, the output may not be produced, or may be empty or meaningless.
- The program will not automatically add the ".csv" file extension to the output file name given. Specifying an output file name with a different extension (e.g., ".xls") will not alter the format of the output from CSV.
- If Excel output is desired, the output specification must be literally "Excel"—proper case matters. If, for example, the output specification is given as "excel", then a CSV file named "excel" will be created on disk. If query output is sent to Excel, the Excel workbook that is created will be activated (i.e., made visible) after the query script is complete. The Excel workbook will not be automatically saved as a file. If the Excel output format is used, bear in mind that Excel has row and column limitations, whereas CSV files do not.
- The COM library for Excel is very slow. Unless only a very small amount of query output is expected, creating a CSV file and then opening it in Excel will generally be faster than routing the data directly to Excel in memory.
- Scripts that use temporary queries will result in those queries being created in the Access database, and then removed, every time the scripts are run. This can lead to a gradual increase in the size of the Access database file. Access' "Compact and Repair" tool can be run to reclaim the space that is added to the .mdb file when temporary queries are created.
Requirements
The execsql program used the Common Object Model (COM) to communicate with Access and Excel. Python does not include a standard library that supports COM, and so execsql uses Mark Hammond's pywin32 extension for Windows to provide COM support. A version of pywin32 compatible with your version of Python must be installed so that execsql can be used.
Support for CREATE TEMPORARY QUERY DDL
The syntax of the "CREATE TEMPORARY QUERY" DDL supported by execsql is:
The "TEMPORARY" specification is optional: if it is included, the query will be deleted after the entire script has been executed, and if it is not, the query will remain defined in the database after the script completes. If a query of the same name is already defined in the Access database when the script runs, the existing query will be deleted before the new one is created—no check is performed to determine whether the new and old queries have the same definition, and no warning is issued by execsql that a query definition has been replaced.
The keyword "VIEW" can be used in place of the keyword "QUERY". This alternative provides compatibility with the "CREATE TEMPORARY VIEW" command in PostgreSQL, and minimizes the need to edit any scripts that are intended to be run against both Access and PostgreSQL databases.
Using Script Files
Using script files to store and execute task-specific queries has a number of advantages over using separate individual queries within the Access database itself:
- Maintenance and management of queries is easier and more reliable. Because all of the SQL needed for a specific data summarization task is kept together in one text file, there is little or no risk that one of the invididual queries needed to complete a specific task will be either deleted or altered. The clutter of queries that can accumulate in an Access database can be substantially reduced or eliminated.
- Creation of a set of queries for a new task is much easier: the script file can be simply copied and edited. Search and replace operations can be used to easily and reliably make changes throughout the entire set of queries that is needed for a particular task.
- Complete documentation can (and should!) be included in the script files, so that the purpose, assumptions, limitations, and history of changes can be easily reviewed by anybody who might consider using or modifying the query script.
- The query script can be more easily preserved to document the way in which data were selected or summarized. For example, a query script that is used for a particular analysis may be kept in a directory with other files associated with that analysis. Query scripts can be easily archived or backed up independently of the database. Script files can be made read-only so that they cannot easily or accidentally be modified after the script for a particular task has been finalized.
- The ability of execsql to automatically generate CSV or Excel output reduces the amount of time that might otherwise be required to interactively open the database, run the appropriate query (not to mention verifying that the query, or any queries that it depends on, have not been altered), and export the result. If the query output will be further processed or used in another scriptable application (e.g., to produce graphics or statistics using R), the execsql program can be combined with other programs in a batch file to further automate the data summarization and analysis process.
- If a database must be maintained in two different formats (e.g., in PostgreSQL for ordinary use, but downloaded to Access for use when a network connection is not available), one script file can potentially be used to carry out exactly the same data selection and summarization operations on both formats of the database.
A Few Comments on SQL used with Jet and DAO
The version of SQL that is used by the Jet engine and Microsoft's Data Access Objects (DAO) methods, and thus that must be used in the script files executed with execsql, is generally equivalent to that used within Access itself, but is not identical, and is also not the same in all respects as standard SQL; it also differs in some respects from the SQL that can be used with Access via ODBC (which is more standards-compliant). A few points to bear in mind when creating SQL scripts for use with Access:
- The Jet engine can fail to correctly parse multi-table JOIN expressions. In these cases you will need to give it some help by parenthesizing parts of the JOIN expression. This means that you have some responsibility for constructing optimized (or at least acceptably good) SQL.
- Visual Basic for Applications (VBA) functions that you might use in Access are not all available via DAO. Sometimes these can be worked around with slightlier lengthier code. For example, the 'Nz()' function is not available, but it can be replaced with an expression such as 'Iif([Column] is null, 0, [Column])'.
- Expressions that should produce a floating-point result ('Double') sometimes do not, with the output being truncated or rounded to an integer. A workaround is to multiply and then divide the expression by the same floating-point number; for example: '1.00000001 * <expression> / 1.00000001'.
- The wildcard character used with the LIKE expression must be "*" rather than "%"—this is a case in which the COM interface corresponds to Access rather than to the SQL standard or the ODBC interface.
- The BEGIN TRANSACTION statement (and COMMIT, END, and ROLLBACK statements) are not supported as individual SQL statements.
Example
This example illustrates a script that makes use of several temporary queries to select and summarize data, and a final query that prepares the data for export or further use. This example is intended to be run against an EQuIS database.
During the execution of this script, the temporary queries will be created in the database. When the script concludes, the temporary queries will be removed. Nothing except the data itself need be kept in the database to use a script like this one.
Copyright and License
Copyright (c) 2008, R.Dreas Nielsen
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The GNU General Public License is available at http://www.gnu.org/licenses/.