Sql Comparing two tables: Complete grammar for SQL 2000 and 2005?

I would like to build a TSQL parser and lexer for various reasons. One of them involves complex data import procedures where we need to ensure that columns are populated before they are used as data sources or criteria for other columns. There are actually many very cool and useful things that could be done with a good TSQL parser.

My question is this: are there any grammars available online for DDL, DML, whatever you call the DBCC statements, etc.? I should think Microsoft would have to have something like an EBNF for their own parsers; would it be possible to make them public?
No. The grammar for TSQL is not public information. Why does data import procedures need to look at the code? You can verify data post or pre or during processing or declaratively using foreign keys, constraints, indexed views etc.|||Declarative programming has limits that we have exceeded. Constraints simply won't catch all of the errors that can and have happened to us. The alternative to what I describe is painstaking checking of code, work that can and therefore should be done automatically.

Another use for the parser would be to build a little editor that provides intellisense and also shows better error messages, such as in cases where a column name is ambiguous (why not use intellisense to suggest one of the valid aliases/table names?). Usage of indexes in queries could be indicated by coloring of column names, extended properties of columns could be displayed using tooltips, etc. Basically, I have a desire to bring the sort of power of Visual Studio 2005 offers C# developers to SQL developers. It would be much easier to do this with a grammar from Microsoft, instead of building my own via trial and error. To be honest, I should think Microsoft would be in favor of such tools, as they would provide additional incentive for developers to use SQL Server.

|||

I agree that a parser will help for providing more productive environment. But I just do not see how it helps to check data purity for instance. In most cases, you will have to analyze complex procedural logic which may or may not be entirely on the server-side. So it is probably best to catch those at the source or destination.

Anyway, I don't know of any plans in the near future to release the grammar for TSQL or SQL implementation. If you are using ANSI SQL compatible coding standards then you can get the grammar for free on the internet. This should help you for the SQL part not TSQL syntax.

|||The data import process works like the following. I first preprocess a single text file I receive from a client and generate several tab-delimited files consumable by BCP. Along with the files, I create format files based on metadata. I run BCP, dumping the data in a primary table. After that, I run numerous scripts against the data to converting text codes to foreign keys, verify data adhere to specified formats, categorize data using fairly complex SQL statements, etc. Some statements depend on the values assigned by previous statements. Once the complexity of this interdependence gets to a certain level, it gets to be a pain to maintain. Verifying correctness would be trivial if I could parse SQL and see which fields are assigned to, which fields are assigned from, and which fields are used in criteria. The alternative is to manually recreate these three collections of fields.

The above is not the only use I have for TSQL parsing. Our current application is based of a metadata architecture used to describe databases so that our client has absolutely zero database-specific code. This involves, among other things, a query generation engine that probably equals if not exceeds the abilities of the .NET 3.5 Entity Framework. We now have two versions of the query generation engine: a VB6 one and a C# one. It would be invaluable to test the C# one by ensuring that it generates the same effective SQL as the VB6 one. I would rather not enforce character-for-character equality -- my time can be spent in better ways that completely reverse engineering the old algorithm so that the new one matches the old character-for-character. Being able to treat TSQL code as data, Lisp style, is the best solution.

I don't want to come off as sounding antagonistic, although it may be too late. I just don't understand why Microsoft wants to make my life (and by proxy, of many of their loyal developers) harder by not providing a completely derivable piece of data. I wouldn't be competing with them by having a full grammar; I would be enhancing one of their existing products, making it more lucrative. Surely Microsoft understands the power of treating code as data -- there is the CodeDom for generating .NET code, and the .NET GC was originally prototyped in Lisp, a language renowned for its treatment of code as data. DLINQ depends on Expression<T>, which can be generated by lambda functions, which are code converted to data.

I have a feeling all of the above is for nought. My impression is that Microsoft can be completely opaque if it wants and a little guy like me has no chance in getting any sort of insightful answer from them to a question like this. I would absolutely love to be wrong!

Microsoft understands how much of a big deal intellisense is.

Sunday, February 19, 2012

Complete grammar for SQL 2000 and 2005?

No comments:

Post a Comment

Sql Comparing two tables

Blog Archive

About Me