pg2sql Whitepaper

OpenACS data model in Transact SQL ("pg2tsql").

The Big Picture

During some recent downtime, I did some work porting the ACS kernel datamodel and plpgsql API to the Microsoft SQL Server 2000 platform. Paying work has picked up for me, and progress on this project has stalled quite a bit. My goal was to get to the point where I could run the kernel regression tests, but that's not going to happen any time soon with my current schedule. It's not that I think it would take a lot of my time, but I think it require the kind of focused, consecutive hours which I don't have right now. Anyway, I'm notifying the community to see if there's any interest in completing the port. I need some help manually translating the statements that the parser can't handle. I don't think it's a lot of work, but you can see for yourself at the project web site.

What

I have ported only about 8 statements from pl/pgsql to transact sql by hand. Real exciting, huh? Well, for the rest of the DDL statements and pl/pgsql functions in the kernel (> 500), I've used an automated code translation suite written in perl. It handles most subtle differences in SQL-92 implementations, sequences, overloaded functions, Transact SQL's kludgy distinction between functions and stored procedures, cursors, and a bunch of other minor incompatibilities. It does not handle triggers or dynamic SQL, but I think these implementations are so platform specific and divergent that it's not worth attempting.

But regardless, the code does a lot. And it does alot by cheating -> in some places it relies on ACS function and variable naming conventions, and in others the decisions it makes about how to translate a particular line of code have been captured to config files which can be hand tweaked, or from metadata captured by previous translation passes.

It is not a general purpose plpgsql2transact-sql translator *by any means*. I only implement a small subset of SQL-92, but for the most part so does OpenACS. Naturally, automated code translation has it's limitations, and I have no doubts some of the auto-translated DML or functions will have to be tuned. Even so, there's a surprisingly large amount of rote code (select statements with subtle platform-specific SQL-92 differences) that can be translated effectively with some simple translation rules. For extra credit, I could also translate the all files of all individual modules without too much work, but I wouldn't really have a good way to validate or take advantage of the translations without wiring the whole thing up to the TCL (or Java etc.) API.

Why?

My short term motivation is quite simply that I like the OpenACS data model, and would like the opportunity to leverage it more often on client projects. Since a great deal of the work people hire me to do involves MS-SQL Server, and I know the platform pretty well, that seemed like the right target. My objective with this project has not been to port OpenACS to the MS stack (IIS/etc). Just the data model. It's appealing to think of an W2K/AolServer/ODBC/MSSqlServer solution as an option for some Windows-bound clients, but it's not something I can do on my own. I don't know the guts of the request processor very well, and I'm not skilled at debugging ODBC driver source code, and troubleshooting lob handling is not high on my list of fun things to do. However, my understanding is that going the extra mile wouldn't be too much work for the people with those skills. This also suggests my longer term motivation. I'd like to see potential OpenACS users not be forced to choose between a toolset with limited mind/market share and the ones they use now. More work than I've hinted at above would have to be done to make this ambition a reality, but I don't think it's too much more work and hopefully what I've done might make it more likely.

Why didn't I target another platform at my translator's output, say Oracle? Well, for a couple of reasons. It's very unlikely that the code generation stuff could do as good a job as the human porters, and for the most part the human porters have already done their job. I could see it being useful as a 'first pass' type of thing for modules that are first created on Pg, catching simple find/replace translations for concatenation, date functions etc. That can still be done, but I haven't optimized my work for multiple, highly different outputs (in other words, there's no clean separation between the parse and translate stages of my translator). But in pondering next steps, I think a more likely target would be Sybase - the marginal effort would be pretty small, it runs on *nix, and AOLServer drivers exist for it.

How?

A friend of mine worked for a Y2K company that had implemented a COBOL parser in perl. It emitted Y2K compliant COBOL code where it could, and flagged the statements it couldn't translate for hand coding. The idea of hand-porting a bunch of pl/pgsql code just for amusement was not really interesting to me, but I figured an approach like that would allow me to learn about grammars and lexers, which was interesting to me. Also, doing it in an automated fashion is the only realistic way to try to keep this branch of code in sync with the core effort, without devoting major resources.

On to specifics. I was able to leverage the source code of Postgres to a great degree. I combined the pl/pgsql grammar file and the SQL 92 grammar file from their CVS tree, and ported it to a grammar file that Damian Conway's brilliant Parse::RecDescent module could read. I added in some driver code to walk the directory tree, and some more code to validate the statements that had been parsed by running them against a SQL 2000 database.

Sounds fairly straightforward, right? Well, the resulting code is a bit on the scary side. I bet myself that I could do it in a week, but it took more like 2.5. Although I cleaned some of it up, there's plenty of evidence of QADAD (quick and dirty application development) and since this is perl, well... Let's just say I went out of my way to take advantage of many perl idioms and leave it at that.

Coda

Anyway, like most in the community, I've bought into the AOLServer/TCL combo as more scalable, robust and simple than the indstry-leading alternatives offered by Sun and MS (viva Phil!). Tack on Postgres and you have unbeatable price/performance. That's terrific. But it doesn't change the fact that most larger firms have already made a database decision, it's not open source, and they're not really open to considering alternatives for major initiatives. While I hope this project furthers awareness of OpenACS at places like that, pragmatically I'm just hoping to achieve datamodel reusability in the near future.

This document was generated using AFT v5.078b


	Home pg2tsql Oasis VM