Note: this unsurprisingly carries on from My Coding History - Part 1
In January 1989 I began employment at the same organisation where I still work in 2022. Initially I was quite keen to not get involved in the organisation's computer technology, which was quite easy as there wasn't much. Walking in on my first day, I had never seen so much paper in my life! I remember saying to my first manager that yes I knew about computers but I was a bit "over all that" and wasn't sure if I wanted to do it "at work" again. I just wanted to be a plain office worker.
There was however a "stand alone" computer with a database on it, which was being used for new entries as a gradual way to replace a large card index.
Now, I'm going to have to skip a lot of details here, both for not exposing office details but also because I could write a very large number of words about the rapid increase of computers in the office at this period. So I will try to restrict myself to aspects relevant to databases and programming. Note also, that to avoid saying "the organisation" over and over, I'll just use "the office".
The only thing I did at the office that was even vaguely programming in those times was playing with Microsoft Word version 4 (that's the for-DOS version number) and its limited type of macros.
Their new database was something made in a product called R:Base (yes, with a colon in the name) which was new to me. It was DOS based - at that time they had no usage of Windows at all. R:Base used SQL internally and had a built in query builder. This was an interesting new tangent for me to follow.
Their application in R:Base had a simple search function but I quickly started using the query builder to do more direct and useful finding of data. Being inquisitive and remembering my work with dBase I soon started looking under the hood to see how it worked.
Two aspects of R:Base remain in my memory of that time.
- it had a simple scripting system that allowed making menus and series of actions to perform when menu items were selected. It was actually more limited than was dBase that I'd used back on CP/M.
- its simple way of doing relational links was automatic and based on the joining columns having the exact same name. This was powerfully simple but I eventually found this was also quite limiting.
As an example of the proficiency I gained, after about a year, and as this new database grew I decided to try rewriting it to improve its search response. Bear in mind that I was just a regular clerical employee at that point.
By experimenting I determined that the "index" lookup in R:Base only used the first four bytes of a field. While the R:Base control language was very limited in what it could do, I was still able to craft a way so that input data would massaged before being stored. Notably that I could have it recognise a stock set of title beginnings - e.g "The " - and pull them out into a separate "prefix" field so that the first four bytes of the title field would become more unique. Rather than impose this design on the existing database application I wrote a whole new one. Thankfully it worked a treat and significantly improved search speed - even if it did mean teaching the users to search for "Thing" instead of "The Thing" as well as check that the prefixing was being done correctly.
- Actually there was a deep lesson in that. Users will quickly learn, adopt and (crucially) tell each other about a feature that they like. In that case the speed improvement was its own perpetuating impetus.
Word soon got around that I knew my way around R:Base and this led to me being offered work in a new section being setup to assist with "modernising" the office using computers. So after less than two years of pure clerical work, I moved into providing internal IT services, specialising in databases - and that's roughly what I've been doing ever since. Quite soon I was fixing, amending or writing new R:Base applications all over the place.
I recall that at that time there was a small amount of dBase III and then dBase IV usage - and even some use of Clipper. At the same time, able to afford them again, I was back to avidly reading lots of computer magazines. That way, I gained a new awareness of the database developer market. My favourite magazine was "Data Based Advisor" and I became familiar with the term xBase for the world of dBase clones.
I also took advantage of being in the city and visited the state reference library, reading my way through back issues of Byte and Scientific American (for the ones I'd missed in my lean income years).
There was a point when the office declared dBase IV as the preferred database solution and that R:Base was to be deprecated. I re-looked at dBase IV and quickly established that apart from a couple of SQL-like additions to connect tables it was still not a "relational" product. I quietly ignored the edict and kept maintaining the R:Base databases as they were much better solutions. I suspect I wasn't alone in that view because I never saw anyone else use dBase IV for any solutions.
I remember correcting a major flaw with a security database - that handled the user IDs allocated to staff and which had led to the same ID being one person on the mainframe but another person in a local branch site. Fixing the application was easy but sorting out the data mess was more .. interesting. My first attempt failed as I quickly discovered some people were too important to be forced to change their ID.
Another was adapting an existing R:Base application that held the branch staff list as used by the personnel area (yes, the term "HR" came much later) so that the data could be prepped and corrected for uploading to a replacement mainframe application.
Later when the office relocated from its use of about eight different office buildings in the central city to two large offices, I made the database that was used as a manifest for the entire relocation.
Another was making a replacement database for use by the three switchboard operators - and for this I replayed my earlier first-four-bytes trick to make look-ups lightning fast.
During this time the office was installing a lot more computers with an eventual target of having one per staff member, thereby scaling up from about one computer among sixty people when I'd joined. As part of this I was one of several people who ran classes to train all the staff in using Windows 3 with this new thing, a mouse. I know this was before the big relocation because I remember in which room I gave class after class after class after class. I remember feeling like an actor in a play by the end.
My focus after the relocation was in migrating the R:Base database applications from "stand alone" computers to having them run on the new networked diskless Windows 3 based machines.
There's a whole other topic here about our organisation having been well ahead of Microsoft in finding how to make scaled network use of a Windows platform. Eventually, Microsoft put sufficient features into their platform that only some curious echoes remain of the home-grown arrangements.
Coincident with the big relocation was the merging of the modernising team with the existing computing support area, with the new brand of being "I.T." and the new building even had a literal "help desk". To go with this, my job now broadened and I became a general do anything/everything support person, but who specialised in database support.
At home, having cleared my debts I was able to consider replacing my CP/M home computer. I bought a no-name 80486-based laptop. I quickly migrated my Pascal coding from the CP/M world to the MS-DOS one (albeit under Windows 3).
Oddly I don't recall what version of Pascal I initially used at that point. I know I bought "Borland Pascal" at what was probably "Borland Pascal with Objects 7.0 © 1992" as noted here so maybe that was the only one I had.
The copies I now have of Pascal programs is a mix of them as both converted and not converted from the written-for-CP/M-Microbee into being adapted for MS-DOS.
Where I had need of holding data at home, I was still storing it in text files. If I wanted to do something automated with that data, I would write a Pascal program to do it. The data though, just stayed in text files.
Via the magazines I was aware of a new database product: Microsoft Access. I bought a copy of version 1.0 and this became the next seismic shift in my data story.
I now imported my data text files into Access and starting building relational databases with forms and reports. The most immediate thing I noticed was that I didn't need the joining columns to have the same name as required in R:Base. I simply enjoyed using Access to experiment with joins and queries and thereby effectively taught myself a lot of the principles that I still work with as my basic relational knowledge.
My oldest data text file was my list of Compact Discs and so my first real database (at home) was soon exploring all the ways to map CD album and track listings into table structures. This is a more complex matter than you might perhaps realise - especially for classical music. This intersection of the human understandings of information and the relational model became my pet kind of puzzle to play with.
It was around this time that one of my work colleagues was showing interest in becoming a programmer. As part of that, he was starting to teach himself Visual Basic (version 3 perhaps). I can recall a discussion we had about the relative merits of doing applications as programs versus as customised usage of database applications (such as Access). The other side of that discussion was about whether doing that kind of thing was compatible with being in IT support. I didn't really have any advice in that respect, but it was this discussion that made me realise that personally I had no wish to be programming full time as a job. Before long he made his choice and left for a position as a VB programmer with a small firm.
- by the way, in the course of editing all this, I did a net search for my former colleague - in short, he seems to have done well and now lists himself as a senior React developer.
This is perhaps the point where my life changed pattern about programming at home. While I played around a little with Access at home that was mostly about "databasing" rather than coding. As I was coding at work whenever that proved useful I found I didn't feel like doing it much at home. I was then about 30 and with a steady income again was exploring all the culture: music, theatre etc that I'd missed out during my moneyless years.
At work I my coding would involve whatever tools the office had on tap, mainly to do various amounts of system administration. While our servers were based on OS/2 this was Rexx, an IBM scripting language. When the servers became Windows NT 4, there was no Rexx but there was Perl, so I taught myself Perl 4 and converted my scripts over to it.
I was still the resident "database" go-to person so that continued to be my main activity when I wasn't doing shifts on the Help Desk. The R:Base applications were gradually replaced in various ways. Some scaled back to being Excel spreadsheets, perhaps with some automation. Some were replaced by mainframe applications. One was replaced by a Paradox database.
There were no web applications that I can recall, web browsers only really being used on the office library standalone computers with a modem to reach a local ISP and thereby the Internet.
I was however, also the goto for various other things, one of which was dealing with the couple of (Australian) anti-virus products we used along with being the person who would disinfect the portable computers when they got infected.
An interesting point here is that because I had never been scared of "going into the hex", I was able to fix things that other people could not. For example, if an R:Base database went corrupt (computers in those days had degrees of unreliability that no-one would even imagine nowadays) then I would use a hex editor to find and repair the broken chains of 32 bit links in the raw file. Later I got the office to buy a product to make this less manual but by then corruptions had become rare (thankfully).
As it happened I used similar "hacking" techniques to disinfect the portable computers, teaching myself all about "boot sectors" and the PC boot process in order to do it. My signature "only I could do it" fix was to do this even for the portables that had encrypted drives.
Interestingly, it is hard to now find the virus names that I would have encountered. Most current web pages that give potted histories of malware don't say much about the years of boot sector viruses. You can get idea of how they interfered with the boot arrangements with this Wikipedia page about the Stoned virus and its variants.
At some point I was able to implore the office to buy some MS Access licenses for making things. This gradually expanded as I made more of these and those usages then required more licenses. Presumably I wasn't alone, and as the separate Microsoft "office" products became "Microsoft Office" at some point it was chosen to include Access in the whole of office licence deal. This was a game changer as I could now comfortably make a solution in Access and be sure that I could give the internal client just the database file as the database solution.
Now that I was using MS Access a lot I started to explore how to make it do even more things by looking at its built-in dialect of Basic. As with the rest of the Microsoft Office suite, this had taken a slow migration from being its own dialect - e.g. MS Word had "WordBasic" - into being Visual Basic for Applications across the whole suite. It's still the case that:
- the VBA dialect in Access is still slightly distinct from the rest of Office, although by now only in some obscure details
- the VBA dialect has also stayed quite distinct from Visual Basic the product for "developers".
When I started writing code inside MS Access it was "AccessBasic". It was quite a long time before I had any cause to try writing much with the other variants in Office. I can recall doing some work with Macro Sheets in Excel 4 rather than as some kind of Basic and also doing some things in WordBasic.
At that time Word for Windows had some nasty bugs in its Save function so someone else in our organisation had written a replacement "Save" function and forced Word to use that instead. From memory, the Save could fail if anywhere along the path was off limits to the current user. With networked "drives" it was quite common to lock users out of being able to write to folders closer to the root. A slight mistake with such settings could easily have a folder in the middle of a path with no read permissions.
A part of 1992 saw me working in a team that was making a Windows based front end tool for use by contact area staff. The product they were using was Matesys ObjectScript (later subsumed by the same producer's ObjectView), but there's very little information to be found about that online nowadays. I'm guessing I was loaned to the area for a while (something that has happened many times in my career). I probably did a mixture of coding in Basic and handling data.
I remember another one of my stints of being loaned-to-an-area saw me using Excel v4 with its pre-VBA "macro sheets" as an automated switchable calculation. The requirement was to have a set of data that could be interpreted two ways - so as to show the two extremes of how a court case might resolve. Which was basically: all that could be proven beyond reasonable doubt (to establish guilt) versus all that could be raised as a liability (after a guilty finding).
Thus passed the nineteen nineties and the last couple of years of that decade were busy with preparations and fixes and work-arounds for the millennium bug. As you might expect, this meant checking everything that we could think of checking. I got the job of checking all the remaining standalone computers around the place. Most merely did things for which the date wasn't used, but some required some boot patches. This was all support work for me, I don't recall that it needed any programming or database work.
The organisation rewarded our Y2K work by .. outsourcing all of IT.
This was clearly going to involve redundancy for most of the IT staff outside of the managerial hub on the other side of the country. When the process began I was partway through getting a promotion in the very area being outsourced but luckily the timing allowed me to get through before the contracts were signed. The rules around redundancy meant that I could claim a sideways transfer to a position at my level if there was someone only "acting" in it. While my hand had been forced, this was not something I felt good about doing.
It did however, mean that overnight I went from being a database technology support to being someone who would have to actually work with the data itself. From then on, my profession was "data analyst".
The immediate change was that I now worked by writing SQL for querying a Teradata data warehouse. Of course, I'd been using dialects of SQL in R:Base and MS Access for years. The massive shock though was just how many tables of data the organisation had (and has) on its data warehouse. And notably, this was all the back end data, for systems whose front-ends I had little knowledge about.
In my favour, I had spent so much of my technology life in learning by reverse-engineering, that doing this for data was not daunting. It was just a twist on the same old thing. Also with access to the data and a query language, I could prove or disprove my guesses at understanding.
Thus began my multi-decade life of writing SQL and dealing with very large amounts of data - and for that matter, large amounts of SQL analysing and doing things with that data.
I had been used to two varieties of "query by example" in R:Base and Access but was now expected to mainly craft SQL as script. Helping a bit was a tool called "BI Query" (short for Business Intelligence) which operated as a pair of tools: one was used to setup "models" and the other was then used to query those models with a point and click builder.
With MS Access to hand and familiar, it was quite common for me mock up some tables in it, and toy with various queries until I had a relational concept that I was happy with. Then I would write out the Teradata SQL equivalent.
Gradually as I wrote more and more scripts and as the solutions I was asked to provide called for more automation, I started using MS Access to organise them. At some point, these experiments took the step of having Access connect itself to the remote data warehouse - I remember an early one that assumed there were named "macros" created on the Teradata.
- Tech note: for a long time there were no stored procedures on Teradata. Its "macro" concept could be passed parameters but was quite limited, for example it couldn't mix DDL and DML statements. Eventually Teradata added stored procedures so this is just one of many obsolete historical footnotes that I still know about.
Around 2002, yet more experiments led me to start making a fresh tool - this time as a general purpose Access application for managing, analysing and submitting SQL scripts to the data warehouse. While there are some other significant side projects that I've built in Access, this one tool is the place where I tend to add additional features as I see a need for them. So that by now to describe the various concepts and techniques that I've learned or developed over the years is largely a description of things I've done with that main tool. It is now complex enough to be quite daunting to describe.
Also, a large part of that has involved a lot of coding in VBA. This means that I also have a large bank of VBA functions and modules. Alas, being written at my workplace I'm not at complete liberty to freely copy them to the outside world.
Somewhere in all this I had a realisation, that for years I would call - "my relational epiphany". Curiously, for something that I thought was so pivotal, I don't recall quite when or how it came about.
It was probably after I began writing Teradata SQL because I remember that the early examples for that dialect included the concept of a "cursor". This idea struck me as clunky compared to the elegance of when SQL is at its best.
Maybe it came from some reading of Codd and/or Date - two of the great relational theorists. Maybe it came from some comments made on the old Teradata Forum about ways to adapt data solutions from other platforms to the Teradata - exhortations to: think differently.
I will perhaps write a whole other piece about this, but in short, the epiphany is when someone from a procedural background realises to invert their thinking - that:
- rather than see the relational handling of tables as being about something that goes away and does all the things that you ask as a lot of loops that you don't see.
- instead, to think of the table operations as being the atoms of what you intend to do and allow each of them to simultaneously act upon all rows in the tables.
Once I had myself a tool that could orchestrate and run whoe sequences of SQL, I started to use that ability with my mental shift about what could be atomic. I created a set of structures to provide a kind of "program" controlling actions on other tables. Each iteration would use the next control value to direct a CASE expression across all the rows. The overall action sequences were stored and manipulated like a stack - indeed the step logic was fashioned as an en-masse form of Reverse Polish Notation.
At the time, Teradata had a very poor set of string functions - quite woeful really - so this method was used for wholesale pattern detection in title-description fields. That was probably around 2002 or so.
Along with the epiphany, I have often guided people new to data into the mysteries of SQL and relational thinking. Usually at some point I have had to explain that the clear ideas of "relational" get made messy and confusing by the poor design of the SQL language.
SQL is the QWERTY keyboard of data work. It's there, it's the "standard" that everyone varies from, and it's what you'll need to learn to use. It is good enough to get on with, but truly problematic in places.
My stock quip for this is: "SQL is a crap language" and I stand by it. It is an unending shame that the computer and data industries have never replaced it with something better.
I won't give the details for this opinion here, seek on the web for people who have written about this very thing over the years. It is flawed, we forgive its flaws and get on with our work, but let's not pretend that it is not flawed.
- p.s. sad as it is to report, the "language" is getting worse rather than better, but this is also a long other story for some other place.
One of the good things about a long career in a large organisation is the variety of people you meet in the course of the work. The sad part is that not everyone lasts the journey and there are colleagues I've had whose funerals I've attended and whose lives have ended not long after illness had forced them to retire.
Only this last week, I was pruning out old code from one of my tools and there was a module of ADODB code, that one of these had given me when I needed to convert my remote ODBC operations from DAO. Honestly, I think I'd always left it in there as a reminder of him - somehow over ten years after he'd gone it felt right to finally remove that module.
I don't have any code from the others but there was a no-bullshit honesty and integrity about data that we all shared, and if that's now an excuse I use to cling to those values I'll happily wear that. Vale Laurie, Lisa and Alison - I remember you.
Some years ago the organisation started looking for alternatives to Teradata for handling large scale data. By then Hadoop had settled from its early phase into something reasonably stable and so an investment was made and some data put into that instead.
Later again I was tasked with working with this new setup and thus found myself learning all about Hadoop. Luckily I was sent on a one-week intensive training course.
By that point the "map-reduce" phase of Hadoop was passing and the new tools for data analysts to use on it were:
Not covered in the course, but also present was Spark so I did an online course in that. Having written one solution in HiveQL, for an exercise I wrote a parallel implementation for part of it in Spark.
For reasons the way I got to use Spark was as PySpark, which is essentially a customised Python invocation that inherently knows it has access to the Spark libraries. The online Spark course that I did covered use both from PySpark and from Scala, which makes sense as Spark is written in Scala.
Around the same time, some related Hadoop work was being done by a crew of contract programmers who were writing in Java. As it was anticipated that I might have a support role for that project - e.g. after they completed it and moved on - I found myself doing another online course to learn Java.
So here's the thing. I have always been interested in programming languages and how they differ. Ditto for the various programming paradigms. My tech reading since I got permanent employment has been quite continuous.
However, that's never the same as actually doing things with those. Among other things, there are too many languages, environments etc for any one person to be proficient in them all. But mainly, as already noted, apart from reading about them, I had pretty much stopped coding at home.
At work, while I'm lucky to often be able to self-direct the detail of my work, and thus able to spend time making tools and new capacities, I don't do that without some kind of actual work impetus.
This means that I've known about Python, and Java, and Hadoop for almost as long as they have been on the scene. But, unless it is made available to me at work, and I have some work reason to use it then I just don't. It's not that I'm opposed, it just doesn't occur.
Thus it was needing to use PySpark that prompted me to think: maybe it's time I learned Python. And as that wasn't going to happen at work, maybe I should do that at home.
The background to the next phase is covered in the notes for the tool that I wrote as my first ever Python program.
By coincidence at the same time as work was needing me to use PySpark, I was getting lost in the (sadly true) mess of backups that I'd made across multiple USB hard drives. I'd been using duplicate file finders for ages but there were not much help or comparing full copies of nearly same backups on terabytes of storage,
When my usual tool - fslint - got dropped by my Linux distro - I felt the time had come to write my own tool, partly to do some of the things that fslint did not do.
It seemed a no-brainer to combine this new project with a language I now had a desire to learn for work. I decided that I would teach myself Python at home, and that would then be a skill I could use at work.
It also gave me a reason to teach myself Git, which I had also been reading about for years.
This led to me setting up an account at GitLab which would also make it easy for me to see my code no matter what environment I was at.
- I am of course, glossing over all the other possible paths. Git vs Bazaar vs Fossil. GitLab vs GitHub vs BitBucket etc. PyCharm vs Geany vs Vim vs VS Codium - and so on and so on.
- I have also glossed over my at home migration from Windows to Linux. That's a whole other topic, the short version is that since 2006 I've been a 99% Ubuntu Linux user, with a licensed Windows XP running under VirtualBox and my last license of MS Access running on it.
I do now have a few projects under way, most of which are up on GitLab and speak for themselves (and are not specifically the topic here).
It has been .. odd .. to be coding at home again. As I wrote somewhere else, it has been a v-e-r-y long time since I last coded for anything other than work, which means without being paid and therefore subtracting from my non-work life.
Other aspects of my life are very different from when I was last coding at home - the details of which are not relevant, just the overall effect.
And, let's face it, I'm much older than when I last did this. This is a subtle thing as I've been coding sporadically at my work for all the years in between.
And, another difference is the idea of coding in the open. An irony of this part is that I have been aware of the Free Software (and later the Open Source) movements from the very beginning. Even when I was accessing files on bulletin boards in the 1980s I was seeing pieces of software - adapted to CP/M from the GNU project and then soon bearing the name of the Free Software Foundation.
I have therefore been around while the whole Free Open Source Software revolution has grown from something small, and based on quite manual distribution of compiled software (for which you could then seek the source code) into a world of people writing their code on open access web repositories and a whole new culture of collaborative coding.
And yet all that time, my own coding was hidden away in the internal world of my employment. So even though I knew all about it, putting up my very first "repo" was quite a momentous step (as it is for everyone I'm sure).
I am however, still mainly coding for myself. How I write up my notes is largely how I've learned to write for my later self to read. Because I have multiple times been the person ten-years later reading code I wrote ten years earlier, and trying to understand it. It was a slow way to learn that kind of lesson but it's powerful when you only have yourself to blame.
Of course, everyone who writes code has a unique story. I don't think of mine as being especially unusual.
However, as I'm writing this with the intention to post it to dev.to I am quite aware that it's a very different story to the ones that many people there are writing themselves into as they go along their own coding histories.
Mainly I think this difference is not about the period that I've spanned, although I expect that will be the aspect most apparent to many.
Rather, my reason for writing this is to present the idea of coding as merely part of full time work mixed with a hobby - without ever worrying about the labels (oh, you know the ones: developer, programmer, software engineer, data scientist et al).
Instead there was just ever two things:
- the work I was tasked with doing
- the things I found interesting.
Sure, I was lucky to stumble into being reliably paid to do that - but that's really my point - that this is also a valid type of tech career.