Summarising my week in code.
- Access/VBA - making some GitLab REST API tooling
- Power BI - have been doing self-paced learning
- Python - more writing about coding than coding (this one is "home" stuff, not "work" stuff)
As an exercise, I thought I'd try writing out a description of those after logging out of work on Friday afternoon. It's now late on Sunday and from writing during the gaps in domestic life I'm not thinking this "coding diary" is going to be a good use of time. I only managed to cover the first point. Admittedly that's because I felt like putting in edited code samples and explaining anything that it caught my mind to do so. Oh well, maybe some of the following might still be of use to someone.
As my workplace uses an internal instance of GitLab, I've long had an intention to integrate it with my Access-based script tooling (that's a whole other topic). Overlooking some earlier failed attempts, I decided to make a new start.
One technical point is that this time around I just accepted that I would try to use the GitLab feature of a "personal access token" as the authentication method. I don't like it, but it would have to do until after I had things working.
I've chosen to focus on just three fetches via the REST API :
- getting a list of projects
- getting a list of all the files in the repository of a project
- getting the details and content of a file in the repository of a project
I skipped the first one and just used a known "id" for a single project to try working on the other two.
As it happens, the "list of all files" fetch is available as a "tree" fetch, which despite the name just returns all the files in the file/folder tree of the repository. The files all just have their full location "path", with no embedded treeness at all.
p.s. for the curious, the relevant GitLab REST API documentation links are:
I was able to combine some code I already had for using the library in Access/VBA along with a clue found online (probably Stack Overflow but I don't recall) and soon had something that could issue a REST request and store what came back.
The REST API returns data as JSON, which is all nice and modern, but there's nothing built into Access/VBA that can handle that. Luckily, I'd long ago found a good module for converting a JSON string into a nested set of VBA Collection and Dictionary objects.
- the JSON parser came from this: VBA-JSON
As the reason for fetching from GitLab is to store the information in Access tables, that required the (by now familiar) methods of populating tables via VBA. I still use DAO for this - in short it's simpler, faster and more reliable for in-Access tables.
Unusually, this time around I wrote my add-to-table functions so that they would replace data if the file/detail was already in the table. This was curious, not because it was hard to do, but because I've never had cause to do that from VBA before - i,e, all in the same function. (Cue: a topic for another time.)
The big problem with this kind of thing in general - getting data as JSON and populating (relational) tables from it are:
- decoding the JSON - done now by the JSON to the nest Collection/Dictionary module
- copying with the fact that while you might "know" what the JSON will be like, actually that JSON could be absolutely anything.
- coping with things simply absent in the JSON
- JSON data is inherently always text.
In this case, as the JSON was coming from a well documented source, I decided to just assume that the expected JSON elements would be there, and to ignore things that I didn't want. To some that will seem obvious, but for much of my data work, that would not be .. wise. (Cue: a topic for another time.)
With that in mind I soon had a function that would perform the fetch for a nominated (numeric) project id - first just displaying the text and then converting that to a VBA object.
I duplicated that and modified it to prove that the file detail fetch would also work. As this returned JSON included the file content encoded as Base64, I pulled in a function to handle that, and prove that I could now fetch file metadata and the file content.
The next thing to do was to construct tables to hold the fetched GitLab information. BTW the file content would not be stored - the tool this coding is aimed essentially manages file metadata and only pulls file content during live script runs or script analyses.
So, the main issue was bullet-proofing the code. In VBA that's not hard, it's just tedious. This time around when I went to write the part where a dictionary was transposed into a table - out of interest I coded it to loop through the keys of the dictionary and via a Select Case to handle when the key had a known match to a (hard-coded) table column to write into.
In retrospect I may have done that as a way to isolate the bullet-proofing in one sense and avoid the need for it in another.
I decided in making up tables to hold the data, that it would stay as text except where I really needed/wanted it to be something else.
I made two tables, one for the list of files and one for the per-file meta data - this being a simple matter of storing what comes from the two API fetches. However, as this meant having some relational tables, it was necessary to be clear about what the primary keys were for them.
When fetching from GitLab, the project id has to be given in the URL, but that value is not present in the returned JSON. Therefore the project id needed to be an extra field in the table.
p.s. these REST API fetches can specify the Git branch per requests. For now, I'm ignoring that and merely getting the default branch per project. What to do about branching begs a lot of questions about how my script tool will operate with GitLab. Questions that I have neither positions nor answers for at this stage.
So, each table ha a structure of:
- Project Id
- the selection of JSON keys that I've chosen to extract and store values for
I created a VBA "Type" definition for each of those JSON key sets - one for the tree and one for the file details. Then I added code to the two fetching functions to transfer data from the decoded JSON into the fields of a Type variable, and then to pass that Type variable to a custom table writing function.
BTW it's likely that I will later make a variation of the file detail fetch function, in order to have one that just fetches the file content - but that can be dealt with later, particularly having already proven to be able to decode the content.
At that point, I chose to tackle adding a function to fetch and store the list of projects - that could be accessed by my account.
As the easiest thing to do would be to once again clone one of the two existing functions and adjust it for handling the project list fetch, I ... didn't do that.
While I'm always happy to clone a function and make a variant, my rule of thumb is to only do that once. If I find myself making yet another clone-and-change, then that should be my cue to think about what the more general solution would be.
And so .. I diverted to consider a more general approach to such fetches.
In retrospect, this might be a pointless diversion - that I might never need more than the three fetches, and so doing yet another clone action might in fact be the "good enough" solution. Nonetheless, I've learned over the years that if I feel a spurt of intention to construct a general solution, then I should run with that - while the enthusiasm is right there.
Clearly, for a more general solution, using a Type definition per fetch-and-store is not going to work. Instead, some form of dictionary sue will be required, possibly also with a dictionary to "know" the mappings between JSON keys and table/column names into which to transfer the values.
Where should such mapping information be stored? Well, in a table of course - i.e. in the database. As a consequence, for this generalised approach to work:
- some tables in the database would be required - it seemed that just two were enough: one to define a "set" and another to list the mappings (from JSON keys to column names) for the set.
- this required some stock code to read from those two tables, loading into variables (plain and objects)
- rewriting of the processing routines to be guided by the "set" information - rather than intrinsically know those by being written using a specific data structure.
Note that the "rewrite" included a subtle change of quite where the key-field mapping actually occurs. In the initially written function, this mapping happens as each data item is reached (as a Dictionary object) and converted to a Type object, which is then passed to the table-writing function.
In the rewrite, the data item as Dictionary object is passed to the table-writing function, but so is the remapping Dictionary - and the mapping thus occurs during the step of writing to the table.
p.s. it would of course, be useful for this write-up, if the relevant source code was visible. Alas, as the code was written in the course of my employment, I do not have the freedom to share it.
Perhaps I'm making the rewrite sound trivial - but it wasn't quite that. It proved necessary to add a few additional items into the definition of each "set". Such as: the names of which columns would act as part of the primary key - along with the Project id.
At the time of writing I'm yet to see what will be required to have this set+mapping method handle the fetch for the list of Projects, where it will be one of the JSON fields that is itself the primary key for the table rendition. As it happens, I've prepared a variant function that specific situation. Note that unlike the TreeFiles and FileDetails fetches, I haven't yet written a specific fetch function for the project list.
A consequence of the code thus written for this experiment - is that it contains a few handy examples of basic Access/VBA technique. And "technique" is something I'm always happy to share.
- Use of two-level constants for referencing Access table and column names in VBA
- Use of DAO to query, edit or append Access tables from VBA (inside Access)
- Use of error trapping code
- Use of Variant data types in VBA - for things that may have Nulls
- Use of Dictionaries in concert with static structures
- Use of Modules in VBA for clarity and separation
As it happens, this example doesn't use Class modules at all. I do find that some people leap straight into writing those for no apparent gain. It something I do when the situation calls for it.
Here is what my top level constants look like:
Public Const Ldb_Rst_GitLab_ApiRest_Keys_Tbl = "GitLab_ApiRest_Keys" Public Const Ldb_Rst_GitLab_RestApi_Keys_Col_In_SetId = "In_SetId" Public Const Ldb_Rst_GitLab_RestApi_Keys_Col_Api_DctKey = "Api_DctKey" Public Const Ldb_Rst_GitLab_RestApi_Keys_Col_Api_DctValTyp = "Api_DctValTyp" Public Const Ldb_Rst_GitLab_RestApi_Keys_Col_Mdb_ColName = "Mdb_ColName" Public Const Ldb_Rst_GitLab_RestApi_Keys_Col_Mdb_ColType = "Mdb_ColType"
For my main projects, this text gets generated by code (in yet another module) and then gets (manually) pasted into a module that holds nothing else. Being in a distinct single module is why those constants are all made as
Then in any specific module that will use those constants, an extra round of constants re-quotes them - usually into simpler shorter names that are local to the module (hence done as
Private Const Tbl_Keys = Ldb_Rst_GitLab_ApiRest_Keys_Tbl Private Const T_Keys_C_SetId = Ldb_Rst_GitLab_RestApi_Keys_Col_In_SetId Private Const T_Keys_C_DctKey = Ldb_Rst_GitLab_RestApi_Keys_Col_Api_DctKey Private Const T_Keys_C_DctValTyp = Ldb_Rst_GitLab_RestApi_Keys_Col_Api_DctValTyp Private Const T_Keys_C_ColName = Ldb_Rst_GitLab_RestApi_Keys_Col_Mdb_ColName Private Const T_Keys_C_ColType = Ldb_Rst_GitLab_RestApi_Keys_Col_Mdb_ColType
The general idea is for these local constants to only be done as needed, whereas the "public" is always comprehensive. When the table structures get changed, then a re-generation of the public constants module is done. A compile check will thus bring to light where the second layer references need to be corrected, or code rewritten.
- Without going into detail, the methodology is also applied to Access Queries, so that judicious use of queries as buffer layers can also be done (as normal for those of us who use that as a general SQL technique)
Oh, after mentioning them I should probably show how these get used.
Here: they get used to construct SQL for a opening table:
sql_check = _ "SELECT" & " " & _ Tbl_Keys & ".*" & " " & _ "FROM" & " " & _ Tbl_Keys & " " & _ "WHERE" & " " & _ Tbl_Keys & "." & T_Keys_C_SetId & " = " & Str(GLAR_SetId) & " " & _ ";" Set rst_check = dbs.OpenRecordset(sql_check, dbOpenSnapshot)
And here is some column constants being used to pull out values from a row:
With rst_check .MoveFirst Do While didok And Not .EOF guide_key = .Fields(T_Keys_C_ColName) guide_val = .Fields(T_Keys_C_DctKey)
I still use DAO for internal-Access VBA coding. It is more direct, marginally faster and has some features that were lost compared with ADO. Yes, there are some extra tricks that ADO can do, but for bread-and-butter procedural table work those rarely arise.
- Microsoft's history list of messing around/up their VB data calls is a long sad story. See elsewhere.
Microsoft - and some vendors - have deactivated aspects of DAO, enough that it can no longer be practical to use for data external to the Access database holding the VBA code. However, in a way, that's a useful feature - so that all Access-internal data handling can be DAO and all external be ADO.
This is a (surprisingly?) large topic in data programming. Consider that:
- relational data tables have columns which are typed but possibly Null
- a JSON feed, in which data is always text rather than specifically typed, might either:
- just not have a particular key
- have a value of Null stored for a key
In VBA, the only variable data type that can accept a Null value is a Variant. This is like neither of the above - in that it can hold a typed content, or a null, but the type could be any type.
So a question then becomes when is it wise to use Variants as interim holders of things in VBA?
So, the reason I'm writing about this is because, unusually for me, I wrote a bit that just took a value into a Variant and then back out again. In this case it was taking values from a data table and putting them into a Dictionary, which therefore could store the Variant.
Here's an abbreviated extract of the code:
Dim rst_check As DAO.Recordset Dim guide_key As Variant Dim guide_val As Variant If (rst_check.RecordCount > 0) Then Set dct_guide = New Dictionary With rst_check .MoveFirst Do While Not .EOF guide_key = .Fields(T_Keys_C_ColName) guide_val = .Fields(T_Keys_C_DctKey) dct_guide(guide_key) = guide_val .MoveNext Loop .Close End With
p.s. to state something semi-obvious, manic optimisers would point out that I needn't use Variant variables at all, as one could directly do:
dct_guide( .Fields(T_Keys_C_ColName) ) = .Fields(T_Keys_C_DctKey)
.. but I always like to check things are working by inspecting the values, especially during development.
It would be a different story going the other way - as a Variant going into a table column would need to conform to the column data type to avoid an error.
This mainly occurs in the three situations where fundamentally different data types are intersecting.
- reading from a table column into a plain variable - this will fail if a data row's column value has a Null. Only a VBA variable of type Variant can reliably receive such a value - but really, that only defers the problem to the next step anyway.
The simplest way to use error trapping in VBA is
- put line labels in two places, one as
GotErrorthe other as
- before the line that may crash, put a line of
On Error Goto GotError
- after the line, do the things that will use the non-error outcome, followed by
Here is how the sequence looks:
Dim ok as Boolean On Error Goto GotError n = RiskyThingThatMightCrash Call DoSomethingWithNonErrorOutcome( n) ok = True Goto Afterwards GotError: ok = False Afterwards:
Note that one of the oddities about line labels in VBA is that they don't ever indent - the IDE will always snap them back to the start of the line. Hence if the code above was itself at some deeper indent, it would look like this;
Dim ok as Boolean On Error Goto GotError n = RiskyThingThatMightCrash Call DoSomethingWithNonErrorOutcome( n) ok = True Goto Afterwards GotError: ok = False Afterwards:
Dictionaries have a curious place in the VBA world. As VBA has long had a built-in Collection type, it is perhaps odd that the Dictionary type is still not inherent. Instead it has to come by adding a "Reference" to the Microsoft Scripting library. However, that library has now been present in editions of Windows for so long that its presence can be taken for granted.
This helps explain why my VBA coding, which frankly goes back a long way in time, for many years avoided using Dictionaries. In recent years when writing fresh code, I've occasionally used Dictionaries in places as ways to make code more flexible regarding data types that may vary in structure.
The following abbreviated extract is an example. Here, after accessing a DAO RecordSet, a new row is created (by "AddNew") and two column values set manually, followed by a number of columns for which the choice of column to do, and column value to apply are fetched via two dictionaries. One dictionary is a "guide", of which key in the row dictionary should match which column name for the table.
With rst_write .AddNew ' key fields .Fields( tblkeycol) = pid .Fields( tblkeycol) = "" ' value fields For Each vk In dct_col_guide.Keys .Fields(vk) = dct_row( dct_col_guide( vk)) Next vk .Update
Obviously, this relies on code elsewhere having correctly setup the "guide" dictionary (just once) and data row dictionary (refreshed for each row).
This has already been alluded to in section dealing with the table/column name constants. For that, having local "Private" use of constants means that the code in the module using the local constants is proofed from being affected by changes in other modules.
Generally though, in VBA, I tend to write functions as "Private" except where they need to be made "Public" - i.e. precisely so the latter can be used in other modules. VBA doesn't have any formal method of having modules declare interfaces, but this Private vs Public mostly has the same effect.
As it has been touched on in some of the above, I figured I may as well give some code examples of how DAO usually gets used (by me at least).
This is the stock way of using DAO in Access/VBA to read some values from rows of a table. Here some SQL is being constructed to only fetch selected rows.
Dim dbs As DAO.Database Dim rst As DAO.Recordset Dim sql As String Set dbs = CurrentDb sql = _ "SELECT" & " " & _ Tbl_Keys & ".*" & " " & _ "FROM" & " " & _ Tbl_Keys & " " & _ "WHERE" & " " & _ Tbl_Keys & "." & T_Keys_C_SetId & " = " & Str(GLAR_SetId) & " " & _ ";" Set rst_check = dbs.OpenRecordset(sql, dbOpenSnapshot) If (rst_check.RecordCount > 0) Then rst.MoveFirst Do While Not rst.EOF column_value = rst.Fields("ColName") rst.MoveNext Loop rst.Close ElseIf (rst_check.RecordCount = 0) Then rst_check.Close Else Debug.Print "Should never see this!" rst_check.Close End If dbs.Close Set rst_check = Nothing Set dbs = Nothing End Sub
OpenRecordset call will also accept just the name of the desired table, which amounts to being like asking for
SELECT * FROM DesiredTable
That example shows, looping through the returned rows. If you know should only get one row, you might:
- check explicitly for
rst_check.RecordCount = 1rather than for
- remove the loop
Note that for simplicity I've removed my usual setting of
ok boolean and doing error trapping.
That was reading a table, what about appending new rows?
Dim dbs As DAO.Database Dim rst As DAO.Recordset Dim sql As String Set dbs = CurrentDb Set rst = dbs.OpenRecordset( "TableName", dbOpenDynaset) rst.AddNew rst.Fields("ColName") = "new value" rst.Update rst.Close Set rst = Nothing dbs.Close Set dbs = Nothing
and what about editing existing rows, here we're assuming that we have parameters:
- to identity the one row to edit
- provide a new value to apply to a column
Dim dbs As DAO.Database Dim rst As DAO.Recordset Dim sql As String Set dbs = CurrentDb sql_check = _ "SELECT" & " " & _ tblnam & ".*" & " " & _ "FROM" & " " & _ tblnam & " " & _ "WHERE" & " " & _ tblnam & "." & tblkeycol & " = " & Str(vk) & " " & _ "AND" & " " & _ tblnam & "." & tblkeycol & " = " & "'" & vk & "'" & " " & _ ";" Set rst = dbs.OpenRecordset(sql_check, dbOpenDynaset) If (rst.RecordCount = 1) Then rst.Edit rst.Fields("ColName") = "new value" rst.Update rst.Close Endif dbs.Close Set rst_check = Nothing Set dbs = Nothing
Note that I've given an absurd WHERE clause just to show the two main syntax build variations for the SQL string.
Finally, what if we want to replace values to an existing row but add a new row when there was no existing row. Here's both actions covered in the one code block.
Dim dbs As DAO.Database Dim rst_check As DAO.Recordset Dim rst_write As DAO.Recordset Dim sql_check As String Dim isthere As Boolean rcrd_didok = False rcrd_didnew = False Set dbs = CurrentDb ' check table sql_check = _ "SELECT" & " " & _ tblnam & ".*" & " " & _ "FROM" & " " & _ tblnam & " " & _ "WHERE" & " " & _ tblnam & "." & tblkeycol & " = " & Str(vk) & " " & _ "AND" & " " & _ tblnam & "." & tblkeycol & " = " & "'" & vk & "'" & " " & _ ";" Set rst_check = dbs.OpenRecordset(sql_check, dbOpenDynaset) If (rst_check.RecordCount = 1) Then ' update record With rst_check .Edit .Fields("ColName") = "new value" .Update .Close rcrd_didok = True End With ElseIf (rst_check.RecordCount > 1) Then ' can't resolve rst_check.Close ElseIf (rst_check.RecordCount = 0) Then rst_check.Close ' Add new record Set rst_write = dbs.OpenRecordset(tblnam, dbOpenDynaset) DoEvents With rst_write .AddNew .Fields("ColName") = "new value" .Update .Close End With rcrd_didok = True rcrd_didnew = True Set rst_write = Nothing Else Debug.Print "Should never see this!" rst_check.Close End If dbs.Close Set rst_check = Nothing Set dbs = Nothing