DEV Community

Michael Adsit
Michael Adsit

Posted on

Do Not Base Your APIs on Legacy

Do Not Base Your APIs on Legacy

This post is meant to clarify reasons why basing your APIs upon a pre-existing system and ingrained culture can become problematic for those creating and consuming them as well as to propose some ways to help overcome that.

Working with a Legacy

Nearly every organization has some form of legacy code that has survived over time, and those working there have become used to certain patterns from seeing them for so many years. As such, the natural instinct when designing something for others to consume is to keep on using the same practices.

For instance, if one has been using a SOAP interface for internal applications for many years, the temptation when creating a REST API may be to simply take the existing SOAP WSDL and use that as the design document for the new REST API.

While this is certainly a way to start building something new, as a lot of time and effort has gone into it in the past, there are a few questions that must be asked before doing so.

  • Is my new API simply to fill a current buzz, or to create a better experience for everyone?
  • How easy is it for someone from the outside to navigate the existing documentation?
  • Does my old structure reveal information that I want to keep internal only?
  • Are there any bottlenecks to overcome in order to entice people to use my new API?
  • When exposing a way into my company from the outside world, am I keeping a secure mindset?
  • What were some issues that the newest people on the team had when learning our current design?
  • If I have a client requesting this API, what are their needs when compared to my desires?

In the past, I have experienced things such as unique identifiers being requested to be sent at the time of object creation because that was how it had been done in the original system due to the fact that the only consumer was written in a specific language and there was a centralized class distributed to create and attach this information. When dealing with a browser or other form of decentralized input it is important to consider the new situation - sometimes things that seem to be company common sense are not so.

If the documentation exists as to how the current product backend came about, that can be very useful to compare requirements for the new API versus the original backend.

Simple Legacy Example

Quick Clarification

While I have seen things similar to the given example, it is designed simply to be somewhat plausible though not very well designed. Though it is possible to use the code given, I would not recommend it. Remember that it is always important to promote team unity, and discuss things in a manner where everyone feels safe to share and work together towards a better overall product. Do not attack people's character - attack the problem together.

Actual Example

One day, a developer at the company was asked to create a standard task or todo list for their team. They make use of the company's main MySQL database, create a table, and link it to a simple database editor user interface.

CREATE TABLE Task(
  Task_ID INT PRIMARY KEY AUTO_INCREMENT,
  Task TEXT,
  Complete BOOL
) ENGINE=INNODB;

At some time, another team wants to keep track of people's first and last names and creates the following table.

CREATE TABLE Person(
  Person_ID INT PRIMARY KEY AUTO_INCREMENT,
  First_Name TEXT,
  Last_Name TEXT
) ENGINE=INNODB;

Down the line, someone else started using these tables - but linked them together.

CREATE TABLE People_Assigned_Task(
  Person_ID INT,
  Task_ID INT,
  FOREIGN KEY (Person_ID)
      REFERENCES Person(Person_ID)
    ON DELETE CASCADE,
    FOREIGN KEY (Task_ID)
        REFERENCES Task(Task_ID)
    ON DELETE CASCADE
) ENGINE=INNODB

Inside the company, through hallway discussions, it became obvious how to use them. People started writing programs to link other people to tasks. Some teams started using it to report and it was informally assumed that null values in the database meant that the task was not complete. No one thought to unify the Person and People terms in the table names.

Eventually, a subset of the data looked like this.

Task Table
Task_ID Task Complete
1 My First Task null
2 My Second Task 0
3 My Third Task 1
4 My First Task 1
5 My Second Task 1
6 My Third Task 1
7 My Fourth Task null
Person Table
Person_ID First_Name Last_Name
1 Bob Jones
2 Larry Jones
3 Moe Jones
4 Moe Smith

And reports were being run against it with statements such as the following to figure out who had been assigned tasks.

SELECT 
Person.First_Name,
Person.Last_Name 
FROM People_Assigned_Task
INNER JOIN Person
ON People_Assigned_Task.Person_ID = Person.Person_ID
INNER JOIN Task
ON People_Assigned_Task.Task_ID = Task.Task_ID
First_Name Last_Name
Bob Jones
Moe Jones
Moe Jones

This team was just curious about who had been assigned tasks, but another team was curious about who had incomplete tasks left.

SELECT 
Person.First_Name,
Person.Last_Name 
FROM People_Assigned_Task
INNER JOIN Person
ON People_Assigned_Task.Person_ID = Person.Person_ID
INNER JOIN Task
ON People_Assigned_Task.Task_ID = Task.Task_ID
WHERE Task.Complete = false
First_Name Last_Name
Bob Jones

And over time reports with different combinations of criteria came about, some business users wanted to be able to add and review more data. Tables just kept getting added until someone came up with the brilliant idea of some arbitrary data types. Every team then started adding their own types, and would simply filter to what is needed on their team.

CREATE TABLE Arbitrary_Data_Type(
    Arbitrary_Data_Type_ID VARCHAR(32) NOT NULL PRIMARY KEY,
  Extended_Description TEXT
) ENGINE INNODB;

CREATE TABLE Task_Arbitrary_Data(
    Task_ID INT,
  Arbitrary_Data_Type_ID VARCHAR(32),
  `Data` TEXT,
  PRIMARY KEY (Task_ID, Arbitrary_Data_Type_ID),
  FOREIGN KEY (Task_ID)
    REFERENCES Task(Task_ID)
    ON DELETE CASCADE,
    FOREIGN KEY (Arbitrary_Data_Type_ID)
        REFERENCES Arbitrary_Data_Type(Arbitrary_Data_Type_ID)
    ON DELETE CASCADE
) ENGINE INNODB;

For the internal applications, this has worked out fine so far, every newcomer learns how their team makes use of it and no one really thinks too hard before adding new types. Now the time has come to make this task list available for others to use.

Transitioning from Internal Legacy to Exposed API.

Using the simple example above a client has requested the ability to see tasks themselves about the product you are working on for them. This is able to be either a REST API or JavaScript for them, they just need the definitions to mock and start building out their own user interface. Working with the security team, you come up with a solution to filter tasks and users to just what a logged in external user is able to see, so that is not a concern. You do research, and see that a collection of entities when returned should have the same data as the one specifically retrieved, and start to give the specification for the returned objects as JSON Schema so that the client can start planning for the future.

Describing the person is easy enough, as you were given guidance to give each object a generic id field instead of the specific one in the database and use snake_case for the properties. Also, it is assumed that the $id will be at a resolvable URL for validators later.

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "http://example.com/schemas/person.json",
    "type": "object",
    "title": "Person Schema",
    "description": "The root schema for a person.",
    "default": {},
    "additionalProperties": false,
    "examples": [
        {
            "id": 1,
            "first_name": "Bob",
            "last_name": "Jones"
        }
    ],
    "required": [
        "id",
        "first_name",
        "last_name"
    ],
    "properties": {
        "id": {
            "$id": "#/properties/id",
            "type": "integer",
            "title": "ID",
            "description": "The ID for this person",
            "default": 0,
            "examples": [
                1
            ]
        },
        "first_name": {
            "$id": "#/properties/first_name",
            "type": "string",
            "title": "First Name",
            "description": "This is the person's given first name.",
            "default": "",
            "examples": [
                "Bob",
                "Jill"
            ]
        },
        "last_name": {
            "$id": "#/properties/last_name",
            "type": "string",
            "title": "Last Name",
            "description": "This is the person's given last name.",
            "default": "",
            "examples": [
                "Jones",
                "Smith"
            ]
        }
    }
}

And now you move onto the task as the next base part.

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "http://example.com/schemas/task.json",
    "type": "object",
    "title": "Task Schema",
    "description": "The root schema for a task.",
    "default": {},
    "additionalProperties": false,
    "examples": [
        {
            "id": 1,
            "task": "My First Task",
            "complete": false
        }
    ],
    "required": [
        "id",
        "task",
        "complete"
    ],
    "properties": {
        "id": {
            "$id": "#/properties/id",
            "type": "integer",
            "title": "ID",
            "description": "The ID for this task",
            "default": 0,
            "examples": [
                1
            ]
        },
        "task": {
            "$id": "#/properties/task",
            "type": "string",
            "title": "Task",
            "description": "What must be done for this task to be considered complete, or a summary thereof.",
            "default": "",
            "examples": [
                "My First Task",
                "Read the employee handbook"
            ]
        },
        "complete": {
            "$id": "#/properties/complete",
            "type": "boolean",
            "title": "Complete",
            "description": "Has this task been completed.",
            "default": false,
            "examples": [
                false,
                true
            ]
        }
    }
}

Feeling happy, you are about to hand over this list when re-reading the requirements, you realize that you must be able to identify who is assigned to which task, as well as what arbitrary data is associated with this task. To help define this, you decide to use a reverse engineering tool such as the one included in MySQL Workbench in order to get an idea of what the relationships should be visually before making the JSON.

Database Diagram

Upon quickly looking, you realize that this has a relationship that could cause infinite recursion in the JSON, as tasks could belong to a person, or people could belong to a task. In addition, you start looking through the Arbitrary_Data_Type_ID values, and see things such as Length, Lgth, Time Taken, Time to Completion, Time To Completion B, all of which have an Extended_Description of The time it took to complete this task.

After taking a moment to bang your head against the wall compose yourself, it has become obvious that you could simply complete the requirement as quickly as possible to look good on the metrics and let the outside world just deal with the huge amount of types and looping through them all of the time, or you could have some meetings to discuss things internally before meeting with the client.

Before doing anything else though, you decide that it is possible to do something to avoid the infinite recursion and so decide to focus on that first.

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "http://example.com/schemas/task-with-people.json",
    "type": "object",
    "title": "Task Schema with people",
    "description": "The root schema for a task that has people.",
    "all-of": ["http://example.com/schemas/task.json"],
    "default": {},
    "examples": [
        {
            "id": 1,
            "task": "My First Task",
            "complete": false,
            "people": [{
              "id": 1,
              "first_name": "Bob",
              "last_name": "Jones"
            }]
        }
    ],
    "required": [
        "people"
    ],
    "properties": {
        "people": {
            "$id": "#/properties/people",
            "type": "array",
            "title": "People",
            "description": "The people associated with this task",
            "default": [],
            "examples": [{
              "id": 1,
              "first_name": "Bob",
              "last_name": "Jones"
            }],
            "items": { "$ref": "http://example.com/schemas/person.json" }
        }
    }
}
{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "http://example.com/schemas/person-with-tasks.json",
    "all-of": ["http://example.com/schemas/person.json"],
    "type": "object",
    "title": "Person Schema with Tasks",
    "description": "The root schema for a person with tasks.",
    "default": {},
    "examples": [
        {
            "id": 1,
            "first_name": "Bob",
            "last_name": "Jones",
            "tasks": [{
              "id": 1,
              "task": "My First Task",
              "complete": false,
            }]
        }
    ],
    "required": [
        "tasks"
    ],
    "properties": {
        "tasks": {
            "$id": "#/properties/tasks",
            "type": "array",
            "title": "Tasks",
            "description": "The tasks for this person",
            "default": [],
            "items": { "$ref": "http://example.com/schemas/task.json" },
            "examples": [{
              "id": 1,
              "task": "My First Task",
              "complete": false,
            }]
        }
    }
}

Having figured out a way to avoid your API having infinite recursions as an attack vector, you have decided that the talk about references is really needed. Meetings happen internally, and you realize that you are the only one who has worked with and researched any external APIs for a long time. Those in the meetings want to have the API consumers make a separate call for each piece of arbitrary data, thinking that that will be the easiest approach without needing to be too concerned about too much data or the security of it - something like the following.

// get with pure javascript API - ignoring async concerns
const company_api = new CompanyAPI(MY_API_KEY)
const tasks = company_api.tasks_with_people()
tasks.forEach(({id})=>{
  company_api.tasks(id).arbitrary_data('FIELD_1')
  company_api.tasks(id).arbitrary_data('FIELD_2')
})

// or fetch (lazy template below below)
fetch(`http://my-API/tasks/${task}/data/${arbitrary_data_id}`)

From your own experience, you know with certainty that you would rather have a way to get all of the required data in one swoop and plead to be able to. Something more like the below.

// get with pure javascript API - ignoring async concerns
const company_api = new CompanyAPI(MY_API_KEY)
const tasks = company_api.tasks_with_people({extras: ['FIELD_1', 'FIELD_2']})

// or fetch
fetch(`http://my-API/tasks/?extras=FIELD_1&extras=FIELD2`)

And so, you talk about that and are told that that goes against the standards that someone else in the company came up with - they don't want to use a query string for what data is received because each element should come complete as per the definition, and in the internal system that has been how it always has been done. It is explained to you that each team has always made queries specific to how they are going to use it, and we do not allow an arbitrary amount of things to be given to each customer.

This is not the battle that you thought you were going to be fighting,~~ and so you finally just give up and give in, figuring that it is not your problem.~~ and so you request some time to see if there is a solution that respects the company tradition and that will also allow less work for those who are consuming the data. This is because you realize that the way this data is currently used does not necessarily work well outside of the teams using it. Even if it is limited to only what a customer is allowed to see, the data is not unified for them nor is it easy to give a full specification as it could change per customer.

After some thought, you propose a configurable set of defaults that can be seen by all external users, as well as customer-specific defaults for those that request them. In addition, you recommend one more part to the API to be able to retrieve what the extra keys mean. The higher-ups accept the proposed solution, and you are finally able to send some schemas over. The only difference in the base task schema is the "additionalProperties": false line changed to "additionalProperties": { "type": "string" }. However, you have forced several behind the scenes changes with the following tables.

CREATE TABLE Arbitrary_Data_External_Defaults(
    Arbitrary_Data_Type_ID VARCHAR(32) NOT NULL PRIMARY KEY,
    FOREIGN KEY (Arbitrary_Data_Type_ID)
        REFERENCES Arbitrary_Data_Type(Arbitrary_Data_Type_ID)
        ON DELETE CASCADE
) ENGINE INNODB;

CREATE TABLE Arbitrary_Data_External_Customer_Defaults(
    Arbitrary_Data_Type_ID VARCHAR(32) NOT NULL,
    Customer_Name TEXT,
    FOREIGN KEY (Arbitrary_Data_Type_ID)
        REFERENCES Arbitrary_Data_Type(Arbitrary_Data_Type_ID)
        ON DELETE CASCADE
) ENGINE INNODB;

And the API now has one additional feature added, so you must send over one more schema.

{
    "$schema": "http://json-schema.org/draft-07/schema",
    "$id": "http://example.com/schemas/task-extras.json",
    "type": "object",
    "title": "Task Extras Schema",
    "description": "The root schema for describing extra fields on a task.",
    "default": {},
    "additionalProperties": false,
    "examples": [
        {
            "property": "Time To Completion",
            "description": "The time it took to complete this task."
        }
    ],
    "required": [
        "property",
        "description"
    ],
    "properties": {
        "property": {
            "$id": "#/properties/property",
            "type": "string",
            "title": "Property",
            "description": "A property attached to a task",
            "default": "",
            "examples": [
                "Time To Completion",
                "Why This Task?"
            ]
        },
        "description": {
            "$id": "#/properties/description",
            "type": "description",
            "title": "Description",
            "description": "Why does this property exist, or what does it represent.",
            "default": "",
            "examples": [
                "The time it took to complete this task.",
                "Sometimes people just want to know why we made the task"
            ]
        }
    }
}

Finally, the first draft of the API is ready to send to a client. They can create a simulated person with tasks to start testing their items with, using the examples. A while later it is time for the live tests to begin and the customer gets back the data you expected.

{
  "id": 1,
  "first_name": "Bob",
  "last_name": "Jones",
  "tasks": [{
    "id": 1,
    "task": "My First Task",
    "Time To Completion": "",
    "Why This Task?": "Just to check out the system",
    "Extra Field": "Why do this?",
    "complete": false
  }]
}

However, it seems that more communication needed to be done, as the client was not expecting the Extra Field property that was there. It did not do any harm but was a good reminder to communicate changes to their defaults as part of your deployment pipeline.

Conclusion

Legacy Systems come about after several years and can be hard for those who designed or have been using them for a while to question them as their experience has been that they work. Challenging the pre-existing structure can be important when creating an implementation on top of something that already exists, but doing so in a manner that listens to the concerns of others is important if you are to change perceptions in a positive manner. Some cultures are much more open to change than others, and one should work towards recognizing what is a fact and what is an opinion. So - take some time to analyze the problem with a fresh perspective, and make sure to include the team and if possible the end-user to try to come up with the best solution possible.

Originally Uploaded

Originally uploaded at adsittech.com

Top comments (0)