DEV Community

Cover image for Using Chat GPT To Generate Datasets
Cris Mihalache
Cris Mihalache

Posted on

Using Chat GPT To Generate Datasets

Recently I used Chat GPT to generate mock data in JSON format.

This is extremely powerful, as you can describe what real data looks like in your prompt, and Chat GPT will generate data with realistic content in fields like descriptions, titles, tags, etc.

This is a sample of the data Chat GPT generated:

  {
    "id": 3,
    "description": "Implementing OAuth2 for user authentication",
    "tags": [
      "@feature",
      "@security"
    ],
    "start": "2024-02-22T03:11:03",
    "end": "2024-02-22T06:36:18"
  }
Enter fullscreen mode Exit fullscreen mode

Notice how the description, tags, and duration are all realistic.

Example

To build the documentation website for track-time-cli, I needed a database with mock data to have realistic command output in the documentation content.

To get Chat GPT to generate them for me, I used the following prompt:

Generate 5 time sheet entries with realistic software development tasks as descriptions. Use the following schema:

        {
          "id": 1,
          "description": "writing documentation",
          "tags": [
            "@feature",
            "@docs"
          ],
          "start": "2024-02-21T18:26:45",
          "end": "2024-02-21T19:33:08"
        }

The entries should be sequential in time, of varying durations (between 5 minutes and 5 hours), and have realistic descriptions that fit the software development time sheet theme.

The output should be in JSON, an array of time sheet entries conforming to the provided schema example.
Enter fullscreen mode Exit fullscreen mode

Note that I specified the number of entries (5) and the type of activity (realistic software development tasks) that should be mentioned in the description.

I defined the output schema by providing an example of a correctly formatted time sheet entry. I also specified that they should be sequential in time and have realistic durations as would be expected for the defined type of activity.

Result

This was the output:

[
  {
    "id": 1,
    "description": "Refactoring user interface code",
    "tags": [
      "@feature",
      "@frontend"
    ],
    "start": "2024-02-21T18:26:45",
    "end": "2024-02-21T22:39:32"
  },
  {
    "id": 2,
    "description": "Optimizing database queries for report generation",
    "tags": [
      "@performance",
      "@database"
    ],
    "start": "2024-02-21T22:39:32",
    "end": "2024-02-22T03:11:03"
  },
  {
    "id": 3,
    "description": "Implementing OAuth2 for user authentication",
    "tags": [
      "@feature",
      "@security"
    ],
    "start": "2024-02-22T03:11:03",
    "end": "2024-02-22T06:36:18"
  },
  {
    "id": 4,
    "description": "Weekly team sync-up meeting",
    "tags": [
      "@meeting",
      "@communication"
    ],
    "start": "2024-02-22T06:36:18",
    "end": "2024-02-22T10:16:24"
  },
  {
    "id": 5,
    "description": "Adding localization support for new languages",
    "tags": [
      "@feature",
      "@i18n"
    ],
    "start": "2024-02-22T10:16:24",
    "end": "2024-02-22T11:50:10"
  }
]
Enter fullscreen mode Exit fullscreen mode

After running this prompt a few times to generate 16 entries, I placed them all in an array within a mock JSON database. The output of running track-time-cli's list command was the following:

 23:50  97%  xia  f3rno64  .track-time-cli   main    2?  $  tt l
* Showing sheets since 2/21/2024, 12:00:00 AM [yesterday]

- Sheet main (16 entries) [1:10:19:16]
    (1)   [9:28:26]  yesterday       ->  14 hours ago    Writing API documentation                           @feature @docs
    (2)   [16:51]    14 hours ago    ->  14 hours ago    Implementing user authentication module             @feature @backend
    (3)   [50:01]    14 hours ago    ->  13 hours ago    Designing database schema for new features          @design @database
    (4)   [2:04:11]  13 hours ago    ->  11 hours ago    Improving responsive design for mobile devices      @feature @frontend
    (5)   [2:29:48]  11 hours ago    ->  9 hours ago     Conducting user experience research                 @research @ux
    (6)   [1:42:30]  9 hours ago     ->  7 hours ago     Implementing caching for improved performance       @performance @backend
    (7)   [9:08]     7 hours ago     ->  7 hours ago     Weekly team retrospective meeting                   @meeting @team
    (8)   [39:52]    7 hours ago     ->  6 hours ago     Refining search algorithm                           @algorithm @backend
    (9)   [2:13:23]  6 hours ago     ->  4 hours ago     Reviewing and merging pull requests                 @review @version-control
    (10)  [1:08:12]  4 hours ago     ->  3 hours ago     Developing new feature for customer analytics       @feature @analytics
    (11)  [2:27:32]  3 hours ago     ->  21 minutes ago  Solving compatibility issues in cross-platform app  @bugfix @mobile
    (12)  [3:14:14]  21 minutes ago  ->  in 3 hours      Organizing backlog and prioritizing tasks           @planning @management
    (13)  [3:07:36]  in 3 hours      ->  in 6 hours      Upgrading server infrastructure                     @maintenance @devops
    (14)  [1:41:17]  in 6 hours      ->  in 8 hours      Creating automated tests for new API endpoints      @testing @automation
    (15)  [2:02:29]  in 8 hours      ->  in 10 hours     Troubleshooting server downtime issues              @maintenance @devops
    (16)  [43:46]    in 10 hours     ->  in 10 hours     Designing user interface for the upcoming module    @design @frontend

* 0 Sheets not shown. use --all to show
Enter fullscreen mode Exit fullscreen mode

Conclusions

Clearly Chat GPT is a great tool for generating realistic datasets.

Try it out the next time you need some mock data with realistic textual content!

Top comments (1)

Collapse
 
seif_sekalala_81e09fe6b9e profile image
Seif Sekalala

Good job! Im embarking on a huge ChatGPT-human (book-) writing collabo, and I plan to prompt Prof Chat to help me generate a wide variety of data-sets! Heres a relevant link below, if youre interested.

But again, good job. Your chat-result (the data-set[s]) is an encouraging omen for my plan, so to speak!

Link:
drive.google.com/drive/folders/1-8...