DEV Community

Priscilla Parodi for Elastic

Posted on • Edited on

Elastic Data Frame - Inference Processor HandsOn

| Menu | Next Post: NLP and Elastic: Getting started |

Note: This HandsOn assumes that you have already followed the step-by-step Setup of your Elastic Cloud account and added the Samples available there to replicate the analysis mentioned here. If not, please, follow the steps mentioned there. And if you didn't follow the steps of the Elastic Data Frame - Classification Analysis HandsOn I also suggest you do this before proceeding with this HandsOn, we will be using our new model in this tutorial.

Let's create an ingest pipeline with an inference processor:

Kibana>Stack Management>Ingest>Ingest Node Pipelines

You can add the name of your preference, I'll add the name: pipeline-delay-prediction

Alt Text

And then you add your inference processor:

Add a processor> Processor=Inference

Open another Kibana page and copy your ML Model ID:

Kibana>Machine Learning>Data Frame Analytics>Models

Alt Text

Go back to the processor configuration page and paste the Model ID, as in the image below:

Alt Text

I also added the target field (optional) delay to add the inference processor results to ml.inference.delay. When done, click add.

Before creating the pipeline and starting to use it, you can test it. Click Add documents.

Alt Text

You can copy a document from the original index as an example and change its data to simulate a new document.

Do this using kibana_sample_data_flights index to not get ML data.

Open a new window:

Kibana>Analytics>Discover

And select kibana_sample_data_flights, choose any document and copy the document _id and _index.

Alt Text

Go back to the Ingest Node Pipelines window and paste the index and id there and click Add document. With this you can change the field value to simulate what you want.

Alt Text

I removed the fields: FlightDelay, FlightDelayType and FlightDelayMin, for the same reason I removed these fields during training.

This is our updated JSON document:

[
  {
    "_id": "MKx29nkBQAv3jO3lIeem",
    "_index": "kibana_sample_data_flights",
    "_source": {
      "FlightNum": "GDZWNB0",
      "DestCountry": "CN",
      "OriginWeather": "Clear",
      "OriginCityName": "London",
      "AvgTicketPrice": 952.4522444587226,
      "DistanceMiles": 5743.8378391883825,
      "DestWeather": "Rain",
      "Dest": "Shanghai Hongqiao International Airport",
      "OriginCountry": "GB",
      "dayOfWeek": 6,
      "DistanceKilometers": 9243.810963470789,
      "timestamp": "2021-07-11T23:50:12",
      "DestLocation": {
        "lat": "31.19790077",
        "lon": "121.3359985"
      },
      "DestAirportID": "SHA",
      "Carrier": "Kibana Airlines",
      "Cancelled": false,
      "FlightTimeMin": 770.3175802892324,
      "Origin": "London Gatwick Airport",
      "OriginLocation": {
        "lat": "51.14810181",
        "lon": "-0.190277994"
      },
      "DestRegion": "SE-BD",
      "OriginAirportID": "LGW",
      "OriginRegion": "GB-ENG",
      "DestCityName": "Shanghai",
      "FlightTimeHour": 12.838626338153874
    }
  }
]
Enter fullscreen mode Exit fullscreen mode

Add the input data that makes sense for your training and testing under "Documents" and click "Run the pipeline”.

This is our output:

{
  "docs": [
    {
      "doc": {
        "_index": "kibana_sample_data_flights",
        "_type": "_doc",
        "_id": "MKx29nkBQAv3jO3lIeem",
        "_source": {
          "FlightNum": "GDZWNB0",
          "Origin": "London Gatwick Airport",
          "OriginLocation": {
            "lon": "-0.190277994",
            "lat": "51.14810181"
          },
          "DestLocation": {
            "lon": "121.3359985",
            "lat": "31.19790077"
          },
          "DistanceMiles": 5743.8378391883825,
          "FlightTimeMin": 770.3175802892324,
          "OriginWeather": "Clear",
          "dayOfWeek": 6,
          "AvgTicketPrice": 952.4522444587226,
          "Carrier": "Kibana Airlines",
          "OriginRegion": "GB-ENG",
          "DestAirportID": "SHA",
          "timestamp": "2021-07-11T23:50:12",
          "Dest": "Shanghai Hongqiao International Airport",
          "FlightTimeHour": 12.838626338153874,
          "Cancelled": false,
          "DistanceKilometers": 9243.810963470789,
          "OriginCityName": "London",
          "delay": {
            "prediction_score": 0.4013867640677467,
            "model_id": "delay-prediction-1626961317123",
            "FlightDelay_prediction": false,
            "top_classes": [
              {
                "class_name": false,
                "class_probability": 0.9983471228188069,
                "class_score": 0.4013867640677467
              },
              {
                "class_name": true,
                "class_probability": 0.0016528771811931647,
                "class_score": 0.0016528771811931647
              }
            ],
            "prediction_probability": 0.9983471228188069
          },
          "DestWeather": "Rain",
          "OriginCountry": "GB",
          "DestCountry": "CN",
          "DestRegion": "SE-BD",
          "OriginAirportID": "LGW",
          "DestCityName": "Shanghai"
        },
        "_ingest": {
          "timestamp": "2021-07-22T14:30:02.675515386Z"
        }
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

And this was our original document:

{
  "_index": "kibana_sample_data_flights",
  "_type": "_doc",
  "_id": "MKx29nkBQAv3jO3lIeem",
  "_version": 1,
  "_score": null,
  "fields": {
    "Origin": [
      "London Gatwick Airport"
    ],
    "OriginLocation": [
      {
        "coordinates": [
          -0.190277994,
          51.14810181
        ],
        "type": "Point"
      }
    ],
    "FlightNum": [
      "GDZWNB0"
    ],
    "DestLocation": [
      {
        "coordinates": [
          121.3359985,
          31.19790077
        ],
        "type": "Point"
      }
    ],
    "FlightDelay": [
      false
    ],
    "DistanceMiles": [
      5743.838
    ],
    "FlightTimeMin": [
      770.31757
    ],
    "OriginWeather": [
      "Clear"
    ],
    "dayOfWeek": [
      6
    ],
    "AvgTicketPrice": [
      952.4523
    ],
    "Carrier": [
      "Kibana Airlines"
    ],
    "FlightDelayMin": [
      0
    ],
    "OriginRegion": [
      "GB-ENG"
    ],
    "DestAirportID": [
      "SHA"
    ],
    "FlightDelayType": [
      "No Delay"
    ],
    "hour_of_day": [
      23
    ],
    "timestamp": [
      "2021-07-11T23:50:12.000Z"
    ],
    "Dest": [
      "Shanghai Hongqiao International Airport"
    ],
    "FlightTimeHour": [
      "12.838626338153874"
    ],
    "Cancelled": [
      false
    ],
    "DistanceKilometers": [
      9243.811
    ],
    "OriginCityName": [
      "London"
    ],
    "DestWeather": [
      "Rain"
    ],
    "OriginCountry": [
      "GB"
    ],
    "DestCountry": [
      "CN"
    ],
    "DestRegion": [
      "SE-BD"
    ],
    "OriginAirportID": [
      "LGW"
    ],
    "DestCityName": [
      "Shanghai"
    ]
  },
  "sort": [
    1626047412000
  ]
}
Enter fullscreen mode Exit fullscreen mode

As you can see, I didn't change the other fields and values from the input document to be able to compare the original value of the variable we are classifying, in this case: "FlightDelay": [false] with our output result, in this case: FlightDelay_prediction": false.

This assures us that the result is correct, consistent with the original document.

Now you can close the Test Pipeline and click Create pipeline to start using this pipeline, or continue changing the value of the fields to test the model.

You can also test pipelines using the simulate pipeline API.

In this case it would be:

POST /_ingest/pipeline/pipeline-delay-prediction/_simulate
  {
      "docs": [
      {
    "_id": "MKx29nkBQAv3jO3lIeem",
    "_index": "kibana_sample_data_flights",
    "_source": {
      "FlightNum": "GDZWNB0",
      "DestCountry": "CN",
      "OriginWeather": "Clear",
      "OriginCityName": "London",
      "AvgTicketPrice": 952.4522444587226,
      "DistanceMiles": 5743.8378391883825,
      "DestWeather": "Rain",
      "Dest": "Shanghai Hongqiao International Airport",
      "OriginCountry": "GB",
      "dayOfWeek": 6,
      "DistanceKilometers": 9243.810963470789,
      "timestamp": "2021-07-11T23:50:12",
      "DestLocation": {
        "lat": "31.19790077",
        "lon": "121.3359985"
      },
      "DestAirportID": "SHA",
      "Carrier": "Kibana Airlines",
      "Cancelled": false,
      "FlightTimeMin": 770.3175802892324,
      "Origin": "London Gatwick Airport",
      "OriginLocation": {
        "lat": "51.14810181",
        "lon": "-0.190277994"
      },
      "DestRegion": "SE-BD",
      "OriginAirportID": "LGW",
      "OriginRegion": "GB-ENG",
      "DestCityName": "Shanghai",
      "FlightTimeHour": 12.838626338153874
    }
    }
    ]
  }
Enter fullscreen mode Exit fullscreen mode

| Menu | Next Post: NLP and Elastic: Getting started |

This post is part of a series that covers Artificial Intelligence with a focus on Elastic's (Creators of Elasticsearch) Machine Learning solution, aiming to introduce and exemplify the possibilities and options available, in addition to addressing the context and usability.

Top comments (0)