DEV Community

Cover image for Selenium AI Automation: Image Processing with Gemini
vishalmysore
vishalmysore

Posted on • Updated on

Selenium AI Automation: Image Processing with Gemini

Tools4AI with Selenium can revolutionize UI validation by automating the process of verifying UI elements and ensuring consistency with design specifications. This approach goes beyond traditional UI validation methods by validating entire UI sections at once instead of examining individual elements. As a result, Tools4AI integration with Selenium can significantly streamline the testing process, allowing you to automate a comprehensive verification of web-based applications.
With this integration, you can leverage a combination of natural language and Java code to create Selenium test scripts in a more human-readable format. This simplifies UI testing and increases efficiency by allowing non-programmers to write test scenarios in plain English.

Selenium Integration with Tools4AI

Tools4AI's integration with Selenium introduces a flexible way to automate UI testing. Instead of traditional Java code for Selenium scripts, Tools4AI allows you to define test scenarios in plain English, offering a more accessible approach to testing web applications. These English-based commands can be converted into Selenium code to automate web-based interactions and streamline testing.

Example of Selenium Test with Tools4AI

WebDriver driver = new ChromeDriver();
SeleniumProcessor processor = new SeleniumGeminiProcessor(driver);

// Navigate to the website
processor.processWebAction("go to website https://the-internet.herokuapp.com");

// Check if a specific button is present
boolean buttonPresent = processor.trueFalseQuery("do you see Add/Remove Elements?");
if (buttonPresent) {
    // Perform a click action
    processor.processWebAction("click on Add/Remove Elements");
    // Further English-based instructions can be added
}

// Check if checkboxes are visible and interact with them
processor.processWebAction("go to website https://the-internet.herokuapp.com");
boolean isCheckboxPresent = processor.trueFalseQuery("do you see Checkboxes?");
if (isCheckboxPresent) {
    processor.processWebAction("click on Checkboxes");
    processor.processWebAction("select checkbox 1");
}
Enter fullscreen mode Exit fullscreen mode

In this example, the SeleniumProcessor processes commands in plain English and converts them into Selenium actions.

Code for above example is here

This approach allows for complex interactions without manually writing Java code for each test. Tools4AI serves as a bridge between natural language and Selenium, making it easier to automate UI testing in a way that is both efficient and intuitive.

You can use SeleniumOpenAIProcessor as well but it will use plain html conversion to Pojo rather than images as OpenAI does not support API for images yet

Screen Validations
One of the most powerful and unique features of Tools4AI is its ability to convert an image or an html page into a structured Java object. This capability can be invaluable for various applications, especially in scenarios where you need to extract data from an image, screenshot, html page, then manipulate or analyze it within your Java code.
By transforming an image into a Java object, Tools4AI opens up a wide range of possibilities:

UI Testing and Validation: You can convert screenshots of a user interface into Java objects to validate UI elements and their attributes. This feature simplifies the testing process by allowing you to automate the verification of entire sections or components without manually interacting with individual elements.

Data Extraction: Tools4AI's image-to-POJO functionality can be used to extract data from images, such as scanned documents, infographics, or screenshots, and convert it into a structured format. This can be useful for creating data-driven applications, automating workflows, or extracting information for further processing.

Simplifying Automated Tests: By converting images into Java objects, you can write automated tests that operate at a higher level of abstraction. Instead of interacting with individual web elements, you can work with an entire data structure that represents the content of a web page or UI screen.

Examples

Example 1

graph

The above picture can be converted to Java object with simple code in this way

GeminiImageActionProcessor processor = new GeminiImageActionProcessor();
Sales sales = (Sales)processor.imageToPojo(GeminiImageExample.class.getClassLoader().getResource("images/sales.PNG"), Sales.class) 
Enter fullscreen mode Exit fullscreen mode

The Sales pojo looks like this

@Getter
@Setter
@ToString
@NoArgsConstructor
public class Sales {

    @MapKeyType(Integer.class)
    @MapValueType(Double.class)
    Map<Integer,Double> yearlySales;
} 
Enter fullscreen mode Exit fullscreen mode

If you dont want to create a Pojo you can get data in simple HashMap or Json String as well

 log.info(processor.imageToJson(GeminiImageExample.class.getClassLoader().getResource("images/sales.PNG"),"sales in 2013"));
Enter fullscreen mode Exit fullscreen mode

and the response would be

INFO: {"fields":[{"fieldName":"sales in 2013","fieldType":"String","fieldValue":"58"}]}

Enter fullscreen mode Exit fullscreen mode

Or you can get multiple values in this way

log.info(processor.imageToJson(GeminiImageExample.class.getClassLoader().getResource("images/sales.PNG"),"sales in 2013", "sales in 2015"));

Enter fullscreen mode Exit fullscreen mode

and your output will be

INFO: {
  "fields": [
    {
      "fieldName": "sales in 2013",
      "fieldType": "String",
      "fieldValue": "58"
    },
    {
      "fieldName": "sales in 2015",
      "fieldType": "String",
      "fieldValue": "67"
    }
  ]
}

Enter fullscreen mode Exit fullscreen mode

Example 2

example 1
We will convert this Pojo into Java code

FoodConsumption foodConsume = (FoodConsumption) processor.imageToPojo(GeminiImageExample.class.getClassLoader().getResource("images/PieChart.PNG"), FoodConsumption.class);
log.info(foodConsume.toString());

Enter fullscreen mode Exit fullscreen mode

FoodConsumption Pojo looks like this

@Getter
@Setter
@NoArgsConstructor
@ToString
public class FoodConsumption {
    @MapValueType(Double.class)
    @MapKeyType(String.class)
    private Map<String, Double> foodTypeToPercentage;
}
Enter fullscreen mode Exit fullscreen mode

Output looks like this

INFO: FoodConsumption(foodTypeToPercentage={Rice Dishes=0.3, Leafy Greens=0.15, Soups=0.25, Root Vegetables=0.2, Hot Drinks=0.1})
Enter fullscreen mode Exit fullscreen mode

Example 3

Image description

log.info(processor.imageToPojo(GeminiImageExample.class.getClassLoader().getResource("images/FruitsSold.PNG"), WeeklyFruitSales.class).toString());
Enter fullscreen mode Exit fullscreen mode

and our pojo looks like this

@Getter
@Setter
@NoArgsConstructor
@ToString
public class WeeklyFruitSales {
    @ListType(DailyFruitSales.class)
    private List<DailyFruitSales> dailySales;

    // Constructor
    public WeeklyFruitSales(List<DailyFruitSales> dailySales) {
        this.dailySales = dailySales;
    }

    // Getter and Setter
    public List<DailyFruitSales> getDailySales() {
        return dailySales;
    }

    public void setDailySales(List<DailyFruitSales> dailySales) {
        this.dailySales = dailySales;
    }
}
Enter fullscreen mode Exit fullscreen mode

and output is

INFO: WeeklyFruitSales(dailySales=[DailyFruitSales(dayOfWeek=Monday, fruitSales={Mango=12, Orange=10, Banana=5}), DailyFruitSales(dayOfWeek=Tuesday, fruitSales={Mango=15, Orange=13, Banana=6}), DailyFruitSales(dayOfWeek=Wednesday, fruitSales={Mango=7, Orange=9, Banana=6}), DailyFruitSales(dayOfWeek=Thursday, fruitSales={Mango=6, Orange=14, Banana=5}), DailyFruitSales(dayOfWeek=Friday, fruitSales={Mango=19, Orange=17, Banana=8}), DailyFruitSales(dayOfWeek=Saturday, fruitSales={Mango=19, Orange=21, Banana=10}), DailyFruitSales(dayOfWeek=Sunday, fruitSales={Mango=15, Orange=21, Banana=9})])
Enter fullscreen mode Exit fullscreen mode

Example 4
Imagine you're testing an online library system, and you encounter a complex user interface with various elements representing different sections, like books, members, and other UI components. Traditionally, you would inspect each element individually to validate its content and functionality. This involves identifying and interacting with each UI element separately, which can be time-consuming and error-prone.
Tools4AI transforms this process by offering a unique feature that converts an entire screen or webpage into a Plain Old Java Object (POJO). This powerful capability enables you to extract the structure and content of a complex UI in one step, significantly streamlining the testing process

Image description

The POJOS look like this

@Setter
@Getter
@NoArgsConstructor
@ToString
public class LibraryScreen {
    @ListType(Book.class)
    private List<Book> latestBooks;
    @ListType(Member.class)
    private List<Member> members;
}
Enter fullscreen mode Exit fullscreen mode

and

@Setter
@Getter
@NoArgsConstructor
@ToString
public class Book {
    private String title;
    private String author;
    private String genre;
    private boolean isAvailable;
    // Constructors, getters, and setters
}
Enter fullscreen mode Exit fullscreen mode

and

 @Setter
@Getter
@NoArgsConstructor
@ToString
public class Member {
    private String id;
    private String name;
    @Prompt(dateFormat = "ddMMyyyy")
    private Date membershipStart;
    private int booksLoaned;
    // Constructors, getters, and setters
}
Enter fullscreen mode Exit fullscreen mode

Please pay special attention to @Prompt(dateFormat = "ddMMyyyy") as this will convert the date to the specified format automatically
Since this is a complex screen we can use Transformer

String text = processor.imageToText(GeminiImageExample.class.getClassLoader().getResource("images/library.PNG"),"convert the entire screen to text");

GeminiV2PromptTransformer transformer = new GeminiV2PromptTransformer();

log.info(transformer.transformIntoPojo(text, LibraryScreen.class).toString());
Enter fullscreen mode Exit fullscreen mode

and the entire screen will be converted to POJO

INFO: LibraryScreen(latestBooks=[Book(title=The Great Gatsby, author=F. Scott Fitzgerald, genre=Fiction, isAvailable=true), Book(title=1984, author=George Orwell, genre=Dystopian, isAvailable=false)], members=[Member(id=001, name=Alice Smith, membershipStart=Thu Jan 12 00:00:00 EST 2023, booksLoaned=4), Member(id=002, name=Bob Johnson, membershipStart=Wed Feb 15 00:00:00 EST 2023, booksLoaned=2)])
Enter fullscreen mode Exit fullscreen mode

Example 5

Image description

String jsonStr = processor.imageToJson(GeminiImageExample.class.getClassLoader().getResource("images/auto.PNG"),"Full Inspection");
        log.info(jsonStr);
Enter fullscreen mode Exit fullscreen mode

and the result is

INFO: {
  "fields": [
    {
      "fieldName": "Full Inspection",
      "fieldType": "String",
      "fieldValue": "Starting at $99.99"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Or it can be converted into Pojo like this

@Getter
@Setter
@ToString
@NoArgsConstructor
@AllArgsConstructor
public class AutoRepairScreen {
    double fullInspectionValue;
    double tireRotationValue;
    double oilChangeValue;
    Integer phoneNumber;
    String email;
    String[] customerReviews;

}
Enter fullscreen mode Exit fullscreen mode

Example 6

Imagine encountering the project report screen shown in the image you uploaded. Tools4AI can transform this screen into a Java object, allowing you to interact with its data in a structured manner for further analysis or testing.

Image description
We will convert this entire project report to POJO so that we can take action

@NoArgsConstructor
@AllArgsConstructor
@Getter
@Setter
@EqualsAndHashCode
@ToString
public class ProjectDashboard {
    @MapKeyType(String.class)
    @MapValueType(Double.class)
    private Map<String, Double> featuresImplemented; // Map of quarter to percentage
    @MapKeyType(String.class)
    @MapValueType(Double.class)
    private Map<String, Double> expenses; // Map of quarter to percentage
    @ListType(ProjectStatus.class)
    private List<ProjectStatus> projectStatuses; // List of project status entries
    @ListType(Task.class)
    private List<Task> tasks; // List of tasks
    @ListType(String.class)
    private List<String> criticalItems; // List of critical items
    @ListType(String.class)
    private List<String> blockers;
}
Enter fullscreen mode Exit fullscreen mode

and

@NoArgsConstructor
@AllArgsConstructor
@Getter
@Setter
@EqualsAndHashCode
@ToString
public class Task {
    private String assignedTo;
    private String priority;
    private String status;
    private int completion;
}
Enter fullscreen mode Exit fullscreen mode

and

@NoArgsConstructor
@AllArgsConstructor
@Getter
@Setter
@EqualsAndHashCode
@ToString
public class ProjectStatus {
    private String projectName;
    @MapKeyType(String.class)
    @MapValueType(Integer.class)
    private Map<String, Integer> statusCounts;
}
Enter fullscreen mode Exit fullscreen mode

and then we call our image processor

GeminiImageActionProcessor processor = new GeminiImageActionProcessor();
ProjectDashboard projectDashboard = (ProjectDashboard) processor.imageToPojo(GeminiImageExample.class.getClassLoader().getResource("images/RAG.PNG"), ProjectDashboard.class);
log.info(projectDashboard.toString());
Enter fullscreen mode Exit fullscreen mode

and here is the result

INFO: ProjectDashboard(featuresImplemented={1st Qtr=64.0, 3rd Qtr=11.0, 2nd Qtr=25.0}, expenses={1st Qtr=25.0, 3rd Qtr=40.0, 2nd Qtr=35.0}, projectStatuses=[ProjectStatus(projectName=Neo, statusCounts={Issues=3, Features=2, Backlog=4}), ProjectStatus(projectName=Wypal, statusCounts={Issues=3, Features=1, Backlog=2}), ProjectStatus(projectName=Dorake, statusCounts={Issues=1, Features=3, Backlog=2}), ProjectStatus(projectName=Symphony, statusCounts={Issues=2, Features=3, Backlog=1})], tasks=[Task(assignedTo=John, priority=High, status=Done, completion=100), Task(assignedTo=Smith, priority=Normal, status=In progress, completion=20), Task(assignedTo=Zoya, priority=Low, status=Not started, completion=0), Task(assignedTo=Ellie, priority=High, status=In progress, completion=40)], criticalItems=[Order more RAM], blockers=[Server Upgrades, Core Processors])
Enter fullscreen mode Exit fullscreen mode

Conclusion

Tools4AI's image-to-Java object conversion feature provides a bridge between raw visual data and structured information, allowing you to process, analyze, and validate data in a way that is both efficient and accessible. It can streamline many tasks, from automated UI testing to data-driven applications, by enabling complex data manipulations with minimal effort.

Code for this article is here

๐Ÿ”ฅ ๐‘๐ž๐ฏ๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ข๐ณ๐ž ๐˜๐จ๐ฎ๐ซ ๐”๐ˆ ๐“๐ž๐ฌ๐ญ๐ข๐ง๐  ๐ฐ๐ข๐ญ๐ก ๐“๐จ๐จ๐ฅ๐ฌ4๐€๐ˆ! ๐Ÿ”ฅ

Tired of manually inspecting every element in a complex UI? Tools4AI is here to change the game. This groundbreaking technology converts entire screens or images into structured Java objects (POJOs), enabling you to streamline your automated UI testing, data extraction, and data analysis. No more tedious element-by-element validationโ€”Tools4AI does it all in one step!

๐Š๐ž๐ฒ ๐‡๐ข๐ ๐ก๐ฅ๐ข๐ ๐ก๐ญ๐ฌ:

๐Ÿš€๐‚๐จ๐ฆ๐ฉ๐ซ๐ž๐ก๐ž๐ง๐ฌ๐ข๐ฏ๐ž ๐ƒ๐š๐ญ๐š ๐„๐ฑ๐ญ๐ซ๐š๐œ๐ญ๐ข๐จ๐ง: Convert entire screens into Java objects for efficient data processing.
โšก ๐’๐ข๐ฆ๐ฉ๐ฅ๐ข๐Ÿ๐ข๐ž๐ ๐€๐ฎ๐ญ๐จ๐ฆ๐š๐ญ๐ข๐จ๐ง: Automate UI testing with a structured approach, interacting with POJOs instead of individual elements.
๐Ÿ”ฅ ๐„๐ง๐ก๐š๐ง๐œ๐ž๐ ๐„๐Ÿ๐Ÿ๐ข๐œ๐ข๐ž๐ง๐œ๐ฒ: Quickly validate entire UI sections to ensure design consistency and accuracy.

๐”๐ฌ๐ž๐ฌ:

๐Ÿฅ ๐‡๐ž๐š๐ฅ๐ญ๐ก๐œ๐š๐ซ๐ž: Convert images of medical records into structured data to streamline electronic health records (EHR) management and automate data entry.
๐Ÿ›’ ๐‘๐ž๐ญ๐š๐ข๐ฅ: Turn product images into Java objects to enhance inventory management and automate real-time stock tracking.
๐Ÿซ ๐„๐๐ฎ๐œ๐š๐ญ๐ข๐จ๐ง: Convert classroom whiteboard notes or scanned documents into Java objects for automated digitization and learning analytics.
๐Ÿญ ๐Œ๐š๐ง๐ฎ๐Ÿ๐š๐œ๐ญ๐ฎ๐ซ๐ข๐ง๐ : Automate quality control by converting product images into Java objects to detect defects and ensure compliance with manufacturing standards.
๐Ÿ’ฐ ๐…๐ข๐ง๐š๐ง๐œ๐ž: Transform financial reports or stock data into Java objects to automate financial analysis and streamline data processing.

artificialintelligence, #selenium, #Java, #automation, #Gemini, and #OpenAI #AI

Top comments (0)