DEV Community

Gary
Gary

Posted on • Edited on

Pivot Data Using Informatica Java Transformation

Java Transformation is a powerful feature in Informatica, though examples on its usage are scarce online. This scarcity is likely because Java Transformation isn't typically necessary for most mappings. However, in certain scenarios, using Java Transformation can simplify and streamline your mappings. In this blog post, I will provide a comprehensive example of how to use Java Transformation to pivot data.

Suppose you have source data like this:
Source:

STUDENTID NAME SUBJECT SCORE
1001 Alice Math 100
1001 Alice English 85
1002 Bob Math 98
1002 Bob English 95
1002 Bob Science 90
1003 Chris Math 95
1003 Chris Science 90
1003 Chris English 80

You want your target like this:

STUDENTID NAME MATH ENGLISH SCIENCE
1001 Alice 100 85 0
1002 Bob 98 95 90
1003 Chris 95 80 90

Typically, achieving this requires using Expression, Sorter, and Aggregator Transformations. While there are many online discussions covering this method, it can be tedious and difficult to maintain, especially when dealing with numerous columns and various aggregations.

Using Java Transformation can simplify this process significantly. Here are the steps to pivot the source data using Java Transformation:

Step 1 – Create a Mapping
Create a mapping with the source "PIVOT_TEST_SRC" and the target "PIVOT_TEST_TGT":

Alt Text

Step 2 – Create the Java Transformation
In the "Create Transformation" window, select "Java" from the list of available transformations and enter a name for your transformation. Here’s how you can do it:

  1. Open the Mapping Designer: Navigate to the Mapping Designer tool in Informatica.
  2. Add a New Transformation: Click on the "Transformation" menu and select "Create."
  3. Select Java Transformation: From the list of available transformations, select "Java."
  4. Name Your Transformation: Enter a meaningful name for your Java Transformation, such as "Java_Pivot_Src".
  5. Click Create: Click the "Create" button, then click "Done." This sets up the Java Transformation, allowing you to customize it to pivot your data as needed. Alt Text

Then choose Active type. With the Active type, the number of input rows can differ from the number of output rows.
Alt Text

Step 3 – Drag and drop all the Source qualifier columns to the Java transformation:
Drag and drop each column from the Source Qualifier to the Java Transformation. This action creates input ports in the Java Transformation for each column.

Ensure that the names and data types of the input ports in the Java Transformation match those of the Source Qualifier columns.

Alt Text

Step 4 – Add Output Ports in the Java Transformation
To properly configure the Java Transformation to pivot your data, you'll need to define the necessary output ports. Here's how to do it:

  1. Open Java Transformation Properties: Double-click on the Java Transformation (e.g., Java_pivot_src) in your mapping to open its properties window.

  2. Navigate to the Ports Tab: Click on the "Ports" tab within the properties window.

  3. Add Output Ports: Add the following output ports, which will correspond to the pivoted data structure you want to achieve:

  • out_studentId
  • out_studentName
  • out_mathScore
  • out_englishScore
  • out_scienceScore
    Here are the steps to add each port:

  • Click on the Add Button: This is usually represented by a "+" icon or similar.

  • Name the Port: Enter the name for each port (e.g., out_studentId).

  • Set the Data Type: Ensure that the data type for each port matches the expected data type of the source data (e.g., String, Integer, etc.).

  • Configure Data Types and Precision: Make sure to configure the data types and precision for each output port appropriately. This ensures that the data will be correctly processed and transferred to the target.

Your Ports tab should look something like this after adding the output ports:
Alt Text

Save your work. You mapping looks like this:

Alt Text

Step 5 – Implement the Java Code for Pivoting Data
To implement the Java code necessary for pivoting your data, follow these detailed steps:

Open Java Code Tab: Double-click on the Java Transformation (e.g., Java_pivot_src) in your mapping to open its properties window. Then, navigate to the "Java Code" tab at the top.

On Input Row Tab: By default, you will be in the "On Input Row" tab at the bottom of the window. This is where you will write the code that processes each input row.

Understand Other Tabs: Familiarize yourself with other tabs like “Import Packages”, “Helper Code”, and “On End of Data”. These tabs allow you to import necessary Java packages, write helper methods, and define code that executes after all data has been processed, respectively.

Example Using Java HashMap
For this example, you will use a Java HashMap to store student names, subjects, and their respective scores. The keys in the HashMap will be the target column names (Name, Math, English, Science), and the values will be the student names and scores.

Alt Text

Here’s how you can implement this:

                             Name ->  student_name
                             Math ->  math_score
                          English -> English_score
                          Science -> science_score
Enter fullscreen mode Exit fullscreen mode

This hashmap will be embedded in another hashmap (allRecords) where key is student id. So the final hashmap structure looks like:

                            |-->  Name ->  student_name
               Student_id-> |-->  Math ->   math_score
                            |--> English ->  English_score
                            |-->  Science-> science_score
Enter fullscreen mode Exit fullscreen mode

Based on this idea, put the following two lines in “Import Packages” tab:

import java.util.HashMap;
import java.util.Map;
Enter fullscreen mode Exit fullscreen mode

Select “Helper Code”, declare the variables required inside the Java code:

private HashMap<String, HashMap<String, Object>> allRecords= new HashMap<String, HashMap<String, Object>>();
static int inputRows=0;
static int outputRows=0;
Enter fullscreen mode Exit fullscreen mode

Select “On Input row” tab, write the following code:

             HashMap<String, Object> subjectSore;
             if (!isNull("STUDENTID")) {
                inputRows += 1;
                if (allRecords.containsKey(STUDENTID)) {
                    subjectSore = allRecords.get(STUDENTID);
                    subjectSore.put(SUBJECT, SCORE);
                } else { //build new key value pair
                    subjectSore = new HashMap<String, Object>();
                    subjectSore.put("Name", NAME);
                    subjectSore.put("Math", null);
                    subjectSore.put("English", null);
                    subjectSore.put("Science", null);
                    subjectSore.put(SUBJECT, SCORE);
                    //add subjectSore to hashmap
                    allRecords.put(STUDENTID, subjectSore);
                }
              }
Enter fullscreen mode Exit fullscreen mode

“On End of Data” tab, put the following code for output results:

HashMap<String, Object> outsubjectSoreMap;
for (Map.Entry<String, HashMap<String, Object>> entry : allRecords.entrySet()) {
                outsubjectSoreMap = entry.getValue();
                out_studentId = entry.getKey(); //get("STUDENTID");
                out_studentName = (String) outsubjectSoreMap.get("Name");


                if (outsubjectSoreMap.get("Math") == null) {
                    out_mathSore = "";
                } else {
                    out_mathSore = (String) outsubjectSoreMap.get("Math");
                }

                if (outsubjectSoreMap.get("English") == null) {
                    out_englishSore = "";
                } else {
                    out_englishSore = (String) outsubjectSoreMap.get("English");
                }

                if (outsubjectSoreMap.get("Science") == null) {
                    out_scienceScore = "";
                } else {
                    out_scienceScore = (String) outsubjectSoreMap.get("Science");
                }

               generateRow();
               outputRows +=1;
            } // end for loop: output records  
            allRecords.clear();  //release memory
            logInfo("The total number of records generated is: " +  outputRows);

            inputRows=0; 
            outputRows=0; 
Enter fullscreen mode Exit fullscreen mode

generateRow() is PowerCenter Java API, which generates rows according to the defined output ports values.
If the input records are too many to hold in the hashmap, you can use generateRow() in the “On Input row” tab to generate some output ports values and then release memory. If you want to this, you need sort expression to sort the input records before Java Transformation, Otherwise you may get duplicated student ids in output records.

             HashMap<String, Object> subjectSore;
             if (!isNull("STUDENTID")) {
                inputRows += 1;
                if (allRecords.containsKey(STUDENTID)) {
                    subjectSore = allRecords.get(STUDENTID);
                    subjectSore.put(SUBJECT, SCORE);
                } else { //build new key value pair
                      if (inputRows >=1000) {  // output records when processed 1000 rows  to release memory
                          HashMap<String, Object> outsubjectSoreMap;
                        for (Map.Entry<String, HashMap<String, Object>> entry : allRecords.entrySet()) {
                outsubjectSoreMap = entry.getValue();
                out_studentId = entry.getKey(); //get("STUDENTID");
                out_studentName = (String) outsubjectSoreMap.get("Name");


                if (outsubjectSoreMap.get("Math") == null) {
                    out_mathSore = "";
                } else {
                    out_mathSore = (String) outsubjectSoreMap.get("Math");
                }

                if (outsubjectSoreMap.get("English") == null) {
                    out_englishSore = "";
                } else {
                    out_englishSore = (String) outsubjectSoreMap.get("English");
                }

                if (outsubjectSoreMap.get("Science") == null) {
                    out_scienceScore = "";
                } else {
                    out_scienceScore = (String) outsubjectSoreMap.get("Science");
                }

               generateRow();
               outputRows +=1;
            } // end for loop: output records  
            allRecords.clear();  //release memory
                        }  //end if countRows >= 1000 

                    subjectSore = new HashMap<String, Object>();
                    subjectSore.put("Name", NAME);
                    subjectSore.put("Math", null);
                    subjectSore.put("English", null);
                    subjectSore.put("Science", null);
                    subjectSore.put(SUBJECT, SCORE);
                    //add subjectSore to hashmap
                    allRecords.put(STUDENTID, subjectSore);
                }
              }
Enter fullscreen mode Exit fullscreen mode

Inside the java code, you can use java System.out.println() or PowerCenter Java API logInfo() to log debug information, which can be found in session log.

I added an Expression transformation to convert string to integer to match data type in target:
Alt Text

that’s it.

If you Java code ran into exceptions, open the session log and found the error messages. For example, here was the error I got:

JAVA PLUGIN_1762    [ERROR] java.lang.NullPointerException
JAVA PLUGIN_1762    [ERROR]     at com.informatica.powercenter.server.jtx.JTXPartitionDriverImplGen.execute(JTXPartitionDriverImplGen.java:382)
Enter fullscreen mode Exit fullscreen mode

This “JTXPartitionDriverImplGen.java:382” gave you where exactly which row has the error. Open the Java transformation, click the ‘Full code” link, it pops up a window with full java code in it. Copy the java code and paste to your favorite editor, go the error line (in this example the line number is 382) then analyze why it threw you that error.

Conclusion
As you can see from the example above, using Java Transformation to pivot data in Informatica is straightforward and powerful. This transformation allows you to perform complex data manipulations with ease. Beyond pivoting data, Java Transformation can be leveraged for various tasks such as compressing data, encrypting/decrypting data, concatenating fields, and aggregating special fields.

With minor modifications to the code used in this example, you can extend the functionality to:

Compress Data: Implement data compression algorithms to reduce the size of data.
Encrypt/Decrypt Data: Use encryption libraries to secure sensitive data.
Concatenate Fields: Combine multiple fields into a single field for easier data management.
Aggregate Special Fields: Perform custom aggregations that might not be straightforward with standard transformations.
Moreover, Java Transformation is highly versatile, enabling you to build mappings for ETL processes that transfer data between non-SQL databases and SQL databases seamlessly.

By integrating Java Transformation into your Informatica mappings, you can enhance your ETL workflows, making them more efficient and easier to maintain. The flexibility and power of Java code within the ETL process open up numerous possibilities for data transformation and management.

Top comments (0)