Practical FATE-LLM Task with KubeFATE — A Hands-on Approach

#fate #kubefate #llm #fatellm

Author: Fangchi Wang, Senior Member of Technical Staff, Office of the CTO at VMware. Chenlong Ma, Henry Zhang and Layne Peng from Office of the CTO also contributed to this blog.

As mentioned in the previous blog post, the FATE community has recently released FATE-LLM along with FATE v1.11. By integrating federated learning with large language models fine-tuning techniques, we now have the ability to build our own LLMs using distributed data without the need to centralize them. This approach ensures data privacy while enhancing model performance. In this blog, we will provide a detailed, step-by-step guide on executing an example FATE-LLM job. We use KubeFATE here to manage the deployment and workflow in a cloud-native fashion.

Task and Environment Configurations

In this example, we will simulate a two-parties federated learning scenario in which we will be fine-tuning a pre-trained GPT2 model for a sentiment classification task. A classification head is added for this particular downstream task.

For the experiment environment, two Kubernetes clusters were created, representing the two parties. Each Kubernetes cluster contains a node that is a vSphere virtual machine that integrates with a Nvidia V100 GPU via PCI passthrough. We used Docker and cri-docker, together with the nvidia-container-runtime and k8s-device-plugin to work with the GPUs within the Kubernetes clusters. Specific versions of all the key dependencies are listed below:

The dataset for this sentiment classification task is the Large Move Review Dataset (paper) from Stanford AI Lab. Since we are conducting federated learning jobs involving two parties, we will divide the dataset into two parts, with each party owning 12500 entries. After performing some straightforward preprocessing, the resulting dataset to be used will look like:

As introduced in the previous blog, FATE-LLM offers several PEFT (Parameter Efficient Fine-Tuning) methods that can be applied to fine-tuning the LLM. By utilizing these PEFT methods, we can train only a small number of newly added parameters and keep the original weights frozen, thereby reducing the computational costs with federated learning aggregation. In this practice, we will explore three different PEFT methods: Houlsby Adapter, LoRA, and IA3.

PEFT Methods Used Here

To fine-tune an LLM, the typical approach is updating all model parameters during the training process. On the other hand, parameter-efficient fine-tuning (PEFT) methods have been proposed, which offer comparable performance to “full fine-tuning” while significantly reducing the number of parameters to update. In this section, we will introduce three PEFT methods.

· Houlsby Adapter: is proposed by Houlsby et al.(link). This method introduces two adapter layers into the transformer architecture of the LLM, after the attention block and feed-forward block, respectively. Each adapter layer is just a 2 layer fully connected neural network. During training, only the weights of the newly added adapter layers are updated.

· LoRA: stands for Low-Rank Adaptation, which is presented by Hu et al. (link). The key idea behind LoRA is that when adapting to new tasks, the updated weight matrices do not have to be of the same rank as the original weight matrices. In that sense we can use two smaller matrices to represent the update, meaning we can simply train this two new small matrices.

· IA3: or (IA)³, an acronym for Infused Adapter by Inhibiting and Amplifying Inner Activations, is proposed by Liu et al. (link). It proposes an alternative strategy by introducing additional vectors to the attention block and the final feed-forward block to do element-wise multiplication. Only the newly added vectors will be updated during the fine-tuning process.

While FATE-LLM supports a variety of other methods that can be easily configured through its programming interface, for the purpose of simplicity, we will focus on these three approaches as examples.

Launch FATE-LLM Jobs

We can follow the guide in the KubeFATE repo to deploy KubeFATE and FATE clusters. To use FATE-LLM, we need to explicitly specify certain settings in the cluster.yaml file. Firstly, it should have the algorithm parameter set to “NN”, indicating we want to use the FATE container image containing FATE-LLM and related modules. To enable GPU usage, the device parameter should be set to “GPU.” Additionally, in the configuration of the Python component, the resources section should be adjusted to include the requests and limits for GPU resources, such as nvidia.com/gpu: 1.

Below is an example demonstrating these specific settings. This guide uses the KubeFATE example file to deploy the FATE clusters.

To verify the environment and settings after the FATE cluster is deployed, we can enter the fateflow pod using kubectl exec and use the nvidia-smi command to inspect the available GPU resources within it. We can further verify that both FATE parties can work together by running the flow test toy program in that container.

By following the FATE Official GPT-2 example, we can start FATE-LLM jobs now. Firstly, both parties would upload their local data into the FATE system. This includes putting the preprocessed CSV files into each party’s fateflow container, respectively. Then we use the following code in a Jupyter Notebook to bind the uploaded data into FATE. Note this step should be performed in both parties.

Then, on the guest party, for example party 10000, we can start FATE-LLM jobs following the official guide’s “Submit Federated Task” section. With the following minor changes:

· We can skip the bind table code as we already did above.

· Change config=GPT2Config().to_dict() to pretrained_path=’gpt2' in the model definition line.

· Add save_to_local_dir=True, pin_memory=False to the TrainerParam to accelerate the job time.

We ran multiple jobs with different settings to evaluate their impact and the result:

· The CUDA parameter in the TrainerParam can be toggled to compare the difference of between using CPU and GPU.

· By specifying the aggregate_every_n_epoch in the TrainerParam, we can control the local rounds of training before doing global aggregation.

· To use different PEFT methods, we can change the adapter_type parameter for the model definition like “HoulsbyConfig”, “LoRAConfig”, etc.

Result Analysis

As mentioned earlier, we conducted multiple FATE-LLM jobs with various settings, including different PEFT approaches, device types, aggregation frequencies, and more. Each FATE-LLM job ran for 10 epochs, after which we collected performance metrics of the trained model, the amount of data transmitted, as well as job-related metrics such as execution time and GPU usage. The results are presented in the table below (click to view the larger version):

Regarding the comparison of different PEFT approaches, as expected, reducing the number of trainable parameters led to a slight degradation in model performance. But the data being transmitted is also reduced. So choosing the right approach is striking a trade-off between these considerations.

Some other general findings are:

· All FATE-LLM jobs produced trained models capable of achieving an AUC (Area Under the Curve) of over 0.95, which is comparable to full fine-tuning.

· The ratio of trained parameters compared to full fine-tuning ranged from 0.04% to 1.4%, resulting in an acceptable network communication cost.

· The cost can be further reduced by allowing more local training epochs before performing aggregation, without significant degradation in the resulting model’s performance.

Conclusion & Next Steps

This blog provides a comprehensive walkthrough of the steps involved in running a FATE-LLM job. We begin by detailing the FATE deployment process using KubeFATE, focusing on enabling GPU support within the corresponding Kubernetes pod. Subsequently, we demonstrate a simple FATE-LLM fine-tuning job for a sentiment classification task. The results illustrate the capability of distributed parties to collectively train an LLM model using their local data without sharing them. The integration of PEFT methods in FATE-LLM significantly reduces communication costs to an acceptable level, further enhancing its effectiveness.

It is important to note that real-world use cases may require additional preprocessing, adjustments to settings, and deployment configurations. But this example can serve as a reference for running FATE-LLM experiments. As FATE-LLM continues to evolve and more features are released, we will continue conducting further experiments and tests to showcase the extensive capabilities of FATE-LLM.

FATE GitHub Repo: https://github.com/FederatedAI/FATE

KubeFATE GitHub Repo: https://github.com/FederatedAI/KubeFATE

FATE-LLM Introduction Document: https://github.com/FederatedAI/FATE/blob/v1.11.1/doc/federatedml_component/fate_llm.md