{"667967":{"#nid":"667967","#data":{"type":"news","title":"Breakthrough Scaling Approach Cuts Cost, Improves Accuracy of Training DNN Models","body":[{"value":"\u003Cp\u003EA new machine-learning (ML) framework for clients with varied computing resources is the first of its kind to successfully scale deep neural network (DNN) models like those used to detect and recognize objects in still and video images.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe ability to uniformly scale the width (number of neurons) and depth (number of neural layers) of a DNN model means that remote clients can equitably participate in distributed, real-time training regardless of their computing resources. Resulting benefits include improved accuracy, increased efficiency, and reduced computational costs.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDeveloped by Georgia Tech researchers, the ScaleFL framework advances federated learning, which is an ML approach inspired by the personal data scandals of the past decade.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EFederated learning (FL), a term coined by Google in 2016, enables a DNN model to be trained across decentralized devices or servers. Because data aren\u2019t centralized with this approach, threats to data privacy and security are minimized.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe FL process begins with sending the initial parameters of a global DNN model to smartphones, IoT devices, edge servers, or other participating devices. These edge clients train their local version of the model using their unique data. All local results are aggregated and used to update the global model.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe process is repeated until the new model is fully trained and meets its design specifications.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EFederated learning works best when remote clients involved in training a new DNN model have comparable computational power and bandwidth. But training can bog down if some participating remote-client devices have limited or fluctuating computing resources.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u201cIn most real-life applications computational resources tend to differ significantly across clients. This heterogeneity prevents clients with insufficient resources from participating in certain FL tasks that require large models,\u201d said School of Computer Science (CS) Ph.D. student Fatih Ilhan.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u201cFederated learning should promote equitable AI practice by supporting a resource-adaptive learning framework that can scale to heterogeneous clients with limited capacity,\u201d said Ilhan, who is advised by Professor Ling Liu.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EIlhan is the lead author of\u0026nbsp;\u003Ca href=\u0022https:\/\/openaccess.thecvf.com\/content\/CVPR2023\/papers\/Ilhan_ScaleFL_Resource-Adaptive_Federated_Learning_With_Heterogeneous_Clients_CVPR_2023_paper.pdf\u0022\u003E\u003Cem\u003EScaleFL: Resource-Adaptive Federated Learning with Heterogeneous Clients\u003C\/em\u003E\u003C\/a\u003E, which he is presenting at the \u003Ca href=\u0022https:\/\/cvpr2023.thecvf.com\/\u0022\u003E2023 Conference on Computer Vision and Pattern Recognition\u003C\/a\u003E. CVPR 23 is set for June 18-22 in Vancouver, Canada.\u003C\/p\u003E\r\n\r\n\u003Cp\u003ECreating a framework that can adaptively scale the global DNN model based on a remote client\u2019s computing resources is no easy feat. Ilhan says the balance between a model\u2019s basic and complex feature extraction capabilities can be easily thrown out of whack when manipulating the number of neurons or the number of neuron layers of a DNN model.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u201cSince a deeper model is more capable of extracting higher order, complex features while a wider model has access to a finer resolution of lower-order, basic features, performing model size reduction across one dimension causes unbalance in terms of the learning capabilities of the resulting model,\u201d said Ilhan.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe team overcomes these challenges in part by incorporating early exit classifiers into ScaleFL.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThese ML-based tools are designed to optimize accuracy and efficiency by introducing intermediate decision points in the classification process. This capability enables a model to complete an inference task as soon as it is confident in its prediction, without having to process the whole model.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u201cScaleFL injects these classifiers to the global model at certain layers based on the model architecture and computational constraints at each complexity level. This enables forming low-cost local models by keeping the layers up to the corresponding exit,\u201d said Ilhan.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u201cTwo-dimensional scaling with splitting the model along depth and width dimensions yields uniformly scaled, efficient local models for resource-constrained clients. As a result, not only does the global model achieves better performance compared to baseline FL approaches and existing algorithms, but local models at different complexity levels also perform significantly better for clients that are resource-constrained at inference time.\u201d\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe exit classifiers that help balance a model\u2019s basic and complex features also play into the second part of ScaleFL\u2019s secret sauce, self-distillation.\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESelf-distillation is a form of knowledge distillation, which has been used to transfer knowledge from a \u2018teacher\u2019 model to a smaller \u2018student\u2019 model. ScaleFL applies this process within the same network by comparing early predictions made by the exit classifiers (students) and the final predictions of the last exit (teacher) of local models during optimization. This technique prevents isolation and improves the knowledge transfer among subnetworks of different levels in ScaleFL.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EIlhan and his collaborators extensively tested ScaleFL on three image classification datasets and two natural language processing datasets.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u201cOur experiments show that ScaleFL outperforms existing representative heterogeneous federated learning approaches. In local model evaluations, we were able to reduce latency by two times, and the model size by four times, all while keeping the performance loss below 2%,\u201d said Ilhan.\u003C\/p\u003E\r\n","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003ESchool of Computer Science researchers have developed a new framework that advances federated learning, a distributed, real-time approach for training deep neural network models. The new framework enables remote clients to equitably participate in training regardless of their computing resources.\u003C\/p\u003E\r\n","format":"limited_html"}],"field_summary_sentence":[{"value":"A new machine learning framework promotes equitable AI practice while advancing a popular distributed model training approach."}],"uid":"32045","created_gmt":"2023-06-02 02:05:38","changed_gmt":"2023-07-12 18:11:20","author":"Ben Snedeker","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2023-06-02T00:00:00-04:00","iso_date":"2023-06-02T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"670912":{"id":"670912","type":"image","title":"Georgia Tech CS Ph.D. student Ilhan Fatih","body":null,"created":"1685672138","gmt_created":"2023-06-02 02:15:38","changed":"1685672138","gmt_changed":"2023-06-02 02:15:38","alt":"An outdoor photo portrait of Georgia Tech CS Ph.D. student Ilhan Fatih","file":{"fid":"253878","name":"Screen Shot 2023-06-01 at 2.48.19 PM.png","image_path":"\/sites\/default\/files\/2023\/06\/01\/Screen%20Shot%202023-06-01%20at%202.48.19%20PM.png","image_full_path":"http:\/\/tlwarc.hg.gatech.edu\/\/sites\/default\/files\/2023\/06\/01\/Screen%20Shot%202023-06-01%20at%202.48.19%20PM.png","mime":"image\/png","size":740954,"path_740":"http:\/\/tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2023\/06\/01\/Screen%20Shot%202023-06-01%20at%202.48.19%20PM.png?itok=3dUsHc15"}}},"media_ids":["670912"],"groups":[{"id":"576481","name":"ML@GT"},{"id":"50875","name":"School of Computer Science"},{"id":"1188","name":"Research Horizons"}],"categories":[{"id":"153","name":"Computer Science\/Information Technology and Security"},{"id":"135","name":"Research"},{"id":"8862","name":"Student Research"}],"keywords":[{"id":"187915","name":"go-researchnews"}],"core_research_areas":[{"id":"39431","name":"Data Engineering and Science"}],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003EBen Snedeker, Communications Manager II\u003Cbr \/\u003E\r\nGeorgia Tech\u003Cbr \/\u003E\r\nCollege of Computing\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ealbert.snedeker@cc.gatech.edu\u003C\/p\u003E\r\n","format":"limited_html"}],"email":["albert.snedeker@cc.gatech.edu"],"slides":[],"orientation":[],"userdata":""}}}