Learning to grow machine-learning models | MIT News | Massachusetts Institute of Technology – MIT News
Its no secret that OpenAIs ChatGPT has some incredible capabilities for instance, the chatbot can write poetry that resembles Shakespearean sonnets or debug code for a computer program. These abilities are made possible by the massive machine-learning model that ChatGPT is built upon. Researchers have found that when these types of models become large enough, extraordinary capabilities emerge.
But bigger models also require more time and money to train. The training process involves showing hundreds of billions of examples to a model. Gathering so much data is an involved process in itself. Then come the monetary and environmental costs of running many powerful computers for days or weeks to train a model that may have billions of parameters.
Its been estimated that training models at the scale of what ChatGPT is hypothesized to run on could take millions of dollars, just for a single training run. Can we improve the efficiency of these training methods, so we can still get good models in less time and for less money? We propose to do this by leveraging smaller language models that have previously been trained, says Yoon Kim, an assistant professor in MITs Department of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).
Rather than discarding a previous version of a model, Kim and his collaborators use it as the building blocks for a new model. Using machine learning, their method learns to grow a larger model from a smaller model in a way that encodes knowledge the smaller model has already gained. This enables faster training of the larger model.
Their technique saves about 50 percent of the computational cost required to train a large model, compared to methods that train a new model from scratch. Plus, the models trained using the MIT method performed as well as, or better than, models trained with other techniques that also use smaller models to enable faster training of larger models.
Reducing the time it takes to train huge models could help researchers make advancements faster with less expense, while also reducing the carbon emissions generated during the training process. It could also enable smaller research groups to work with these massive models, potentially opening the door to many new advances.
As we look to democratize these types of technologies, making training faster and less expensive will become more important, says Kim, senior author of a paper on this technique.
Kim and his graduate student Lucas Torroba Hennigen wrote the paper with lead author Peihao Wang, a graduate student at the University of Texas at Austin, as well as others at the MIT-IBM Watson AI Lab and Columbia University. The research will be presented at the International Conference on Learning Representations.
The bigger the better
Large language models like GPT-3, which is at the core of ChatGPT, are built using a neural network architecture called a transformer. A neural network, loosely based on the human brain, is composed of layers of interconnected nodes, or neurons. Each neuron contains parameters, which are variables learned during the training process that the neuron uses to process data.
Transformer architectures are unique because, as these types of neural network models get bigger, they achieve much better results.
This has led to an arms race of companies trying to train larger and larger transformers on larger and larger datasets. More so than other architectures, it seems that transformer networks get much better with scaling. Were just not exactly sure why this is the case, Kim says.
These models often have hundreds of millions or billions of learnable parameters. Training all these parameters from scratch is expensive, so researchers seek to accelerate the process.
One effective technique is known as model growth. Using the model growth method, researchers can increase the size of a transformer by copying neurons, or even entire layers of a previous version of the network, then stacking them on top. They can make a network wider by adding new neurons to a layer or make it deeper by adding additional layers of neurons.
In contrast to previous approaches for model growth, parameters associated with the new neurons in the expanded transformer are not just copies of the smaller networks parameters, Kim explains. Rather, they are learned combinations of the parameters of the smaller model.
Learning to grow
Kim and his collaborators use machine learning to learn a linear mapping of the parameters of the smaller model. This linear map is a mathematical operation that transforms a set of input values, in this case the smaller models parameters, to a set of output values, in this case the parameters of the larger model.
Their method, which they call a learned Linear Growth Operator (LiGO), learns to expand the width and depth of larger network from the parameters of a smaller network in a data-driven way.
But the smaller model may actually be quite large perhaps it has a hundred million parameters and researchers might want to make a model with a billion parameters. So the LiGO technique breaks the linear map into smaller pieces that a machine-learning algorithm can handle.
LiGO also expands width and depth simultaneously, which makes it more efficient than other methods. A user can tune how wide and deep they want the larger model to be when they input the smaller model and its parameters, Kim explains.
When they compared their technique to the process of training a new model from scratch, as well as to model-growth methods, it was faster than all the baselines. Their method saves about 50 percent of the computational costs required to train both vision and language models, while often improving performance.
The researchers also found they could use LiGO to accelerate transformer training even when they didnt have access to a smaller, pretrained model.
I was surprised by how much better all the methods, including ours, did compared to the random initialization, train-from-scratch baselines. Kim says.
In the future, Kim and his collaborators are looking forward to applying LiGO to even larger models.
The work was funded, in part, by the MIT-IBM Watson AI Lab, Amazon, the IBM Research AI Hardware Center, Center for Computational Innovation at Rensselaer Polytechnic Institute, and the U.S. Army Research Office.
See the original post here:
Learning to grow machine-learning models | MIT News | Massachusetts Institute of Technology - MIT News
- Infleqtion Secures $2M U.S. Army Contract to Advance Contextual Machine Learning for Assured Navigation and Timing - Yahoo Finance - December 12th, 2025 [December 12th, 2025]
- A county-level machine learning model for bottled water consumption in the United States - ESS Open Archive - December 12th, 2025 [December 12th, 2025]
- Grainge AI: Solving the ingredient testing blind spot with machine learning - foodingredientsfirst.com - December 12th, 2025 [December 12th, 2025]
- Improved herbicide stewardship with remote sensing and machine learning decision-making tools - Open Access Government - December 12th, 2025 [December 12th, 2025]
- Hero Medical Technologies Awarded OTA by MTEC to Advance Machine Learning and Wearable Sensing for Field Triage - PRWeb - December 12th, 2025 [December 12th, 2025]
- Lieprune Achieves over Compression of Quantum Neural Networks with Negligible Performance Loss for Machine Learning Tasks - Quantum Zeitgeist - December 12th, 2025 [December 12th, 2025]
- WFS Leverages Machine Learning to Accurately Forecast Air Cargo Volumes and Align Workforce Resources - Metropolitan Airport News - December 12th, 2025 [December 12th, 2025]
- "Emerging AI and Machine Learning Technologies Revolutionize Diagnostic Accuracy in Endoscope Imaging" - GlobeNewswire - December 12th, 2025 [December 12th, 2025]
- Study Uses Multi-Scale Machine Learning to Classify Cognitive Status in Parkinsons Disease Patients - geneonline.com - December 12th, 2025 [December 12th, 2025]
- WFS uses machine learning to forecast cargo volumes and staffing - STAT Times - December 12th, 2025 [December 12th, 2025]
- Portfolio Management with Machine Learning and AI Integration - The AI Journal - December 12th, 2025 [December 12th, 2025]
- AI, Machine Learning to drive power sector transformation: Manohar Lal - DD News - December 7th, 2025 [December 7th, 2025]
- AI WebTracker and Machine-Learning Compliance Tools Help Law Firms Acquire High-Value Personal Injury Cases While Reducing Fake Leads and TCPA Risk -... - December 7th, 2025 [December 7th, 2025]
- AI AND MACHINE LEARNING BASED APPLICATIONS TO PLAY PIVOTAL ROLE IN TRANSFORMING INDIAS POWER SECTOR, SAYS SHRI MANOHAR LAL - pib.gov.in - December 7th, 2025 [December 7th, 2025]
- AI and Machine Learning to Transform Indias Power Sector, Says Manohar Lal - The Impressive Times - December 7th, 2025 [December 7th, 2025]
- Exploring LLMs with MLX and the Neural Accelerators in the M5 GPU - Apple Machine Learning Research - November 23rd, 2025 [November 23rd, 2025]
- Machine learning model for HBsAg seroclearance after 48-week pegylated interferon therapy in inactive HBsAg carriers: a retrospective study - Virology... - November 23rd, 2025 [November 23rd, 2025]
- IIT Madras Free Machine Learning Course 2026: What to know - Times of India - November 23rd, 2025 [November 23rd, 2025]
- Towards a Better Evaluation of 3D CVML Algorithms: Immersive Debugging of a Localization Model - Apple Machine Learning Research - November 23rd, 2025 [November 23rd, 2025]
- A machine-learning powered liquid biopsy predicts response to paclitaxel plus ramucirumab in advanced gastric cancer: results from the prospective IVY... - November 23rd, 2025 [November 23rd, 2025]
- Monitoring for early prediction of gram-negative bacteremia using machine learning and hematological data in the emergency department - Nature - November 23rd, 2025 [November 23rd, 2025]
- Development and validation of an interpretable machine learning model for osteoporosis prediction using routine blood tests: a retrospective cohort... - November 23rd, 2025 [November 23rd, 2025]
- Snowflake Supercharges Machine Learning for Enterprises with Native Integration of NVIDIA CUDA-X Libraries - Snowflake - November 23rd, 2025 [November 23rd, 2025]
- Rethinking Revenue: How AI and Machine Learning Are Unlocking Hidden Value in the Post-Booking Space - Aviation Week Network - November 23rd, 2025 [November 23rd, 2025]
- Machine Learning Prediction of Material Properties Improves with Phonon-Informed Datasets - Quantum Zeitgeist - November 23rd, 2025 [November 23rd, 2025]
- A predictive model for the treatment outcomes of patients with secondary mitral regurgitation based on machine learning and model interpretation - BMC... - November 23rd, 2025 [November 23rd, 2025]
- Mobvista (1860.HK) Delivers Solid Revenue Growth in Q3 2025 as Mintegral Strengthens Its AI and Machine Learning Technology - Business Wire - November 23rd, 2025 [November 23rd, 2025]
- Machine learning beats classical method in predicting cosmic ray radiation near Earth - Phys.org - November 23rd, 2025 [November 23rd, 2025]
- Top Ways AI and Machine Learning Are Revolutionizing Industries in 2025 - nerdbot - November 23rd, 2025 [November 23rd, 2025]
- Snowflake Supercharges Machine Learning for Enterprises with Native Integration of NVIDIA CUDA-X Libraries - Yahoo Finance - November 18th, 2025 [November 18th, 2025]
- An interpretable machine learning model for predicting 5year survival in breast cancer based on integration of proteomics and clinical data -... - November 18th, 2025 [November 18th, 2025]
- scMFF: a machine learning framework with multiple feature fusion strategies for cell type identification - BMC Bioinformatics - November 18th, 2025 [November 18th, 2025]
- URI professor examines how machine learning can help with depression diagnosis Rhody Today - The University of Rhode Island - November 18th, 2025 [November 18th, 2025]
- Predicting drug solubility in supercritical carbon dioxide green solvent using machine learning models based on thermodynamic properties - Nature - November 18th, 2025 [November 18th, 2025]
- Relationship between C-reactive protein triglyceride glucose index and cardiovascular disease risk: a cross-sectional analysis with machine learning -... - November 18th, 2025 [November 18th, 2025]
- Using machine learning to predict student outcomes for early intervention and formative assessment - Nature - November 18th, 2025 [November 18th, 2025]
- Prevalence, associated factors, and machine learning-based prediction of probable depression among individuals with chronic diseases in Bangladesh -... - November 18th, 2025 [November 18th, 2025]
- Snowflake supercharges machine learning for enterprises with native integration of Nvidia CUDA-X libraries - MarketScreener - November 18th, 2025 [November 18th, 2025]
- Unlocking Cardiovascular Disease Insights Through Machine Learning - BIOENGINEER.ORG - November 18th, 2025 [November 18th, 2025]
- Machine learning boosts solar forecasts in diverse climates of India - researchmatters.in - November 18th, 2025 [November 18th, 2025]
- Big Data Machine Learning In Telecom Market by Type and Application Set for 14.8% CAGR Growth Through 2033 - openPR.com - November 18th, 2025 [November 18th, 2025]
- How Humans Could Soon Understand and Talk to Animals, Thanks to Machine Learning - SYFY - November 10th, 2025 [November 10th, 2025]
- Machine learning based analysis of diesel engine performance using FeO nanoadditive in sterculia foetida biodiesel blend - Nature - November 10th, 2025 [November 10th, 2025]
- Machine Learning in Maternal Care - Johns Hopkins Bloomberg School of Public Health - November 10th, 2025 [November 10th, 2025]
- Machine learning-based differentiation of benign and malignant adrenal lesions using 18F-FDG PET/CT: a two-stage classification and SHAP... - November 10th, 2025 [November 10th, 2025]
- How to Better Use AI and Machine Learning in Dermatology, With Renata Block, MMS, PA-C - HCPLive - November 10th, 2025 [November 10th, 2025]
- Avoiding Catastrophe: The Importance of Privacy when Leveraging AI and Machine Learning for Disaster Management - CSIS | Center for Strategic and... - November 10th, 2025 [November 10th, 2025]
- Efferocytosis-related signatures identified via Single-cell analysis and machine learning predict TNBC outcomes and immunotherapy response - Nature - November 10th, 2025 [November 10th, 2025]
- Arc Raiders' use of AI highlights the tension and confusion over where machine learning ends and generative AI begins - PC Gamer - November 3rd, 2025 [November 3rd, 2025]
- From performance to prediction: extracting aging data from the effects of base load aging on washing machines for a machine learning model - Nature - November 3rd, 2025 [November 3rd, 2025]
- Meet 'kvcached': A Machine Learning Library to Enable Virtualized, Elastic KV Cache for LLM Serving on Shared GPUs - MarkTechPost - October 28th, 2025 [October 28th, 2025]
- Bayesian-optimized machine learning boosts actual evapotranspiration prediction in water-stressed agricultural regions of China - Nature - October 28th, 2025 [October 28th, 2025]
- Using machine learning to shed light on how well the triage systems work - News-Medical - October 28th, 2025 [October 28th, 2025]
- Our Last Hope Before The AI Bubble Detonates: Taming LLMs - Machine Learning Week US - October 28th, 2025 [October 28th, 2025]
- Using multiple machine learning algorithms to predict spinal cord injury in patients with cervical spondylosis: a multicenter study - Nature - October 28th, 2025 [October 28th, 2025]
- The diagnostic potential of proteomics and machine learning in Lyme neuroborreliosis - Nature - October 28th, 2025 [October 28th, 2025]
- Using unsupervised machine learning methods to cluster cardio-metabolic profile of the middle-aged and elderly Chinese with general and central... - October 28th, 2025 [October 28th, 2025]
- The prognostic value of POD24 for multiple myeloma: a comprehensive analysis based on traditional statistics and machine learning - BMC Cancer - October 28th, 2025 [October 28th, 2025]
- Reducing inequalities using an unbiased machine learning approach to identify births with the highest risk of preventable neonatal deaths - Population... - October 28th, 2025 [October 28th, 2025]
- Association between SHR and mortality in critically ill patients with CVD: a retrospective analysis and machine learning approach - Diabetology &... - October 28th, 2025 [October 28th, 2025]
- AI-Powered Visual Storytelling: How Machine Learning Transforms Creative Content Production - About Chromebooks - October 28th, 2025 [October 28th, 2025]
- How beauty brand Shiseido nearly tripled revenue per user with machine learning - Performance Marketing World - October 28th, 2025 [October 28th, 2025]
- Magnite introduces machine learning-powered ad podding for streaming platforms - PPC Land - October 26th, 2025 [October 26th, 2025]
- Krafton is an AI first company and will invest 70M USD on machine learning - Female First - October 26th, 2025 [October 26th, 2025]
- Machine learning prediction of bacterial optimal growth temperature from protein domain signatures reveals thermoadaptation mechanisms - BMC Genomics - October 24th, 2025 [October 24th, 2025]
- Data Proportionality and Its Impact on Machine Learning Predictions of Ground Granulated Blast Furnace Slag Concrete Strength | Newswise - Newswise - October 24th, 2025 [October 24th, 2025]
- The Evolution of Machine Learning and Its Applications in Orthopaedics: A Bibliometric Analysis - Cureus - October 24th, 2025 [October 24th, 2025]
- Sentiment Analysis with Machine Learning Achieves 83.48% Accuracy in Predicting Consumer Behavior Trends - Quantum Zeitgeist - October 24th, 2025 [October 24th, 2025]
- Use of machine learning for risk stratification of chest pain patients in the emergency department - BMC Medical Informatics and Decision Making - October 24th, 2025 [October 24th, 2025]
- Mass spectrometry combined with machine learning identifies novel protein signatures as demonstrated with multisystem inflammatory syndrome in... - October 24th, 2025 [October 24th, 2025]
- How Machine Learning Is Shrinking to Fit the Sensor Node - All About Circuits - October 24th, 2025 [October 24th, 2025]
- Machine learning models for mechanical properties prediction of basalt fiber-reinforced concrete incorporating graphical user interface - Nature - October 24th, 2025 [October 24th, 2025]
- Ohio wins national cybersecurity award for fraud solutions using machine learning - Spectrum News NY1 - October 24th, 2025 [October 24th, 2025]
- Itron Partners with Gordian Technologies to Enhance Grid Edge Intelligence with AI and Machine Learning Solutions - Quiver Quantitative - October 24th, 2025 [October 24th, 2025]
- Wearable sensors and machine learning give leg up on better running data - Medical Xpress - October 23rd, 2025 [October 23rd, 2025]
- Geophysical-machine learning tool developed for continuous subsurface geomaterials characterization - Phys.org - October 23rd, 2025 [October 23rd, 2025]
- Ohio wins national cybersecurity award for fraud solutions using machine learning - Spectrum News 1 - October 23rd, 2025 [October 23rd, 2025]
- Machine learning predictions of climate change effects on nearly threatened bird species ( Crithagra xantholaema) habitat in Ethiopia for conservation... - October 23rd, 2025 [October 23rd, 2025]
- A machine learning tool for predicting newly diagnosed osteoporosis in primary healthcare in the Stockholm Region - Nature - October 23rd, 2025 [October 23rd, 2025]
- ECBs New Perspective on Machine Learning in Banking - KPMG - October 23rd, 2025 [October 23rd, 2025]