Resource-Efficient Engineering: Teaching Machines to Do More with Less Memory

Resource-Efficient Engineering: Teaching Machines to Do More with Less Memory

By Abigail Tarekegn Assefa

AIEngineering

Introduction

When AI/ML first started booming, most research and deployment happened in data centers, and a connection to the cloud was mandatory. Over the last decade, machine learning began moving into devices themselves. Focus shifted toward applying AI in small spaces with limited resources, giving birth to a study field (some call it a community) called TinyML. Microcontrollers are as tiny as a ladybug or even smaller, yet the goal of TinyML is to run machine learning workloads directly on them, at the source of the information. In smart agriculture, for example, integrating AI into soil moisture sensors could give farmers real-time crop recommendations without any internet connection. The problem? These devices often have less than 256 kilobytes of RAM, meaning a standard machine learning model would never fit. This article investigates two techniques, quantization and feature selection, that shrink a regression model's memory footprint while keeping accuracy intact.

What Is Regression and Why Does It Run on Everything?

Machine learning models learn from data, but not all of them learn the same way. Supervised learning is a type of machine learning in which a model is trained on labeled data, meaning for every input, there is a known correct output. Under supervised learning, we have regression and classification. Cassification focuses on which category the input belongs to, while regression associates them with real numbers. Predicting a patient's disease progression, estimating crop yield, or forecasting energy consumption are some instances of regression. It is simple, interpretable, and computationally light, making it a natural candidate for deployment on constrained devices. Linear regression is the simplest form of regression. It works by assigning a weight to each input feature. The more features a model has, the more weights it must store, and the more memory it consumes. Reducing the number of features a model relies on and shrinking the size of the numbers it stores are necessary optimizations. This is why linear regression, despite being one of the oldest machine learning algorithms, is still at the center of TinyML.

Squeezing Intelligence: Quantization and Feature Selection

Even after choosing a lightweight algorithm like linear regression, two problems remain: the model may still rely on too many input features, and the numbers it stores may take up more space than necessary. Two techniques directly address these problems: feature selection and quantization. Features are simply the information you feed into the model so it can make a prediction (Eg, in a diabetes database, you find age, sex, BMI, etc.) Feature selection is the art of selecting the most informative features and discarding the irrelevant ones. Quantization tackles a different part of the same problem. By default, machine learning models store their weights as 64-bit floating-point(FP) numbers. It is a highly precise format, but it consumes significant memory. Quantization deliberately reduces precision. A 32-bit FP uses half the memory, and an integer uses one-eighth. The tradeoff is a potential loss in accuracy, since less precise numbers carry less information. An important question one may ask at this point is “how much can you compress before the model breaks?”

An Experiment and its results

To investigate this question, I used the Scikit-learn Python library to train a linear regression model on the Diabetes dataset, containing measurements from 442 patients across 10 features. The baseline of the experiment included all features and the default 64-bit float. After feature selection, the model kept only the 5 most predictive features and discarded the rest. This, on its own, cuts memory usage by almost half. The marginal accuracy R² fell from 0.4526 to 0.4382, a difference of just 1.4%. Float 32 Quantization brought the same memory usage cut but with zero accuracy drop. Combining feature selection with float 32 quantization produced the most balanced result. Memory was reduced to 7,192 bytes, which is a 75% reduction from the baseline, and accuracy remained largely intact. This combination represents the sweet spot for memory-constrained deployment. The experiment also revealed a hard limit. int8 quantization produced the smallest model at 1,917 bytes, but the model collapsed entirely. It performs worse than simply guessing the average every time. Compression, it turns out, has a breaking point. For linear regression, float32 is the boundary.

https://colab.research.google.com/drive/1Yc5W9GB1fDUX7fU-lRHzewccmK_f1DO1?usp=sharing

CONCLUSION

The world is moving toward TinyML. Researchers are focusing on bringing machine learning directly to devices instead of waiting for a cloud connection. Microcontrollers, sensors, and affordable smartphones are becoming the new frontier of artificial intelligence. The problem, however, is that these devices are severely resource-constrained, and standard machine learning models are too large to run on them. This study presents two techniques that directly address this: feature selection, which eliminates unnecessary input features, and quantization, which reduces the precision of the numbers a model stores. The significance of solving this problem cannot be overstated. AI that runs offline, locally, and without internet dependency can reach farmers in rural Ethiopia, students in under-connected schools, and clinics without reliable infrastructure. The experiment revealed that you can only squeeze so much juice from an orange. Compression has its breaking point. Float32 is safe by cutting memory in half with zero accuracy loss. On the contrary, int8 quantization breaks the model entirely, producing results worse than random guessing. A developer blindly applying maximum compression could unknowingly ship a useless model. I recommend adopting feature selection combined with float32 quantization as a standard first step in any TinyML deployment. Together, they reduce memory by 75% while preserving accuracy. Future research should explore whether this boundary shifts for more complex models like neural networks, and whether these results hold when tested directly on real microcontroller hardware.

REFERENCES

Efron, Bradley, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. “Least Angle Regression.” The Annals of Statistics, 2004. https://hastie.su.domains/Papers/LARS/LeastAngle_2002.pdf Pedregosa, F., et al. “Scikit-learn: Machine Learning in Python.” Journal of Machine Learning Research, 2011. https://jmlr.org/papers/v12/pedregosa11a.html