Machine Learning & Research Projects

A z A z

Producing models with better generalization using continued fractions

In data analysis, our goal is often to create models that predict outcomes based on a set of variables. A bank, for instance, may take the current salary, age and location to determine a maximum dollar amount for lending.

The ability to extrapolate is often crucial, that is, the ability to predict well for individuals whose data is outside of the ranges used to build the model. This could mean determining a lending amount in uncommon market circumstances, or in unique combinations of salary and age for an individual. Another real-world problem is noisy data—random fluctuations and errors that can obscure these underlying patterns. A robust model must differentiate signal from noise, allowing for strong predictions even in data inconsistencies.

We investigated using continued fractions to model such problems and demonstrated better generalization performance in these circumstances. Our developed algorithm achieved the most 1st ranks compared to its state-of-the-art competitors on 21 noise-affected datasets. It never ranked worse than third for any dataset while taking at most 4% of the training time of its best-performing competitor.

This work was published in the Proceedings of the Genetic and Evolutionary Computation Conference, the leading conference in genetic and evolutionary computation, and presented in Lisbon, Portugal 2023 to the academic community https://dl.acm.org/doi/abs/10.1145/3583131.3590461ty https://dl.acm.org/doi/abs/10.1145/3583131.3590461

Read More
A z A z

Beyond BMI: Machine learning approaches outperforming state-of-the-art body fat prediction

It all begins with an idea.

Quetelet's index, known today as BMI, has been a standard in health studies since its rise to popularity in the 1970s by Ancel Keys, with its relevance underscored by its presence in millions of Google Scholar entries. However, the BMI is often critiqued for its limited predictive accuracy for body fat percentage (PBF) due to variations across genders and ethnicities. Another criticism has been the use of kg/m² units, which is not three-dimensional like density, nor an actual ‘index’ (dimensionless) measure like that of PBF.

In recent research, sex-specific, dimensionless indices were produced using over 12,000 samples from the National Health and Nutrition Examination Survey (NHANES) and assessed across Mexican-American, European-American and African-American populations. The method was assessed in terms of overall error and classification of obesity at different body fat thresholds for males and females.

Our methods utilised a means of searching for forms of each variable measurement (waist, height, neck, etc.) that were maximally correlated with the percentage of body fat and then utilising those variables in predicting body fat. Limitations were in place such that only feasible models were produced, given that they are used by unspecialized general practitioners who must take measurements within a normal 15-minute appointment.

Our results showed our method categorically outperformed body fat prediction on all state-of-the-art approaches considered in both weighted root mean square error and obesity classification.

As a result, adopting our models can lead to a more accurate prediction of a person’s body fat percentage without needing scans such as dual X-ray absorptiometry. An early and more accurate body fat prediction can lead to earlier and less expensive detection of fat-related health problems.

Read More