Put our heart attack prediction AI loose with “no code” tools

Ahhh, easy button!
Enlarge / Ahhh, easy button!

Aurich Lawson | beautiful pictures

This is the second episode of our exploration of “codeless” machine learning. In our first postWe posed our problem set and discussed the data we would use to test whether a highly automated ML tool designed for business analysts could return results. cost effective close to the quality of more code-intensive methods a bit more related to human-based data science.

If you haven’t read that article, you should go back and at least skimming. If you’re ready, review what we would do with our heart attack data under “normal” machine learning conditions (i.e. more code), then throw it all away and hit the “easy” button. easy”.

As discussed previously, we are working with a set of cardiovascular health data taken from a study at the Cleveland Clinic and the Hungarian Heart Institute in Budapest (as well as other places where data are available). we removed for quality reasons). All that data is available in an archive we created on GitHub, but its original form is part of a data warehouse maintained for University of California-Irvine machine learning projects. We are using two versions of the dataset: a smaller, more complete version that includes 303 patient records from the Cleveland Clinic, and a larger database (597 patients) that combines data of the Hungarian Institute but missing two of the data types from the smaller set.

The two missing fields in the Hungarian data seem likely to be a consequence, but the Cleveland Clinic data itself might be too small a set for some ML applications, so we’ll try both. to include our facilities.


With multiple datasets in hand for training and testing, it’s time to start mulling. If we did this the way data scientists usually do (and the way we tried last year), we would do the following:

  1. Split data into training and test sets
  2. Use training data with existing algorithm type to create model
  3. Validate the model with a test suite to check its accuracy

We can all do that by coding it in a Jupyter notebook and tweaking the model until we get to acceptable accuracy (as we did last year, in a perpetual cycle). But instead, we’ll try two different approaches first:

  • A “no code” approach using AWS SageMaker Canvas: Canvas takes the aggregate data, automatically splits the data into training and testing, and generates a predictive algorithm
  • Another “no code/low code” approach using SageMaker Jumpstart and AutoPilot: AutoML is a big part of what’s behind Canvas; it evaluates the data and tries several different kinds of algorithms to determine what is the best

Once done, we’ll use one of the many battle-tested ML approaches data scientists have tried with this dataset, some of which have proven accurate. more than 90% accurate.

The end product of these approaches should be an algorithm that we can use to run a predictive query against the data points. But the real outcome will be the trade-offs of each approach in terms of completion time, accuracy, and computational time cost. (In our last test, AutoPilot literally blew through our entire AWS compute credit budget.)

Source link


News5s: Update the world's latest breaking news online of the day, breaking news, politics, society today, international mainstream news .Updated news 24/7: Entertainment, the World everyday world. Hot news, images, video clips that are updated quickly and reliably

Related Articles

Back to top button