Using ML To Predict Whether Medium Articles Will Go Viral

Is this article going to get a lot of claps?
It’s the age-old question that Medium writers have been asking for millennia (probably not). Trust me, we’ve tried everything —writing between 6 to 7 min reads, getting into big publications, and of course, even putting up those flashy header images to catch your attention. That’s right — us writers want to get views and claps, so that’s what we try optimizing for.
Despite all that effort, a lot of the time, even following every guideline to reach Medium fame doesn’t work out for most people, since they don’t know when to apply certain principles depending on what they’re writing. To top that off, success on Medium also needs a bit of luck. Two identical articles might end up gathering a completely different number of stats just because one managed to publish at the right time.
In short, unless you’re loaded with cash to spend on running ads, there’s a lot of factors that play into gaining fame on Medium, or any social platform in general. Especially if you’re a new writer, chances are that you’re going to be completely overwhelmed.
That includes me.
Even now, after writing for about six months (which isn’t really that long), I still find myself confused about how I should write and promote my content to attract more people. It used to feel like Medium was like a black box when I started writing, and I can assure you, nothing’s changed since then.
Just when I started to think I was going to keep getting the short end of the Medium stick (haha), an idea struck me. Machine Learning. Using Machine Learning (ML), I could analyze datasets of articles to find the key to getting Medium famous. I could already imagine where I where I would go on vacation after becoming a professional writer. If there was a holy grail, I knew I was going to find it.
Game on!
Cracking The Medium Code With ML: The Data
Of course, I’m not going to share the trade-secrets of a multinational company like Medium to you without getting something in return. But, I’m generous, so all I ask is that you read the step-by-step process I followed through with (or you could just scroll to the end and I wouldn’t know). Don’t worry though, there’s no code involved. So, let’s get right to it!
As you might know, ML is nothing without data, so we’re going to need to find a high-quality source of it before we even think of training a model. Don’t worry, though, I already found and cleaned one up for you to download:
The dataset I chose was a .CSV (Comma-Separated Values) file from Kaggle attached with hundreds of articles that were web-scraped (automatically collected using a programmed application).
I’ve attached the end-product of extensively cleaning that original dataset
by formatting it, removing duplicate rows, and adding a new column that defined whether the articles was viral or not. I assigned that column as a Boolean type, which meant it could take the form of two values — true or false. For the purposes of this scenario, I categorized any article with over 1000 claps as viral (True), and those with less than that as not viral (False).
Now, the only thing standing in your way from starting to train a machine learning model is downloading the data from Google Sheets:
- Select The ‘File’ Section On The Top Navigation Bar
- Click The ‘Download’ Button In The Dropdown Menu
- Choose The .CSV format to Save The File
If you’re here, great job! You’ve basically done half the work it takes to complete the process. Now, this is where the fun begins — training a model!
Cracking The Medium Code With ML: The Model
On we go! Time to create a model that can accurately analyze the dataset you’ve downloaded.
Remember how I said that you wouldn’t need to use a single line of code in this project? To build a working ML model like this, we’re going to be using an application called MATLAB. It’s a pretty popular tool used when prototyping models, and it’s probably the most versatile one.
What’s unique about MATLAB’s platform is that, even without any code, you’re not going to be compromising on the level of customization you would have gotten if you were coding it from scratch — you’ll see that as you go on.
If you don’t already have one, start by going to the MATLAB online website and registering for your account. That seems pretty self-explanatory so I’ll leave that one to you. Anyway, after you’re done, go ahead and open up the application. This is the workspace you’re going to be using for this project.
Pretty cool, right? Feel free to play around and come back when you’re done.
Don’t worry, I’ll wait…
So, now that you’re done exploring, start by clicking on the ‘Upload’ button on the top bar of your workspace, and importing the dataset you downloaded earlier. When complete, you should see a slot in your workspace folder with its file name, which should look something like this:

By doing that, you’ve uploaded your dataset to MATLAB. Double-click the file slot that was just created, and that should open up a .CSV preview your selection, and a new tab called ‘Import’ should have come up. This is where we make all our changes before training our models with it.
Our dataset comes with header titles which we don’t need, so, in the Import tab, we can specify a selection range that covers all our data, but not the titles, so type in the range of A2:G321 to do that. Our last issue is that our field ‘author’ is a categorical feature instead of the text feature it’s supposed to
be. Click on the dropdown menu and change the column to the type ‘Text’. Finally, choose the ‘Import Selection’ button to finalize the edits you made:

After importing your edited dataset into MATLAB to train your model, you’re left with a table that goes by the same name as the name of your dataset. To start browsing models and training them with your data, navigate to the ‘Apps’ tab in your menu, and then go ahead to choose the ‘Classification Learner’ module.
The Classification Learner module is an extension to MATLAB, and allows you to create any model that separate data into categories. By clicking on the app’s button, a new window should have popped up that automatically detects your saved data. The window should also prompt you to start a ‘New Session’, and that should redirect you to yet another windows. Don’t worry, there’s more of where that came from:

This new window allows you to select the major features that all your models are going to need in order to function. This includes things like the response variable (what you’re trying to predict), as well as how the model is validate (tested for accuracy and other metrics).
For this scenario, I’ve set the response variable to Virality, which is a Boolean variable that we’re trying to predict using all our other features using the models we train. I’ve also set the evaluation of our models to cross-validation with three folds (partitions). This works especially well for smaller datasets, but feel free to tweak things a little bit if that’s your thing. Finally, uncheck the ‘claps’ field, because it’ll be pretty obvious for the model if you don’t.
Other than that, everything is at its default setting, so now, we’re ready to start training models. Press the ‘Start Session’ button to begin:

A session is an environment where you can create and test the wide variety of models available in MATLAB, and this is where I encourage you to start tinkering. Adjust some numbers, try different ML models, plot some graphs, and see what you get. The model that I’ll be teaching you train was the best performing model after consecutively training all the models at once.

Now that you’ve started your session (and hopefully played around a bit), we can start training. For this dataset, the best performing model turned out to be the ‘Gaussian Naive Bayes’ learning technique at the very bottom of your dropdown menu. Once you’ve selected that, go ahead and press the ‘Train Model’ button.
If everything went smoothly, I want to say congratulations! You’ve built the ideal model in MATLAB capable of launching you into Medium success! Oh, and to top it all off, you did all that without ever using a line of code! Isn’t that insane?! Now, you can see your model’s results too:

Who would’ve thought that a day would come where you wouldn't need hundreds of lines of code to build a model. We achieved an accuracy of 72.5% — pretty good for something pretty unpredictable like Medium virality. In ML, now it’s about working smart, not hard.
Take me for example — even though I know how to code ML models, MATLAB’s still my preferred application to prototype them. You don’t need to put in nearly as much effort as you might first think if you want to build models and enter into AI.
Anyways, we’re forgetting that we’ve both cracked the code to Medium — how incredible is that? Now we can use this platform as a (pretty inaccurate) tool to determine its future popularity whenever we’re writing something new. I don’t know about you, but I can’t help myself — I’m trying it out right now with this article.
I just have to export the model through the Export Model button, and then use the pre-built PredictFcn( )and input it with all the features that the model requires (in order), like so:
yfit = predictFcn(name, link, reading time, content);
Well the, let’s see what the model has in store for me. Oh man, I’m just so excited to see them…

Well, it’s too late to restart.