By Geroge Aine
Twitter is one of the fast-growing social media sites in the world. In Uganda, most of us send various tweets, some defaming people or leaders, with the thinking that our identities and locations can never be traced.
However, Twitter users have to think twice, following the latest discovery by scientists. Apparently, researchers at University College London and the Alan Turing Institute have discovered that metadata on Twitter can be used to identify each user and their location.
Metadata is data that describes other data. Metadata summarizes basic information about data, which can make finding, and working with particular instances of data easier. For example, author, date created and date modified and file size are examples of very basic document metadata.
In the study published by new paper, the scientists used tweets and the associated metadata to identify any user in a group of 10,000 Twitter users with 96.7 percent accuracy. The findings indicate that even when muddling up to 60 percent of the metadata, the model could still pinpoint a single person with more than 95 percent accuracy.
According to Savvas Zannettou, a PhD student at the Cyprus University of Technology, people wrongly assume that because the data is online, they aren’t vulnerable to identification but their research showed that it’s possible to identify with near-precise accuracy an individual using just a handful of pieces of metadata.
The researchers took a corpus of five million Twitter users and ran 14 pieces of metadata from their tweets (including the time the account was created, the time a tweet was published, and the number of favorites, followers and following) through three different machine learning algorithms.
The most efficient at identifying individual accounts with the best accuracy was also one of the most basic machine learning algorithms, say the researchers. It showed that it’s possible to identify with near-precise accuracy an individual using just a handful of pieces of metadata.
It does so by training the model with a known dataset of users, demonstrating that they behave in a certain way on Twitter based on the metadata of their tweets. When the model is run “in the wild”, using new tweets from the same users, it can unpick people’s behavior from metadata, identifying them as a specific individual.