The Rise of Big Data Analytics in Cyber Defense  

26 February 2017

Once we want to understand what Big Data is, we start to see data everywhere around us. The data generated by our electronic devices, by companies, and by industries are pervasive and plentiful. The ongoing explosion of data has been spurred not only by the tremendous growth of the Internet and the rapid digitization but also by ubiquitous data generation mechanisms such as the Internet of Things (IoT) which connects massive amounts of devices and sensors that create a tsunami of data. We are currently living in the era of Big Data, where data is generated every single minute. Although Big Data is extremely important to governments and organizations, its meteoric rise has come at an expensive cost and an astronomical increase in the frequency of cyber-attacks.

Several cyber security solutions exist to fight different attacks which pop up daily, relying on a host of firewalls and antivirus programs that prevent malware or potentially unwanted software from attacking their systems. Nevertheless, the generation of Big Data and the emergence of complex social engineering approaches which leverage certain human foibles, make the traditional cyber security solutions obsolete. In order to overcome this issue, an extensive research is being carried out, particularly on the application of Big Data analytics techniques to cybersecurity. In this paper, we will investigate how Big Data analytics can improve cyber security.

Although Big Data is omnipresent today, the concept has remained elusive and comes in several varieties (Amir and Murtaza 2015). The hype surrounding it is certainly a pretty big deal to raise some confusion. In fact, “Big Data” is an evolving term used to refer to the ever-increasing volumes of data -structured and unstructured- that are hard to collect, manage, store, process, and analyze by traditional database technologies. While the term “Big Data” is relatively new, the problem of storing and managing huge amounts of data is age-old. But today’s data explosion has brought the problem into the spotlight and now, more than ever, an efficient solution is needed to handle the ginormous data amounts which are continuously springing up.

Big Data makes its way to the Middle East and North Africa (MENA) region due to the continuous growth in Internet usage reaching 73% in 2015, compared to 44% in 2010 according to research from (Ipsos). Social media also constitute a huge source of data and have become almost synonymous with the Internet in MENA where 88% of the Internet population uses social media daily (Wamda 2015).  In addition, the Internet of Things (IoT) has significantly contributed to the data growth. Latest research from Gartner anticipated $11.5 billion in IT spending by governments across the MENA region this year mainly on Internet of Things (IoT) and smart cities, in an attempt to diversify the economy away from oil. The data growth has also gained pace with the rapid digitization in MENA, where the digital market was likely to reach $35 billion in 2015, according to a study by U.S.-based Strategy&.  Keep in mind that these numbers are increasing exponentially.

As a matter of fact, the Internet's rapid expansion, the wave of digitization, the emergence of IoT and the boom of social media are deeply transforming businesses and governments in the MENA region, but are also giving rise to a massive data explosion which makes the region an attractive target for a larger range of cyber-threats than ever before. It is apparent that traditional security solutions, including the security event and information management (SIEM) applications, become unable to control the real-time Big Data network streams. Huge potentially malicious files that need to be urgently and meticulously analyzed are often reported as the received data volume is beyond the capacity of the current cyber security tools.

However, while "Big Data" seems to lay on the problem space, it can also be a key part of the solution. Indeed, using Big Data analytics to counteract cyber-crimes is becoming an efficient strategy for organizations, which are keen to protect their systems. IDC's research uncovered that cyber security threats are evolving at a jet speed and that governments and companies have to shift from a reactive approach to a proactive approach (Robert 2015) by ensuring that the malicious behavior is detected and the preventive actions are taken to intervene before an attacker can do any damage. Shifting to a proactive mode implies looking into all available information and relying on predictive and analytic techniques to determine the likelihood of a threat, detect the abnormal behavior, follow it, and respond well before the threat becomes an incident. This is where Big Data analytics comes into play.
Indeed, Big Data tools allow collecting, organizing and analyzing massive amounts of data and mining useful information from it in an automated way. This enables the analysts to get a broad view of risks and classify them without the long delays and then react on time. Big Data also allows visualizing cyber attacks by transforming in real time manner the network traffic reports into dynamic patterns which help to immediately pinpoint anomalies. Furthermore, using Big Data in its raw format makes it possible to exploit disparate data not only from a present perspective but also with historical data. Historical data is significant and useful in building normal baselines in order to easily recognize the deviations from the norms (Brandon 2016). In fact, the common indicators can be sometimes missed when presented in real time but they may have a new meaning if viewed over a period of time. With enough historical data, we can predict the likelihood of certain events occurring in the future.

Owing to its power, Big Data analytics has attracted the considerable attention of security researchers who attempt to enhance cyber security. Hadoop, an Apache Software Foundation project and open source software platform for scalable, distributed storage and processing sizeable data sets, is likely one of the most popular tools for Big Data analytics. Hadoop offers rapid processing and reliable analysis of large and heterogeneous data in a distributed computing environment, using a core programming model named MapReduce which enables large-scale distributed computing (Sravanthi 2013).

Recently, numerous cyber security projects based on Big Data analytics, mainly on Hadoop, have shown promise such as the BotCloud project which leverages the potential of MapReduce for analyzing massive amounts of Netflow data to pinpoint malware infected hosts participating in a botnet (François 2011). Niara has also leveraged the compute and storage scale of Hadoop and has relied on new machine learning methods to build an efficient security solution to allow companies to detect cyber threats proactively.

To date, however, Big Data analytics is still in its infancy in the Mena region. Despite the growing level of awareness and certain related events, the region is a relatively slow adopter of Big Data and it is still lagging behind compared to developed countries. In fact, there is a lack of trained and deep technical expertise in Big Data practitioners which is highly required to properly configure the Big Data analysis tools. The inclusion of new technologies is often inhibited by slow law making processes and outdated laws. On the brighter side, there is a growing concern about cyber security in MENA which turned into a significant increase in IT security spending. In order to strengthen cybersecurity, all companies and governments in the region should be really aware of the dramatic explosion of data which leads to heightened exposure to cyber-attacks, making the current security solutions obsolete and unreliable.

They need to make security their top priority and always expect the unexpected. Likewise, they should realize the great potential of Big Data analytics to proactively mitigate cyber risks. Therefore, security specialists should collaborate with data scientists in order to build an efficient security analytics solution. It is important to note that attending Big Data training and workshops is highly recommended to experiment with the different new techniques and to deepen knowledge in Big Data.

In conclusion, I would like to reiterate clearly that we are all being overwhelmed by Big Data which increases dramatically and has brought with it an exponential growth in cyber threats. The problem is alarming and universal. Advanced research founds out that the future of cybersecurity lies in the massive amounts of data at disposal as well as the strength of the new technologies to react immediately. At the present time, Big Data analytics seems to be a great weapon to fight against cyber criminals.

References and further readings:

Amir G. and Murtaza H. (2015). Beyond the hype: Big Data concepts, methods, and analytics. International Journal of Information Management, vol. 35, no 2, p. 137-144.

Brandon L (2016). How Big Data is Advancing Cybersecurity. Security Insight. 18 February. Available at [accessed 03 February 2017].

François J, et al (2011). BotCloud: Detecting Botnets Using MapReduce. In IEEE International Workshop on Information Forensics and Security, pages 1–6.

Mostafa E (2015). How big is social media in MENA? Wamda, 14 May. Available at [accessed 03 February 2017].

Robert E (2015). Big Data and Predictive Analytics: On the cybersecurity Front Linell, IDC Whitepaper, February.

Sravanthi A, et al (2013). Building machine learning algorithms on Hadoop for Big Data. International Journal of Engineering and Technology. February, 3(2):143–7.

Ms. Nouha Othman is a Ph.D. student in computer science at ISG Tunis.  She has completed the iGmena-DiploFoundation Internet governance capacity building course in August 2016.