Free Essays Security

Forensic Analysis of WhatsApp on Android

Abstract

WhatsApp is the most popular messaging applications available for smartphones in the world. It boasts 1.3 billion active monthly users, 1 billion of which are active every single day. While WhatsApp began life as a social-messaging platform, it is also being used by criminals to communicate with each other, evidenced by the arrest of 39 people in April 2017, as part of a Europol investigation into the sharing of child pornography via WhatsApp. WhatsApp uses end-to-end encryption, therefore the records within the WhatsApp databases extracted from the smartphones of criminals provides vital digital evidence in the prosecution of criminal cases. This paper describes several forensically significant artefacts in the WhatsApp message database on Android and provides references for digital investigators in their work in the examination of WhatsApp.

Keywords

WhatsApp, Android Forensics, Smartphone Forensics, SQLite

1.    Introduction

WhatsApp is the most popular messaging applications available for smartphones in the world. It boasts 1.3 billion active monthly users, 1 billion of which are active every single day.[1] WhatsApp users are sending 55 billion messages per day, including 4.5 billion photos, and 1 billion videos. WhatsApp was acquired by Facebook in February 2014 for US$19.2 billion, standing alongside its own messaging application, Facebook Messenger. It is a cross-platform application, available on the Android, iPhone, Windows Phone, BlackBerry and Symbian platforms. While WhatsApp began life as a social-messaging platform, it is being used by criminals to communicate with each other, evidenced by the arrest of 39 people in April 2017[2], as part of a Europol investigation into the sharing of child pornography via WhatsApp.

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Find out more

The purpose of this paper is to forensically examine significant artefacts present in the message database used by the WhatsApp messaging application on Android 7, codenamed Nougat. WhatsApp advertises itself as “Security by Default”[3] as, since April 2016,[4] it uses end-to-end encryption by default for all its message transfers. However, this does not extend to the data at rest, when it is stored on the Android handset, where it is stored in an unencrypted SQLite database. As this paper shows, forensic examination of this database allows message conversations to be reconstructed easily, and thumbnails of shared images can be reassembled even when the actual image has been deleted from the file system of the Android smartphone in question.

As demonstrated in the following images, two of the basic functions of the WhatsApp application are interactions via instant messages (Fig. 1[5]) as well as two-way video or voice calling (Fig. 2[6]). Messages can hold various types of content, including text, audio, video, location, and even documents can be shared as attachments. This paper explores how to acquire the WhatsApp message database from a smartphone, and how to decode the encrypted database backup. It then goes through the analysis of this database and provides proof of concept code to replay message conversations in a human readable format. Finally, to validate the acquisition and analysis, experimental tests were carried out on various Android smartphones and WhatsApp versions.

Figure 1 – WhatsApp message interface

Figure 2 – WhatsApp 2 way call

2.    Related Works

To begin this research, it was a requirement to have a knowledge of the current trends in forensic analysis of Android devices. Older versions of Android (up to version 2.3, codename Gingerbread) used a file system called YAFFS2. Since then, Android has switched to using the ext-4 file system primarily because of its multithreading support. Lessard & Keppler (2010) wrote one of the first papers to deal with the issues involved in forensically examining a mobile device running the Android operating system. However, this paper was written in a time when YAFFS2 was the file system of choice on Android systems. The tools that they document, for example, Android Debug Bridge (ADB) and dd, are still in use today albeit in updated forms. Their paper initially describes the physical forensic examination of SD cards, the challenges involved in accessing the root partition of the physical device and extracting the relevant data from it. It then goes on to discuss the analysis of the extracted data, introducing tools such as Access Data’s Forensic Analysis Toolkit and compares it with logical analysis of the database files used by apps as well as Cellebrite UFED (Universal Forensic Extraction Device). However, these tools were not available for this research.

Hoog (2011) in his book, presents background knowledge on how data is stored by Android applications on the device file system, for example, the use of SQLite databases. He also provides information on the strategies and specific utilities that can be used to forensically examine an Android smartphone. Mahajan et al. (2012)bring the focus of their research onto the forensic analysis of Instant Messaging applications on Android, including WhatsApp. However, they limit their research to the acquisition and analysis by Cellebrite UFED.

Thakur (2013) wrote her thesis on the forensic analysis of WhatsApp on Android smartphones, focussing primarily on memory analysis. Such analysis was out of scope for this paper, as it focusses on artefacts from the file system. Sahu (2014) and Anglano (2014) further the work on the forensic analysis of WhatsApp on Android smartphones. They discuss the forensic artefacts in the message store database, and how the database can be reconstructed into a human readable format. Anglano especially analyses the msgstore.db database file, examining each table, and determining what forensic evidence, if any, can be construed from the database.

3.    Material and Methods

The primary focus of this paper is to analyse the database used by WhatsApp to store its data. This database file was found to be stored in /data/data/ partition, however this is inaccessible to the end user unless they are using a rooted[7] device. Forensic examination tools such as Cellebrite’s UFED can access this partition but that is outside the scope of this paper.

3.1  Data Acquisition

The current version of Android Studio (version 2.3) was installed on a Windows 10 PC, and a virtual Google Pixel XL smartphone device was created using Android Virtual Device (AVD), the emulator that ships with Android Studio. As the AVD emulator does not include the Google Play Store, the WhatsApp APK[8] (version 2.17.130) was downloaded from WhatsApp’s website[9] and installed onto the emulated device using drag and drop.

The first step to activate WhatsApp is to associate it with a phone number (Fig. 3). As the AVD emulator does not have a SIM card and thus does not have the ability to send/receive SMS messages, WhatsApp on the emulator was activated using the phone number of an actual SIM card and used the security code received via SMS to complete the activation. Once that was complete, I was able to verify that WhatsApp was installed correctly by exchanging several messages of various types between WhatsApp on a physical smartphone and the emulated smartphone (Fig. 4).

Figure 3 – WhatsApp activation screen

Figure 4 – Activated WhatsApp

Prior research confirms that WhatsApp stores its message data in an SQLite database located in the file /data/data/com.whatsapp/databases/msgstore.db. However, it appears that the Android Device Monitor that ships with Android Studio 2.3 does not allow access to the /data/data partition when the emulator is running Android 7.x, but it does allow access when emulating earlier versions of Android. A fix is reportedly coming in version 2.4 of Android Studio.[10]

It was noted that, by default, WhatsApp creates a backup of msgstore.db every night at 02:00, and encrypts it in a user accessible folder /sdcard/WhatsApp/Databases/. This file is encrypted and named msgstore.db.crypt12 (Fig. 5). WhatsApp also stores the previous 8 backups using the following naming convention: msgstore-YYYY-MM-DD.1.db.crypt12 giving 9 encrypted backups in total. A copy of the most recent backup file was copied to a Windows 10 PC for analysis and possible decryption.

Figure 5 – WhatsApp encrypted backups in user accessible storage

3.2  Decryption

Research reveals that crypt12 is a modified version of Spongy Castle – a cryptography API library for Android.[11] A python script[12] was located which purported to decrypt WhatsApp databases when given the decryption key. Previous research, Thakur (2013), indicated that all WhatsApp backups were encrypted with the same key (346a23652a46392b4d73257c67317e352e3372482177652c), however, this no longer seems to be the case, as when this key was used to decrypt a backup it failed.

Research on the internet gave the location of the decryption key as /data/data/com.whatsapp/files/key, but, as before this location is inaccessible without some specialist forensic tools on non-rooted devices. WhatsApp’s own FAQ[13] gives steps to copy WhatsApp Messenger from one phone to another, as follows: create a manual backup; copy /sdcard/WhatsApp from the original smartphone to a new smartphone; install WhatsApp on the new smartphone. This procedure lends to the belief that the encryption/decryption key is somehow tied to the phone number.

Given that, as discovered, the Android Device Monitor in Android Studio 2.3 does not allow access to the /data/data partition on Android version 7, it was decided to test out the move database files advice given by WhatsApp on how to move databases between smartphones. An Android version 6 emulated smartphone was created, and the WhatsApp application was installed, and activated using the same phone number that was used to activate WhatsApp on the Android 7 smartphone, without copying /sdcard/WhatsApp from the original smartphone. It was noted that Android Device Manager was able to access the /data/data partition on this emulated smartphone, so the encryption key /data/data/com.whatsapp/files/key was extracted to the local computer. The python script from earlier was used to decrypt this database, using the key from the Android 6 emulated smartphone. This appeared to be successful as, what seemed to be a decrypted database file, msgstore.db, was generated.

3.3  Analysis

The msgstore.db file was examined using DB Browser for SQLite[14], a browser for SQLite database files. It was immediately obvious that the database schema had changed significantly since Anglano (2014) as the database now contained twenty-one tables, whereas it had 3 tables in 2014. Further analysis of the database shows that the three tables that existed when Anglano (2014) carried out his research still exist, but they contain many more fields. A listing and a description of the tables containing evidential information follows (Table 1).

Table Name Description
chat_list List of conversations
frequents List of frequently contacted users, some users appear twice
group_participants List of WhatsApp groups, along with their participants
group_participants_history List of WhatsApp groups the user was previously a member of
media_refs References to media stored in the user’s filesystem
media_streaming_sidecar Unknown as to evidential data
message_thumbnails Storage location of media thumbnails since April 2017
messages List of messages sent and received
messages_edits Empty table, possibly for future use
messages_fts Unknown as to evidential data
messages_fts_contents Unknown as to evidential data
messages_fts_segdir Unknown as to evidential data
messages_fts_segments Unknown as to evidential data
messages_links List of messages containing URL links
messages_quotes List of messages containing quotes of other messages
messages_vcards Empty table, possibly for future use
messages_vcards_jids Empty table, possibly for future use
props Internal WhatsApp table
receipts Listing of message receipts for messages sent to groups
sqlite_sequence Internal WhatsApp table
status_list Relating to WhatsApp Status, where broadcast messages may be sent to all contacts, and disappear after 24 hours

Table 1 – Table names and descriptions from msgstore.db

The tables that contain the most evidential data and those that the rest of this paper concentrates on are chat_listmessages, and message_thumbnails. Also of note are the group_participants and the group_participants_history tables, but the contents of those can also be garnered from the messages table. A complete description of the three pertinent tables identified follows.

3.3.1       The chat list table

The chat_list table contains the list of all the conversations the user was a member of. A conversation is defined as a series of messages, of various types, exchanged between the user and either one other user, or a group of users. Each entry in the table contains a record of each conversation, from the creation time and the subject in the case of group conversations to the number of unread messages. Table 2 below describes the chat_list table with each field, and Figure 6 contains an example screenshot of the database using DB Browser for SQLite.

Field Name Description
_id Key
key_remote_jid Conversation ID
message_table_jid References _id column in messages table – points to last message received in conversation
subject Group Only – Subject of the Group
creation Group Only – Creation Time (milliseconds since epoch) of the Group
last_read_message_table_id Same as message_table_jid – points to last read message in conversation
last_read_receipt_sent_message_table_id Same as message_table_jid – points to last read message in conversation where a read receipt has been sent
archived Archived bit – https://www.whatsapp.com/faq/en/bb/20888029 – 1 if archived, NULL otherwise
sort_timestamp An epoch timestamp in milliseconds, used for sorting
mod_tag Unknown
gen Unknown – all NULL
my_messages Signifies whether you have sent a message in this conversation -1 if so, NULL if not
plaintext_disabled Unknown – all 1
last_message_table_id Appears to be the same as message_table_jid
unseen_message_count Count of unseen messages in conversation
unseen_missed_calls_count Count of unseen voice calls in conversation
unseen_row_count Unknown, not the same as unseen_message_count
vcard_ui_dismissed Possibly whether the vCard UI has been dismissed or not.

Table 2 – Description of the chat_list table

Figure 6 – The chat_list table, contact information obfuscated

The key_remote_jid field aligns to a conversation ID, where in the case of a one to one conversation is in the format PHONENUMBER@s.whatsapp.net. In the case of a group conversation, the format of the key_remote_jid is in the format PHONENUMBER-TIMESTAMP@g.us. The phone number being in the international dialling format without the leading zeros, e.g. 353881234567, in the case of a group conversation, the group creator, and the timestamp is the number of seconds since epoch[15] and corresponds to the time that the group was created.

3.3.2       The messages table

The messages table is the primary table used by WhatsApp to store the actual content of the conversations referenced by the chat_list table. Each record in the table contains a message sent or received by the user. A description of each field in the messages table follows in Table 3.It should be noted that some of the fields only contain NULL values while having descriptive field names. It is assumed that these fields are reserved for future use by the WhatsApp messenger application.

Field Name Description
_id Key ID
key_remote_jid Individual – NUMBER@s.whatsapp.net | Group – CreatorNUMBER-GID@g.us – GID is timestamp in seconds
key_from_me Message direction – 0 is to me, 1 is from me
key_id Message ID
status Meaning unknown, possibly delivery status.
needs_push 1 when message has not yet sent, otherwise 0
data Message content, when no attachment, except when attachment type is a vCard, when it contains the actual vCard
timestamp Epoch time (milliseconds) of the time the message was sent
media_url URL of attachment – Encoded for sent messages after Jan 2016, and for received after March 2016
media_mime_type MIME type of attachment, when attachment type is image, audio or video
media_wa_type Type of media in attachment: 1=image, 2=audio, 3=video, 4=vCard, 5=location, 8=call, 9=document, 13=video also? (0 = no attachment)
media_size Size of the attachment in bytes
media_name The name of the attachment when it’s an image. Preview of webpage when data contains a URL
media_caption Text caption of the attachment. The “title” of the web page when data contains a URL.
media_hash Appears to be base 64 hash of the attachment when the attachment is an image, audio or a video
media_duration Duration of the attachment when the attachment is an audio, video or a call
origin Unknown, vast majority is 0, ~ 10% is 1 and a very small number is 2
latitude Latitude when the message attachment is a location
longitude Longitude when the message attachment is a location
thumb_image Possible thumbnail of attached image – untried but this may decode – http://stackoverflow.com/questions/24828383/how-to-read-the-thumb-image-column-in-the-whatsapp-messengers-sqlite3-database
remote_resource Message sender (except self) in the case of groups (see key_remote_jid above)
received_timestamp Epoch time (milliseconds) of the time the message was received
send_timestamp Unknown as yet, everything appears to be -1
receipt_server_timestamp Epoch time (milliseconds) of the time the message was received by WhatsApp servers (single grey tick) (own messages only)
receipt_device_timestamp Epoch time (milliseconds) of the time the message was received by the recipient (or all recipients in the case of a group) (double grey tick) (own messages only)
read_device_timestamp Epoch time (milliseconds) of the time the message was read by the recipient (or all recipients in the case of a group) (double blue tick) (own messages only)
played_device_timestamp NULL, of unknown evidential value
raw_data NULL, of unknown evidential value
recipient_count Number of members in group when message was sent
participant_hash Unknown
starred NULL, of unknown evidential value
quoted_row_id row id (message_quotes table) of the message quoted
mentioned_jids NULL, of unknown evidential value
multicast_id NULL, of unknown evidential value
edit_version NULL – possibly reserved for the future when WhatsApp will allow you to edit sent messages (c.f. messages_edits table) – http://www.telegraph.co.uk/technology/2017/01/30/whatsapp-let-users-edit-recall-sent-messages/
media_enc_hash NULL

Table 3 – Description of the messages table

Of primary note here, is the media_wa_type field. WhatsApp allows the user to send and receive different types of messages, for example, images or videos. The media_wa_type field defines the type of message that the record contains, with the following cipher: 0 = text; 1 = image; 2 = audio; 3 = video; 4 = contact / vCard; 5 = location; 8 = audio / video call; 9 = document. Where media_wa_type is 1, 3 or 5 there is also a thumbnail in binary format stored in the database. This location changed during the research of this paper: prior to the 21st of April 2017, the thumbnail was stored in the data field of the messages table; but since then the thumbnail has moved to the message_thumbnails table as described in the next section. The actual media exchanged in the case of an image or video is stored in the user accessible directory, /sdcard/WhatsApp/Media once downloaded, either automatically or by the user.

Find out how UKEssays.com can help you!

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our services

3.3.3       The message_thumbnails table

Prior to April 2017, the message_thumbnails table existed but was blank. After the 21st of April, this table became the storage location of thumbnails of images, videos or Google map locations that have been sent or received by the WhatsApp location. The resulting thumbnail is a maximum 100×100 representation of the actual image, video or Google map location. A description of the message_thumbnails table follows in Table 4.

Field Name Description
thumbnail Blob – the first 4 bytes are “ff d8 ff e0” i.e. the file signature hash of a jpg
timestamp Date time in epoch
key_remote_jid Sender / Receiver ID
key_from_me Same as key_from_me in messages table – 0 is to me, 1 is from me
key_id 30 char hex string

Table 4 – Description of the message_thumbnails table

The thumbnail field here is a blob type, containing a binary image. During examination of the data contained in this field, it was noted that the first four bytes of each record were ff d8 ff e0 which is the file signature of the JPEG file format.

3.4  Conversation Reconstruction

While there are existing scripts[16] to convert the content of a WhatsApp msgstore.db table to HTML, to reconstruct the chat history, it was noted that these are several years old and did not take account of the updates made to the WhatsApp application since. For example, the change in the storage location used by thumbnails, as mentioned in section 3.3.2. It was therefore decided to create some proof of concept code in Python to take account of these changes. This proof of concept code is contained in the attachment examine_msgstore.py and a screenshot of the output of this script is contained in Figure 7.

Figure 7 – Example output of the extract_msgstore.py script, some contact details obfuscated

4.    Experiments and Results

As part of the research for this paper, it was necessary to validate the findings across multiple devices. Five different test devices were procured, each running Android 7, and the WhatsApp application was installed on each. These five devices were given to five different users, and they were instructed to communicate via WhatsApp messenger, with each other, and with third parties from their own contacts. The users were instructed to test with all the message types as described in Section 3.3.2. After a period of two weeks, the five devices were examined and the WhatsApp database backup was extracted from each device. Once a copy of each database was acquired, WhatsApp was installed and activated using the phone numbers of each of the five devices on five different emulators. The key file was then copied from the emulator to a local PC, and an attempt was made to decrypt the database backups, using the method outlined in Section 3.2.

Each of the five databases was opened in DB Browser for SQLite and the database schema was examined to see if they matched the earlier database acquired from the Android emulator. The databases were then parsed using the examine_msgstore.py proof of concept script to see if the conversations matched those in the WhatsApp messenger applications. The results of the experiments of data acquisition, decryption, schema examination and conversation reconstruction are displayed in Table 5.

Device Android ver. Acquisition Decryption Schema Conversation
Samsung Galaxy S7 7.0
Samsung Galaxy S8 7.0
HTC 10 7.0
OnePlus 3 7.1.1
Google Nexus 6P 7.1.1
Emulated Google Pixel XL 7.1.1

Table 5 – Experimental results

5.    Conclusions and Further Research

In this paper, several functions relating to the forensic analysis of WhatsApp on Android were examined. These were: data acquisition; decryption; analysis; and conversation replay. It was found that that the research previously carried out by Anglano (2014) is still valid, with some slight changes, namely that the storage of message thumbnails has changed since his research was carried out. The empty tables in the WhatsApp database are indicative of upcoming changes within the application, especially with reference to the ability to edit messages, the tracking of such edits, and the possible movement of vCards out of the messages table into their own table. As both WhatsApp, and less frequently Android, are constantly updated, it would be worthwhile to repeat these experiments on a regular basis to ensure that they are still relevant.

6.    Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

7.    References

  • Anglano, C. Forensic Analysis of WhatsApp Messenger on Android Smartphones. Digital Investigation Journal, 11(3) pp. 201-213. 2014
  • Hoog, A. Android Forensics: Investigation, Analysis, and Mobile Security for Google Android. Syngress Publishing. 2011
  • Lessard, J., Kessler, G.C. Android Forensics: Simplifying Cell Phone Examinations. Small Scale Digital Device Forensics Journal, 4(1), pp. 1-12. 2010
  • Mahajan, A., Dahiya, D.S., Sanghvi, H.P. Forensic Analysis of Instant Messenger Applications on Android Devices. International Journal of Computer Applications. 68 (8) pp 38-44 (2013)
  • Sahu, S. An Analysis of WhatsApp Forensics in Android Smartphones. International Journal of Engineering Research. 3 (5), 349-350. 2014
  • Thakur, NS. Forensic Analysis of WhatsApp on Android Smartphones. University of New Orleans Theses and Dissertations. Paper 1706. 2013

[1] https://blog.whatsapp.com/10000631/Connecting-One-Billion-Users-Every-Day

[2] https://www.europol.europa.eu/newsroom/news/global-action-tackles-distribution-of-child-sexual-exploitation-images-whatsapp-39-arrested-so-far

[3] https://www.whatsapp.com/security/

[4] http://www.bbc.com/news/technology-35969739

[5] http://www.whatsapp.com

[6] http://www.whatsapp.com

[7] Rooting is the process of allowing users of smartphones, tablets and other devices running the Android mobile operating system to attain privileged control (known as root access) over various Android subsystems. (Wikipedia)

[8] Android Package Kit (APK) is the package file format used by the Android operating system for distribution and installation of mobile apps and middleware. (Wikipedia)

[9] https://www.cdn.whatsapp.net/android/

[10] https://issuetracker.google.com/issues/37123176

[11] http://www.digitalinternals.com/security/decrypt-whatsapp-crypt12-database-messages/559/

[12] https://github.com/EliteAndroidApps/WhatsApp-Crypt12-Decrypter

[13] https://faq.whatsapp.com/android/20902622

[14] http://sqlitebrowser.org/

[15] January 1, 1970

[16] https://forum.xda-developers.com/showthread.php?t=1583021

 

Leave a Reply

Back To Top