Introduction
Methodology
Findings
——–Findings within the /data/data/com.google.android.apps.googlevoice Folder
—————-/files/accounts/<account_number>/LegacyMsgDbInstance.db
————————Additional Observations
—————-/files/accounts/<account_number>/AccountDataForVoip.pb
————————Additional Observations
—————-/files/AccountData.pb
—————-/files/accounts/<account_number>/SqliteKeyValueCache:VoiceAccountCache.db
—————-/cache/Photo MMS images
—————-/cache/audio
—————-Additional Observations and Findings for the /data/data/com.google.android.apps.googlevoice Folder
——–Other Findings
Conclusion
Introduction
Google Voice is a popular phone and messaging service that gives Google account users US phone numbers. These are numbers that use the Voice over Internet Protocol (VoIP) to communicate over the Internet rather than a traditional phone line. It has become increasingly popular among online scammers, as the phone numbers are not directly linked to a user’s physical location and can be changed with ease. Surprisingly, the application’s data is not parsed by common digital forensic analysis tools. In July, the DigForCE Lab set out to conduct research on the artifacts created by Google Voice to further understand the data collected, its meaning and associations, and its applicability to digital forensics.
In this post, we focus on the artifacts and findings associated with the use of Google Voice on an Android phone. Google Voice mostly uses Short Message Service (SMS) and Multimedia Messaging Service (MMS) for message communications. It also stores an extensive amount of data within Protocol Buffers. A Protocol Buffer (commonly referred to as a “Protobuf”) is a data format used for structured data. One of the unique qualities of Protobufs is that they are language and platform neutral, which makes them popular for cross-platform information sharing. Protobufs generally have the structure of a key-value pair. The way data in a value is encoded in Protobufs is called a wire type. The Protobuf wire types are varint (used for integers, booleans, and enumeration), 64-bit (used for fixed-length integers and floats), length-delimited (used for strings, arrays, etc.), groups (now deprecated), and 32-bit (also used for fixed-length strings and integers). Protobufs can also be stored within other Protobufs as values (referred to as sub-keys and denoted with [] in this report). However, often these keys are arbitrary numbers to the forensic examiner. As a result, much of our research focused on identifying the meaning of these keys in combination with their values.
Methodology
The Android phone used in our research was a Samsung Galaxy A32 5G. Before collecting any data, we factory reset the phone. Afterwards, we installed Google Voice on the phone and tracked the time of every piece of data generated using NIST Official US Time, which can be accessed here: https://www.time.gov/. After a set of tests, an extraction was taken of the phone using valid digital forensic tools. The tests used ranged from sending and receiving messages, sending and receiving calls, and changing settings on the app. Additionally, we tested how adding and removing google accounts to the app affected the data generated.
Findings
Some of the most important findings within the Google Voice database were within tables and files that used Protobufs. Most of the artifacts associated with Google Voice can be found in the /data/data/com.google.android.apps.googlevoice folder.
Findings within the /data/data/com.google.android.apps.googlevoice Folder
/files/accounts/<account_number>/LegacyMsgDbInstance.db

The LegacyMsgDbInstance database holds an array of interesting tables. Arguably the most important one being the message_t table. Each entry in this table represents a call, voicemail, or text message. For each entry, the table lists the message’s ID, its timestamp, its conversation ID, and a “message_blob” column containing a Protobuf of additional information. After performing a series of tests and comparing the data found in this Protobuf after each test, information regarding the timestamp, phone numbers involved, type of entry, read status, voicemail transcripts, duration, text data, MMS data, and information for recorded incoming calls was identified. Below is a simple Protobuf example from the message_t table and a table listing the key findings within the message_t Protobuf structure.



| Key Value [Sub-Key] | Key Representation | Value |
| 1 | Message ID for calls, voicemails, or texts | Message ID (equal to message_id column in message_t) |
| 2 | Timestamp for entry | Epoch time in milliseconds |
| 3 | The phone number assigned by Google Voice | Phone number in +1xxxxxxxxxx format |
| 4 | Other phone number(s) associated with the entry Blocked number indication if the entry represents an individual message/call and does not have an unknown caller ID | [1] & [2]: Phone number = One sender or recipient Group message ID = Group Chat message ID “Unknown” = Anonymous Caller ID [7]: 0 = Unblocked number 1 = Blocked number |
| 5 | Type of entry | 0 = Missed call 1 = Answered call 2 = Received a voicemail 10 = Received a message 11 = Sent a message 14 = Outgoing call |
| 6 | Chat status | 0 = Unread 1 = Read |
| 7 | Voicemail transcript (if transcripts enabled) | [2]: Contains a data item where each element holds a singular word from the voicemail in the sub-key 1 value |
| 9 | Duration | The tool used in the image above parses the Protobuf value as a fixed32 integer; however, it should be a float. After converting to a float, the value represents the duration in seconds. |
| 10 | Chat text | Text for sent and received messages “MMS Sent” = Sent an image or GIF “MMS Received” = Received an image or GIF |
| 12 | URL for voicemails | URL – when entered in a browser, the link opens a window to download an mp3 file with the message ID of the voicemail |
| 13 | Type of entry | 0 = Missed call 1 = Answered call 2 = Outgoing call 3 = Received a voicemail 5 = Sent a message 6 = Received a message |
| 15 | Additional group chat or MMS info | [1]: Text for sent and received group chat messages [3][1]: Type of MMS data [3][2]: “message_id-1″ [4]: Data items holding phone number(s) and blocked number indication(s) [5]: The phone number of the sender of a received message in a group chat (not the user) |
| 19[2] | Missed an incoming call and received a voicemail with a transcript – if the value is 2 It’s important and interesting to note that for the value to be 2, the voicemail had to have been received because an incoming call was missed (did not include the welcome voicemails) and the account had to have transcripts enabled for the voicemail (voicemails with no transcript received a 1) | 1 = Default value for most entries 2 = Missed call that received a voicemail with a transcript |
| 22[3] | Missed an incoming call | 1 = Missed an incoming call (present regardless of if a voicemail was left or not) 2 = Missed an incoming call because Do Not Disturb was enabled |
| 23 | More information for recorded calls | [1]: “cra:message id” [3][1][1]: Duration of recording in seconds |
Additional Observations
| Key Value | Key Representation | Value |
| 7 | Voicemail transcript (if transcripts enabled) | [1]: Holds an integer associated with the voicemail [2]: Contains a data item where each element also holds the integer above in [1] in the sub-key 4 value(s). This integer is the same for identical voicemails, such as the automatic welcome voicemail Google Voice sends to new users. |
| 18 | Unknown – only appears when the user sends messages (SMS or MMS) | Large integer |
| 14 | Seems to indicate the presence of additional text data within the Protobuf | 0 = No additional text data 1 = Additional text data |
A test conducted produced a 3-way call using an initial incoming call to the Google Voice number (from the contact “Irene Baker”) and then the phone number (Irene) connecting to another Google Voice number (“Gary”). However, within the Protobuf, no data hinting at a 3-way call could be found. The call just appeared as an incoming call from Irene’s phone number.
When a text message conversation is archived, its label_id in the conversation_labels_t table is changed to 13 (meaning archived – translation in label_t table). However, in the message_t table, the messages appear without any identification as an archived message.
Similarly, when a text message conversation is marked as spam, its label_id is changed to 4 (meaning spam – translation also in the label_t table) and no changes appear to be made to the messages in the message_t table.
In the conversation_id column in message_t, the ID starts with a “.c”, “.t”, or “.g” if the entry is a call/voicemail, text message, or group chat message, respectively.
In the conversation_t and conversation_labels_t tables, some entries have a conversation ID that starts with “.s”. However, these are temporary entries, as they do not appear with a “.s” in later extractions.
Entries for missed calls that receive a voicemail will record the timestamp as the time when the call was first received. Additionally, the duration given for missed calls with voicemails is the duration of the voicemail recording itself.
No data was found in keys 8, 11, 16, 17, 20, and 21 in the Protobufs examined.
/files/accounts/<account_number>/AccountDataForVoip.pb
The AccountDataForVoip Protobuf contains a variety of Google Voice account data, including device information and the phone number linked to the Google Voice account.

| Key Value [Sub-Key] | Key Representation | Value |
| 3[3] | Type of device using Google Voice | Text data of device type (e.g. “Android Device” or “SM A326U”) |
| 3[4][1] | Linked phone number | Phone number in +1xxxxxxxxxx format |
| 4 | Linked phone number | Phone number in +1xxxxxxxxxx format |
Additional Observations
Key 1 could potentially represent the last date an account was loaded on the device or another background process in Epoch microseconds time. However, this has not been tested enough to confirm this. The integer stored here is generally consistent; however, it occasionally changes.
Key 3[1] is a large hexadecimal value which may represent a user ID of some sort. For example, the value in key 3[1] for the user account 1 was also referenced in several other databases and log files.
/files/AccountData.pb
The AccountData Protobuf also contains additional user account information.

| Key Value | Key Representation | Value |
| 1 | Number of accounts that have been on the device (including removed accounts and re-added accounts) | The number of total accounts + 1 |
| 2 | A data item with a variable number of embedded Protobuf(s) that represent a current Google account logged into Google Voice on the device | [1]: Account number, which represents the folder information for this account is under and the account this Protobuf is referring to [2][1]: Account number, which represents the folder information for this account is under [2][2][2]: Full name on the Google account [2][2][3]: Email address on the Google account [2][2][8]: First name on the Google account [2][2][9]: Last name on the Google account |
/files/accounts/<account_number>/SqliteKeyValueCache:VoiceAccountCache.db
In this database, within the cache_table table, the response_data field holds a Protobuf that links important information for the Google Voice accounts. This file directly connects the current Google Voice number to the linked phone number for each account. Below is a screenshot of a test Protobuf generated and a table of the noteworthy information.

| Key Value [Sub-Key] | Key Representation | Value |
| 1 | The Google Voice assigned phone number | Phone number in +1xxxxxxxxxx format |
| 3[1][1][1] | The Google Voice assigned phone number | Phone number in +1xxxxxxxxxx format |
| 3[2][1][1] | Linked phone number | Phone number in +1xxxxxxxxxx format |
| 3[5][1] | Same value as the potential user ID value in key 3[1] in the AccountDataForVoip Protobuf | A large hexadecimal value |
| 3[5][4] | Linked phone number | Phone number in +1xxxxxxxxxx format |
| 3[7][1][1] | The Google Voice assigned phone number | Phone number in +1xxxxxxxxxx format |
/cache/Photo MMS images
Images, GIFs, and other MMS data sent via Google Voice are stored in the Photo MMS Images table. An image is recorded as a single message in message_t and it is stored in Photo MMS Images as the message_id + “-14” + extension (e.g. bbcb0c40527ee100e98b390135d2f390576d4984-14.jpg). This table also contains information on the file extension, type of file, size in bytes, last modified time in the device’s time zone, last accessed time also in the device’s time zone, and an MD5 hash for each file. Below is a screenshot of the Photo MMS images table from the test device.

/cache/audio
The audio table contains audio recordings of voicemails and recorded incoming calls on Google Voice. These are stored with a file name in the format message_id + “.mp3” (e.g. e2f9c067-2335-4e85-8786-0f15cdb42de0_welcome_voicemail_.mp3). This table also has information about the file extension, type of file, size in bytes, last modified time in the device’s time zone, last accessed time also in the device’s time zone, and an MD5 hash for each file. Below is a screenshot of the audio table on the test device.

When an image or voicemail is received and then promptly deleted, the associated files do not remain in their respective folders in the /cache folder. However, when we removed all the accounts from Google Voice on the phone, the image and audio files associated with them remained as of the time of the extraction.
Additional Observations and Findings for the /data/data/com.google.android.apps.googlevoice Folder

Above is a screenshot of the voicemail page in the Google Voice app, with the transcripts visible and the search bar. When the user searches in Google Voice for a contact or phone number, the search is stored in the search_metadata_t table, and the results are stored in the search_result_message_t table. The data in the search_result_message_t references conversations with the searched phone number and is only temporarily stored. However, the searches themselves remain stored in the search_metadata_t table.
When a Google account is removed from Google Voice, it deletes all the data within the removed user’s account folder. However, the folder itself remains, and when a new user is added, a different, new folder is made. For example, if the user with the accounts “2” folder is removed, the accounts “2” folder will be empty, but the next user will have the accounts “3” folder. This can make counting the number of active users trickier but leaves an additional form of identification for removed users.
The folder at /data/data/com.google.android.apps.googlevoice/files/accounts/1/birdsong contains log files with network connectivity information.
Other Findings
Contacts made in Google Voice are also stored in the com.samsung.android.providers.contacts folder and are therefore already parsed by some digital forensic analysis programs. Additionally, deleted contacts are also still visible in tools like Cellebrite.
Some tools parse the Protobuf data differently. For example, Magnet AXIOM would give the duration in key 9 from the message_blog Protobuf in the LegacyMsgDbInstance database in seconds, whereas Cellebrite would only give the fixed32 integer. Another interesting example was for the AccountDataForVoip Protobuf, Cellebrite would produce the model of the device, but the same data in CyberChef would give two keys with unique integers. This difference is shown in the screenshots below. This is likely because the tools are interpreting the wire types of these values differently. As a result, the interpretations of the bytes within the Protobufs are different.


Figure 11: Comparison between the data parsed by Cellebrite and CyberChef, shown respectively, from AccountDataForVoip.pb.
Lastly, the audio files of call recordings and voicemails are only stored locally on a device if the file has been played by the user on that device. Otherwise, the files are stored in the cloud. This is also the same for whether additional MMS data, such as images, has been viewed on the device or not.
Conclusion
Google Voice offers a lot of valuable information accessible within the filesystem of a phone. By running a variety of tests using Google Voice, we were able to parse through the data produced to draw conclusions about the information stored in several Protobufs, tables, and databases. These conclusions can aid in examining the extensiveness of the Google Voice data found on a device, what phone numbers were used on an account, what phone numbers are linked to an account, and the contents of the data sent and received.
This project still has plenty of room for further research. Information on how Google Voice handles call forwarding, 3-way calls, sending an undeliverable message/being a blocked number, sessions on other devices, adding a voicemail greeting, and user IDs could be investigated. Researching the message_t Protobuf to obtain a better understanding of large integers used, values stored within key 19 and 14, and how it connects to other tables in the database would help build upon this research. Attempting to generate values for keys 8, 11, 16, 17, 20, and 21 in the message_t Protobufs could also be interesting.
Additionally, more in-depth testing on the persistence of deleted image and audio files would provide more clarity on this matter. This testing could involve waiting an extended period of time to take an extraction after removing an account, deleting a message that has existed in the database for a longer period, or deleting a conversation containing this media rather than just a message.
Furthermore, a logical next step in this research would be to identify how Google Voice stores data on iOS devices. For the most part, the artifacts on iOS will likely be similar, as Google Voice makes use of the Protobuf data structure, which remains platform independent.
The methodology used in this research can be applied to similar applications across various mobile devices to aid in understanding and validation during digital forensic investigations.