Forensic Analysis and Security Implications of DeepSeek

Overview

Starting on February 24th, 2025, William Campbell and Reina Girouard of the DigForCE Lab began an analysis and investigation into the key areas surrounding DeepSeek’s application and security concerns. The following article details the methodology used, findings observed, and further conclusions from external sources.

Research Question:

What forensic artifacts can be recovered from a smartphone and a computer after interactions with DeepSeek, and what are the cybersecurity, data privacy, and national security implications of DeepSeek?

Project Background and Description:

This research study aims to investigate the forensic artifacts associated with DeepSeek’s recently popular generative AI applications, available on iOS, Android, and web platforms. This study attempts to provide an overview of the newly discovered and well-known cybersecurity, data privacy, and national security implications of these applications. These findings are based on the following research conducted, as well as other relevant studies.

Project Scope:

The primary objective of this study is to explore the data acquisition capabilities of DeepSeek’s services, understand their potential impact on user data, and examine the broader implications of these effects. To provide comprehensive analysis, the study will focus on the following areas:

Forensic Analysis of DeepSeek – Collect data from iOS, Android, and Windows devices regarding their respective applications. Then, analyze the files used or modified by these applications in order to provide insights into the applications and their services.
Review of DeepSeek’s Public Documentation – Review DeepSeek’s Privacy Policy and Terms of Use to gain insights into DeepSeek’s data collection practices, and identify potential privacy concerns, data-sharing policies, and any security vulnerabilities related to the services.
Review of Known Issues Related to DeepSeek’s Services – Survey existing reports and articles to identify any known issues, vulnerabilities, or security concerns regarding the functionality and security of DeepSeek’s services.

Based on the findings of these analyses, this study will evaluate the potential risks associated with DeepSeek’s data acquisition methods, assess their compliance with privacy regulations, and identify the security implications of these findings.

Methodology:

Device	Operating System Version	Application	Application Version
Apple iPhone 14 Pro Max	iOS 18.3 (22D63)	DeepSeek (iOS Application)	Version 1.0.7
Samsung Galaxy A32 5G	Android 13 (January 1, 2025)	DeepSeek (Android Application)	Version 1.0.8
Lenovo ThinkPad X1 Yoga Gen 6	Windows 11 Home, 24H2 (26100.3037)	https://chat.deepseek.com/ (Google Chrome)	Version 133.0.6943.54

Figure 1: Devices and versions analyzed.

The following is the proposed methodology, based on the project’s scope:

Forensic Analysis of DeepSeek
- Device Set Up: Prepare devices for data acquisition for their respective applications. Ensure devices and applications are fully up to date prior to project execution. Above is a table including the devices and applications that will be utilized during this project (Figure 1).
- Controlled Data Collection: Data acquisitions of the devices will be performed using validated digital forensics tools to establish a baseline for data comparison. This data will include the initial state of the devices before any interaction with DeepSeek’s services.
- Data Generation: The applications are installed/accessed on each device and logged into using a newly created DeepSeek account. Logs of user interaction will be recorded for each device. The following content is generated on each device:
  - A “Saved” Conversation: A conversation that is that is generated and not deleted from the application.
  - A “Deleted” Conversation: A conversation that is generated and is deleted from the application.
- Data Analysis and Interpretation: Once data is generated on each device, data acquisitions will be performed once again to capture the state of the devices with the modified files result from platform usage. The acquired data will be examined to understand the potential impacts of DeepSeek on user data. This analysis will also include identifying digital artifacts left by the application, such as chat logs, queries, metadata, or any residual AI-generated content. Technologies used by the application will also be reported if they impact the security or privacy of the platform.
Review of DeepSeek’s Public Documentation
- Review DeepSeek’s Terms of Use and Privacy Policy to identify how the platform handles user data.
- Report on how findings align with data privacy regulations, such as the General Data Protection Regulation (GDPR), California Consumer Privacy Act (CCPA), or other national security directives.
Review of Known Issues Related to DeepSeek’s Services
- Identify and synthesize commonly known issues related to the DeepSeek company and its services that may impact user data privacy and/or security.
  - Reference articles and reports discussing DeepSeek’s data privacy and security.
  - Investigate the implications of bias originating from DeepSeek models, compare them with industry standards, and assess their potential political, social, and ethical implications.

Analysis

I. Forensic Analysis of DeepSeek

As discussed in the Methodology section of this article, the forensic analysis of DeepSeek was performed on an up-to-date iOS, Android, and Windows device. The analysis was performed after user-generated content was created within the DeepSeek applications on each device. All user interaction with the devices was logged and recorded. For each device, a “Saved” and “Deleted” chat conversation was created within each application under the same DeepSeek Account. The analysis aims to identify digital artifacts that can be recovered from each device, as well as technologies used by the app that may pose security or data privacy concerns. Notable digital artifacts will be recorded for each device, along with any additional relevant analysis.

Note: A User ID (User_ID) is a unique identifier used by DeepSeek’s services to specify a user. This identifier is the same on all devices with the same account signed in. This identifier is a 32-character Globally Unique Identifier (GUID) value (e.g. 92d30d42-e1a0-4a6e-8de3-c71b78f54a2e).

iOS Analysis

The following section outlines various digital artifacts associated with the DeepSeek iOS application, extracted from an iOS device. Each artifact of interest is listed below with its file path, name, and a description of its contents. These artifacts provide insights into the application’s data structure, user information, and operational behavior.

/root/private/var/mobile/Containers/Data/Application/<GUID>/Documents/<User_ID>.db
This database stores information about chat conversations saved to the device, all linked to the user’s DeepSeek account. Each conversation is associated with a unique GUID labelled “id”. Notably, chat conversations initiated on Android and Windows devices also appear in this database, highlighting the prevalent syncing across platforms. However, deleted conversations could not be identified. All chat conversations saved to this database are stored unencrypted. Below is a screenshot of the “chat_session_list” table found within the database, along with the table associated with the “Saved” iOS chat conversation identified by ID 62781124-c91f-4313-b15c-55504f4a02e4.

/root/private/var/mobile/Containers/Data/Application/<GUID>/Documents/deepseek_chat.db
This database contains application information, such as the user’s unique User ID (User_ID), email address, and phone number. Below is a screenshot of the “app_user_info” table found within the database.

/root/private/var/mobile/Containers/Data/Application/<GUID>/Library/Caches/com.deepseek.chat/fsCachedData/*
Cache files used by the DeepSeek iOS application. During testing, one cached file (labeled with a unique GUID) was found to contain data entries corresponding to the last response in the most recent active conversation, which happened to be from the “Deleted” iOS conversation. Therefore, the final response of the conversation that was deleted from the iOS device was able to be recovered. Below is a screenshot of the cached file’s contents.

The strings derived from the ‘contents’ field of each data entry reconstruct the final response from the most recent conversation, which was the “Deleted” iOS conversation. The response reads: “Understood! If you have any questions or need help before this chat is deleted, feel free to ask, I’m here to assist you! 😊”

/root/private/var/mobile/Containers/Data/Application/<GUID>/Library/Caches/com.deepseek.chat/Cache.db
This database contains CFURL endpoints used to access resources. CFURL is a data type in Apple’s API that is used to dereference URL strings to access resources and files [1]. The Endpoints included in the database include the following domains: deepseek.com, intercom.com, volces.com, and fengkongcloud.com. Below is a list of API endpoints that were identified within the Cache.db file:

Of these API endpoints, the following are most notable:

The API endpoints of volces.com are associated with APMPlus, an Application Performance Monitoring (APM) solution developed by Volcengine [2], a subsidiary of ByteDance, the parent company of TikTok. APM solutions are used to monitor application performance through various software tools and system telemetry data. Some of the URLs found within the Cache.db that are associated with volces.com contain parameters for unique identifiers and device identifiers.

The API endpoints of intercom.com belong to Intercom, a software company that provides customer service functionality to various companies through chatbots, like the one present within the iOS application. The API endpoints of fengkongcloud.com belong to Shumei Inc. and are allegedly used for device fingerprinting. More information about how the domain is accessed can be found within Shumei’s documentation [3].

/root/private/var/mobile/Containers/Data/Application/<GUID>/Library/Cookies/Cookies.binarycookies
Cookies are used to manage session persistence, track user activity, and facilitate smoother interactions within the app. These cookies can include authentication tokens, user preferences, and other session-related data to ensure that users maintain their logged-in state and personalized settings across app sessions. The DeepSeek iOS application stores cookies on the device originating from volces.com and deepseek.com. As mentioned previously, volces.com is associated with Volcengine, a subsidiary of ByteDance, the parent company of TikTok.

iOS Third-Party Services

The following section outlines various software components, libraries, and services used by the DeepSeek iOS application. These third-party services are integrated within the app to provide additional functionality, enhance performance, and manage different services. Each artifact of interest is listed below with its file path, name, and a description of its contents. These artifacts provide insights into how the DeepSeek iOS application manages data, facilitates communication, and integrates third-party services.

/root/private/var/mobile/Containers/Data/Application/<GUID>/Library/Caches/com.pinterest.*
The DeepSeek iOS application uses PINCache, which is an open-source object cache framework created by Pinterest. This framework appears to be caching assets used by Intercom, which integrates customer service functionality within the application.

/root/private/var/mobile/Containers/Data/Application/<GUID>/Library/Heimdallr
The DeepSeek iOS application uses Heimdallr, an open-source OAuth 2.0 client created by Trivago, to handle secure authentication and authorization within the app.

/root/private/var/containers/Bundle/Application/<GUID>/DeepSeek Chat.app/*
The application bundle for the DeepSeek iOS app contains various resources, including third-party bundles and frameworks. The following services have been identified as integrated into the DeepSeek iOS application:

Alamofire
APMInsight
AppAuth
AppCheckCore
FBLPromises
GoogleSignIn
GoogleSignInSwift
GoogleUtilities
GTMAppAuth
GTMSessionFetcher
Intercom
IosMath
MMKV
MMKVCore
PinLayout
Pods_DeepSeek_Chat
SmCaptcha
Highlightr
Kingfisher
RangersAPMPrivacyInfo
RangersAppLog
RangersAppLogDevTools
Snapkit

Of these third-party libraries and frameworks, the following are most notable:

The DeepSeek iOS application uses the following services that are related to APMPlus: APMInsight, RangersAPMPrivacyInfo, RangersAppLog, RangersAppLogDevTools. As previously mentioned, APMPlus was developed by Volcengine, a subsidiary of ByteDance, the parent company of TikTok.

The DeepSeek iOS application uses MMKV, a mobile key-value storage framework developed and employed by WeChat, which is owned by Tencent Holdings. This framework runs on-device and should not impact data privacy and security, as all data stored using MMKV remains local to the device. The DeepSeek iOS application uses services from Intercom. As previously mentioned, Intercom is a software company that provides customer service functionality to various companies through chatbots.

Android Analysis

The following section outlines various digital artifacts associated with the DeepSeek Android application, extracted from an Android device. Each artifact of interest is listed below with its file path, name, and a description of its contents. These artifacts provide insights into the application’s data structure, user information, and operational behavior.

/data/data/com.deepseek.chat/databases/deepseek_chat_<User_ID>.db
Similar to the iOS application, this database stores information about chat conversations saved to the device, all linked to the user’s DeepSeek account. Each conversation is associated with a unique GUID labelled “id”. Notably, chat conversations initiated on iOS and Windows devices also appear in this database, highlighting the prevalent syncing across platforms. However, deleted conversations could not be identified. All chat conversations saved to this database are stored unencrypted. Below is a screenshot of the “chat_session_list” table found within the database, along with the table associated with the “Saved” Android chat conversation identified by ID bfcb7ffe-8c5c-4ea0-bc2b-d9c47e9e9ec6.

/data/data/com.deepseek.chat/databases/deepseek_chat.db
Similar to the iOS application, this database contains application user information, such as the user’s unique User ID (User_ID), email address, and phone number. Below is a screenshot of the “app_user_info” table found within the database.

Android Third-Party Services

The following section outlines various software components, libraries, and services used by the DeepSeek Android application. These third-party services are integrated within the app to provide additional functionality, enhance performance, and manage different services. Each artifact of interest is listed below with its file path, name, and a description of its contents. These artifacts provide insights into how the DeepSeek Android application manages data, facilitates communication, and integrates third-party services.

/data/data/com.deepseek.chat/files/Vlog/APMPlus/*
/data/data/com.deepseek.chat/files/apminsight/*
The DeepSeek Android application uses services that are related to APMPlus, such as APMInsight. As previously mentioned, APMPlus was developed by Volcengine, a subsidiary of ByteDance, the parent company of TikTok.

/data/data/com.deepseek.chat/files/mmkv/*
The DeepSeek Android application uses MMKV, a mobile key-value storage framework developed and employed by WeChat, which is owned by Tencent Holdings. This framework runs on-device and should not impact data privacy and security, as all data stored using MMKV remains local to the device.

Web (Windows) Analysis

DeepSeek does not have a dedicated application for use on Windows, MacOS, or Linux; Instead, DeepSeek hosts a web application, which can be accessed at https://chat.deepseek.com/. The following section outlines various digital artifacts associated with the DeepSeek web application, as viewed from a Windows device running the Google Chrome web browser. These artifacts provide insights into the application’s data structure, user information, and operational behavior.

Web Network Activity

The following section outlines digital artifacts associated with the DeepSeek web application, collected from network traffic generated by the application. This traffic was captured using the network feature of Google Chrome’s Inspect Element. For this section, a conversation was initiated within the DeepSeek web application, which generated network traffic to and from the Windows device. Each artifact of interest is listed below with the name of the network request and a description of its contents.

https://gator.volces.com/list
https://apmplus.volces.com/monitor_web/collect
The DeepSeek web application sends requests to volces.com. As previously mentioned, these API endpoints are associated with APMPlus, which was developed by Volcengine, a subsidiary of ByteDance, the parent company of TikTok. APM solutions are used to monitor application performance through various software tools and system telemetry data. These network requests include unique identifiers and device identifiers, such as the user’s browser, browser version, device model, operating system name, and operating system version.

Cached Web Browser Data

The following section outlines digital artifacts associated with the DeepSeek web application, collected from cached data found within the Google Chrome application on a Windows device. Each artifact of interest is listed with its file path, name, and a description of its contents.

\Users\<User>\AppData\Local\Google\Chrome\User Data\Default\Cache\Cache_Data\*
The DeepSeek web application stores temporary data locally to the Windows device when in use. This data includes identifiers that were previously mentioned during the analysis of the DeepSeek iOS and Android application. Below is a list of domain names of interest from the URLs cached on the Windows computer. These domains may provide insight into web activity and data exchanges related to the DeepSeek application.

intercomcdn.com
Intercomassets.com
intercom.io
deepseek.com
cloudflare.com
googleapis.com
gstatic.com
recaptcha.com
cloudfront.net
baidu.com
volces.com
portal101.cn

Of these domain names, the following are most notable:

The web application accesses content from baidu.com. Baidu.com is a search engine primarily used in China that belongs to Baidu, a Chinese multinational company.

The web application accesses content from volces.com. As previously mentioned, this domain is associated with APMPlus, which was developed by Volcengine, a subsidiary of ByteDance, the parent company of TikTok.

The web application accesses content from portal101.cn. This domain belongs to Shumei Inc. and is allegedly used for device fingerprinting. More information about how the domain is accessed can be found within Shumei Inc’s documentation [4].

II. Review of DeepSeek’s Public Documentation

An official DeepSeek account is required to use any of DeepSeek’s services. These services include their iOS, Android, and web applications, as well as their official Application Programming Interface (API). Upon creating a DeepSeek account, users are able to use an email address or phone number for authentication purposes. By creating an account, the user must agree to DeepSeek’s Terms of Use and Privacy Policy. Once a DeepSeek account is created, the setting “Improve model for everyone” is enabled by default. This setting claims the following: “Allow your content to be used to train our models and improve our services. We secure your data privacy”. There are no other settings associated with DeepSeek that impact the privacy or security of a user’s account.

Privacy Policy

The following section outlines information found within DeepSeek’s Privacy Policy as of February 14th, 2025. DeepSeek’s official Privacy Policy can be accessed here: https://cdn.deepseek.com/policies/en-US/deepseek-privacy-policy.html.

DeepSeek’s Privacy Policy indicates that data is collected in three main ways: information that the user provides, information that is automatically collected, and information that is collected from other sources. Below is a list of all the information that DeepSeek’s services collect from the user:

Information Provided by the User
- Account Information – Information provided during account creation (date of birth, username, email address, telephone number, and password).
- User Input – Information provided while interacting with DeepSeek’s services (text input, text prompts, uploaded files, feedback, and chat history).
- Information Collected When You Contact DeepSeek – Any information given to DeepSeek upon contacting them. (proof of identity or age, contact information, feedback, and inquiries).
Automatically Collected Information
- Device and Network Information – Certain device and network connection information, as well as service-related, diagnostic, and performance information (device model, operating system, IP address, device identifiers, and system language). In the Forensic Analysis section of this article, this information was found to be collected by various Chinese entities, including ByteDance (the owners of TikTok).
- Log Information – Information about the user’s activities. (application features and user actions).
- Location Information – Approximate location based on IP address, for security reasons, such as detecting unusual login activity.
- Payment Information – Payment information used to provide services such as order placement and payment (transaction information and payment orders).
Information from other Sources
- Linked Services – Access tokens from third-party services such as Google or Apple.
- Security Information – Information received from “trusted partners”, to protect against fraud, abuse, or other security threats.
- Public Information – Publicly available information used to train DeepSeek’s models and provide various services.

It is to note that upon DeepSeek’s initial release in January 2025, the privacy policy stated that DeepSeek collected “keystroke patterns or rhythms”, similar to the TikTok application [5]. This information was removed with the most recent update to DeepSeek’s Privacy Policy that occurred on February 14th, 2025. Due to the sudden change in the data being collected, DeepSeek’s data collection practices have been met with uncertainty and skepticism.

Compared to other companies offering similar services, such as OpenAI’s ChatGPT or Google’s Gemini, DeepSeek collects similar types of data. However, one key difference is that DeepSeek explicitly states where user data is stored, which is in “secure servers” located in the People’s Republic of China (PRC).

Terms of Use

The following section outlines information found within DeepSeek’s Terms of Use as of January 20th, 2025. DeepSeek’s official Terms of Use can be accessed here: https://cdn.deepseek.com/policies/en-US/deepseek-terms-of-use.html.

According to DeepSeek’s Terms of Use, if a user discontinues the use of DeepSeek’s services, terminates their contract with them, or deletes their DeepSeek account, DeepSeek has the right to retain “certain data of the user as required by laws and regulations”.

DeepSeek’s Terms of Use also states that the company is “governed by the laws of the People’s Republic of China in the mainland”. Since user data that is collected by DeepSeek’s services is stored in the PRC, the data is also governed by PRC law. According to the Center for Strategic & International Studies (CSIS), PRC law allows the PRC to access DeepSeek user data without the legal procedures that would be required in any other rule-of-law country [6]. Therefore, data privacy concerns are relevant as user information could be accessed without typical safeguards.

Data Privacy Compliance

Unlike other companies offering similar services, such as OpenAI’s ChatGPT or Google’s Gemini, DeepSeek does not mention compliance with key regulations like the General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), or other national security directives on any of their public documentation. Companies like OpenAI and Google openly lay out their approach to data privacy and their compliance with these directives. DeepSeek’s Privacy Policy also lacks transparency with which service providers they share user data with, which also raises potential compliance issues.

The findings documented in the Forensic Analysis section of this article indicate that DeepSeek shares user data with multiple third-party entities, including ByteDance, the parent company of TikTok. These findings are consistent with claims made in various reports on the subject [7]. The transmission of user behavior data and device fingerprinting to a Chinese-owned company with a history of data privacy concerns raises significant compliance issues under GDPR, CCPA, and other national security regulations.

III. Review of Known Issues Related to DeepSeek’s Services

The following section synthesizes commonly known issues related to DeepSeek, in particular relation to the release of its latest R1 model. A wide range of media organizations and researchers have investigated DeepSeek’s products and partnerships, placing a large focus on its security and data privacy. Since January, DeepSeek has suffered a variety of cyberattacks and vulnerabilities. Researchers and analysts have also connected it to several Chinese firms that have raised concerns with data privacy in the past. This includes China Mobile and ByteDance. This has led several countries to place restrictions on the use and access of DeepSeek. Additionally, the model shows significant bias towards the PRC’s ideals, often censoring answers or pushing propaganda.

Security Issues

January 29th Data Breach

On January 29th, Wiz released an article documenting a significant vulnerability in one of DeepSeek’s databases [8]. DeepSeek patched the vulnerability, however it posed a serious risk to exposing sensitive user data. The Wiz researchers found a ClickHouse database that could be accessed via subdomains of DeepSeek’s website, with no authentication required. SQL queries on the database could be executed, providing clear access to sensitive internal data. This data included over one million lines of log streams, including chat history, API keys, backend information, operational data, and other sensitive information.

Known Vulnerabilities

Since the release of DeepSeek’s R1 model in January, DeepSeek has struggled continuously with a variety of cyberattacks and vulnerabilities. In late January, the company limited registration only to individuals with Chinese phone numbers. It claimed that limiting registration would allow current users to still access its services amid a significant rise in cyberattacks against the company. According to The Hacker News [9], large-scale cyberattacks like these are not uncommon for companies that have faced such a rapid rise in popularity as DeepSeek has. In late January, analysts from Wallarm were also able to jailbreak DeepSeek’s model [10]. Their jailbreak attack resulted in DeepSeek returning its entire system prompt. DeepSeek has now patched this vulnerability.

On January 31st, researchers from Cisco and the University of Pennsylvania published a report evaluating the security of DeepSeek in comparison to other AI models [11]. They tested DeepSeek by running it locally, rather than using the web or mobile application interfaces [12]. DeepSeek failed 100% of the automated attacks run against it. The researchers used jailbreaking techniques from the HarmBench dataset, which is a well-known benchmark for LLMs. It was the only AI evaluated to fail 100% of the attacks, which means it responded effectively to every harmful prompt it was given. These include prompts focusing on cybercrime, misinformation, illegal activities, and general harm. The researchers suggested that these vulnerabilities are what have likely come at the cost of DeepSeek’s cheaper R-1 model. As of now, there is no indication that DeepSeek has fixed these vulnerabilities. Below is a graph from Cisco of their findings.

Experts used jailbreaking techniques on DeepSeek that caused it to provide C++ code for various forms of malware [13]. The code was not advanced enough to work on its own and had a variety of errors. However, the experts were able to manually develop the code into functioning malware programs. While DeepSeek has the ability to provide starter code for malware, it clearly is not developed enough and has not received the training to do this effectively.

According to Lawfare [14], DeepSeek itself has not posed a significantly greater risk than other LLMs on the market, with its privacy policies in particular. However, the nationality of DeepSeek overrides the normalcy of these policies. This is because such policies and regulations enacted by the company can be overruled by the Chinese government. For example, Article 7 in the Chinese Intelligence Law states, “All organizations and citizens shall support, assist, and cooperate with national intelligence efforts in accordance with law, and shall protect national intelligence work secrets they are aware of.” Lawfare also notes that users can work around data privacy concerns surrounding this by running the model locally on their machines. They also discuss how DeepSeek will likely become another LLM malicious threat actors will add to their toolkit to automate reconnaissance and scripting within their attacks.

Data Privacy

China Mobile

Researchers found code on DeepSeek’s web app likely related to China Mobile authentication and identity management systems [15]. China Mobile is a Chinese state-owned telecommunications company currently banned from operating in the US. The code was heavily obfuscated, but the researchers said that it could transmit fingerprinting data when a user logs in to DeepSeek. [15] also discusses concerns surrounding the type of data being used with DeepSeek, in comparison to other entertainment and social media platforms. Data included in prompts given to AI models like DeepSeek often include confidential business information, research, and sensitive personal information. A previous official of the Department of Homeland Security and the National Security Agency, Stewart Baker, stated, “[DeepSeek] raises all of the TikTok concerns plus you’re talking about information that is highly likely to be of more national security and personal significance than anything people do on TikTok.”

ByteDance and Encryption

According to NowSecure [16] and SecurityScorecard [17], DeepSeek stores application usage and device telemetry data on servers managed by ByteDance. This was verified above in the Forensic Analysis section of this article. While researchers have not found conclusive evidence that DeepSeek shares user data with the Chinese state, its connection to ByteDance has raised many concerns around data analytics, control, and privacy.

These articles also discuss significant vulnerabilities found in the DeepSeek mobile apps due to insecure encryption implementations. These include the use of hardcoded encryption keys, weak encryption algorithms, and reusing initialization vectors. For example, DeepSeek uses 3DES, a well-known outdated encryption algorithm to encrypt transmitted data. 3DES has been officially deprecated by the NIST since 2019 due to its weak algorithm and susceptibility to attacks [18]. Furthermore, NowSecure found that some data transmitted over the Internet on the DeepSeek IOS mobile app is not even encrypted to begin with.

Data Stored Elsewhere

According to National Public Radio (NPR) [19], DeepSeek stores all the data collected on Americans in China. Consequently, this has led to the widespread concern regarding the access the Chinese government will have to this data. NPR mentioned it is worth considering that the Chinese government has already gained access to copious amounts of data on Americans, without needing an app such as DeepSeek. Regardless, the US or EU is not able to effectively control or influence the use of the data stored by DeepSeek has likely caused the influx of concerns surrounding the company.

Current Restrictions

Several countries have placed restrictions or bans on the use of DeepSeek products. According to Exterro [20], seven countries have outwardly placed or recommended restrictions on the use of DeepSeek. Taiwan has advised its citizens against using DeepSeek. Australia, Canada, and the Netherlands have restricted or banned its use on government devices. While the US has pushed similar legislation, DeepSeek is currently banned on NASA and US Navy devices. Additionally, Italy and South Korea have restricted access to DeepSeek nationwide.

Model Bias

Chinese national cybersecurity standards require generative AI to align with the country’s “core socialist values” and protect is national image [21]. The Guardian probed DeepSeek with potentially sensitive prompts and found several that the AI deemed out of its scope [21]. These included details about what happened at Tiananmen Square on June 4th, 1989, Hu Jintao’s removal in 2022, the Umbrella Revolution, Covid-19 lockdown protests, and comparisons between Xi Jinping and Winnie the Pooh. However, the censorship could be overcome by asking it to answer questions in an unconventional way (e.g. Leet speak). Beyond direct censorship, DeepSeek also clearly showed bias towards sensitive Chinese political topics. When asked about the independence of Taiwan, it stated that Taiwan was part of China’s territory, any attempts to split the country would fail, and reunification under the One-Chine Principle is key. When asked the same question, ChatGPT stated that Taiwan is an independent country and Gemini stated that it was a complex topic and gave several perspectives on the issue. Other topics that showed clear bias toward China included the Spratly Islands in the South China Sea and the Dalai Lama.

David Baek published an article on LinkedIn [22], documenting DeepSeek’s and ChatGPT’s responses to an array of sensitive questions. These included questions on topics surrounding human rights violations in Hong Kong, the Korean War, the Russo-Ukrainian War, US sanctions against China, and the suppression of free speech. For each question, ChatGPT gave a very different answer to DeepSeek, often incorporating a range of facts or various perspectives on an issue. DeepSeek generally appeared to give heavily pre-scripted, vague answers that aligned with the views of the Chinese state. With this censorship, it is worth considering how limitations like these placed on DeepSeek might affect its training and future development.

Our Findings

An article from WIRED found that DeepSeek’s model is prone to censoring its responses within DeepSeek’s website, app, or API [23]. According to the article, DeepSeek does this to comply with Chinese regulations. Due to the real-time characteristics of DeepSeek’s output, the censoring of DeepSeek’s content can be seen when discussing topics that conflict with Chinese national interests or values. For example, asking the model “what is the political situation in taiwan” begins to generate response on the topic. After the response is fully generated, the DeepSeek application censors its answer, replacing the previously generated output with “Sorry, that’s beyond my current scope. Let’s talk about something else.” Below are the screenshots of the DeepSeek response before and after it was fully generated:

Although other models, such as ChatGPT and Gemini, have similar features for explicit content or sensitive topics, they do not employ real-time censorship like DeepSeek. This type of censorship is only present within the models hosted by DeepSeek’s applications. Asking a DeepSeek model that is locally hosted on a platform like Ollama, produces a long descriptive output which is not censored. Below is a screenshot of a locally hosted version of DeepSeek being asked “what is the political situation in taiwan”.

While real-time censoring is not present in the locally hosted version of the DeepSeek, the model still tends to produce biased results. A locally hosted version of the DeepSeek model was tested to compare its bias with the model hosted by DeepSeek’s applications. We found that the locally hosted version of the DeepSeek model produced biased results less frequently compared to the model hosted by DeepSeek’s applications.

For example, when prompting the model with “is taiwan country”, the models hosted by DeepSeek consistently output the same biased result, which is pictured below.

This biased output was consistently generated by DeepSeek’s applications, whether or not it was asked at the beginning of a new conversation or not, almost as if the response was pre-scripted. Prompting the locally hosted version of the DeepSeek model with the same question generated a less biased result, which considered different perspectives on the topic. The output generated by the locally hosted model is pictured below.

While the locally hosted model exhibits a reduction in biased responses, it is not entirely free from bias and still occasionally produces responses like that of the model hosted by DeepSeek’s applications. Below is a screenshot of a similarly biased response generated by the locally hosted model.

Conclusion

The digital forensic analysis of the DeepSeek application, conducted by William Campbell and Reina Girouard of the DigForCE Lab, revealed that its iOS, Android, and web versions of DeepSeek integrate third-party services for device fingerprinting and telemetry data collection to support application program management. These services involve API requests and development libraries used within the applications. Notably, many of the telemetry-related services (handling device and application data rather than user information) are operated by Chinese entities, including ByteDance (the owners of TikTok), Shumei Inc, and Tencent Holdings. DeepSeek’s interactions with these companies align with numerous reports highlighting potential privacy concerns associated with the application.

From an investigative perspective, chat conversations and account information are able to be recovered from devices running the iOS and Android DeepSeek applications. Cross-platform syncing was apparent, enabling the retrieval of user-generated content across different devices. Notably, one significant artifact was the remnant of residual AI-generated content from a deleted chat conversation on an iOS device, which included the last generated response of a conversation before deletion.

A review of DeepSeek’s public documentation reveals several aspects of its data collection and user privacy practices. To use DeepSeek’s services, users must create an account and agree to its Terms of Use and Privacy Policy. The Privacy Policy details the collection of user-provided information, device data, location data, and third-party data. Notably, the company collects data that is transmitted to Chinese entities, including ByteDance, which raises privacy concerns. Additionally, DeepSeek’s Terms of Use indicate that user data is subject to the laws of the People’s Republic of China, where it could be accessed by local authorities without the typical legal safeguards. Unlike competitors such as OpenAI and Google, DeepSeek does not explicitly state its compliance with major privacy regulations like GDPR or CCPA, which raises concerns about the transparency and security of user data.

Reviewing known issues related to DeepSeek has shown a significant lack of implementation of cybersecurity practices by the company. This includes leaving vulnerabilities in DeepSeek’s applications, exposing it to large-scale data breaches, service-limiting cyberattacks, and jailbreak attacks. Widespread concern has also risen regarding DeepSeek’s association with Chinese state linked China Mobile and ByteDance. Lastly, DeepSeek is also biased towards Chinese interests, in order to comply with Chinese regulations. This includes censoring content generated by the models to align with the Chinese government’s policies, limiting discussions on sensitive topics, and restricting access to information that may be critical of China. These actions have raised concerns about the potential for state influence over DeepSeek’s models, impacting on the neutrality and reliability of the information they provide.

To address privacy and potential security concerns within organizations utilizing DeepSeek’s models, it is recommended that users interact with a locally hosted version of the model. Users can do this by utilizing third-party hosting services such as Ollama or similar platforms. This approach ensures that all user data remains within the organization’s infrastructure, mitigating data security and privacy risks. While biases may still be present in the locally hosted model, they are less consistent compared to the online version, providing a better balance between privacy and security. Also, unlike the DeepSeek-hosted version of the model, responses generated from the locally hosted version of the model are unable to be censored, providing users with full control over the output.