Design of Integrated Digital and IntelligentAudit System: A Proposed Framework
Lian Yun Gang City Audit Bureau Liu Jianbo
Abstract: This paper outlines the design considerations for a new Integrated Digital and Intelligent Audit System based on data analysis tools and artificial intelligence tools. The system is a solution of processing sophisticated audit work flow, leveraging digital technologies to enhance efficiency, accuracy, and data-driven analysis capability. The core goal is to seamlessly integrate audit work flow with information flow, creating a dynamic and responsive intelligent audit system. audit work flow including: Intelligent data acquisition, Data cleaning, Vectorized storage, Adaptive fast intelligent analysis, Providing assistance in audit analysis, Providing assistance in formulating audit conclusions and recommendations. The information flow sustaining system is the integrating of support tools chain to provide work flow function. Information flow sustaining system follows the stages of audit work flow system, processing data following sequence but also facilitating interactive data circulation across stages. This dual-process framework for the digital and intelligent audit system named digital and intelligent audit (DIA) system.
Keywords: Artificial Intelligence, Audit, Design of Digital and Intelligent Audit System, Audit Integration, Digital and Intelligent Audit (DIA)
With the rapid development of artificial intelligence (AI) technology, digitally-driven technologies, spearheaded by AI and big data, are rapidly being integrated into various industries. This presents a pressing challenge for auditors: how to effectively integrate and leverage these technologies for enhanced audit processes. This paper explores a proposed framework for integrating intelligent auditing systems.
1. The background of the design of integrated intelligent audit system
Based on the core function and audit work logic, considering the resource characteristics of some audit teams (grassroots audit agencies, numerous small general accounting audit firms), and the flexible integration features of artificial intelligence technology, it is advisable to first utilize open-source, mature AI applications, middleware, components, and databases that have good compatibility with domestic alternative hardware environments. Then, gradually improve functionality and replace controllable components. This practical deployment strategy is suitable for frontline audit teams. It can be applied to non-production environments such as on-site audit analysis environments. For fully controllable production environments, software registration should be conducted according to relevant regulations, or physical isolation of software and hardware may be required based on specific circumstances.
In terms of data acquisition middleware, message middleware that supports the CDC (Change Data Capture) function is adopted. There are two technical routes widely used in various fields in China, one is the KafKa series, the other is the Mgtt series, both of which have a wide range of application scenarios in various industries.
In the selection of large language models, LLM solutions with a certain open-source foundation are primarily considered. During the design phase, localization deployment is mainly taken into account, integrating localized deployment and network applications is considered in the practical section on the part of embedding intelligent tools into auditing processes. This chapter focuses on developing and deploying lightweight to heavy AI applications in a location-specific manner, primarily based on the background of open-source and domestically restricted model environments. Currently, most AI entities used in finance and economics field adopt Transformer architecture trained LLMs. To enhance intelligence levels, these AI training bases mostly refer to or use mature datasets and models for training. The generated large language models typically exhibit characteristics of black-box AI. To achieve intelligent emergence, generating wisdom means allowing probabilistic errors, which is also the root cause of common AI issues today. A major pain point in audit analysis is the analysis of unstructured files and data. To address these issues, drawing on test results from specialized fields, it is proposed to use Llama3 series, Deepseek V3 series, and some dedicated Coder models that have good performance and compatibility in this field, applying them according to their respective strengths for different application scenarios.
In terms of AI application middleware and references, an open-source code first approach is adopted. To facilitate development and deployment, and to reduce the difficulty for frontline audit teams, we refer to the ChatGPT model reference book (its OpenAI API and invocation methods are widely used and referenced in the field of artificial intelligence, with some interfaces and methods being recognized as industry standards), Llama family models, Qianwen, Deepseek, Zhipu Qingyan, and other open-source model series documentation. We use configuration and interface reference files from more general-purpose model middleware software such as Langchain and Llamaindex.
In the auditing analysis of text field, RAG technology is adopted. According to IDC reports, currently, retrieval-enhanced generation RAG applications and Vector Database are key development directions for AI developers, accounting for as high as 92.9% of global unstructured data in 2023. It can be said that RAG applications are crucial tools for connecting non-standardized data with relational databases and vector databases for analysis. Users need to better manage and maintenance these unstructured data to enable more precise analysis and AI content generation.
In the application of database analysis field, functions are currently developed for three areas: relational database analysis, intelligent database analysis, and time series database analysis. For the selection of relational database management software, it is essential to use open-source or domestically produced hardware-compatible database management software. Commonly used open-source database management software includes phppgadmin, phpMyAdmin, Dbeaver Cloud, while common commercial professional general-purpose database management software includes VS Code, Navicat, Dbeaver. In terms of intelligent database analysis field, a text-to-SQL solution is adopted. This approach primarily addresses the high precision requirements for financial and banking data auditing works, referencing the BloombergGPT and LightGPT models from published papers. LLMs are flexibly applied to the intelligent text analysis of accounting transaction summaries and business management behavior records, as well as the intelligent SQL query analysis of amount figures and text-to-SQL queries, which are current achievable digital intelligence technology routes. For time series databases auditing analysis, their inherent temporal attributes and convenient built-in time functions can reduce the difficulty of programming associated with using ordinary relational databases. They are applications that can directly process realtime data transactions. Common time series databases such as TimescaleDB, InfluxDB, which integrates previous required analysis and management processes of programming, have been widely used in digital asset management such as financial finance, medical monitoring, factory automation and logistics network.
In terms of prompt engineering applications, we must clearly recognize that large language models currently primarily use Q&A for their applications. Most technology and legal application solutions widely employ AI prompt engineering to optimize Q&A performance. The functionality differences in some applications mainly stem from variations in prompt organizations. Well-organized prompts are an essential condition for enhancing the productivity of artificial intelligence.
2. The process of digital and intelligent audit (main functional framework)
Referring to the previous process of using digital tools for audit and its actual functions, we should pay attention to the realtime and intelligent characteristics of application in the era of intelligence, sort out the whole technology software chains of data analysis and artificial intelligence, and rebuild the digital and intelligent auditing work process and information process as follows.
2.1. Work flow of intelligent audit
Take full advantage of the characteristics of intelligent audit with artificial intelligence, embed digital and intelligent technology into the audit work flow, should focused on the stage of: Intelligent data acquisition; Data cleaning; Vectorized storage; Adaptive rapid intelligent analysis; Providing assistance in audit analysis; Providing assistance in formulating audit conclusions and recommendations.
(1) Intelligent data acquisition
In some industry sectors, whether it is relational database data or internet application and web page data, changes can occur at any time. The intelligent data acquisition capability should take into account realtime data acquisition. Traditional digital auditing acquire financial software by the method of backing up database, when acquiring source from Excel sheets, Sql Server, and Oracle databases. In the era of intelligence, it is necessary to increase the use of message middleware software to build realtime automated acquisition functions. In recent years, database platform technology has developed fastly, and mainstream database platforms support CDC (Change Data Capture) technology, which facilitates realtime acquisition work (acquisition via message queue middleware software). This will bring significant progress to audit work, greatly improving the efficiency of information acquisition and transmission.
(2) Data cleaning
There are two hard requirements for data cleaning of raw data: one is that after unstructured data is converted into text or structured data through OCR, there is usually an issue of transformation errors, which generally need to be corrected again; the other is that information accumulated by artificial intelligence may lead to the leakage of personal information, and according to relevant Chinese regulations, the use of personal information has certain limitations. It should also be noted that work in various government departments often involves private domain information, and improper handling of labeled information could even result in specific discrimination in algorithms and models. Therefore, any data generated from administrative actions, whether it is used to train large language models or form local knowledge bases, timely desensitization and cleaning of raw data are necessary. This is also a mandatory requirement for ethics and morality in artificial intelligence in some countries and regions. Currently, the General Office of the State Council and the General Office of the Central Committee have issued the “Opinions on Strengthening the Governance of Scientific and Technological Ethics”, proposing an agile governance philosophy, requiring enhanced early warning of technological ethical risks, timely tracking and analysis, and dynamic adjusting scientific and technological ethics rules. The Cyberspace Administration of China and six other departments have issued the “Interim Measures for the Management of Generative Artificial Intelligence Services”, proposing principles such as balancing safety and development, and combining innovation with governance, all of which come with two general office regulatory requirements. And this data cleaning work often taken by database management system.
(3) Vectorized storage
Vectorized storage is spatialized storage that can address the issue of rapid queries in massive databases. It changed storage stage of structured database text to the application writing stage (the application writing stage is more efficient than digitization during the traditional storage phase). This transformation enhances data query capabilities in the spatial dimension and may be the only effective way to store and analyze large amounts of data. Therefore, intelligent application data acquisition, local knowledge base construction, or image and text recognition, storage, and efficiency analysis all rely on the establishment of localized vector databases. Information stored by vectorized database will be a crucial condition for improving audit efficiency through artificial intelligence.
(4) Adaptive fast intelligent analysis
Rapid analysis and timely warnings are the reasons why the contemporary international financial industry adopts RPA automated auditing. This can effectively prevent risks, promptly analyze and judge issues, and prevent problems from occurring. In the era of informatization and artificial intelligence, realtime auditing serves as the tentacles and immune antibodies for national audit authority to function as an “immune system” for economic supervision. It is also a powerful tool in internal auditing, where numbers take a key constructive role, enhancing management value and strengthening financial discipline. Various audit departments have specific requirements for independence and a data-driven foundation in auditing work. Under the impetus of data analysis and artificial intelligence, realtime supervision of units and projects will bring significant development opportunities for audit reform, marking a substantial advancement in economic governance capabilities.
(5) Providing assistance in audit analysis
In terms of intelligent auditing, the methods of digital and intelligent auditing will provide more effective auxiliary auditing support compared to traditional database analysis-based digital auditing. Whether it is an audit evidence sheet or an audit working paper, they can be understood as electronic text records based on summaries of identified issues and situations. The large language model LLM, which is incubated using NLP natural language processing technology, has some specialized advantages in text analysis. By leveraging RAG technology and artificial intelligence interaction, first, a qualitative legal assistant for auditing can be formed; second, intelligent analysis and summarization of documents can assist in forming audit conclusions for audit matters. For digital and intelligent auxiliary analysis, localized deployment software such as Ollama can be used as the LLM application framework. Models can be flexibly selected according to the following application scenarios: Using Qianwen and Llama3 series as the main model base for analyzing and answering questions about financial texts, with parameter ranges of quantified simplified 8B (8 billion parameters) and 70B (7 billion parameters); Using nomic-embed-text for vector embedding of texts, with parameters of 137M; Using qwen2.5-coder as the database analysis model base for executing text-to-SQL auxiliary analysis and code completion, with parameter ranges of quantified simplified 1.5B (1.5 billion parameters) and 7B (7 billion parameters); And using Qwen2.5, llama3.1 plus legal, regulatory, and normative document libraries to build a legal classification analysis RAG, with parameter range of quantified simplified 1.5B (1.5 billion parameters) and 8B (8 billion parameters).
(6) Providing assistance in formulating audit conclusions and recommendations
Using LLM assist in generating audit reports primarily relies on the text summarization capabilities of large language models, which can simplify complex evidence materials and extract key information and problem statements. Since 2024, our research has shown that large language models have inherited NLP comprehensive ability to process text, with strong literature synthesis and summarization functions. After Deepseek was launched in 2025, it also achieved the capability to capture, analyze, and interpret financial reports. Multiple organizations both domestically and internationally can provide large language models for analyzing financial reports via the internet. Based on the author’s over a year of practical experience using LLM for summarization and text analysis, the function of assisting in generating audit reports is similar to that of RAG functions, and professional LLM models with strong capabilities are preferred for auditing RAG tasks. Therefore, the localized deployment primarily uses Qwen2.5 and the Llama3 family as the main foundation, with parameter ranges ranging from quantitatively simplified 1.5B (1.5 billion parameters) to 8B (8 billion parameters). This work flow refining audit records to assisting in generation of audit conclusion, and the results can be regenerated by using the web version of Zhipu Qingyan AI for comparison and verification.
2.2. Audit work flow and information flow coordination system of intelligent audit system
Setting up a digital and intelligent audit work flow inevitably involves considering the support system of the technology chain (the technical software usage has been briefly described in the previous text). Therefore, the main processes of a digital and intelligent audit system include two levels: one is the digital and intelligent audit work flow system, and the other is the information flow sustaining system to support digital and intelligent audit flow system. The following are the dual-process framework for the digital and intelligent audit system named DIA (DIA is short for digital intelligent audit, conclusion of digital and artificial intelligent audit) system:
发表评论