DARPA has developed scalable tools based on formal mathematical methods (formal methods) to secure and prove the absence of exploitable vulnerabilities. We want to create a sustainable ecosystem of formal methods tools that are ready to use, while continuing to advance the state of the art.
Formal Methods
Formal methods are mathematically rigorous techniques that create mathematical proofs for developing software that eliminate virtually all exploitable vulnerabilities. These techniques achieve this end by specifying, developing, analyzing, and verifying software and hardware systems. Like other engineering disciplines, the use of formal methods in software development establishes a design's correctness and robustness, which are essential to a system's safety and overall security.
Employing formal methods to improve software security has a rich history spanning more than half a century. Even in the early days of computing, there were efforts directed at mathematical specifications and proof of properties of programs. Several U.S. agencies invested in research in formal methods, motivated by emerging uses of computing software and hardware in critical systems (e.g., space or aircraft flight control, communication security, and medical devices). For decades, however, formal methods tools and ecosystems could operate only on problems and systems of extremely modest scale.
The lack of broad application of formal methods tools was also due to computer science students lacking exposure to formal methods techniques and tools. The techniques were largely considered merely theoretical in nature and lacked tangible benefits that could be practically applied to real-world software engineering projects. However, recent advances in formal methods tools, practices, training, and ecosystems have facilitated the application of formal methods at larger scales. These tools can now be employed in a manner that is affordable and usable by software and hardware engineers.
Benefits
Successfully applying formal methods provides many benefits in the short and the long term, including the following:
- Discover software bugs early, eliminate them, and then verify their absence.
- Reduce the number of design iterations needed to achieve system quality goals, which reduces time-to-market and total cost of development.
- Enable reasoning about design choices early in the process.
- Provide a principled and efficient means to generate test plans.
- Aid in integration efforts, providing stronger and less subject-to-change interface designs.
- Ensure that all system elements are appropriately adapted when design changes are needed.
- Integrate seamlessly with zero trust architectural approaches.
- Generate correct and easily read text-based interpretations that allow stakeholders to quickly understand the underlying formal mathematical models and analyses.
- Automate key elements of process to obtain authority to operate, including establishing high assurance of key properties of requirements, system architecture, design commitments, implementation choices, configuration settings, test coverage, and maintenance operations.
Drawbacks
- Applying some types of formal methods requires a high level of expertise, which can be challenging for software developers without a strong background in relevant areas.
- Training non-formal methods experts (e.g., software engineers and developers) can add time and resources to the development process due to a steep learning curve. However, DARPA’s PROVERS program is developing new tools to guide non-experts through designing proof-friendly software systems and reduce the proof repair workload.
- Developers accustomed to traditional software development methodologies may find it difficult to adapt to the rigorous and mathematical nature of formal verification, which creates a deficit in trained users of formal methods.
- Available formal methods tools are less polished and require more significant upfront investment in time and effort compared to traditional software development approaches. However, initial investment is offset by long-term benefits, including enhanced security, reduced development time, and improved software quality.
Continuing impact
- Major industrial use cases: Modern-day use cases go beyond traditional, narrowly targeted formal methods applications, such as flight controls and cryptography. We are now witnessing larger, more diverse, and more complex business-critical systems leveraging the benefits of formal methods. Project leaders have asserted a business case with high return-on-investment for the businesses involved. Indeed, this is accelerating demand for formal methods expertise in these sectors. In addition, use cases in government emerge from an increasing diversity of research and development projects. Academic projects have focused on creating formally verified infrastructure that includes everything from complete systems to the software that runs on them, and even the physical hardware. These infrastructures are increasingly becoming widely adopted, as signaled by the alliance of the Linux Foundation with the verified seL4 operating system.
- Maturing ecosystems: More mature capabilities mean organizations can train staff to leverage formal methods and create high levels of software assurance easier than ever before.
- Broadening scope: There are increasing opportunities to apply formal methods into existing toolchains, practices, languages, and other engineering practices that have been widely adopted.
- Linking to modern engineering practices: There are increasing opportunities to link formal methods to existing systems engineering practices, including safety cases, security cases, hazard analysis, model-based approaches, and test-plan generation for critical systems. These include harmonization with tools, repository infrastructures, and process models.
Read on for additional examples for formal methods tools and techniques.
Parsers
Parsers are software components that serve as the "trusted gatekeepers" of high-assurance software; computer programs rely on them to consume input from untrusted sources.
They take input data (e.g. text or binary from sensors) and build a data structure. The data structure is often some kind of parse tree, abstract syntax tree, or other hierarchical structure that provides a structural representation of the input while checking for correct syntax.
Typical parsers are written by hand in languages like C, with inevitable errors leading to security vulnerabilities. DARPA's Safe Documents (SafeDocs) program focused on advancing the state of the art in verification of the security of parsers and eliminating the primary source of parsing vulnerabilities.
Researchers developed a methodology and tools to capture de facto specifications of electronic data formats in machine-readable form. The process eliminated format ambiguity and reduced the syntactic complexity of document specifications to levels within reach of automated verification tools.
The methodology also created unambiguous definitions that help computers reason about document formats and used automatically generated scanners (parsers that simply check to see if a document is well formed without creating a data structure) to reject maliciousness and avoid confusion caused by ambiguity.
The program's development of verified parser construction kits was a result of a collaborative effort between big data and machine learning experts, and data format reverse engineering and exploitation experts. This interdisciplinary collaboration was essential in creating verified memory-safe parsers for commonly used electronic document formats.
The successful prototype tools developed under the SafeDocs program have paved the way for capturing and describing the de facto syntax of data formats in human-intelligible, machine-readable form. The insights gained from the development approach, measured results, and accompanying analyses all point to a primary conclusion: SafeDocs solutions should be expanded beyond documents to other file formats.
It is difficult to overstate the impact of reduced systems vulnerability. If every data format was designed with SafeDocs tools, this change alone could eliminate a significant fraction of known software vulnerabilities (roughly 80% of the software vulnerabilities in the MITRE CVE are related to parsing).
seL4 microkernel
A kernel is a piece of software that runs at the heart of any software system and controls all access to resources, including the communication between system components. The kernel is the most critical part of the software system and runs in a privileged mode.
A microkernel is compact compared to a typical kernel, performing only the basic functions universal to all computers. Designed to be integrated into different operating systems, it works with OS-specific servers that provide higher-level functions.
The secure embedded L4 (seL4) is a small and simple microkernel, at core managing a collection of partitions and the resources they require. It provides the highest assurance of isolation between applications running in its partitions, meaning that a compromise in an application running in one partition can be contained and prevented from harming other, potentially more critical applications running in other partitions.
As the first operating system (OS) kernel with some degree of formal verification, seL4 has maintained its position as the leading formally verified OS kernel after more than a decade of continued research and engineering.
Its formal verification, combined with its status as the fastest OS kernel on the ARM architecture makes seL4 the ideal basis for high-assurance systems. It demonstrates the level of fidelity and scale formal verification can achieve. It supports multiple architectures (Arm, x86-64, and RISC-V), provides deep security properties, such as integrity, confidentiality, and availability, and comes with formally verified user-level system initialization.
While an OS kernel is a crucial foundation for a high-assurance system, seL4 offers more than that for systems engineering. A simpler, real-time OS might be formally verified but would be insufficient for specific engineering tasks. But the true power of seL4 lies in its ability to scale formal analysis and verification to the much larger code bases that make up entire systems. It does so by providing strong isolation among user-level components.
This isolation means that components can be analyzed separately from one another and be composed safely.
Ultimately, seL4 provides the foundation for the soundness of highly automated analysis tools. Because of seL4, the following are all possible:
- Untrusted virtual machines can be run while being securely monitored.
- Filtering and monitoring code can be guaranteed not to have been tampered with.
- The limited communication channels assessed by analysis tools can be guaranteed to be the only ones that are available to the system's components.
Architectural Analysis & Design Languages (AADL)
Model-based system engineering (MBSE) tools and languages offer a common system design approach.
Such MBSE methods have been extended with formal languages and automation, making the development process efficient, rigorous, and repeatable. One of the key technologies that enable the power of MBSE are the system architecture modeling languages.
The Architecture Analysis and Design Language (AADL) is a prime example, capturing essential design concepts in complex systems. AADL presents the hardware and software architecture in a hierarchical format, offering a high degree of flexibility and supporting incremental development. This format means that an architecture can be refined to increase levels of detail, providing a clear path for system evolution.
AADL's architecture model includes component interfaces, connections, and execution characteristics, but not component implementations. It describes the interactions between components and their arrangement in the system, while keeping the lowest-level components as black boxes. Their implementations can be described separately using traditional programming languages, which may be included by reference in the architecture model.
AADL also provides rich semantics and precise run-time specifications, both of which are crucial for high-assurance systems.