Compile / build

From Things and Stuff Wiki
Revision as of 23:06, 18 November 2023 by Milk (talk | contribs) (→‎Build)
Jump to navigation Jump to search


Compile and link time

to resort



  • https://en.wikipedia.org/wiki/Preprocessor - a program that processes its input data to produce output that is used as input to another program. The output is said to be a preprocessed form of the input data, which is often used by some subsequent programs like compilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages.


Compiler


  • http://c2.com/cgi/wiki?TheKenThompsonHack - Ken's acceptance speech Reflections On Trusting Trust describes a hack (in every sense), the most subversive ever perpetrated, nothing less than the root password of all evil. Ken describes how he injected a virus into a compiler. Not only did his compiler know it was compiling the login function and inject a backdoor, but it also knew when it was compiling itself and injected the backdoor generator into the compiler it was creating. The source code for the compiler thereafter contains no evidence of either virus.



  • Compiler Explorer - Run compilers interactively from your web browser and interact with the assembly] - an interactive compiler. The left-hand pane shows editable C, C++, Rust, Go, D, Haskell, Swift and Pascal code. The right, the assembly output of having compiled the code with a given compiler and settings. Multiple compilers are supported, and the UI layout is configurable (thanks to GoldenLayout). There is also an ispc compiler ? for a C variant with extensions for SPMD.


  • https://en.wikipedia.org/wiki/Object_file - a file containing object code, meaning relocatable format machine code that is usually not directly executable. There are various formats for object files, and the same object code can be packaged in different object files. An object file also works like an Application Extension (.dll). In addition to the object code itself, object files may contain metadata used for linking or debugging, including: information to resolve symbolic cross-references between different modules, relocation information, stack unwinding information, comments, program symbols, debugging or profiling information.



  • https://github.com/DQNEO/HowToWriteACompiler - This repository explains how to write a compiler from scratch by Go. The compiler has some constraints; Can compile only arithmetic operations; Runs only on Linux; Outputs X86-64 assembly (GAS)

Assembler

  • https://en.wikipedia.org/wiki/Assembler_(computing) - creates object code by translating combinations of mnemonics and syntax for operations and addressing modes into their numerical equivalents. This representation typically includes an operation code ("opcode") as well as other control bits. The assembler also calculates constant expressions and resolves symbolic names for memory locations and other entities. The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications.




to sort

  • https://en.wikipedia.org/wiki/Compiler-compiler - or compiler generator is a programming tool that creates a parser, interpreter, or compiler from some form of formal description of a language and machine. The earliest and still most common form of compiler-compiler is a parser generator, whose input is a grammar (usually in BNF) of a programming language, and whose generated output is the source code of a parser often used as a component of a compiler.

The ideal compiler-compiler takes a description of a programming language and a target instruction set architecture, and automatically generates a usable compiler from them. In practice, the state of the art has yet to reach this degree of sophistication and most compiler generators are not capable of handling semantic or target architecture information.



  • https://en.wikipedia.org/wiki/Metacompiler - a software development tool used chiefly in the construction of compilers, translators, and interpreters for other programming languages. They are a subset of a specialized class of compiler writing tools called compiler-compilers that employ metaprogramming languages.




  • https://github.com/StanfordPL/stoke - a stochastic optimizer and program synthesizer for the x86-64 instruction set. STOKE uses random search to explore the extremely high-dimensional space of all possible program transformations. Although any one random transformation is unlikely to produce a code sequence that is desirable, the repeated application of millions of transformations is sufficient to produce novel and non-obvious code sequences. STOKE can be used in many different scenarios, such as optimizing code for performance or size, synthesizing an implementation from scratch or to trade accuracy of floating point computations for performance. As a superoptimizer, STOKE has been shown to outperform the code produced by general-purpose and domain-specific compilers, and in some cases expert hand-written code.

Linking


  • https://en.wikipedia.org/wiki/Relocation_(computing) - the process of assigning load addresses to various parts of a program and adjusting the code and data in the program to reflect the assigned addresses. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program. Relocation is typically done by the linker at link time, but it can also be done at run time by a relocating loader, or by the running program itself. Some architectures avoid relocation entirely by deferring address assignment to run time; this is known as zero address arithmetic.


  • https://en.wikipedia.org/wiki/Runtime_library - a set of low-level routines used by a compiler to invoke some of the behaviors of a runtime environment, by inserting calls to the runtime library into compiled executable binary. The runtime environment implements the execution model, built-in functions, and other fundamental behaviors of a programming language. During execution (run time) of that computer program, execution of those calls to the runtime library cause communication between the executable binary and the runtime environment. A runtime library often includes built-in functions for memory management or exception handling. Therefore, a runtime library is always specific to the platform and compiler.




  • https://github.com/rui314/mold - a faster drop-in replacement for existing Unix linkers. It is several times faster than LLVM lld linker, the second-fastest open-source linker which I originally created a few years ago. mold is created for increasing developer productivity by reducing build time especially in rapid debug-edit-rebuild cycles.Here is a performance comparison of GNU gold, LLVM lld, and mold for linking final debuginfo-enabled executables of major large programs on a simulated 8-core 16-threads machine.

Software

  • https://github.com/mattgodbolt/compiler-explorer - an interactive compiler. The left-hand pane shows editable C, C++, Rust, Go, D, Haskell, Swift, Pascal (and some more!) code. The right, the assembly output of having compiled the code with a given compiler and settings. Multiple compilers are supported, and the UI layout is configurable (thanks to GoldenLayout). There is also an ispc compiler ? for a C variant with extensions for SPMD.



pcc

  • https://en.wikipedia.org/wiki/Portable_C_Compiler - an early compiler for the C programming language written by Stephen C. Johnson of Bell Labs in the mid-1970s, based in part on ideas proposed by Alan Snyder in 1973, and "distributed as the C compiler by Bell Labs... with the blessing of Dennis Ritchie."

One of the first compilers that could easily be adapted to output code for different computer architectures, the compiler had a long life span. It debuted in Seventh Edition Unix and shipped with BSD Unix until the release of 4.4BSD in 1994, when it was replaced by the GNU C Compiler. It was very influential in its day, so much so that at the beginning of the 1980s, the majority of C compilers were based on it. Anders Magnusson and Peter A Jonsson restarted development of pcc in 2007, rewriting it extensively to support the C99 standard.

gcc

  • GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for these languages (libstdc++, libgcj,...). GCC was originally written as the compiler for the GNU operating system. The GNU system was developed to be 100% free software, free in the sense that it respects the user's freedom.



LLVM

  • LLVM - a collection of modular and reusable compiler and toolchain technologies. Despite its name, LLVM has little to do with traditional virtual machines, though it does provide helpful libraries that can be used to build them. The name "LLVM" itself is not an acronym; it is the full name of the project.


  • https://github.com/fundamental/stoat - a tool to assert properties about the callgraph of programs/libraries. The primary usage of this tool is to analyze programs which need to perform hard realtime operations in a portion of a mixed codebase. This is done by identifying all functions which can transitively be called from some known root function which must be realtime. If any unsafe function which could block for an arbitrary amount of time is found in this transitive closure, then an error is emitted to indicate where the improper behavior can be found and what backtrace is responsible for it getting called.


  • Emscripten is an LLVM to JavaScript compiler. It takes LLVM bitcode (which can be generated from C/C++ using Clang, or any other language that can be converted into LLVM bitcode) and compiles that into JavaScript, which can be run on the web (or anywhere else JavaScript can run).

Clang

  • http://en.wikipedia.org/wiki/Clang - compiler front end for the C, C++, Objective-C, Objective-C++, OpenCL and CUDA programming languages. It uses LLVM as its back end and has been part of the LLVM release cycle since LLVM 2.6.

It is designed to offer a complete replacement to the GNU Compiler Collection (GCC). It is open-source,[4] and its contributors include Apple, Microsoft, Google, ARM, Sony, Intel and AMD. Its source code is available under the University of Illinois/NCSA License, a permissive free software licence.

Other

  • Parrot is a virtual machine designed to efficiently compile and execute bytecode for dynamic languages. Parrot currently hosts a variety of language implementations in various stages of completion, including Tcl, Javascript, Ruby, Lua, Scheme, PHP, Python, Perl 6, APL, and a .NET bytecode translator. Parrot is not about parrots, though we are rather fond of them for obvious reasons.







  • https://en.wikipedia.org/wiki/Yacc - a computer program for the Unix operating system. It is a LALR parser generator, generating a parser, the part of a compiler that tries to make syntactic sense of the source code, specifically a LALR parser, based on an analytic grammar written in a notation similar to BNF. Yacc itself used to be available as the default parser generator on most Unix systems, though it has since been supplanted as the default by more recent, largely compatible, programs.






  • https://github.com/imatix/gsl GSL/4.1 is a code construction tool. It will generate code in all languages and for all purposes. If this sounds too good to be true, welcome to 1996, when we invented these techniques. Magic is simply technology that is twenty years ahead of its time. In addition to code construction, GSL has been used to generate database schema definitions, user interfaces, reports, system administration tools and much more. [11]





icecream

  • https://github.com/icecc/icecream - created by SUSE based on distcc. Like distcc, Icecream takes compile jobs from a build and distributes it among remote machines allowing a parallel build. But unlike distcc, Icecream uses a central server that dynamically schedules the compile jobs to the fastest free server. This advantage pays off mostly for shared computers, if you're the only user on x machines, you have full control over them.

Cross-compilation

  • https://en.wikipedia.org/wiki/Cross_compiler#Canadian_Cross - a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a Windows 7 PC but generates code that runs on Android smartphone is a cross compiler. A cross compiler is necessary to compile code for multiple platforms from one development host. Direct compilation on the target platform might be infeasible, for example on a microcontroller of an embedded system, because those systems contain no operating system. In paravirtualization, one computer runs multiple operating systems and a cross compiler could generate an executable for each of them from one main source. Cross compilers are distinct from source-to-source compilers. A cross compiler is for cross-platform software development of machine code, while a source-to-source compiler translates from one programming language to another in text code. Both are programming tools.


distcc

  • distcc - a program to distribute builds of C, C++, Objective C or Objective C++ code across several machines on a network. distcc should always generate the same results as a local build, is simple to install and use, and is usually much faster than a local compile.distcc does not require all machines to share a filesystem, have synchronized clocks, or to have the same libraries or header files installed. They can even have different processors or operating systems, if cross-compilers are installed.

Icecream

  • Using Icecream - Mozilla | MDN - a tool used to create a pool of machines to distribute compilation initiated on one machine across other machines with idle cycles in the pool. It can significantly speed up your macOS or Linux builds. DistCC vs. IceCCBoth DistCC and IceCC are distributed compilers. They require similar tool chain on all target machines. However, IceCC has a built-in mechanism for distributing a custom toolchain on demand (in a chroot jail), which means that unlike DistCC (which requires careful management and setup of a toolchain) with IceCC each client can use their own toolchain without any setup. IceCC also has a scheduler so that it will only use idle cylces on other machines in the pool, avoiding overloading other people's machines.

XMLVM

  • XMLVM - flexible and extensible cross-compiler toolchain. Instead of cross-compiling on a source code level, XMLVM cross-compiles byte code instructions from Sun Microsystem's virtual machine and Microsoft's Common Language Runtime. The benefit of this approach is that byte code instructions are easier to cross-compile and the difficult parsing of a high-level programming language is left to a regular compiler. In XMLVM, byte code-based programs are represented as XML documents. This allows manipulation and translation of XMLVM-based programs using advanced XML technologies such as XSLT, XQuery, and XPath.

crosstool-NG

  • crosstool-NG - aims at building toolchains. Toolchains are an essential component in a software development project. It will compile, assemble and link the code that is being developed. Some pieces of the toolchain will eventually end up in the resulting binaries: static libraries are but an example.So, a toolchain is a very sensitive piece of software, as any bug in one of the components, or a poorly configured component, can lead to execution problems, ranging from poor performance, to applications ending unexpectedly, to misbehaving software (which more than often is hard to detect), to hardware damage, or even to human risks (which is more than regrettable).

Ccache

  • Ccache - or “ccache”, is a compiler cache. It speeds up recompilation by caching previous compilations and detecting when the same compilation is being done again. Ccache is free software, released under the GNU General Public License version 3 or later. See also the license page.


sccache

  • sccache - Ted's Mozilla Blog » Blog Archive » sccache, Mozilla’s distributed compiler cache, now written in Rust

Program synthesis

  • https://en.wikipedia.org/wiki/Program_synthesis - the task to automatically construct a program that satisfies a given high-level specification. In contrast to other automatic programming techniques, the specifications are usually non-algorithmic statements of an appropriate logical calculus. Often, program synthesis employs techniques from formal verification.


Build



  • Start-up Tools - Start-ups are always short on time: here are some tools that can save you time and avoid reinventing the wheel. This list is focused on technical tools to save development time.



  • Reproducible Builds - a set of software development practices that create an independently-verifiable path from source to binary code.
  • https://en.wikipedia.org/wiki/Reproducible_builds - also known as deterministic compilation, is a process of compiling software which ensures the resulting binary code can be reproduced. Source code compiled using deterministic compilation will always output the same binary.

Reproducible builds can act as part of a chain of trust; the source code can be signed, and deterministic compilation can prove that the binary was compiled from trusted source code. Verified reproducible builds provide a strong countermeasure against attacks where binaries do not match their source code, e.g., because an attacker has inserted malicious code into a binary. This is a relevant attack; attackers sometimes attack binaries but not the source code, e.g., because they can only change the distributed binary or to evade detection since it is the source code that developers normally review and modify. In a survey of 17 experts, reproducible builds had a very high utility rating from 58.8% participants, but also a high-cost rating from 70.6%. Various efforts are being made to modify software development tools to reduce these costs.


  • https://github.com/bmwiedemann/theunreproduciblepackage - is meant as a practical way to demonstrate the various ways that software can break reproducible builds using just low level primitives without requiring external existing programs that implement these primitives themselves. It is structured so that one subdirectory demonstrates one class of issues in some variants observed in the wild.


GNU toolchain








  • Jobserver Implementation - describes the GNU make “jobserver” implementation: it is meant mainly for people who want to understand the GNU make jobserver, but also for those interesting in a traditionally hairy problem in UNIX programming: how to wait for two different types of events (signals and file descriptors) at the same time. [14]


  • Why Not Make? - "Make is a conceptually simple build-system, and it is packaged with major Linux distributions. However, Make should rarely be your build-system of choice when starting a new C++ project.That’s a bold claim, so here’s why…" [15]



CMake

  • CMake - an open-source, cross-platform family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment of your choice. The suite of CMake tools were created by Kitware in response to the need for a powerful, cross-platform build environment for open-source projects such as ITK and VTK.




  • An Introduction to Modern CMake - People love to hate build systems. Just watch the talks from CppCon17 to see examples of developers making the state of build systems the brunt of jokes. This raises the question: Why? Certainly there are no shortage of problems when building. But I think that, in 2020, we have a very good solution to quite a few of those problems. It's CMake. Not CMake 2.8 though; that was released before C++11 even existed! Nor the horrible examples out there for CMake (even those posted on KitWare's own tutorials list). I'm talking about Modern CMake. CMake 3.1+, maybe even CMake 3.16+! It's clean, powerful, and elegant, so you can spend most of your time coding, not adding lines to an unreadable, unmaintainable Make (Or CMake 2) file. And CMake 3.11+ is supposed to be significantly faster, as well! [18]

tup

  • tup - a file-based build system for Linux, OSX, and Windows. It inputs a list of file changes and a directed acyclic graph (DAG), then processes the DAG to execute the appropriate commands required to update dependent files. Updates are performed with very little overhead since tup implements powerful build algorithms to avoid doing unnecessary work. This means you can stay focused on your project rather than on your build system.


Qbs

Waf

  • The Waf framework - somewhat different from traditional build systems in the sense that it does not provide support for a specific language. Rather, the focus is to support the major usecases encountered when working on a software project. As such, it is essentially a library of components that are suitable for use in a build system, with an emphasis on extensibility. Although the default distribution contains various plugins for several programming languages and different tools (c, d, ocaml, java, etc), it is by no means a frozen product. Creating new extensions is both a standard and a recommended practice.


GYP

  • GYP - Generate Your Projects. a Meta-Build system: a build system that generates other build systems. GYP is intended to support large projects that need to be built on multiple platforms (e.g., Mac, Windows, Linux), and where it is important that the project can be built using the IDEs that are popular on each platform as if the project is a “native” one. It can be used to generate XCode projects, Visual Studio projects, Ninja build files, and Makefiles. In each case GYP's goal is to replicate as closely as possible the way one would set up a native build of the project using the IDE. GYP can also be used to generate “hybrid” projects that provide the IDE scaffolding for a nice user experience but call out to Ninja to do the actual building (which is usually much faster than the native build systems of the IDEs).


Ninja

  • Ninja - a small build system with a focus on speed. It differs from other build systems in two major respects: it is designed to have its input files generated by a higher-level build system, and it is designed to run builds as fast as possible.


Meson

  • Meson - an open source build system meant to be both extremely fast, and, even more importantly, as user friendly as possible.The main design point of Meson is that every moment a developer spends writing or debugging build definitions is a second wasted. So is every second spent waiting for the build system to actually start compiling code.


FASTBuild

  • FASTBuild - a high performance, open-source build system supporting highly scalable compilation, caching and network distribution. Take a look at the development status and detailed feature breakdown for more information.


xmake

redo

  • redo - a competitor to the long-lived, but sadly imperfect, make program. There are many such competitors, because many people over the years have been dissatisfied with make's limitations. However, of all the replacements I've seen, only redo captures the essential simplicity and flexibility of make, while avoiding its flaws. To my great surprise, it manages to do this while being simultaneously simpler than make, more flexible than make, and more powerful than make.

samurai