5 Known Problems and Workarounds
- 1 - 1. Introduction These notes describe the Base Compiler Development portion (compiler_dev) of the 5.2 IRIS Development Option from Silicon Graphics, Inc. They include discussion of compiler tools, header files, libraries, dynamic shared objects, and KPIC directives. Note: Packaged with the IRIS Development Option software is a separate sheet that contains the Software License Agreement. This software is provided to you solely under the terms and conditions of the Software License Agreement. Please take a few moments to review the Agreement. This document contains the following chapters: 1. Introduction 2. Installation Information 3. Changes and Additions 4. Bug Fixes 5. Known Problems and Workarounds In addition, Appendix A discusses dynamically shared objects (DSOs). 1.1 Release_Identification_Information Following is the release identification information for the Base Compiler Development portion (compiler_dev) of the 5.2 IRIS Development Option: Software Product Compiler_dev Version 3.18 Product Code SC4-IDO-5.2 System Software Requirements IRIX 5.2 or later 1.2 Online_Release_Notes After you install the online documentation for a product (the relnotes subsystem), you can view the release notes on your screen. - 2 - If you have a graphics system, select ``Release Notes'' from the Tools submenu of the Toolchest. This displays the grelnotes(1) graphical browser for the online release notes. Refer to the grelnotes(1) man page for information on options to this command. If you have a nongraphics system, you can use the relnotes command. Refer to the relnotes(1) man page for accessing the online release notes. 1.3 Product_Support Silicon Graphics, Inc., provides a comprehensive product support maintenance program for its products. If you are in the U.S. or Canada and would like support for your Silicon Graphics-supported products, contact the Technical Assistance Center at 1-800-800-4SGI. If you are outside these areas, contact the Silicon Graphics subsidiary or authorized distributor in your country. - 1 - 2. Installation_Information The IRIS Software Installation Guide fully documents the process for installing the Base Compiler Development software. In addition, each compiler has its own set of release notes that describes product-specific installation information. 2.1 3.18_Base_Compiler_Development_Subsystems The 3.18 Base Compiler Development software (compiler_dev) includes these subsystems: compiler_dev.books Base compiler books compiler_dev.books.dbx Base compiler dbx User's Guide compiler_dev.hdr Base compiler headers compiler_dev.hdr.internal Base compiler internal headers compiler_dev.hdr.lib Base compiler environment headers compiler_dev.man.base Base compiler components man pages compiler_dev.man.ld Base compiler loader man pages compiler_dev.man.perf Base compiler performance man pages compiler_dev.man.util Base compiler utility man pages compiler_dev.sw Base compiler software compiler_dev.sw.abi Base compiler ABI software compiler_dev.sw.base Base compiler components compiler_dev.sw.ld Base compiler loader compiler_dev.sw.perf Base compiler performance tools compiler_dev.sw.util Base compiler utilities compiler_dev.man.dbx dbx manual page compiler_dev.man.lib Development environment manual pages - 2 - compiler_dev.man.relnotes These release notes compiler_dev.sw.dbx dbx debugger compiler_dev.sw.lib Development libraries 2.1.1 Subsystem_Disk_Space_Requirements This section lists the compiler_dev subsystems (and their sizes). If you are installing this software for the first time, the subsystems marked ``default'' are those selected for installation automatically. They will be installed when you give the go command unless you explicitly request (with the keep command) that they not be installed. Those marked ``miniroot'' must be installed from the miniroot. Note: The listed subsystem sizes are approximate. Refer to the IRIS Software Installation Guide for information on finding exact sizes. - 3 - Subsystem Name Subsystem Size (512-byte blocks) compiler_dev.hdr.internal 455 compiler_dev.man.base (default) 48 compiler_dev.man.ld (default) 50 compiler_dev.man.perf (default) 53 compiler_dev.man.util (default) 82 compiler_dev.sw.base (default) 8415 compiler_dev.sw.ld (default) 1549 compiler_dev.sw.perf (default) 2250 compiler_dev.sw.util (default) 2684 compiler_dev.hdr.lib (default) 2691 compiler_dev.man.dbx (default) 95 compiler_dev.man.lib (default) 5405 compiler_dev.man.relnotes (default) 147 compiler_dev.sw.dbx (default) 1656 compiler_dev.sw.lib (default) 12299 - 1 - 3. Changes_and_Additions The features in this chapter are new or significantly changed in the Base Compiler Development software since the IRIX 4.0.5 Maintenance release. Except as noted, changes apply to all versions. 3.1 Compiler_System This section lists changes and additions to compilers and development tools since the IRIX 4.0.5 Maintenance release. 3.1.1 Obsoleting_libmld libmld, either in the form of an archive or a DSO, will not be released or supported in future releases. If you use functions in the existing libmld library, contact the Technical Assistance Center for details concerning the migration of your libmld function calls in your existing source code to other calls, probably in libraries such as libelf, the ELF object-file support library, and libraries containing symbol table manipulation routines. 3.1.2 Dynamic_Linking_and_DSOs In earlier versions of IRIX (pre-5.0), executables were only statically linked. This means that all references must be resolved (and their addresses fixed) at link time (by ld(1)). In this release, such programs, although they might use pre-5.0 shared libraries (which are referred to now as static shared libraries) are referred to as non-shared. They are produced by compiling and linking with the -non_shared option. The code so created is not position-independent (PIC). In 5.0 and later IRIX releases, in addition to being statically linked by ld(1), programs are, by default, compiled as PIC code and dynamically linked, that is, part of the program may be relocated dynamically at run time. There are two types of dynamically linked objects: o The executable itself. This consists of your main program and PIC code extracted from all archive libraries linked with it. Code within the executable is not relocated at run time, but some of its references will be. The executable is linked -call_shared. o External sharable dynamically linked objects called dynamic shared objects (DSOs), which are not part of the executable itself. DSOs and their references may be dynamically relocated at run time. DSOs are linked -shared. DSOs by convention have the extension .so. A DSO may be shared by several users and/or programs, - 2 - possibly at different addresses. You cannot mix non-shared objects and PIC objects in the same executable. On this and future release, static shared libraries are supported only for the use of existing (pre-5.0) executables that reference them. You can neither create new static shared libraries nor link new code with existing static shared libraries. PIC code satisfies references indirectly by using a Global Offset Table (GOT), which allows code to be relocated simply by updating the GOT. Your executable has one GOT, and each DSO it uses has one GOT. When a dynamically linked executable is started, the runtime linker, rld(1), is invoked to prepare the program for execution. This preparation involves: o Filling in certain global values. o Relocating any dynamic shared objects (DSOs) that your program references. o Resolving data symbols in DSOs that were unresolved at static link time by ld(1). With very few exceptions, all executable objects in this release are dynamically linked. A new component, the runtime linker /lib/rld, and all standard DSOs (file extension .so) are necessary for programs to execute. More information about these types of objects appears in Appendix A, ``Frequently Asked Questions about DSOs,'' and in the IRIX System Programming Guide. 3.1.3 Object_File_Format_Changes The compiler tools and the link editor now produce ELF format objects and executables by default. DSO is supported only in ELF executables and object files. COFF files are run on IRIX 5.0 and later releases with the IRIX 4.0.5 ABI, and ELF files are run with the IRIX 5.0 and later ABI; hence, the linker refuses to mix (pre-5.0) COFF and ELF objects. Two new header files are associated with ELF objects: /usr/include/elf.h contains definitions that are generic to all implementations. /usr/include/sys/elf.h contains definitions specific to the MIPS architecture. See the System V Application Binary Interface and System V Application Binary Interface MIPS Processor Supplement, - 3 - published by Prentice Hall. A new object file reader, elfdump(1), is associated with ELF format files. This program is known on some other SVR4- compliant systems as dump. 3.1.4 ABI_Development For information about ABI development issues, see the man pages abicc(1), abild(1), check_abi_compliance, check_abi_interface and check_for_syscalls. 3.1.5 Versioning_of_Shared_Objects In the 5.0.1 release, a mechanism for the versioning of shared objects was introduced for SGI-specific shared objects and executables. Note that this mechanism is outside the scope of the ABI, and, thus, must not be relied on for code that must be ABI- compliant and run on non-SGI platforms. Currently, all executables produced on SGI systems are marked SGI_ONLY, which allows use of the versioning mechanism. Versioning allows the creator of a shared object to update it in a way that may be incompatible with executables previously linked against the shared object. This is accomplished by renaming the original shared object and providing it along with the (incompatible) new version. Versioning is mainly of interest only to developers of shared objects. It may not be of interest to you if you simply use shared objects. 3.1.5.1 What_Is_a_Version? A version is part or all of an identifying version_string that can be associated with a shared object by using the -set_version version_string option to ld(1) when the shared object is created. A version_string consists of one or more versions separated by colons (:). A single version has the form:sgi . where is a comment string, which is ignored by the versioning mechanism. It consists of any sequence of characters followed by a #. sgi is the literal string sgi. is the major version number, which is a string of digits [0-9]. - 4 - . a literal period. is the minor version number, which is a string of digits [0-9]. Here is what to do when building your shared library: o When you first build your shared library, give it an initial version, say sgi1.0. Thus, add the option -set_version sgi1.0 to the command to build your shared library (cc -shared, ld -shared). o Whenever you make a compatible change to the shared object, create another version by changing the minor version number, for example, sgi1.1, and add it to the end of the version_string. The command to set the version of the shared library might now look like -set_version "sgi1.0:sgi1.1" . o When you make an incompatible change to the shared object: - Change the filename of the old shared object by adding a dot followed by the major number of one of the versions to the filename of the shared object. Do not change the soname of the shared object or its contents. Simply rename the file. - Update the major version number and set the version_string of the shared object when you create it to this new version, for example, -set_version sgi2.0. Here is how this versioning mechanism affects executables: o When an executable is linked against a shared object, the last version in the shared object's version_string is recorded in the executable as part of the liblist. This can be examined by elfdump -Dl. o When you run an executable, rld looks for the proper filename in its usual search routine. o If a file with the correct name is found, the version specified in the executable for this shared object is compared to each of the versions in the version_string in the shared object. If one of the versions in the version_string matches the executable's version exactly (ignoring comments), then that library is used. - 5 - o If no proper match is found, a new filename for the shared object is built by taking the soname specified in the executable for this shared object and the major number found in the version specified in the executable for this shared object, and putting them together as soname.major. (Remember that you did not change the soname of the object, only the filename.) The new file is searched for using rld's usual search procedure. 3.1.5.2 Example: Suppose you have a shared object foo.so with initial version sgi10.0. Over time, you make two compatible changes for foo.so, which result in the following final version_string for foo.so: initial version #sgi10.0: upgrade I/O#sgi10.1:new devices#sgi10.2 You then link an executable that uses this shared object, useoldfoo. This executable specifies version sgi10.2 for soname foo.so. (Remember that the executable inherits the last version in the version_string of the shared object.) The time comes to upgrade foo.so in an incompatible way. Note that the major version of foo.so is 10, so you move the existing foo.so to the filename foo.so.10 and create a new foo.so with the version_string: efficient interfaces #sgi11.0 New executables linked with foo.so use it directly. Older executables, like useoldfoo, attempt to use foo.so, but find that its version (sgi11.0) is not the version they need (sgi10.2). They then attempt to find a foo.so in the filename foo.so.10 with version sgi10.2. 3.1.6 Runtime_Link_Editor_rld(1) and libdl o rld is a new program that is invoked when running a dynamic executable. It maps in shared objects used by this executable, resolves relocations as ld does at static link time, and allocates common if required. rld is mapped in at program startup time by the kernel. Its path is /lib/rld, but you can change it with the _RLD_PATH environment variable. There are two versions: rld and rld.debug. The first is faster, the second provides debugginh support. Both are described on the rld(1) man page. o Options to rld can be specified via the _RLD_ARGS environment variable. It is possible to replace libraries without recompiling, get extra information - 6 - from the runtime linker, and alter some of the dynamic linking semantics by specifying arguments in this way. See the manual page rld(1) for details. o The functionality previously available in /usr/lib/libdl.so, a user interface to the dynamic linker for manipulating the shared objects used by a dynamic executable, is now part of libc.so.1. Specifically, this includes the function calls dlopen(3), dlclose(3), dlsym(3), and dlerror(3). The following change in rld(1) was made in the 5.0.1 release of IRIX: o In release 5.0, rld zeroed the stack space it had used before invoking the main program. As of release 5.0.1, it no longer zeroes this space. If your program had a bug that relied on an uninitialized automatic variable being zero, the bug may be uncovered by this rld change. If you suspect this to be the case, the previous behavior (rld clearing its used stack space at exit) can be obtained temporarily by adding the option -clearstack to the environment variable _RLD_ARGS when you run the program. However, do not rely on this mechanism; there is no guarantee that the stack space your program is relying on being zero will not be dirtied by other startup code in future releases. The buggy behavior in your program must be corrected. Note that these problems most often will occur relatively early in the call graph of your program. The following change was made to functions in libdl in the 5.0.1 release: o In the 5.0 release, when a shared object was opened via dlopen(3x), its symbols became globally visible. This behavior has been changed to be consistent with SVR4. As of the 5.0.1 release, objects loaded by one invocation of dlopen may not directly reference symbols from objects loaded by a different dlopen invocation. Those symbols may, however, be referenced indirectly using dlsym(3x). See the NOTES section of the dlopen(3x) manual page for further information. 3.1.7 Changes_to_dbx(1) o In 5.0.1 and later, you can set the variable $assumenormalframe to decrease the time dbx takes to produce a stack trace (by the where command), by using: - 7 - set $assumenormalframe=1 This variable should be set to zero (the default) when requesting a stack trace if you are stopped in the function prologue. o Two new commands in dbx(1) deal with shared objects: listobj and whichobj. There are three new printing commands: printo, printx, and printd. These print in octal, hexadecimal, and decimal, respectively. Command-line editing similar to that available in emacs(1) is now available in dbx. See /usr/lib/dbx.help for details on these new commands. o The dbx help system has been enhanced. o The -f and -F options to dbx have been removed. The readsyms and readglobals commands have been removed. dbx now always does fast startup (the -f option) so these options and commands are no longer needed. 3.1.8 Archiver_ar(1) The default format for the archive symbol table has been changed. The default is now the same as ar E and produces an SVR4-compatible symbol table. If you want to produce the old symbol table format, use ar C. 3.1.9 Link_Editor_ld(1) The following changes have been made to the linker ld(1): o As of release 5.0.1, the linker can adjust executables to avoid certain problems with early versions of the R4000. If the -no_jump_at_eop flag is on (it is on by default), small amounts of padding are added between component objects to avoid placing a branch instruction at the end of a page. Slightly smaller executables and significantly faster executables can result by turning this option off (using the -allow_jump_at_eop flag). Binaries built either way should be compatible across all Silicon Graphics systems, but those made with -no_jump_at_eop (the default) often show performance gains on R4000 systems. o New options have been added to ld(1) for aligning variables in the global uninitialized data area (bss). See the manual page for ld(1) for options with names beginning with -X. These new options are unique to - 8 - IRIX and might change across releases. o The default object and executable file format has been changed to ELF. Under no circumstances can you link together ELF and (old) COFF objects. o Static shared libraries are replaced by dynamic shared objects. The linker no longer supports linking with static shared libraries. However, existing executables linked with static shared libraries continue to work. o By default, the linker reports all undefined and unresolved symbols and exits with non-zero status. However, for shared linking, it is possible to allow unresolved symbols at static link time and rely on the runtime linker to complete the resolution at run time. If you specify -ignore_unresolved, the linker does not consider unresolved symbols to be errors. This option is turned on by the driver if the environment variable SGI_SVR4 is set. o The linker now reports a maximum of 50 warnings messages. If you want all warning messages to be printed, specify -wall. o The following new flags are related to DSO support. Please refer to the manual page for details: -B symbolic, -non_shared, -call_shared (default), -shared, -all, -exclude, -no_archive, -transitive_link (default) -check_registry, -update_registry, -set_version, -ignore_unresolved (default), -no_unresolved, -no_library_replacement, -soname, -delay_load, and -export. 3.1.10 Optimizer_(uopt(5)) New optimizations and improvements to existing optimizations have been added to uopt. o -strictIEEE The optimizer performs some floating point expression simplification in the presence of floating point constants, which can cause different behavior in programs that rely on strict adherence to the IEEE floating point standard. An example is the substitution of zero for multiplication by zero. This flag suppresses such optimizations. o -Wo,-nomultibbunroll - 9 - The optimizer now unrolls loops whose bodies contain branches (that is, loop bodies made up of multiple basic blocks). This internal optimizer flag suppresses such unrolls. o -noinline This option disables the inlining operation performed by umerge under -O3. This flag is not meaningful if -O3 is not specified. o -inline_to The default value of this parameter is 0. A positive value of this parameter asks umerge to perform additional inlining of calls to leaf routines up to the specified level, in addition to its automatic decision mechanism. A value of 1 causes all calls to leaf procedures to be inlined. A value of 2 additionally causes all calls to procedures that became leaves due to level 1 inlining to be inlined, etc. Under this option, a procedure becomes a leaf in the inlined output code if and only if the procedure's maximum distance from a leaf in the call graph is less than or equal to the value of this parameter. This option is not affected by the -noinline option and is meaningful only if -O3 is not specified. o -nokpicopt This option tells uopt not to perform the special optimization for accesses of global variables when compiling shared. (-kpicopt is the default for shared compilations) o -kpicopt This option tells uopt to perform the special optimization for accesses of global variables that are not gp-relative whether compiling shared or non-shared. (-nokpicopt is the default for non-shared compilations; however, some programs, particularly if compiled -G 0, might benefit from this optimization even if compiled -non_shared.) 3.1.11 Assembler_(as(1)) o Several new assembler directives are added to support generation of PIC (Position-Independent Code). You should also become familiar with the MIPS ABI Supplement and the PIC coding model it describes. See - 10 - Section 3.2, ``KPIC Directives.'' o The assembler generates ELF object file format. Whether the resulting object is PIC depends on whether an .option pic0 or .option pic2 directive appears in the assembler file and on command-line arguments. (The directive appearing in the .s file takes precedence.) In the .option directive, pic0 indicates non-PIC, and pic2 indicates PIC code. PIC code can also be specified on the command line (in the absence of an .option directive) by the switch -KPIC. If no .option is present in the assembler file and -KPIC does not appear on the command line, the default is non-PIC. o A number of new optimizations have been added to the assembler. They are invoked automatically at optimization level 2 (-O2) and above. See the as man page for more information about -peep, -swpipe, and -symregs. o Cross basic-block scheduling is now enabled by default at optimization levels 2 and above. It can be disabled with the -Wb,-noxbb option. This optimization moves instructions from one basic block to another to allow for better scheduling. o Since the last release, enhancements have been made in the software pipelining and peephole optimizations in the assembler. 3.1.12 Libraries The following changes to the libraries that are part of the compiler system were made in the 5.0.1 release. o The exception handling library, libexc.so, has been changed to allow for correct handling of exceptions in Ada code and for the correct functioning of non-local GOTOs in Pascal code. Previous to this release, non- local GOTOs appearing in Pascal code in a shared object did not function correctly. Due to implementation changes in the handling of non-local GOTOs necessary to correct this problem, all Pascal code, whether in a shared object or not, should be compiled and relinked in 5.0.1 and later. If you are certain that none of your Pascal code uses non-local GOTOs, you can ignore this requirement. o With the 5.0.1 and later releases, C++ code is linked by default with the new shared object libC.so, which is a shared version of libC.a. See the C++ release notes for further information. - 11 - 3.1.13 Performance_Tools This section includes changes to pixie(1), pixstats(1), and prof(1). It also includes a detailed note (with an example) on using these tools with DSOs. o The program cord(1) is not provided in this release. o These tools will not work on executables produced on IRIX 4 systems. For IRIX 4 functionality, you should invoke the IRIX 4 pixie,prof, and pixstats explicitly. They will not be run automatically under the IRIX compatibility mode. The following changes to pixie have occurred in the 3.18 Base Compiler Development release. See the pixie(1) manual page for more information. o pixie no longer produces a .Addrs file. This information is now contained in a section called ``.MIPS.Addrs'' in the instrumented object. o During runtime, there will be only one .Counts generated per thread. Previously, there was one .Counts file for every DSO and main in a thread. Multiple .Counts files occur when forks and multiprocessing calls occur. o pixie now instruments automatically all shared libraries in the program's internal liblist. This means that for most shared programs, you only need to invoke pixie on the main executable. o The old -o has been renamed -pixie_file. The -pixie_file option allows the user to rename the the instrumented output executable. The default is to name the output file the same as the input file with the suffix .pixie added. o The option previously named -bbcounts has been renamed -counts_file. The -counts_file option allows the user to rename the the output counts file. The default is to name the output file the same as the input file with the suffix .Counts added. o The -branchcounts option is now default. See the description of the -branchcounts option below. o -verbose permits printing most pixie transformation messages. - 12 - o The new -liblist option causes pixie to write out a list of dependent dynamic shared libraries to a file with the same base name as the main executable with .liblist as the suffix. This has no effect when used on libraries or non-shared programs. The commmand: pixie -liblist my_prog generates a file my_prog.liblist. o The new -autopixie option tells pixie to instrument all dependent dynamic shared libraries recursively. This has no effect when used on libraries or non-shared programs. -autopixie is on by default. o When the new -longbranch option is used, pixie transforms branches into jumps. This should only be used when pixie complains about branches out of range. A branch can become out of range because pixie inserts code into the executable in order to perform the runtime performance data gathering and branches previously within range become out of range. In addition, the following changes of note have occurred in pixie in recent 5.x releases. o When instrumenting a shared library, the text segment could grow to overlap with the data segment. In the current implementation, the data segment is moved to a higher region in the virtual address space to avoid this conflict. For the main program, if the user compiled it with the -ld option to specify the text and data address, it is the user's responsibility to leave enough space in the data segment to have it instrumented properly. Otherwise, pixie will generate an error message. o Signal handling is done by intercepting the ksigaction system call at runtime and instrumenting the sigreturn system call at when pixie is run. Pixie image register values not saved in the sigcontext structure are thus saved and restored. o The -branchcounts option causes pixie to add more counting code so the instrumented program produces specific information on branch use. pixstats automatically understands the new information. Specifically, information is produced for the following events: - 13 - - Branch to branch taken - Branch to branch untaken - Untaken conditional branches - Taken conditional branches - Taken conditional branches with branch nops - Untaken conditional branches with branch nops - Direction-predicted conditional branches with branch nops - Non-sequential fetches - Taken branches per conditional branch - Forward taken branches per conditional branch - Forward untaken branches per conditional branch - Backward taken branches per conditional branch - Backward untaken branches per conditional branch o The -pids option tells pixie to append the process ID number on the end of the .Counts name. This is handy if you want to run the program instrumented with pixie through a variety of tests before generating the statistics with pixstats. This option should be used with the -pids option to pixstats, which is available on the 5.0.1 and later releases. o -threeway may be used on the 5.0.1 and later releases to suppress pixie transformations on threeway transfers (low-level graphics hardware access). If you are instrumenting libgl.so with pixie on a system that has VGX, GTX or Reality Engine graphics, your program may use this special mechanism for some graphics operations. If you experience problems running your instrumented graphics application on these systems (problems usually result in the graphics simply being black), re-instrument your libgl.so with the correct -threeway option. Use -threeway 3000 for RealityEngine systems and -threeway 6000 for VGX and GTX systems. o -quiet was added in 5.0.1 to suppress most pixie transformation messages. - 14 - o -table can be used in 5.0.1 and later releases to cause pixie to write a copy of its translation table to the stdout device. The translation table is a map of the original addresses to the instrumented addresses. o Static shared libraries are no longer supported. o -oldtrace is no longer supported. o Several options to pixie meant for internal use only are no longer available. These are: - -get_shared_data - -calculate_registers - -sharedlib The following changes to pixstats(1) have been made in the 5.x release. See the manual page for more information. o -excludelibs tells pixstats to ignore statistics from libraries. By default, pixstats outputs statistics that include all libraries. o -pids ... tells pixstats to combine the statistics found in .Counts. , .Counts. , etc., in its output. If your program uses sproc(2), fork(2), is compiled with Power Fortran or Power C, or you used the -pids option when you instrumented it, the .Counts file resulting from its execution will be placed in .Counts. , and you must use pixstats -pids to process it. o The .Counts and .Addrs files generated by 4.0.5 pixie are no longer supported. You cannot use old versions of these files with the performance tools on this release. o pixstats now looks at the file header to choose the timing table. If the file header indicates: MIPS3 r4000 timing is used MIPS2 r6000 timing is used MIPS1 r2000 timing is used o -disassemble disassembles basic blocks with zero counts. The old behavior can be produced with -dislimit 1. - 15 - o -source or -S option has been added to provide source listing with disassembly. o -mips2 has been added as a synonym for -r6000. o -mips3 has been added as a synonym for -r4000. 3.1.13.1 Using_pixie(1)_and_pixstats(1)_with_DSOs DSOs can be instrumented for basic block counting. All shared libraries used by an instrumented executable must also be instrumented. 3.1.13.1.1 Example: Instrument a Program with Shared Libraries To run a program instrumented with pixie, you must instrument all the dependent DSOs. pixie will now instrument the main program and the needed libraries automatically: pixie my_prog Or, you can instrument each one individually: pixie -noautopixie my_prog pixie lib1 pixie lib2 : pixie libn pixie tells you which libraries need to be instrumented if you use the -liblist option. With this option, pixie produces a file named my_prog.liblist that contains the names of the needed dynamic shared libraries with their full paths. This is convenient if you wish to build a dependency list for a makefile or shell script. For example: pixie -liblist -noautopixie my_prog foreach lib (`cat my_prog.liblist`) pixie $lib end WARNING: during static instrumenting, pixie cannot detect accurately dynamic shared libraries that are with calls to dlopen(). rld will detect that the main program has been instrumented and will append .pixie to the name of any file to be opened with dlopen(). However, you then still need to instrument these libraries. The runtime linker (rld) needs to know where the instrumented libraries are. Set the environment variable LD_LIBRARY_PATH to the directory where you keep the libraries or put the instrumented libraries in the current - 16 - default search path for rld. setenv LD_LIBRARY_PATH `pwd` or setenv LD_LIBRARY_PATH . tells rld to look in the current directory. You could just as easily put all of your instrumented libraries in a single directory and set LD_LIBRARY_PATH to that path. Just remember that to profile the program, both pixstats and prof need either the original or a link to: o The original DSOs and a.out o The instrumented DSOs and a.out o The .Counts files that were produced by running the instrumented program You can gather statistics for the whole program or a specific DSO: pixstats gives the statistics (including DSOs). pixstats -excludelibs gives the statistics (excluding DSOs). pixstats gives the statistics of a DSO. 3.1.13.1.2 Example: Instrument a Program That Uses Multiple DSOs 1. Run pixie on the program to instrument both the the main program and the shared libraries it depends on: pixie my_prog 2. Run the program to completion: my_prog.pixie file1 file2 There should now be one .Counts file, myprog.Counts. The .Counts file was created when the application ran. 3. Run pixstats to generate the statistics: pixstats my_prog > my_prog.stat - 17 - 3.1.13.1.3 Example:_Instrument_an_MP_Program In this example, you instrument a Fortran Multiprocessing Program. 1. Compile a MP Fortran program: f77 -o myprog -mp myprog.f 2. Instrument the program and its libraries: pixie myprog 3. Run the program to completion: setenv LD_LIBRARY_PATH . myprog.pixie There should be one .Counts file per thread per DSO. For example running myprog.pixie with four threads: myprog.Counts.1001, myprog.Counts.1002, myprog.Counts.1003, myprog.Counts.1004, . . 4. Analyze the output in one of the following ways: o To analyze each of the threads: pixstats myprog myprog.Count.1001 pixstats myprog myprog.Count.1002 pixstats myprog myprog.Count.1003 pixstats myprog myprog.Count.1004 o To analyze the sum of the threads: pixstats myprog myprog.Counts.* o To analyze the sum of the threads excluding all libraries: pixstats myprog -dso myprog o To analyze a thread using prof: prof -pixie myprog myprog.Counts.1004 o To analyze all threads together using prof: prof -pixie myprog.Counts.* - 18 - 3.2 KPIC_Directives PIC code is generated if either the directive .option pic2 appears in the assembler file or the assembler (as(1)) is invoked with -KPIC in the absence of an explicit .option pic0 or .option pic2 in the assembler file. Unless PIC code is being generated, the other options in this section are ignored by the assembler, with the exception of .gpword, which becomes .word. Thus, you can easily use the same assembler file for generating PIC and non-PIC (that is, non-shared) objects by not placing .option pic0 or .option pic2 in the assembler file and invoking the assembler without -KPIC (for non-shared) or with -KPIC (for PIC code). o .option pic2 This directive forces the assembler to mark the output object file as containing PIC code and activates the following directives. It overrides the command line argument. Normally, you don't need to specify this directive. Instead, you should use -KPIC or -non_shared to toggle between generating PIC or non- PIC. Note that even though -KPIC is the default for the high-level language driver (cc/pc/f77), it is not the default for assembly sources. In the absence of an .option pic0 or .option pic2, you must explicitly specify -KPIC for compiling .s files to get PIC code. o .cpload reg This directive expands into three instructions that set the gp register to the context pointer value for the current function. It should always be placed in a noreorder area (that is, it should be preceded by .set noreorder and followed by .set reorder.) This directive expands into: lui gp,_gp_disp addui gp,gp,_gp_disp addu gp,gp,reg _gp_disp is a reserved symbol that the linker sets to the distance between the lui instruction and the context pointer. This directive is required at the beginning of each subroutine that uses the gp register. You must add this directive at the beginning of every procedure, with the exception of leaf procedures that do not access any global variables and procedures that - 19 - are static (that is, not marked .globl or .extern). Note: The MIPS ABI requires that .cpload use register $25. o .cprestore offset This directive causes the assembler to issue: sw gp,offset(sp) where it appears. Additionally, it causes the assembler to emit: lw gp,offset(sp) after every jump-and-link (jal) (but not branch-and- link (bal)) operation, thereby restoring the gp register after function calls. You are responsible for allocating the stack space for the gp. This space should be in the saved register area of the stack frame to remain consistent with calling and debugger conventions. o .gpword local-sym This directive is similar to .word, except that the relocation entry for local-sym has the R_MIPS_GPREL32 type. After linkage, this results in a 32-bit value that is the distance between local-sym and the context pointer (that is, the gp). local-sym must be local. It is currently used for PIC switch tables. o .cpadd reg This directive adds the value of the context pointer (gp) to reg. - 20 - EXAMPLES: This following is a simplified version of the hello world program: .option pic2 .data .align 2 $$5: .ascii "hello world\\X0A\\X00" .text .align 2 main: .set noreorder .cpload $25 .set reorder subu $sp, 40 sw $31, 36($sp) .cprestore 32 la $4, $$5 jal printf move $2, $0 lw $31, 36($sp) addu $sp, 40 j $31 The actual instructions generated by the assembler will be: lui gp,0 # addiu gp,gp,0 # generated by .cpload addu gp,gp,t9 # lw a0,0(gp) # gp-relative addressing used lw t9,0(gp) # t9 is used for func. call addiu sp,sp,-40 sw ra,36(sp) sw gp,32(sp) # from .cprestore jalr ra,t9 # jal is changed to jalr addiu a0,a0,0 lw ra,36(sp) lw gp,32(sp) # activated by .cprestore move v0,zero jr ra addiu sp,sp,40 nop PIC Linkage Conventions o The MIPS ABI requires register t9 ($25) to be used for indirect function calls, so .cpload should always use $25. Noreorder mode must be in effect when the .cpload directive is encountered. Also, make sure that t9 is - 21 - not in use before any function call, as its value will be destroyed. o If your program uses an indirect jump (jalr), you must also use t9 as the jump register. o If you have an unconditional jump to an external label: j _cerror you have to rewrite it into an indirect jump via t9, that is: la t9,_cerror j t9 o If you use a branch-and-link (bal) instruction for calling a function in the same file, and the target procedure begins with a .cpload, your bal must be to an alternate entry point in the function after the .cpload: foo: .set noreorder # callee .cpload $25 .set reorder $$1: ... # alternate entry point ... j $31 # foo returns bar: ... # caller ... bal $$1 # bypass the .cpload ... This is very important because .cpload assumes register $25 contains the address of foo, but in this case, $25 is not set up. Note that because both foo and bar reside in the same file, they must have the same value for $gp. So the .cpload instructions can be and must be bypassed. However, because foo can still be called from outside, the .cpload is still required. Alternatively, if you don't want to have an alternate entry point, you can set up register $25 before the bal: la t9,foo bal foo or, if foo is an external symbol, you can simply use a jal (and allow the assembler to set up t9 for you). - 22 - Both of these methods are slightly less efficient than adding an alternate entry. o .gpword and .cpadd are used together to implement a position-independent jump table (or any table of text addresses). Entries of the address table created by .gpword are converted into displacements from the context pointer. To get the correct text address, use .cpadd to add the value of gp back to them. Because the gp is updated by the runtime linker, the correct text address can be reconstructed regardless of the location of the DSO. 3.3 Library_and_System_Call_Functionality The following additions and changes were made to library and system call functionality between versions 4.1 and 5.2 of the IRIS Development Option. o IRIX 4.0 source programming interfaces to system calls and system libraries in IRIX 5.0.1 and later are compatible with those in IRIX 4.0. Code that compil ed under IRIX 4.0 and uses commonly recognized practices for writing portable code should compile without modification on IRIX 5.0.1 and later. o Recursive versions of some libc functions have been provided. These correspond to the POSIX 1003.4a specification for reentrant functions. These functions are present in the default compilation mode-if you are compiling in POSIX-compliant mode (_POSIX_SOURCE defined), programs should be compiled with the feature test macro _SGI_REENTRANT_FUNCTIONS defined. o The POSIX 1003.4a specification for making stdio multi-thread safe has bee n implemented. In the default compilation mode, all stdio functions are thread safe. In POSIX or ANSI compilation mode, the program must define the feature test macr o _SGI_MP_SOURCE in order to get the thread safe versions of stdio functions and macros. o The handling of the global error value, errno, has changed from IRIX 4.0. If the program includes and defines the feature test macro _SGI_ MP_SOURCE, references to errno actually reference a per-thread errno; otherwise, the global variable errno is accessed. All system calls update both the per- thread and global versions of errno. - 23 - o The MIPS ABI mutual exclusion library libmutex.so is supported. The actual implementation of the routines is in libc.so.1. These routines, init_lock, acquire_lock, release_lock, and stat_lock, provide low-level portable access to a mutual exclusion primitive (see abiloc k(3x)). o The math library libm.a has been carefully checked to ensure its conformance with both the SVID 3rd Edition and ANSI X3.159-1989. Specific information can be found in the man pages sinh, exp, bessel, floor, gamma, math, hypot, sinh, sqrt, and trig. o The interface to the function scalb(3m) has changed to conform to SVR4. In previous releases, the type of the second argument to scalb (the exponent) was int. In this release, the type of the second argument is double. In addition, the functions scalb and rint have been moved from the math library to the C library. o A new option, flush_to_zero, has been added to libfpe.a. On an R4000-based system, using this option can improve execution performance if many floating point underflows occur. - 1 - 4. Bug_Fixes This section lists the significant bugs fixed in the base compilers since the IRIX 4.0.1 release. 4.1 Compiler_Bug_Fixes 4.1.1 Linker_(ld(1)) o The default cache size was changed to the size of the R4000 cache (8K) in 5.0.1. This default may still be changed by use of the -Xcachesize size option to ld. o The size of the bss is now one-half what it was in IRIX 4.0.1. The bss region in an a.out is now essentially the same size as it would have been in IRIX 3.3.3. o Incremental linking using the -A command has been fixed. Adding a -allow_jump_at_eop to an ld -A link is no longer necessary. o The -Xlocaldata option now works correctly, including its special symbols. o Many memory leaks in the linker have been fixed. This regains most of the linker performance lost in the previous release. 4.1.2 Run-time_Linker_(rld(1))_and_libdl(3x) The following bugs were fixed in the 5.0.1 release of rld and the dynamic linking library libdl. o In 5.0.1, dlopen(3x) of a shared object which was created with the -init option calls the -init routine before dlopen returns. In 5.0, the -init routine was not called at dlopen. o In 5.0, libdl routines could call exit(2) under certain circumstances (for example, if the desired library could not be opened). In 5.0.1, the libdl routines return an error value under these circumstances as documented in their manual pages. o In the 5.0 release, when a shared object was opened via dlopen(3x), its symbols became globally visible. This behavior has been changed to be consistent with SVR4. In the 5.0.1 release, objects loaded by one invocation of dlopen may not directly reference symbols from objects loaded by a different dlopen invocation. Those symbols may, however, be referenced indirectly using dlsym(3x). - 2 - See the NOTES section of the dlopen(3x) manual page for further information. 4.1.3 Assembler_(as(1,5)) Several bugs in the assembler have been fixed since the previous release. These include bugs in the various assembler optimizations such as software pipelining and peephole optimization. 4.1.4 Optimizer_(uopt(5)) Numerous significant bugs have been fixed since the IRIX 4.0.5 release. 4.1.5 Code_Generator_(ugen(5)) Several problems with code generation have been fixed since IRIX 4.0.5. o Several problems with unaligned data accesses have been fixed. (1127521, 129034) o Code generation for FORTRAN's SIGN function has been fixed. o An overflow problem with Pascal passing large objects has been fixed (126986). 4.1.6 The_Debugger_dbx(1) In the version of dbx released with 5.0, attempts to use the stop or trace constructs failed. The dbx documentation states: ``If an is given, that expression is assumed to be a pointer and the thing-pointed-at is inspected at the `appropriate' points.'' In the 5.0 version, the was inspected at 'appropriate' points, rather than the thing-pointed-at by . The result was an inoperative trace or stop command. This problem was fixed in 5.0.1. 4.1.7 Performance_Tools The stability of pixie was greatly improved in the 5.0.1 release. In addition, as of 5.0.1 it is possible to instrument a multiprocessing program with pixie. As of the 3.18/5.2 release, prof can now collect statistics about dynamic shared libraries. In addition, multiprocessor - 3 - support is now working. 4.1.8 Libraries The following bugs have been fixed in libraries. o The exception handling library, libexc, has been changed to allow for correct functioning of non-local GOTOs in Pascal code. In previous releases, non-local GOTOs appearing in Pascal code in a shared object did not function correctly. Due to implementation changes in the handling of non-local GOTOs necessary to correct this problem, all Pascal code, whether in a shared object or not, should be compiled and relinked in 5.0.1 and later. If you are certain that none of your Pascal code uses non-local GOTOs, you can ignore this requirement. o The atof and strtod functions now return correctly signed HUGE_VAL for arguments too large in magnitude. In addition, strtod sets errno to ERANGE. o The ldexp function now correctly returns HUGE_VAL and sets errno to ERANGE if the result overflows. o The precision of conversion between ASCII and binary floating point has been significantly improved in this release. o Rounding into the least-significant digit of an output floating point format is now done correctly in all cases. In previous releases, printing .00053 with a format of %.3f printed 0.000 instead of the (correct) 0.001. o Various bugs against math library manual pages have been fixed. - 1 - 5. Known_Problems_and_Workarounds This section lists known problems with the 3.18 base compiler portion of the IRIS Development Option. 5.1 Optimizer_(uopt(5)) o In certain cases (usually with very large subroutines), uopt has grown unreasonably large while running (over 70 MB). This causes systems with smaller amounts of memory to thrash and, in extreme cases, to run out of available swap space. This should be suspected if uopt dies with a ``signal 9,'' which means that the process was killed externally (for example, by the operating system), rather than by a bug that caused an internal failure. Almost all optimizer problems can be narrowed to to a single subroutine. By identifying the problem routine(s), you do not need to suppress optimization on the whole program, only on the smaller subset. o A considerable number of new optimizations have been added to the assembler. These optimizations are turned on at level -O2; if they fail, they tend to look like optimizer problems. 5.2 Performance_Tools The following known problems exist in pixie(1): - Trace features are currently not supported. This is to say that they have not been tested and thus cannot be guaranteed to work. - Objects loaded using dlopen() cannot be instrumented automatically. o The following problem exists in pixstats(1): The DSOs must be in or linked to the current directory when executing pixstats. o The following problems exist in prof(1): prof (-pixie) -testcoverage or -gprof cannot process basic block counts for shared libraries. If you need to process basic block counts, compile the code with -non_shared flag. - prof cannot process information from dynamic shared libraries that have been opened with - 2 - dlopen() and have the same name, but differenct paths, i.e.: /path1/libl.so /path2/libl.so 5.3 Libraries These are known problems in compiler-associated libraries: o In general, routines in the -lm43 library might not conform to either SVR4 or IEEE with respect to diagnostics or return values. These discrepancies are, however, described in the manual pages of the constituent functions. (See Section 3.5 for math library changes). The following particular problems are known (these problems exist in -lm43 routines, but not in -lm routines): - The -lm43 functions pow, hypot, and cabs might fail to return NaN when given a NaN argument. The return value in these cases is Infinity for hypot and cabs and either Infinity or zero for pow. - If the magnitude of their argument is greater than one, the -lm43 functions acos and asin return zero, pi/2, or pi rather than the (correct) NaN. - The -lm43 y0, y1, and yn functions return NaN (instead of -Infinity) when the argument is zero. These functions also produce underflow inconsistently (with respect to -lm). - The version of gamma in the -lm43 library loops indefinitely if it is given Infinity as an argument. o The single-precision version of log, logf, is imprecise. In particular, logf(x) might not approximate -logf(1/x) as well as expected. The double-precision version does not exhibit this behavior. - 1 - 1. Dynamic_Shared_Objects A Dynamic Shared Object, or DSO, is an ELF format object file, very similar in structure to an executable program but with no "main". It has a shared component, consisting of shared text and read-only data; a private component, consisting of data and the GOT (Global Offset Table); several sections that hold information necessary to load and link the object; and a liblist, the list of other shared objects referenced by this object. Most of the libraries supplied by SGI are available as dynamic shared objects. A DSO is relocatable at runtime; it can be loaded at any virtual address. A consequence of this is that all references to external symbols must be resolved at runtime. References from the private region (.e.g. from private data) are resolved once at load-time; references from the shared region (e.g. from shared text) must go through an indirection table (GOT) and hence have a small performance penalty associated with them. Code compiled for use in a shared object is referred to as Position Independent Code (PIC), whereas non-PIC is usually referred to as non-shared. Non-shared code and PIC cannot be mixed in the same object. At Runtime, exec loads the main program and then loads rld, the runtime linking loader, which finishes the exec operation. Starting with main's liblist, rld loads each shared object on the list, reads that object's liblist, and repeats the operation until all shared objects have been loaded. Next, rld allocates common and fixes up symbolic references in each loaded object. (This is necessary because we don't know until runtime where the object will be loaded.) Next, each object's init code is executed. Finally, control is transferred to "__start". For a more complete discussion of DSOs, including answers to questions frequently asked about them, see the dso(5) man page.