Software Portability and Optimization

Monday, April 22, 2024

Project Stage 2 & 3: Command Line Parsing

Hey there, fellow tech enthusiasts! It's time for another update on my SPO600 Winter Project journey. Stage 2 has been quite a rollercoaster ride, filled with both triumphs and hurdles. Let's dive right in and explore the progress I've made and the challenges I've faced.

Realignment of Tasks: Discovering My Path

A couple of weeks ago, an unexpected twist occurred in our project journey. It turned out that my colleague had been working on the Version List Processing task, which I initially thought was assigned to me. After this realization, we swiftly realigned our tasks, and I found myself redirected to focus on Command-line Parsing instead.

Adding AFMV Capability: A Step Forward

Despite the shift in focus, I quickly immersed myself in the Command-line Parsing task, I did my research mainly on gcc.gnu.org and made significant strides. One of the key milestones I've achieved in Stage 2 is the implementation of AFMV (auto-function-multi-versioning) capability in the GNU Compiler Collection (GCC) for AArch64 systems. This involved significant code modifications and careful integration with existing GCC functionality.

One notable aspect of my implementation is the addition of support for specifying AFMV options via the command-line using the `afmv=` flag. This allows users to enable AFMV and provide a list of architectural feature versions, such as "sve" or "sve2", to be used as additional variants. I ensured robust parsing of these options to validate architectural features and handle error cases gracefully.

This is the option I created in common.opt to get the version list for further processing.

-afmv=

Common Joined Var(flag_afmv) Optimization

Enable automatic function multiversioning with specified architectural features.

Handling Errors and Mac Build Woes

I added some errors in the code as well, like if the user enters no afmv values, then it would display the error, I did this in opts.cc file, and this is the code:

case OPT_afmv_:

      if (arg!){

          error_at(loc, "Please enter values for afmv option");

}

      break;

During the implementation phase, I encountered a couple of roadblocks that tested my problem-solving skills. One challenge arose when attempting to build GCC on a macOS environment. I stumbled upon the dreaded error message:

The directory (BUILD_SYSTEM_HEADER_DIR) that should contain system headers does not exist:

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/usr/include

(on Darwin this usually means you need to pass the --with-sysroot= flag to point to a valid MacOS SDK)

make[3]: *** [stmp-fixinc] Error 1

To overcome this obstacle, I decided to switch gears and build GCC on an AArch64 server. However, this decision came with its own set of challenges. The server turned out to be extremely slow, significantly slowing down my development process. Moreover, to add to the woes, the server was intermittently down, forcing me to resort to using IP addresses instead of my usual user ID.

Looking Ahead

Despite the hurdles I faced, I'm proud of the progress I've made in Stage 2. The AFMV capability is starting to take shape, paving the way for further enhancements and optimizations. As I move forward, I'll continue to refine my implementation, address any remaining issues, and ensure seamless integration with the GCC codebase.

For stage 3, I uploaded the code to Github, and was also in full contact with my colleagues for any coordination required by them for their part's implementation.

Cheers, Muhammad Wajih

Tuesday, March 26, 2024

Approach for Version List Processing

This is the step by step approach, I will be following to implement the Version List Processing task:

1. Understand the Requirements: Make sure to fully understand what the version list processing task entails. Review the project goal and the specific requirements outlined for this task.

2. Identify Dependencies: Determine if there are any dependencies or prerequisites for this task. For example, does it require knowledge of other parts of the codebase, or does it depend on completion of other tasks?

3. Set Up My Development Environment: Ensure I have the necessary tools and resources to work on the task. This might include setting up a development environment with GCC and any other required software.

4. Familiarize Myself with the Codebase: Spend some time familiarizing myself with the relevant parts of the GCC codebase. Understand where the version list processing code will fit in and how it will interact with other components.

5. Implement Command-line Parsing: Start by implementing the command-line parsing functionality if it's not already in place from Cameron, or might get some guidance from him in what I might expect from him to make an example in place. This will involve parsing the compiler command-line arguments to pick up AFMV options and storing them in appropriate data structures.

6. Stub Out Version List Processing: Create stub functions or placeholders for the version list processing logic. These can be basic functions that accept hard-coded input values or read values from a file for testing purposes.

7. Test My Implementation: Test the command-line parsing functionality and the stubbed-out version list processing logic to ensure they're working correctly. This might involve writing test cases to verify that the expected input values are being parsed and processed correctly.

8. Iterate and Refine: Iterate on my implementation, refining and improving it as needed. As I gain a better understanding of the codebase and the requirements of the task, I may need to make adjustments to my implementation.

9. Connect with Other Tasks: Once I have a basic version of the version list processing functionality in place, I can start connecting it with other tasks as needed. This might involve integrating it with the FMV Cloning functionality or other parts of the compiler codebase.

10. Document My Work: Document My implementation, including any design decisions, assumptions, and testing procedures. This documentation will be valuable for my own reference and for others who may need to work with or review my code.

Project Stage 1: Version List Processing and Update Documentation

Version List Processing

Lead: Muhammad Wajih Rajani

Description:

- The task involves processing the version list received from the command-line to validate the architectural features.

- This includes parsing the list of architectural feature versions provided as command-line options.

- The parsed features need to be validated against the existing FMV code to ensure compatibility and correctness.

Approach:

1. Parsing Command-line Options: Use command-line parsing libraries or write custom code to extract the AFMV options provided by the user.

2. Validation of Architectural Features: Compare the parsed features against the existing FMV codebase to ensure compatibility and correctness.

3. Error Handling: Implement appropriate error handling mechanisms to deal with invalid or unsupported architectural feature versions.

Testing:

- Test the version list processing functionality with various combinations of AFMV options.

- Verify that valid options are accepted, and invalid options are rejected with informative error messages.

Skills and Knowledge Required:

- Proficiency in C/C++ programming language.

- Understanding of GCC codebase structure and organization.

- Familiarity with command-line parsing techniques and error handling in C/C++.

Independence from Other Tasks:

- Version list processing can be developed independently of other tasks as it primarily involves working with command-line arguments and validating feature versions.

- It does not depend on the completion of other tasks such as automatic cloning or diagnostic output.

Interface with Other Tasks:

- The version list processing functionality will interface with the command-line parsing module developed by Cameron Coenjarts to retrieve AFMV options.

- It may also interact with the automatic cloning task to ensure that the validated architectural features are correctly applied during function cloning.

Estimated Work Hours:

- Parsing Command-line Options: 4 hours

- Validation of Architectural Features: 6 hours

- Error Handling: 2 hours

- Total: 12 hours

Update Documentation

Description:

- The task involves updating the existing GCC IFUNC and FMV documentation for all architectures.

- Additionally, it requires creating documentation specifically for the AFMV feature introduced in this project.

Approach:

1. Review Existing Documentation: Familiarize yourself with the current IFUNC and FMV documentation available in the GCC codebase.

2. Identify Changes: Identify sections of the documentation that need to be updated to reflect the changes introduced by the AFMV feature.

3. Create New Documentation: Develop new documentation specifically detailing the AFMV feature, including usage instructions and examples.

Testing:

- Review the updated and new documentation to ensure clarity, correctness, and completeness.

- Seek feedback from team members and potential users to refine the documentation further.

Skills and Knowledge Required:

- Proficiency in technical writing and documentation.

- Understanding of GCC architecture and feature sets.

- Ability to translate technical concepts into clear and concise documentation for users.

Independence from Other Tasks:

- Documentation can be updated and created independently of other tasks, as it primarily involves writing and editing text.

- It does not require changes to the codebase or interaction with other modules.

Interface with Other Tasks:

- The updated documentation will serve as a reference for users and developers implementing AFMV and related features.

- Feedback from other task leads and developers may inform updates and improvements to the documentation.

Estimated Work Hours:

- Review Existing Documentation: 2 hours

- Identify Changes: 2 hours

- Create New Documentation: 8 hours

- Total: 12 hours

Preference Task:

My preference from the tasks above would be version list processing, as it is more integral to the course and crucial for the completion of the project. I hope to achieve a higher score by focusing on this task.

I am very confident that I will be able to complete my work successfully. It is of significant interest to me, and completing it will improve my chances of receiving a better grade for university applications.

Writing Assembly Code: AArch64 vs x86_64

Writing assembly code can be an enlightening experience, offering a deep dive into the inner workings of a computer's architecture. In this blog post on Lab 3, we'll explore the process of writing and debugging assembly code for two different architectures: AArch64 and x86_64. We'll contrast the two and share our experiences with each.

AArch64 Assembly Code

Here's a snippet of assembly code written for the AArch64 architecture:

.text

.globl _start

min_val = 0

max_val = 30

zero_char = 0x30

_start:

mov x19, min_val

mov w6, zero_char

mov w7, zero_char

loop:

mov x0, 1

adr x1, msg

mov x2, len

mov x8, 64

mov x23, 0xA

udiv x24, x19, x23

msub x25, x24, x23, x19

cmp x24, 0x0

b.eq setup_second_digit

mov w6, w24

mov w7, w25

mov w26, zero_char

add w6, w6, w26

add w7, w7, w26

add x3, x1, 6

strb w6, [x3]

setup_second_digit:

add x3, x1, 7

strb w7, [x3]

svc 0

add x19, x19, 1

add w7, w7, 1

cmp x19, max_val

b.ne loop

mov x0, 0

mov x8, 93

svc 0

.data

msg: .ascii "Loop: \n"

len= . - msg

Explanation:

- The code begins with `.text` directive, indicating the start of executable instructions.

- `_start` is declared as the entry point of the program.

- Constants such as `loop_index`, `loop_max`, and `zero_char` are defined.

- The loop iterates from `loop_index` to `loop_max`.

- Inside the loop, the current index is divided to obtain the first and second digits.

- ASCII characters for the digits are obtained and stored in the `msg` string.

- The message is printed to the standard output using the `syscall` instruction.

- The loop continues until `loop_index` reaches `loop_max`.

- Finally, the program exits with a system call.

- This is what the loop prints on screen:

[mwrajani@aarch64-001 lab3]$ ./loop

Loop: 0

Loop: 1

Loop: 2

Loop: 3

Loop: 4

Loop: 5

Loop: 6

Loop: 7

Loop: 8

Loop: 9

Loop: 10

Loop: 11

Loop: 12

Loop: 13

Loop: 14

Loop: 15

Loop: 16

Loop: 17

Loop: 18

Loop: 19

Loop: 20

Loop: 21

Loop: 22

Loop: 23

Loop: 24

Loop: 25

Loop: 26

Loop: 27

Loop: 28

Loop: 29

x86_64 Assembly Code

Here's a snippet of assembly code written for the x86_64 architecture:

.text

.globl _start

loop_index = 0

loop_max = 30

zero_char = 48

_start:

movq $loop_index, %rbx

loop:

movq %rbx, %rax

movq $10, %rcx

xor %rdx, %rdx

div %rcx

add $zero_char, %rdx

cmp $0, %rax

je setup_second_digit

add $zero_char, %rax

movq $msg, %rdi

add $6, %rdi

stosb

setup_second_digit:

movq $msg, %rdi

add $7, %rdi

movq %rdx, %rax

stosb

movq $1, %rdi

movq $msg, %rsi

movq $len, %rdx

movq $1, %rax

syscall

inc %rbx

cmp $loop_max, %rbx

jne loop

mov $0, %rdi

mov $60, %rax

syscall

.data

msg: .ascii "Loop: \n"

len= . - msg

Explanation:

- The code starts with `.text` directive and `_start` as the entry point.

- Constants such as `min_val`, `max_val`, and `zero_char` are defined.

- The loop iterates from `min_val` to `max_val`.

- Inside the loop, the current index is divided to obtain the first and second digits.

- ASCII characters for the digits are obtained and stored in the `msg` string.

- The message is printed to the standard output using the `svc` instruction.

- The loop continues until `min_val` reaches `max_val`.

- Finally, the program exits with a system call.

- This is what the loop prints on screen, although it is the same as aarch64, but it is expected as we are also doing the same thing:

[mwrajani@x86-001 lab3]$ ./loop

Loop: 0

Loop: 1

Loop: 2

Loop: 3

Loop: 4

Loop: 5

Loop: 6

Loop: 7

Loop: 8

Loop: 9

Loop: 10

Loop: 11

Loop: 12

Loop: 13

Loop: 14

Loop: 15

Loop: 16

Loop: 17

Loop: 18

Loop: 19

Loop: 20

Loop: 21

Loop: 22

Loop: 23

Loop: 24

Loop: 25

Loop: 26

Loop: 27

Loop: 28

Loop: 29

Writing and Debugging Experience

Writing assembly code for both architectures offers a unique insight into how the underlying hardware operates. AArch64 assembly, with its distinct syntax and register names, requires a thorough understanding of the architecture's instruction set. Debugging AArch64 code often involves meticulous examination of register values and memory accesses.

On the other hand, x86_64 assembly, being more ubiquitous, may feel more familiar to those accustomed to Intel-based systems. Debugging x86_64 code is aided by the availability of numerous tools and resources, making it relatively straightforward to identify and fix issues.

Conclusion

In conclusion, writing assembly code for AArch64 and x86_64 architectures provides valuable insights into low-level system operations. While each architecture has its own syntax and nuances, the fundamentals of assembly programming remain consistent. By understanding these differences, developers can gain a deeper appreciation for the inner workings of modern computing systems.