NATCHATRAN -PREM ANANDH.J: March 2014

CS2354- ADVANCED COMPUTER ARCHITECTURE

PART A Question and Answer

2 Marks

UNIT I

1. What is Instruction Level parallelism?

The technique which is used to overlap the execution of instructions and improve

performance is called ILP.

2. What are the approaches to exploit ILP?

The two separable approaches to exploit ILP are,

• Dynamic or hardware intensive approach

• Static or Compiler intensive approach

3. What is pipelining?

Pipelining is an implementation technique whereby multiple instructions are overlapped

in execution when they are independent of one another.

4. Write down the formula to calculate the pipeline CPI?

The value of the CPI (Cycles per Instruction) for the pipelined processor is the sum of the

CPI and all contributions from stalls.

Pipeline CPI = Ideal pipeline CPI + structural stalls + Data hazard stalls + control stalls.

5. What is loop level parallelism?

Loop level parallelism is a way to increase the amount of parallelism available among

instructions is to exploit parallelism among iterations of loop.

6. Give the methods to enhance performance of ILP?

To obtain substantial performance enhancements, the ILP across multiple basic blocks

are exploited using

• loop level parallelism

• vector instructions

7. List out the types of dependences.

There are three different types of dependences

• Data dependences

• Name dependences

• Control dependences

8. What is Data hazard?

A hazard is created whenever there is dependence between instructions and they are close

enough that the overlap caused by pipelining, or other reordering of instructions, would change

the order of access to the operand involved in the dependence.

9. Give the classification of Data hazards

Data Hazards are classified into three types depending on the order of read and write

accesses in the instructions

• RAW (Read After Write)

• WAW (Write After Write)

• WAR (Write After Read)

10. List out the constraints imposed by control dependences?

The two constraints imposed by control dependencies are

• An instruction that is control dependent on branch cannot be moved before the branch

so that its execution is no longer controlled by the branch.

• An instruction that is not control dependent on a branch cannot be moved after the

branch so that its execution is controlled by the branch.

11. What are the properties used for preserving control dependence?

Control dependence is preserved by two properties in a simple pipeline.

• Instruction execute in program order

• Detection of control or branch hazards

12. Define Dynamic Scheduling?

Dynamic scheduling is a technique in which the hardware rearranges the instruction

execution to reduce the stalls while maintaining data flow and exception behavior.

13. List the advantages of dynamic scheduling?

• It handles dependences that are unknown at compile time.

• It simplifies the compiler.

• Uses speculation techniques to improve the performance.

14. What is score boarding?

Score boarding is defined as it allows out of order execution when all the resources are

available and there is no data dependence. It can’t be eliminated until these two hazards WAW,

WAR are cleared.

15. What are the advantages of Tomosulo’s Approach?

• Distribution of hazard detection layer

• Elimination of WAR and WAW hazard

16. What are the types of branch prediction?

There are two types of branch prediction. They are,

• Dynamic branch prediction

• Static branch prediction

17. Define Amdahl’s Law?

This law states that particular performance of the computer can be improved by

improving some portion of the computer. This is known as Amdahl’s Law.

18. What are the things present in Dynamic branch prediction?

It uses two things they are,

• Branch prediction buffer

• Branch history table

19. Define Correlating branch prediction?

Branch prediction that uses the behavior of other branches to make a prediction is called

correlating branch prediction.

20. What are the basic ideas of pipeline scheduling?

The basic ideas of pipeline scheduling are,

• To keep pipeline full: Find sequence of unrelated instructions that can be

overlapped in the pipeline.

• To avoid pipeline stall: Separate dependent instructions by a distance in clock

dependent instructions by a distance in clock cycles equal to the pipeline latency

of that source instruction.

21. What are the four fields involved in ROB?

ROB contains four fields,

• Instruction type

• Destination field

• Value field

• Ready field

22. What is reservation station?

In Tomasulo’s scheme register renaming is provided by reservation station. The basic

idea is that the reservation station fetches and buffers an operand as soon as it is available,

eliminating the need to get the operand from a register.

23. What is ROB?

ROB stands for reorder buffer. It supplies operands in the interval between completion of

instruction execution and instruction commit. ROB is similar to the store buffer in Tomasulo’s

algorithm.

24. What is imprecise exception?

An exception is imprecise if the processor state when an exception is raised does not look

exactly as if the instructions were executed sequentially in strict program order.

25. What are the two possibilities of imprecise exceptions?

• If the pipeline has already completed instructions that are later in program order

then that instruction will cause exception.

• If the pipeline has not yet completed instructions that are earlier in program order

then that instructions will cause exception.

26. What are the two main features preserved by maintaining both data and control dependence?

• Exception behavior

• Data flow

27. What are the types of dependence?

• Anti dependence

• Output dependence

28. What is anti dependence?

An anti dependence between instruction i and instruction j occurs when instruction j

writes a register or memory location that instruction i reads. The original ordering must be

preserved to ensure that i read the correct value.

29. What is output dependence?

An output dependence occurs when instruction i and instruction j write the same register

or memory location. The ordering between the instructions must be preserved to ensure that the

value finally written corresponds to instruction j.

30. What is register renaming?

Renaming of register operand is called register renaming. It can be either done statically

by the compiler or dynamically by the hardware.

UNIT 2

1. Define VLIW.

VLIW is a technique for ILP by executing instructions without dependencies in parallel.

The compiler analysis the program and detects operations to be executed in parallel; such

operations are packed into one “ large” instruction.

2. List out the advantages of VLIW processor.

􀂙 Simple hardware

Number of functional units can be increased without needing additional

sophisticated hardware to detect parallelism like in superscalus.

􀂙 Good compilers can detect parallelism based on global analysis of the whole

program.

3. Define EPIC

􀂙 Epic is Explicit Parallel Instruction Computing

􀂙 It is an architecture framework proposed by HP.

􀂙 It is based on VLIW and was designed to overcome the key limitations of VLIW

while simultaneously giving more flexibility to compiler writers.

4. What is loop level analysis?

Loop level analysis involves determining what depends exist among the operands in a

loop across the iterations of a loop are data dependent on data values produced in earlier

iterations.

5. What are the types of Data dependencies in loops?

􀂙 Loop Carried dependencies

􀂙 Not loop carried dependence

6. What is loop carried dependence?

Data dependence between different loop iterations (data produced in earlier iterations

used in a later one) is called a loop carried dependence.

7. What are the tasks in finding the dependence in a program?

There are 3 tasks. They are

􀂙 Have good scheduling of code

􀂙 Determine which loop might contain parallelism

􀂙 Eliminate name dependence

8. Define dependence analysis algorithm.

Dependence analysis algorithm is algorithm used to detect the dependence by the

compiler based on the assumptions that

􀂙 Array indices are affine

􀂙 There exist GCD of the two affine indices

9. What is copy propagation?

Copy propagation is the algebraic simplifications of expressions and an optimization

which eliminates operation that copy values.

10. What is tree-height reduction technique?

Tree-height reduction is optimization which reduces the height of the tree structure

representing a computation, making it wider but shorter.

11. What are the components of software pipeline loop?

􀂙 A software pipeline loop consists of a loop body, start- up code and clean-up

code.

􀂙 Start up code is to execute code left out from the first original loop iterations.

􀂙 Finish code to execute instructions from the last original iterations.

12. What is trace scheduling?

Trace scheduling is way to organize the process of global code motion it simplifies

instruction scheduling by incuring the cost of possible code motion on the less critical

paths.

13. List out steps used for trace scheduling.

􀂙 Trace selection

􀂙 Trace compaction

14. Define Inter-procedural analysis.

A procedure with pointer parameters and if we want to analyse the procedure across the

boundaries of the particular procedure. It is called interprocedural analysis.

15. What is software pipelining?

It is a technique for reorganizing loop such that each iteration in the code is made from

instructions chosen from different iterations of original loop.

16. Define critical path.

Critical path is defined as the longest sequence of dependent instructions in a program.

17. Define IA-64 processor.

The IA-64 is a RISC-Style, register-register instruction set with the features designed to

support compiler based exploitation of ILP.

18. What is CFM and what is its use?

􀂙 CFM stands for Current Frame Pointer

􀂙 CFM pointer points to the set of registers to be used by a given procedure.

19. What are the parts of CFM pointer?

There are two parts. They are

􀂙 Local area – Used for local storage

􀂙 Output area - Used to pass values to any called procedure.

20. What is Itanium processor?

Itanium processor is a implementation of Intel IA-64 processor. It is capable of having 6

issues per clock cycle. The 6 issues includes 3 branches and 2 memory reference.

21. What are the parts of 10 stage pipeline in Itanium processor?

􀂙 Front end

􀂙 Instruction delivery(EXP, REN)

􀂙 Operand delivery(WLD, REG)

􀂙 Execution(EXE, DEG, WRB)

22. What are the limitations of ILP?

􀂙 Limitations on hardware model

􀂙 Limitations on window size and maximum issue count

􀂙 Effect of finite register

􀂙 Effects of imperfect alias analysis

23. List the two techniques for eliminating dependent computations

􀂙 Software pipelining

􀂙 Trace scheduling

24. Define Trace selection and Trace compaction

Trace Selection

Trace selection tries to find a likely sequence of basic blocks whose operations will be

put into small number of instructions this sequence is called trace.

Trace Compaction

Trace compaction tries to squeeze the trace into a small number

of wide instructions. Trace compaction is code scheduling hence it attempts to move

operations as early as it can in a sequence packing the operations into as few wide

instructions as possible.

25. Define Superblocks.

Superblocks are formed by a process similar to that used for traces, but are a form of

extended basic blocks, which are restricted to a single entry point but allow multiple exits.

26. Use of conditional or predicted instructions.

Conditional or predicted instructions are used to eliminate braches, converting a control

dependencies and potentially improving performance.

27. Define Instruction Group

Instruction group is a sequence of consecutive instructions with no register data dependencies

among them. All the instructions in a group could be executed in parallel if sufficient

hardware resources existed and if any dependences through memory were preserved.

28. Use of template field in bundle.

The 5 bit template field within each bundle describes both the presence of any stops

associated with the bundle and the execution unit type required by each instruction within

the bundle.

29. List the two types of speculation supported by IA 64 processor.

􀂙 Control Speculation

􀂙 Memory reference speculation

30. Define Advance loads.

Memory reference support in the IA 64 uses a concept called advanced loads. Advance load

is a load that has been speculatively moved above store instructions on which it is potentially

dependent. To speculatively perform a load the ld.a instruction is used.

31. Define ALAT

Executing advance load instructions created an entry in a special table called ALAT. It

stores both the register destination of the load and the address of the accessed memory

location. When a store is executed, an associative look up against the active ALAT

entries is performed. If there is an ALAT entry with the same memory memory address

as the store, mark the ALAT entry as invalid.

32. What are the functional units in Itanium Processor?

There are nine functional units in the Itanium processor.

Two I units

Two M units

Three B units

Two F units

All the functional units are pipelined.

33. Define Scoreboard

In Itanium processor 10 stage pipeline divided into 4 parts. In operand delivery part

scoreboard is used to detect when individual instruction can proceed so that a start of one

instruction in a bundle need not cause the entire bundle to stall.

34. Define Book Keeping Code

Basic block consists of 1 entry and 1 exit code. This code is known as Book 1Keeping

Code.

1. Define cache coherence problem?

Unit-3

Cache coherence problem describes how two different processors can have two different

values for the memory location.

2. What are the two aspects of cache coherence problem?

i. coherence- It determines what value can be returned by the particular read

operation.

ii. Consistency- It determine when the value may be returned by the read

operation.

3. What are the two types of cache coherence protocol?

i. Directory based protocol.

ii. Snooping protocol.

4. Define Directory based protocol.

The shared portion of the main memory may be kept in one common place called

directory. From this directory we can retrieve the data.

5. Name the different types of snooping protocol.

i. invalidate protocol

ii. update/write broadcast protocol.

6. Difference between write Update and invalidate protocol.

Write update:

i. Multiple write broadcast is present

ii. Here they consider separate word for each cache block

iii. Access time is less

Invalidate:

i. Only one invalidation is present

ii. Invalidation is performed for entire cache block

iii. Access time is high

7. What are the different types of access in distributed shared memory architecture?

i, Local:

If the processor refers the local memory then it is called local access.

ii. Remote:

If the processor refers the other process memory then it is called remote access

8. What are the disadvantages of remote access?

• Compiler mechanism for cache coherence is very limited

• Without the cache coherence property the multiprocessor system loss the

advantage of fetch and use multiple words

• Prefetch is very useful only when the multiprocessor fetch multiple word

9. What are the states available in directory based protocol?

i. Shared:-One or more processor can have the copies of same dat.

ii. Uncached :- No processor has the copy of data block.

iii. Exclusive:- Exactly one processor has the copy of data block.

10. What are the nodes available in distributed system?

i. Local Node

ii. Home Node

iii. Remote Node

11. Define Synchronization.

Synchronization is the mechanism that is build with user level software routine,

which depends on hardware supplied synchronization instruction.

12. Name the basic hardware primitives.

i. Atomic Exchange

ii. Test and set

iii. Fetch and Increment

13. Define spinlock.

It is a lock that a processor continuously tries to acquire spinning around a loop until it

succeeds

It is mainly used when the programmer wants to use the lock for a small period of time

14. What are the mechanism to implement locks?

There are two methods to implement the locks.

i. Implementing lock without using cache coherence

ii. Implementing lock using cache coherence.

15. What are the advantage of using spin lock?

There are two advantages of using spin lock

i. They have low overhead

ii. Performance is high

16. Name the synchronization mechanisms for large scale multiprocessor.

i. Exponential back off

ii. queuing locks

iii. combining tree

17. What are the two primitives used for implementing synchronization?

• Lock Based Implementation

• Barrier based Implementation

18. Define sequential consistency.

It requires that the result of any execution be the same as, if the memory access executed

by each processor where kept in order and accesses among different processor are

interleaved.

It reduces the amount of incorrect execution

19. Define multithreading.

The process of executing the multiple thread by common memory or common

processor in which the execution is done is overlapping fashion.

20. What are the types of multi threading?

i. Fine grained multithreading:- It has the ability to switch threads for each

instruction

ii. coarse grained multithreading:- It has the ability to switch the threads only for

costly stalls.

Unit-4

1. Define cache.

Cache is the name given to the first level of the memory hierarchy encountered once

the address leaves the CPU.

Eg: file caches, name caches.

2. What are the factors on which the cache miss depends on?

The time required for the cache miss depends on both

• Latency

• Bandwidth

3. What is the principle of locality?

Program access a relatively small portion of the address space at any instant of

time is called principle of locality.

4. What is called pages?

The address space is usually broken into fixed-size blocks, called pages. Each

page resides either in main memory or on disk.

5. What is called memory stall cycles?

The number of cycles during which the CPU is stalled waiting for a memory

access is called memory stall cycles.

6. Write down the formula for calculating average memory access time?

Average memory access time=Hit time+Miss rate*Miss penalty.

When hit time is the time to hit in the cache, the formula can help us decide

between split caches and a unified cache.

7. What are the techniques to reduce the miss rate?

• Larger block size

• Larger caches

• Higher associativity

• Way prediction and pseudo associative caches

• Compiler optimizations.

8. What are the techniques to reduce hit time?

• Small and simple cache: direct mapped

• Avoid address translation during indexing of the cache

• Pipelined cache access

• Trace cache

9. List out the types of storage devices.

• Magnetic storages : disk, floppy, tape

• Optical storages : compact disks(CD), digital/video/ verstaile

disks(DVD)

• Electrical storage : flash memory

10. What is sequence recorded?

The sequence recorded on the magnetic medics is a sector number, a gap, the

information for that sector including error correction code, a gap, the sector number of

the next sector and so on.

11. What is termed as cylinder?

The term cylinder is used to refer to all the tracks under the arms at a given point

on all surfaces.

12. List the components to a disk access.

There are three mechanical components to a disk access:

• Rotation latency

• Transfer time

• Seek time

13. What is average seek time?

Average seek time is the sum of the time for all possible seeks divided by the

number of possible seek. Average seek times are advertised to be 5 ms to 12 ms.

14. What is transfer time

Transfer time is the time it takes to transfer a block of bits, typically a sector,

under the read-write head. This time is a function of the block size, disk size, rotation

speed, recording density of the track, and speed of the electronics connecting the disk to

computer.

15. Write the formula to calculate the CPU execution time.

CPU execution time=(CPU clock cycles+ memory stall cycles)*clock cycle time.

16. Write the formula to calculate the CPU time.

CPU time=(CPU execution clock cycles+ memory stall clock cycles)* clock cycle

time.

17. Define miss penalty for an out of order execution processor.

For an out of order execution processor, miss penalty is defined as follows.

(Memory stall cycles/Instruction) *( misses/instruction) *(total miss latencyoverlapped

miss latency.

18. What are the techniques available to reduce cache penalty or miss rate via parallelism?

The three techniques that overlap the execution of instructions are

1.Non blocking caches to reduce stalls on cache miss- to match the out of

order processors

2.Hardware prefetching of instructions and data

3.Compiler- controlled prefetching.

19. How are the conflict misses divided?

The four divisions of conflict misses are,

• Eight way

• Four way

• Two way

• One way

20. List the advantage of memory hierarchy?

Memory hierarchy takes advantageof

a.locality

b.cost/performance of memory technologies

22. What is the goal of memory hierarchy?

The goal is to provide a memory system with

*cost almost as low as the cheapest level of memory

*speed almost as fast as the faster level

23. Define cache hit ?

When the cpu finds a requests data item in the cache, it is called a cache hit.

*Hit Rate: the fraction of cache access found in the cache

*Hit Time: time to access the upperlevel which consists of RAM access

time+Time to determine hit\miss

24.Define cache miss?

When the cpu doesnot find a data item it needs in the cache, a cache miss occurs

*Miss Rate-1-(Hit Rate)

*Miss penalty-Time to replace a block in cache +time to deliver the block to the

processor

25. What does Latency and Bandwidth determine?

-Latency determine the time to retrieve the first word of the block

-Bandwidth determine the time to retrieve the rest of this block

26. What are the types of locality?

*Temporal locality(Locality in time)

*Spatial locality(Locality in space)

27. How does page fault occur?

When the cpu references an item within a page that is not present in the cache or main

memory, a page fault occurs, and the entire page is moved from the disk to main memory

28. What is called the miss penalty?

The number of memory stall cycles depends on both the number of misses and the

cost per miss, which is called the miss penalty

29. What is Average memory access time?

The average memory access time for processors is the better measure of memory

hierarchy performance with in-order execution

30. What are the categories of cache miss(3cs of cache miss)

*compulsory

*capacity

*conflict

31. What are the techniques to reduce miss penalty?

*multi-level caches

*critical word first and early restart

*giving priority to read misses over writes

*Merging writes buffer

*victim caches

UNIT-5

1) What is the function of Power Processing Unit?

*a full set of 64-bit power pc register.

*32-168 bit vector multimedia register.

*a 32 KB LI data cache.

*a 32 KB LI instruction cache.

2) List out the disadvantages of Heterogeneous multi-core processors?

*Developer productivity.

*Portability.

*Manage ability.

3) Define Software Multithreading

Software multithreading is a piece of software that is aware of more than one

core/processor and can use these to be able to simultaneously complete multiple tasks.

4) Define Hardware Multithreading

Hardware multithreading is a multithreading that allows multiple to share the functional

units of a single processor in an overlapping fashion.

5) Difference between Software and Hardware Multithreading

*Multithreading(Computer Architecture), multithreading in hardware.

*Thread(Computer Science), multithreading in software.

6) List some advantages of Software Multithreading.

*Increased responsiveness and worker productivity.

-Increased application responsiveness when different tasks run in parallel.

*Improved performance in parallel environments.

-When running computations on multiple processors.

*More computations per cubic foot of data center.

-Web based applications are often multi-threaded in nature.

7) List out the two approaches of Hardware Multithreading.

The two main approaches in Hardware multithreading are

*Fine-grain Multithreading.

*Coarse-grain Multithreading.

8) Define Simultaneous Multithreading(SMT)

SMT is a variation on multithreading that uses resources of a multiple –issue,

dynamically scheduled processor to exploit ILP at the samw time it exploits ILP. ie., convert

thread-level parallelism into more ILP.

9) Give the features exploited by SMT.

It exploits the following features of modern processors

*Multiple Functional Units.

-Modern Processors typically have more functional units available than a

single thread can utilize.

*Register Renaming and Dynamic Scheduling.

execute.

-Multiple instructions from independent threads can co-exist and co-

10) What are the Design challenges of SMT?

The Design Challenges of SMT processor includes the following-

*Larger Files needed to hold multiple contents.

*Not affecting clock cycle time.

*Instruction issue-more candidate instructions need to be considered.

*Instruction comlpletion-choosing which instructions to commit may be challenging.

*Ensuring that cache and TLB conflicts generated by SMT do not degrade performance.

11) Compare the SMT processor with the base Superscalar Processor

The SMT processor are compared to the base superscalar processor in several key

measures

*Utilization of functional units.

*Utilization of Fetch units.

*Accuracy of branch predictors.

*Hit rates of primary caches.

*Hit rates of secondary caches.

12) List the factors that limits the issue slot usage

The issue slot usage is limited by the following factors.

*Imbalances in resources needs.

*Resources availabilty over multiple threads.

*Number of active threads considered.

*Finite Limitations of buffer.

*Ability to fetch enough instruction from multiple threads.

13) Define Multi-core microprocessor

A multi-core microprocessor is one that combines two or more separate processors in one

package.

14) What is Heterogeneous Multi-core processors?

Herogeneous Multi-core processor is a processor in which multiple cores of different

types are implemented in one CPU.

15) List out the advantages of Herogeneous Multi-core processors.

*Massive parallelism.

*Specialization of Hardware for tools.

16) List out the Disadvantages of Herogeneous Multi-core processors.

*Developer productivity.

*Portability.

*Manageability.

17) What is IBM cell processor?

The IBM cell processor is a heterogeneous multi-core processor comprised of controlintensive

processor and computative-intensive SIMD processor cores, each with its own

distinguishing feature.

18) List the components of IBM cell architecture

*Power Processing Elements(PPE).

*Synergistic Processor Elements(SPE).

*I/O controller.

*Element Interconnect Bus(EIB).

19) What are the components of PPE?

The PPE is made out of two main units..

1.Power Processor Unit(PPU)

2.Power Processor Storage Subsystem(PPSS)

20) What is Memory Flow Controller(MFC)?

The Memory Flow Controller is actually the interface between the Synergistic

Processor(SPU) and the rest of the cell chip. Actually, the MFC interfaces the SPU with the EIB.

Natchatran

Friday, March 7, 2014

CS2354 - Advanced Computer Architecture

Syllabus

PART A Question and Answer

CS2354 - Advanced Computer Architecture- PART A Question and Answer

Natchatran

Friday, March 7, 2014

CS2354 - Advanced Computer Architecture

Syllabus PART A Question and Answer

CS2354 - Advanced Computer Architecture- PART A Question and Answer

Syllabus

PART A Question and Answer