Natchatran
NATCHATRAN
Natchatran Blogs includes Technical Tutorials, E-books, Notes, Lab Manual, Question Banks, Viva questions and Interview questions for engineering students, provides all study material for cse students.
-Natchatran(Prem Anandh.J)
Friday, March 7, 2014
CS2354 - Advanced Computer Architecture- PART A Question and Answer
CS2354-
ADVANCED COMPUTER ARCHITECTURE
PART A Question and Answer
2 Marks
UNIT
I
1. What is Instruction Level
parallelism?
The technique which is used to
overlap the execution of instructions and improve
performance is called ILP.
2. What are the approaches to
exploit ILP?
The two separable approaches to
exploit ILP are,
• Dynamic or
hardware intensive approach
• Static or Compiler
intensive approach
3. What is pipelining?
Pipelining is an implementation
technique whereby multiple instructions are overlapped
in execution when they are
independent of one another.
4. Write down the formula to
calculate the pipeline CPI?
The value of the CPI (Cycles per
Instruction) for the pipelined processor is the sum of the
CPI and all contributions from
stalls.
Pipeline CPI = Ideal pipeline CPI
+ structural stalls + Data hazard stalls + control stalls.
5. What is loop level
parallelism?
Loop level parallelism is a way
to increase the amount of parallelism available among
instructions is to exploit
parallelism among iterations of loop.
6. Give the methods to enhance
performance of ILP?
To obtain substantial performance
enhancements, the ILP across multiple basic blocks
are exploited using
• loop level
parallelism
• vector
instructions
7. List out the types of
dependences.
There are three different types
of dependences
• Data dependences
• Name dependences
• Control
dependences
8. What is Data hazard?
A hazard is created whenever
there is dependence between instructions and they are close
enough that the overlap caused by
pipelining, or other reordering of instructions, would change
the order of access to the
operand involved in the dependence.
9. Give the classification of
Data hazards
Data Hazards are classified into
three types depending on the order of read and write
accesses in the instructions
• RAW (Read After
Write)
• WAW (Write After
Write)
• WAR (Write After
Read)
10. List out the constraints
imposed by control dependences?
The two constraints imposed by
control dependencies are
• An instruction
that is control dependent on branch cannot be moved before the branch
so that its execution is no
longer controlled by the branch.
• An instruction
that is not control dependent on a branch cannot be moved after the
branch so that its execution is
controlled by the branch.
11. What are the properties used
for preserving control dependence?
Control dependence is preserved
by two properties in a simple pipeline.
• Instruction
execute in program order
• Detection of
control or branch hazards
12. Define Dynamic Scheduling?
Dynamic scheduling is a technique
in which the hardware rearranges the instruction
execution to reduce the stalls while
maintaining data flow and exception behavior.
13. List the advantages of
dynamic scheduling?
• It handles
dependences that are unknown at compile time.
• It simplifies
the compiler.
• Uses speculation
techniques to improve the performance.
14. What is score boarding?
Score boarding is defined as it
allows out of order execution when all the resources are
available and there is no data
dependence. It can’t be eliminated until these two hazards WAW,
WAR are cleared.
15. What are the advantages of
Tomosulo’s Approach?
• Distribution of
hazard detection layer
• Elimination of
WAR and WAW hazard
16. What are the types of branch
prediction?
There are two types of branch
prediction. They are,
• Dynamic branch
prediction
• Static branch
prediction
17. Define Amdahl’s Law?
This law states that particular
performance of the computer can be improved by
improving some portion of the
computer. This is known as Amdahl’s Law.
18. What are the things present
in Dynamic branch prediction?
It uses two things they are,
• Branch
prediction buffer
• Branch history
table
19. Define Correlating branch
prediction?
Branch prediction that uses the
behavior of other branches to make a prediction is called
correlating branch prediction.
20. What are the basic ideas of
pipeline scheduling?
The basic ideas of pipeline
scheduling are,
• To keep pipeline
full: Find sequence of unrelated instructions that can be
overlapped in the pipeline.
• To avoid
pipeline stall: Separate dependent instructions by a distance in clock
dependent instructions by a
distance in clock cycles equal to the pipeline latency
of that source instruction.
21. What are the four fields
involved in ROB?
ROB contains four fields,
• Instruction type
• Destination
field
• Value field
• Ready field
22. What is reservation station?
In Tomasulo’s scheme register
renaming is provided by reservation station. The basic
idea is that the reservation
station fetches and buffers an operand as soon as it is available,
eliminating the need to get the
operand from a register.
23. What is ROB?
ROB stands for reorder buffer. It
supplies operands in the interval between completion of
instruction execution and
instruction commit. ROB is similar to the store buffer in Tomasulo’s
algorithm.
24. What is imprecise exception?
An exception is imprecise if the
processor state when an exception is raised does not look
exactly as if the instructions
were executed sequentially in strict program order.
25. What are the two
possibilities of imprecise exceptions?
• If the pipeline
has already completed instructions that are later in program order
then that instruction will cause
exception.
• If the pipeline
has not yet completed instructions that are earlier in program order
then that instructions will cause
exception.
26. What are the two main
features preserved by maintaining both data and control dependence?
• Exception
behavior
• Data flow
27. What are the types of
dependence?
• Anti dependence
• Output
dependence
28. What is anti dependence?
An anti dependence between
instruction i and instruction j occurs when instruction j
writes a register or memory
location that instruction i reads. The original ordering must be
preserved to ensure that i read
the correct value.
29. What is output dependence?
An output dependence occurs when
instruction i and instruction j write the same register
or memory location. The ordering
between the instructions must be preserved to ensure that the
value finally written corresponds
to instruction j.
30. What is register renaming?
Renaming of register operand is
called register renaming. It can be either done statically
by the compiler or dynamically by
the hardware.
UNIT
2
1. Define VLIW.
VLIW is a technique for ILP by
executing instructions without dependencies in parallel.
The compiler analysis the program
and detects operations to be executed in parallel; such
operations are packed into one “
large” instruction.
2. List out the advantages of
VLIW processor.
Simple
hardware
Number of functional units can be
increased without needing additional
sophisticated hardware to detect
parallelism like in superscalus.
Good
compilers can detect parallelism based on global analysis of the whole
program.
3. Define EPIC
Epic
is Explicit Parallel Instruction Computing
It
is an architecture framework proposed by HP.
It
is based on VLIW and was designed to overcome the key limitations of VLIW
while simultaneously giving more
flexibility to compiler writers.
4. What is loop level analysis?
Loop level analysis involves
determining what depends exist among the operands in a
loop across the iterations of a
loop are data dependent on data values produced in earlier
iterations.
5. What are the types of Data
dependencies in loops?
Loop
Carried dependencies
Not
loop carried dependence
6. What is loop carried
dependence?
Data dependence between different
loop iterations (data produced in earlier iterations
used in a later one) is called a
loop carried dependence.
7. What are the tasks in finding
the dependence in a program?
There are 3 tasks. They are
Have
good scheduling of code
Determine
which loop might contain parallelism
Eliminate
name dependence
8. Define dependence analysis
algorithm.
Dependence analysis algorithm is
algorithm used to detect the dependence by the
compiler based on the assumptions
that
Array
indices are affine
There
exist GCD of the two affine indices
9. What is copy propagation?
Copy propagation is the algebraic
simplifications of expressions and an optimization
which eliminates operation that
copy values.
10. What is tree-height reduction
technique?
Tree-height reduction is
optimization which reduces the height of the tree structure
representing a computation,
making it wider but shorter.
11. What are the components of
software pipeline loop?
A
software pipeline loop consists of a loop body, start- up code and clean-up
code.
Start
up code is to execute code left out from the first original loop iterations.
Finish
code to execute instructions from the last original iterations.
12. What is trace scheduling?
Trace scheduling is way to
organize the process of global code motion it simplifies
instruction scheduling by
incuring the cost of possible code motion on the less critical
paths.
13. List out steps used for trace
scheduling.
Trace
selection
Trace
compaction
14. Define Inter-procedural
analysis.
A procedure with pointer
parameters and if we want to analyse the procedure across the
boundaries of the particular
procedure. It is called interprocedural analysis.
15. What is software pipelining?
It is a technique for
reorganizing loop such that each iteration in the code is made from
instructions chosen from
different iterations of original loop.
16. Define critical path.
Critical path is defined as the
longest sequence of dependent instructions in a program.
17. Define IA-64 processor.
The IA-64 is a RISC-Style,
register-register instruction set with the features designed to
support compiler based exploitation
of ILP.
18. What is CFM and what is its
use?
CFM
stands for Current Frame Pointer
CFM
pointer points to the set of registers to be used by a given procedure.
19. What are the parts of CFM
pointer?
There are two parts. They are
Local
area – Used for local storage
Output
area - Used to pass values to any called procedure.
20. What is Itanium processor?
Itanium processor is a
implementation of Intel IA-64 processor. It is capable of having 6
issues per clock cycle. The 6
issues includes 3 branches and 2 memory reference.
21. What are the parts of 10
stage pipeline in Itanium processor?
Front
end
Instruction
delivery(EXP, REN)
Operand
delivery(WLD, REG)
Execution(EXE,
DEG, WRB)
22. What are the limitations of
ILP?
Limitations
on hardware model
Limitations
on window size and maximum issue count
Effect
of finite register
Effects
of imperfect alias analysis
23. List the two techniques for
eliminating dependent computations
Software
pipelining
Trace
scheduling
24. Define Trace selection and
Trace compaction
Trace Selection
Trace selection tries to find a
likely sequence of basic blocks whose operations will be
put into small number of
instructions this sequence is called trace.
Trace Compaction
Trace compaction tries to squeeze
the trace into a small number
of wide instructions. Trace
compaction is code scheduling hence it attempts to move
operations as early as it can in
a sequence packing the operations into as few wide
instructions as possible.
25. Define Superblocks.
Superblocks are formed by a
process similar to that used for traces, but are a form of
extended basic blocks, which are
restricted to a single entry point but allow multiple exits.
26. Use of conditional or
predicted instructions.
Conditional or predicted
instructions are used to eliminate braches, converting a control
dependencies and potentially
improving performance.
27. Define Instruction Group
Instruction group is a sequence
of consecutive instructions with no register data dependencies
among them. All the instructions
in a group could be executed in parallel if sufficient
hardware resources existed and if
any dependences through memory were preserved.
28. Use of template field in
bundle.
The 5 bit template field within
each bundle describes both the presence of any stops
associated with the bundle and
the execution unit type required by each instruction within
the bundle.
29. List the two types of
speculation supported by IA 64 processor.
Control
Speculation
Memory
reference speculation
30. Define Advance loads.
Memory reference support in the
IA 64 uses a concept called advanced loads. Advance load
is a load that has been
speculatively moved above store instructions on which it is potentially
dependent. To speculatively
perform a load the ld.a instruction is used.
31. Define ALAT
Executing advance load
instructions created an entry in a special table called ALAT. It
stores both the register
destination of the load and the address of the accessed memory
location. When a store is
executed, an associative look up against the active ALAT
entries is performed. If there is
an ALAT entry with the same memory memory address
as the store, mark the ALAT entry
as invalid.
32. What are the functional units
in Itanium Processor?
There are nine functional units
in the Itanium processor.
Two I units
Two M units
Three B units
Two F units
All the functional units are
pipelined.
33. Define Scoreboard
In Itanium processor 10 stage
pipeline divided into 4 parts. In operand delivery part
scoreboard is used to detect when
individual instruction can proceed so that a start of one
instruction in a bundle need not
cause the entire bundle to stall.
34. Define Book Keeping Code
Basic block consists of 1 entry
and 1 exit code. This code is known as Book 1Keeping
Code.
1. Define cache coherence
problem?
Unit-3
Cache coherence problem describes
how two different processors can have two different
values for the memory location.
2. What are the two aspects of
cache coherence problem?
i. coherence- It determines what
value can be returned by the particular read
operation.
ii. Consistency- It determine
when the value may be returned by the read
operation.
3. What are the two types of
cache coherence protocol?
i. Directory based protocol.
ii. Snooping protocol.
4. Define Directory based
protocol.
The shared portion of the main
memory may be kept in one common place called
directory. From this directory we
can retrieve the data.
5. Name the different types of
snooping protocol.
i. invalidate protocol
ii. update/write broadcast
protocol.
6. Difference between write
Update and invalidate protocol.
Write update:
i. Multiple write broadcast is
present
ii. Here they consider separate
word for each cache block
iii. Access time is less
Invalidate:
i. Only one invalidation is
present
ii. Invalidation is performed for
entire cache block
iii. Access time is high
7. What are the different types
of access in distributed shared memory architecture?
i, Local:
If the processor refers the local
memory then it is called local access.
ii. Remote:
If the processor refers the other
process memory then it is called remote access
8. What are the disadvantages of
remote access?
• Compiler
mechanism for cache coherence is very limited
• Without the
cache coherence property the multiprocessor system loss the
advantage of fetch and use
multiple words
• Prefetch is very
useful only when the multiprocessor fetch multiple word
9. What are the states available
in directory based protocol?
i. Shared:-One or more processor
can have the copies of same dat.
ii. Uncached :- No processor has
the copy of data block.
iii. Exclusive:- Exactly one
processor has the copy of data block.
10. What are the nodes available
in distributed system?
i. Local Node
ii. Home Node
iii. Remote Node
11. Define Synchronization.
Synchronization is the mechanism
that is build with user level software routine,
which depends on hardware
supplied synchronization instruction.
12. Name the basic hardware
primitives.
i. Atomic Exchange
ii. Test and set
iii. Fetch and Increment
13. Define spinlock.
It is a lock that a processor
continuously tries to acquire spinning around a loop until it
succeeds
It is mainly used when the
programmer wants to use the lock for a small period of time
14. What are the mechanism to
implement locks?
There are two methods to
implement the locks.
i. Implementing lock without
using cache coherence
ii. Implementing lock using cache
coherence.
15. What are the advantage of
using spin lock?
There are two advantages of using
spin lock
i. They have low overhead
ii. Performance is high
16. Name the synchronization
mechanisms for large scale multiprocessor.
i. Exponential back off
ii. queuing locks
iii. combining tree
17. What are the two primitives
used for implementing synchronization?
• Lock Based
Implementation
• Barrier based
Implementation
18. Define sequential
consistency.
It requires that the result of
any execution be the same as, if the memory access executed
by each processor where kept in
order and accesses among different processor are
interleaved.
It reduces the amount of
incorrect execution
19. Define multithreading.
The process of executing the
multiple thread by common memory or common
processor in which the execution
is done is overlapping fashion.
20. What are the types of multi
threading?
i. Fine grained multithreading:-
It has the ability to switch threads for each
instruction
ii. coarse grained
multithreading:- It has the ability to switch the threads only for
costly stalls.
Unit-4
1. Define cache.
Cache is the name given to the
first level of the memory hierarchy encountered once
the address leaves the CPU.
Eg: file caches, name caches.
2. What are the factors on which
the cache miss depends on?
The time required for the cache
miss depends on both
• Latency
• Bandwidth
3. What is the principle of
locality?
Program access a relatively small
portion of the address space at any instant of
time is called principle of
locality.
4. What is called pages?
The address space is usually
broken into fixed-size blocks, called pages. Each
page resides either in main
memory or on disk.
5. What is called memory stall
cycles?
The number of cycles during which
the CPU is stalled waiting for a memory
access is called memory stall
cycles.
6. Write down the formula for
calculating average memory access time?
Average memory access time=Hit
time+Miss rate*Miss penalty.
When hit time is the time to hit
in the cache, the formula can help us decide
between split caches and a
unified cache.
7. What are the techniques to
reduce the miss rate?
• Larger block
size
• Larger caches
• Higher
associativity
• Way prediction
and pseudo associative caches
• Compiler
optimizations.
8. What are the techniques to
reduce hit time?
• Small and simple
cache: direct mapped
• Avoid address
translation during indexing of the cache
• Pipelined cache
access
• Trace cache
9. List out the types of storage
devices.
• Magnetic
storages : disk, floppy, tape
• Optical storages
: compact disks(CD), digital/video/ verstaile
disks(DVD)
• Electrical
storage : flash memory
10. What is sequence recorded?
The sequence recorded on the
magnetic medics is a sector number, a gap, the
information for that sector
including error correction code, a gap, the sector number of
the next sector and so on.
11. What is termed as cylinder?
The term cylinder is used to
refer to all the tracks under the arms at a given point
on all surfaces.
12. List the components to a disk
access.
There are three mechanical
components to a disk access:
• Rotation latency
• Transfer time
• Seek time
13. What is average seek time?
Average seek time is the sum of
the time for all possible seeks divided by the
number of possible seek. Average
seek times are advertised to be 5 ms to 12 ms.
14. What is transfer time
Transfer time is the time it
takes to transfer a block of bits, typically a sector,
under the read-write head. This
time is a function of the block size, disk size, rotation
speed, recording density of the
track, and speed of the electronics connecting the disk to
computer.
15. Write the formula to
calculate the CPU execution time.
CPU execution time=(CPU clock
cycles+ memory stall cycles)*clock cycle time.
16. Write the formula to
calculate the CPU time.
CPU time=(CPU execution clock
cycles+ memory stall clock cycles)* clock cycle
time.
17. Define miss penalty for an
out of order execution processor.
For an out of order execution
processor, miss penalty is defined as follows.
(Memory stall cycles/Instruction)
*( misses/instruction) *(total miss latencyoverlapped
miss latency.
18. What are the techniques
available to reduce cache penalty or miss rate via parallelism?
The three techniques that overlap
the execution of instructions are
1.Non blocking caches to reduce
stalls on cache miss- to match the out of
order processors
2.Hardware prefetching of
instructions and data
3.Compiler- controlled
prefetching.
19. How are the conflict misses
divided?
The four divisions of conflict
misses are,
• Eight way
• Four way
• Two way
• One way
20. List the advantage of memory
hierarchy?
Memory hierarchy takes
advantageof
a.locality
b.cost/performance of memory
technologies
22. What is the goal of memory
hierarchy?
The goal is to provide a memory
system with
*cost almost as low as the
cheapest level of memory
*speed almost as fast as the
faster level
23. Define cache hit ?
When the cpu finds a requests
data item in the cache, it is called a cache hit.
*Hit Rate: the fraction of cache
access found in the cache
*Hit Time: time to access the
upperlevel which consists of RAM access
time+Time to determine hit\miss
24.Define cache miss?
When the cpu doesnot find a data
item it needs in the cache, a cache miss occurs
*Miss Rate-1-(Hit Rate)
*Miss penalty-Time to replace a
block in cache +time to deliver the block to the
processor
25. What does Latency and
Bandwidth determine?
-Latency determine the time to
retrieve the first word of the block
-Bandwidth determine the time to
retrieve the rest of this block
26. What are the types of
locality?
*Temporal locality(Locality in
time)
*Spatial locality(Locality in
space)
27. How does page fault occur?
When the cpu references an item
within a page that is not present in the cache or main
memory, a page fault occurs, and
the entire page is moved from the disk to main memory
28. What is called the miss
penalty?
The number of memory stall cycles
depends on both the number of misses and the
cost per miss, which is called
the miss penalty
29. What is Average memory access
time?
The average memory access time
for processors is the better measure of memory
hierarchy performance with
in-order execution
30. What are the categories of
cache miss(3cs of cache miss)
*compulsory
*capacity
*conflict
31. What are the techniques to
reduce miss penalty?
*multi-level caches
*critical word first and early
restart
*giving priority to read misses
over writes
*Merging writes buffer
*victim caches
UNIT-5
1) What is the function of Power
Processing Unit?
*a full set of 64-bit power pc
register.
*32-168 bit vector multimedia
register.
*a 32 KB LI data cache.
*a 32 KB LI instruction cache.
2) List out the disadvantages of
Heterogeneous multi-core processors?
*Developer productivity.
*Portability.
*Manage ability.
3) Define Software Multithreading
Software multithreading is a
piece of software that is aware of more than one
core/processor and can use these
to be able to simultaneously complete multiple tasks.
4) Define Hardware Multithreading
Hardware multithreading is a
multithreading that allows multiple to share the functional
units of a single processor in an
overlapping fashion.
5) Difference between Software
and Hardware Multithreading
*Multithreading(Computer
Architecture), multithreading in hardware.
*Thread(Computer Science),
multithreading in software.
6) List some advantages of
Software Multithreading.
*Increased responsiveness and
worker productivity.
-Increased application
responsiveness when different tasks run in parallel.
*Improved performance in parallel
environments.
-When running computations on
multiple processors.
*More computations per cubic foot
of data center.
-Web based applications are often
multi-threaded in nature.
7) List out the two approaches of
Hardware Multithreading.
The two main approaches in
Hardware multithreading are
*Fine-grain Multithreading.
*Coarse-grain Multithreading.
8) Define Simultaneous
Multithreading(SMT)
SMT is a variation on
multithreading that uses resources of a multiple –issue,
dynamically scheduled processor
to exploit ILP at the samw time it exploits ILP. ie., convert
thread-level parallelism into
more ILP.
9) Give the features exploited by
SMT.
It exploits the following
features of modern processors
*Multiple Functional Units.
-Modern Processors typically have
more functional units available than a
single thread can utilize.
*Register Renaming and Dynamic
Scheduling.
execute.
-Multiple instructions from
independent threads can co-exist and co-
10) What are the Design
challenges of SMT?
The Design Challenges of SMT
processor includes the following-
*Larger Files needed to hold
multiple contents.
*Not affecting clock cycle time.
*Instruction issue-more candidate
instructions need to be considered.
*Instruction comlpletion-choosing
which instructions to commit may be challenging.
*Ensuring that cache and TLB
conflicts generated by SMT do not degrade performance.
11) Compare the SMT processor
with the base Superscalar Processor
The SMT processor are compared to
the base superscalar processor in several key
measures
*Utilization of functional units.
*Utilization of Fetch units.
*Accuracy of branch predictors.
*Hit rates of primary caches.
*Hit rates of secondary caches.
12) List the factors that limits
the issue slot usage
The issue slot usage is limited
by the following factors.
*Imbalances in resources needs.
*Resources availabilty over
multiple threads.
*Number of active threads
considered.
*Finite Limitations of buffer.
*Ability to fetch enough
instruction from multiple threads.
13) Define Multi-core
microprocessor
A multi-core microprocessor is
one that combines two or more separate processors in one
package.
14) What is Heterogeneous
Multi-core processors?
Herogeneous Multi-core processor
is a processor in which multiple cores of different
types are implemented in one CPU.
15) List out the advantages of
Herogeneous Multi-core processors.
*Massive parallelism.
*Specialization of Hardware for
tools.
16) List out the Disadvantages of
Herogeneous Multi-core processors.
*Developer productivity.
*Portability.
*Manageability.
17) What is IBM cell processor?
The IBM cell processor is a
heterogeneous multi-core processor comprised of controlintensive
processor and
computative-intensive SIMD processor cores, each with its own
distinguishing feature.
18) List the components of IBM
cell architecture
*Power Processing Elements(PPE).
*Synergistic Processor
Elements(SPE).
*I/O controller.
*Element Interconnect Bus(EIB).
19) What are the components of
PPE?
The PPE is made out of two main
units..
1.Power Processor Unit(PPU)
2.Power Processor Storage
Subsystem(PPSS)
20) What is Memory Flow
Controller(MFC)?
The Memory Flow Controller is
actually the interface between the Synergistic
Processor(SPU) and the rest of the cell chip.
Actually, the MFC interfaces the SPU with the EIB.
Subscribe to:
Posts (Atom)