Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Automatic multithreading support in encoding? (Read 2252 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Automatic multithreading support in encoding?

While reading the dual core P4 reviews, I found this:
http://www.hexus.net/content/reviews/revie...nVybF9wYWdlPTk=

On this page, there is a comparison of Lame with and without multithreading support.
It appears that multithreading provides benefits on dual cores and hyperthreaded processors.

The interesting part is that in Lame there is ZERO multithreading support in the source code. All the benefits shown in this comparison are cause by automatic compiler optimisations.

I checked an Intel compiler doc, and there is effectively a /Qparallel switch to automatically use threads:
http://www.intel.com/software/products/com...ptimization.pdf

I am quite surprised by the results, as the speed increase is substancial, especially for something only handled on the compiler side.

Automatic multithreading support in encoding?

Reply #1
aren't applications running slower on a HT enabled Pentium iv? maybe this switch just compensates it by assigning the complete resources of the p4 to one application program?

Automatic multithreading support in encoding?

Reply #2
Are those results reproducible?

Sometime I find those (p)reviews and tests quite "erratic"
Vital papers will demonstrate their vitality by spontaneously moving from where you left them to where you can't find them.

Automatic multithreading support in encoding?

Reply #3
Intel's documentation has something more about it than that pdf.
Quote
Programming with Auto-parallelization
The auto-parallelization feature implements some concepts of OpenMP*, such as worksharing construct (with the parallel for directive). This section provides specifics of auto-parallelization.

Guidelines for Effective Auto-parallelization Usage
A loop is parallelizable if:

The loop is countable at compile time. This means that an expression representing how many times the loop will execute (also called "the loop trip count") can be generated just before entering the loop.
There are no FLOW (READ after WRITE), OUTPUT (WRITE after READ) or ANTI (WRITE after READ) loop-carried data dependences. A loop-carried data dependence occurs when the same memory location is referenced in different iterations of the loop. At the compiler's discretion, a loop may be parallelized if any assumed inhibiting loop-carried dependencies can be resolved by run-time dependency testing.

The compiler may generate a run-time test for the profitability of executing in parallel for loop with loop parameters that are not compile-time constants.

Coding Guidelines
Enhance the power and effectiveness of the auto-parallelizer by following these coding guidelines:

Expose the trip count of loops whenever possible. Specifically use constants where the trip count is known and save loop parameters in local variables.
Avoid placing structures inside loop bodies that the compiler may assume to carry dependent data, for example, function calls, ambiguous indirect references, or global references.

Auto-parallelization Data Flow
For auto-parallelization processing, the compiler performs the following steps:

Data flow analysis
Loop classification
Dependence analysis
High-level parallelization
Data partitioning
Multi-threaded code generation
These steps include:

Data flow analysis: compute the flow of data through the program
Loop classification: determine loop candidates for parallelization based on correctness and efficiency as shown by threshold analysis
Dependence analysis: compute the dependence analysis for references in each loop nest
High-level parallelization:
analyze dependence graph to determine loops which can execute in parallel.
compute run-time dependency
Data partitioning: examine data reference and partition based on the following types of access: shared, private, and firstprivate.
Multi-threaded code generation:
modify loop parameters
generate entry/exit per threaded task
generate calls to parallel runtime routines for thread creation and synchronization

 

Automatic multithreading support in encoding?

Reply #4
I've used the auto-paralleliser in ICC (for linux) and tried to measure a speed difference on an HT-enabled P4.  It ran faster but the results from the experiment were totally wrong.  So it messed up the code a bit, it looks like.