Quantization Pipeline Exercise

The current pipeline implementation contains several hardcoded values that could be made configurable to increase its flexibility and usability. This exercise encourages you to enhance the pipeline by parametrizing these values.

Enhancement Opportunities

1. Quantization Format Support

Currently supports only INT8 and INT4 quantization

Potential enhancement: Add support for FP8 format

Consider implementing a configurable quantization format selector

2. Data Calibration Parameters

The following calibration-related values are currently hardcoded and could be made configurable:

NUM_CALIBRATION_SAMPLES: Number of samples used for calibration
DATASET_ID: The dataset identifier used for calibration
DATASET_SPLIT: The specific split of the dataset to use

3. Quantization Configuration

Several quantization parameters could be exposed for customization:

DAMPENING_FRAC: Dampening fraction for the quantization process
OBSERVER: The type of observer used for quantization
GROUP_SIZE: Size of the quantization groups
Quantization mappings for different model components

4. Model Evaluation

The current evaluation is specifically configured for GSM8K parameters. Possible improvements include:

Adding support for different evaluation tasks
Making evaluation parameters configurable
Supporting the full range of lm_eval tasks

5. Hardware Acceleration

Current implementation hardcodes nvidia.com/gpu as the accelerator type

Potential enhancement: enhanced to support different accelerator types.

Exercise Goals

Choose one or more of these areas for improvement
Implement the parametrization of your chosen components
Test the enhanced pipeline with different configurations
Bonus: Send a PR to add it to the official material at pull request

This exercise will help you understand the pipeline’s architecture while making it more versatile for different use cases and environments.