Quantization Pipeline Exercise

The current pipeline implementation contains several hardcoded values that could be made configurable to increase its flexibility and usability. This exercise encourages you to enhance the pipeline by parametrizing these values.

Enhancement Opportunities

1. Quantization Format Support

Currently supports only INT8 and INT4 quantization

Potential enhancement: Add support for FP8 format

  • Consider implementing a configurable quantization format selector

2. Data Calibration Parameters

The following calibration-related values are currently hardcoded and could be made configurable:

  • NUM_CALIBRATION_SAMPLES: Number of samples used for calibration

  • DATASET_ID: The dataset identifier used for calibration

  • DATASET_SPLIT: The specific split of the dataset to use

3. Quantization Configuration

Several quantization parameters could be exposed for customization:

  • DAMPENING_FRAC: Dampening fraction for the quantization process

  • OBSERVER: The type of observer used for quantization

  • GROUP_SIZE: Size of the quantization groups

  • Quantization mappings for different model components

4. Model Evaluation

The current evaluation is specifically configured for GSM8K parameters. Possible improvements include:

  • Adding support for different evaluation tasks

  • Making evaluation parameters configurable

  • Supporting the full range of lm_eval tasks

5. Hardware Acceleration

Current implementation hardcodes nvidia.com/gpu as the accelerator type

Potential enhancement: enhanced to support different accelerator types.

Exercise Goals

  1. Choose one or more of these areas for improvement

  2. Implement the parametrization of your chosen components

  3. Test the enhanced pipeline with different configurations

  4. Bonus: Send a PR to add it to the official material at pull request

This exercise will help you understand the pipeline’s architecture while making it more versatile for different use cases and environments.