Module

`lazyllm.module.ModuleBase`

Bases: SessionConfigableBase

ModuleBase is the core base class in LazyLLM, defining the common interface and fundamental capabilities for all modules.
It abstracts training, deployment, inference, and evaluation logic, while also providing mechanisms for submodule management, hook registration, parameter passing, and recursive updates.
Custom modules should inherit from ModuleBase and implement the forward method to define specific inference logic.

Key Features

Unified management of submodules, automatically tracking held ModuleBase instances.
Support for Option type hyperparameters, enabling grid search and automated tuning.
Hook system that allows executing custom logic before and after calls.
Encapsulated update pipeline covering training, server deployment, and evaluation.
Built-in evalset loading and parallel inference evaluation.

Parameters:

return_trace (bool, default: False ) –

Whether to write inference results into the trace queue for debugging and tracking. Default is False.

Use Cases

When combining some or all of training, deployment, inference, and evaluation capabilities, e.g., an embedding model requiring both training and inference.
When you want to recursively manage submodules through root-level methods such as start, update, and eval.
When you want user parameters to be automatically propagated from outer modules to inner implementations (see WebModule).
When you want the module to support parameter grid search (see TrialModule).

Examples:

>>> import lazyllm
>>> class Module(lazyllm.module.ModuleBase):
...     pass
... 
>>> class Module2(lazyllm.module.ModuleBase):
...     def __init__(self):
...         super(__class__, self).__init__()
...         self.m = Module()
... 
>>> m = Module2()
>>> m.submodules
[<Module type=Module>]
>>> m.m3 = Module()
>>> m.submodules
[<Module type=Module>, <Module type=Module>]

Source code in lazyllm/module/module.py

class ModuleBase(SessionConfigableBase, metaclass=_MetaBind):
    """ModuleBase is the core base class in LazyLLM, defining the common interface and fundamental capabilities for all modules.  
It abstracts training, deployment, inference, and evaluation logic, while also providing mechanisms for submodule management, hook registration, parameter passing, and recursive updates.  
Custom modules should inherit from ModuleBase and implement the ``forward`` method to define specific inference logic.  

Key Features:
    - Unified management of submodules, automatically tracking held ModuleBase instances.
    - Support for Option type hyperparameters, enabling grid search and automated tuning.
    - Hook system that allows executing custom logic before and after calls.
    - Encapsulated update pipeline covering training, server deployment, and evaluation.
    - Built-in evalset loading and parallel inference evaluation.

Args:
    return_trace (bool): Whether to write inference results into the trace queue for debugging and tracking. Default is ``False``.

Use Cases:
    1. When combining some or all of training, deployment, inference, and evaluation capabilities, e.g., an embedding model requiring both training and inference.
    2. When you want to recursively manage submodules through root-level methods such as ``start``, ``update``, and ``eval``.
    3. When you want user parameters to be automatically propagated from outer modules to inner implementations (see WebModule).
    4. When you want the module to support parameter grid search (see TrialModule).


Examples:
    >>> import lazyllm
    >>> class Module(lazyllm.module.ModuleBase):
    ...     pass
    ... 
    >>> class Module2(lazyllm.module.ModuleBase):
    ...     def __init__(self):
    ...         super(__class__, self).__init__()
    ...         self.m = Module()
    ... 
    >>> m = Module2()
    >>> m.submodules
    [<Module type=Module>]
    >>> m.m3 = Module()
    >>> m.submodules
    [<Module type=Module>, <Module type=Module>]
    """
    builder_keys = []  # keys in builder support Option by default

    def __new__(cls, *args, **kw):
        sig = inspect.signature(cls.__init__)
        paras = sig.parameters
        values = list(paras.values())[1:]  # paras.value()[0] is self
        for i, p in enumerate(args):
            if isinstance(p, Option):
                ann = values[i].annotation
                assert ann == Option or (isinstance(ann, (tuple, list)) and Option in ann), \
                    f'{values[i].name} cannot accept Option'
        for k, v in kw.items():
            if isinstance(v, Option):
                ann = paras[k].annotation
                assert ann == Option or (isinstance(ann, (tuple, list)) and Option in ann), \
                    f'{k} cannot accept Option'
        return object.__new__(cls)

    def __init__(self, *, return_trace=False, id: Optional[str] = None, name: Optional[str] = None,
                 group_id: Optional[str] = None):
        super().__init__(id=id, name=name, group_id=group_id)
        self._submodules = []
        self._evalset = None
        self._return_trace = return_trace
        self.mode_list = ('train', 'server', 'eval')
        self._used_by_moduleid = None
        self._options = []
        self.eval_result = None
        self._use_cache: Union[bool, str] = False
        self._hooks = []
        register_hooks(self, resolve_builtin_hooks(self))

    def __setattr__(self, name: str, value):
        if isinstance(value, ModuleBase):
            self._submodules.append(value)
        elif isinstance(value, Option):
            self._options.append(value)
        elif name.endswith('_args') and isinstance(value, dict):
            for v in value.values():
                if isinstance(v, Option):
                    self._options.append(v)
        return super().__setattr__(name, value)

    def __getattr__(self, key):
        def _setattr(v, *, _return_value=self, **kw):
            k = key[:-7] if key.endswith('_method') else key
            if isinstance(v, tuple) and len(v) == 2 and isinstance(v[1], dict):
                kw.update(v[1])
                v = v[0]
            if len(kw) > 0:
                setattr(self, f'_{k}_args', kw)
            setattr(self, f'_{k}', v)
            if hasattr(self, f'_{k}_setter_hook'): getattr(self, f'_{k}_setter_hook')()
            return _return_value
        keys = self.__class__.builder_keys
        if key in keys:
            return _setattr
        elif key.startswith('_') and key[1:] in keys:
            return None
        elif key.startswith('_') and key.endswith('_args') and (key[1:-5] in keys or f'{key[1:-4]}method' in keys):
            return dict()
        raise AttributeError(f'{self.__class__} object has no attribute {key}')

    def __call__(self, *args, **kw):
        """
Convenience wrapper for extract_and_store(data, algo_id), performing extraction and persistence then returning the result.

Args:
    data (Union[str, List[DocNode]]): Text or list of DocNodes.
    algo_id (str, optional): Algorithm/Document name.

**Returns:**

- ExtractResult: Extraction result.
"""
        kw.update(locals['global_parameters'].get(self._module_id, dict()))
        if (files := locals['lazyllm_files'].get(self._module_id)) is not None: kw['lazyllm_files'] = files
        if (history := locals['chat_history'].get(self._module_id)) is not None: kw['llm_chat_history'] = history

        if args and isinstance(args[0], kwargs): args, kw = [], {**args[0], **kw}

        r = execution_with_hooks(
            self, *args, map_exception=lambda e: _change_exception_type(e, ModuleExecutionError), **kw,
        )(self._call_impl)(*args, **kw)

        if self._return_trace:
            lazyllm.FileSystemQueue.get_instance('lazy_trace').enqueue(str(r))
        self._clear_usage()
        return r

    def _call_impl(self, *args, **kw):
        if self._use_cache and 'R' in lazyllm.config['cache_mode']:
            try:
                return module_cache.get(self.__cache_hash__, args, kw)
            except CacheNotFoundError:
                self._cache_miss_handler()
        with globals.stack_enter(self.identities):
            r = self.forward(**args[0], **kw) if args and isinstance(args[0], kwargs) else self.forward(*args, **kw)
        if self._use_cache and 'W' in lazyllm.config['cache_mode']:
            module_cache.set(self.__cache_hash__, args, kw, r)
        return r

    def _stream_output(self, text: str, color: Optional[str] = None, *, cls: Optional[str] = 'text'):
        payload = {'tag': cls, 'delta': colored_text(text, color)}
        FileSystemQueue().enqueue(json.dumps(payload))
        return ''

    @contextmanager
    def stream_output(self, stream_output: Optional[Union[bool, Dict]] = None):
        """Context manager for streaming output during inference or execution.  
When a dictionary is provided to ``stream_output``, a prefix and suffix can be specified along with optional colors.

Args:
    stream_output (Optional[Union[bool, Dict]]): Configuration for streaming output.

        - If True, enables default streaming output.
        - If a dictionary, may include:

            - 'prefix' (str): Text to output at the beginning.
            - 'prefix_color' (str, optional): Color of the prefix.
            - 'suffix' (str): Text to output at the end.
            - 'suffix_color' (str, optional): Color of the suffix.
"""
        if stream_output and isinstance(stream_output, dict) and (prefix := stream_output.get('prefix')):
            self._stream_output(prefix, stream_output.get('prefix_color'))
        yield
        if isinstance(stream_output, dict) and (suffix := stream_output.get('suffix')):
            self._stream_output(suffix, stream_output.get('suffix_color'))

    def used_by(self, module_id):
        """Mark which module is using the current module, indicating the calling relationship.  
Supports chaining by returning the module itself.

Args:
    module_id (str): Unique ID of the parent module that uses this module.

**Returns:**

- ModuleBase: Returns the module itself for method chaining.
"""
        self._used_by_moduleid = module_id
        return self

    def _clear_usage(self):
        globals['usage'].pop(self._module_id, None)

    # interfaces
    def forward(self, *args, **kw):
        """Forward computation interface that must be implemented by subclasses.  
This method defines the logic for receiving inputs and returning outputs, and is the core function of the module as a functor.

Args:
    *args: Variable positional arguments, subclass can define the input as needed.
    **kw: Variable keyword arguments, subclass can define the input as needed.
"""
        raise NotImplementedError

    def register_hook(self, hook_type: Union[LazyLLMHook, Callable]):
        """Register a hook to execute specific logic during module invocation.  
The hook must inherit from ``LazyLLMHook`` and can be used to add custom operations before or after the module's forward computation, such as logging or metrics collection.

Args:
    hook_type (LazyLLMHook): Hook object to register.
"""
        if not isinstance(hook_type, type) and not isinstance(hook_type, LazyLLMHook) and callable(hook_type):
            hook_type = LazyLLMFuncHook(hook_type)
        if not isinstance(hook_type, LazyLLMHook):
            raise TypeError(f'Invalid hook type: {type(hook_type)}, '
                            'must be subclass or instance of LazyLLMHook, or callable function')
        if hook_type not in self._hooks:
            self._hooks.append(hook_type)

    def unregister_hook(self, hook_type: LazyLLMHook):
        """Unregister a previously registered hook.  
If the hook exists in the module, it will be removed and no longer executed during module invocation.

Args:
    hook_type (LazyLLMHook): Hook object to unregister.
"""
        if hook_type in self._hooks:
            self._hooks.remove(hook_type)

    def clear_hooks(self):
        """Clear all hooks registered in the module.  
After calling this, the module will no longer execute any hook logic.
"""
        self._hooks = []

    def _get_train_tasks(self):
        """Define a training task. This function returns a training pipeline. Subclasses that override this function can be trained or fine-tuned during the update phase.


Examples:
    >>> import lazyllm
    >>> class MyModule(lazyllm.module.ModuleBase):
    ...     def _get_train_tasks(self):
    ...         return lazyllm.pipeline(lambda : 1, lambda x: print(x))
    ... 
    >>> MyModule().update()
    1
    """
        return None
    def _get_deploy_tasks(self):
        """Define a deployment task. This function returns a deployment pipeline. Subclasses that override this function can be deployed during the update/start phase.


Examples:
    >>> import lazyllm
    >>> class MyModule(lazyllm.module.ModuleBase):
    ...     def _get_deploy_tasks(self):
    ...         return lazyllm.pipeline(lambda : 1, lambda x: print(x))
    ... 
    >>> MyModule().start()
    1
    """
        return None
    def _get_post_process_tasks(self): return None

    def _set_mid(self, mid=None):
        self._config_id = mid if mid else str(uuid.uuid4().hex)
        return self

    @property
    def _module_id(self):
        return self._config_id

    @property
    def submodules(self):
        return self._submodules

    def evalset(self, evalset, load_f=None, collect_f=lambda x: x):
        """Set the evaluation set for the module.  
During ``update`` or ``eval``, the module will perform inference on the evaluation set, and the results will be stored in the ``eval_result`` variable.  

Args:
    evalset (Union[list, str]): Evaluation data list or path to an evaluation data file.
    load_f (Optional[Callable]): Function to load and parse the evaluation file into a list if ``evalset`` is a file path, default is None.
    collect_f (Callable): Function to post-process evaluation results, default is ``lambda x: x``.


Examples:
    >>> import lazyllm
    >>> m = lazyllm.module.TrainableModule().deploy_method(lazyllm.deploy.dummy).finetune_method(lazyllm.finetune.dummy).trainset("").mode("finetune").prompt(None)
    >>> m.evalset([1, 2, 3])
    >>> m.update()
    INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
    >>> print(m.eval_result)
    ["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]
    """
        if isinstance(evalset, str) and os.path.exists(evalset):
            with open(evalset) as f:
                assert callable(load_f)
                self._evalset = load_f(f)
        else:
            self._evalset = evalset
        self.eval_result_collet_f = collect_f

    # TODO: add lazyllm.eval
    def _get_eval_tasks(self):
        def set_result(x): self.eval_result = x

        def parallel_infer():
            with ThreadPoolExecutor(max_workers=200) as executor:
                results = list(executor.map(lambda item: self(**item)
                                            if isinstance(item, dict) else self(item), self._evalset))
            return results
        if self._evalset:
            return Pipeline(parallel_infer,
                            lambda x: self.eval_result_collet_f(x),
                            set_result)
        return None

    # update module(train or finetune),
    def _update(self, *, mode: Optional[Union[str, List[str]]] = None, recursive: bool = True):  # noqa C901
        if not mode: mode = list(self.mode_list)
        if type(mode) is not list: mode = [mode]
        for item in mode:
            assert item in self.mode_list, f'Cannot find {item} in mode list: {self.mode_list}'
        # dfs to get all train tasks
        train_tasks, deploy_tasks, eval_tasks, post_process_tasks = FlatList(), FlatList(), FlatList(), FlatList()
        stack, visited = [(self, iter(self.submodules if recursive else []))], set()
        while len(stack) > 0:
            try:
                top = next(stack[-1][1])
                stack.append((top, iter(top.submodules)))
            except StopIteration:
                top = stack.pop()[0]
                if top._module_id in visited: continue
                visited.add(top._module_id)
                if 'train' in mode: train_tasks.absorb(top._get_train_tasks())
                if 'server' in mode: deploy_tasks.absorb(top._get_deploy_tasks())
                if 'eval' in mode: eval_tasks.absorb(top._get_eval_tasks())
                post_process_tasks.absorb(top._get_post_process_tasks())

        if 'train' in mode and len(train_tasks) > 0:
            Parallel(*train_tasks).set_sync(True)()
        if 'server' in mode and len(deploy_tasks) > 0:
            if redis_client:
                Parallel(*deploy_tasks).set_sync(False)()
            else:
                Parallel.sequential(*deploy_tasks)()
        if 'eval' in mode and len(eval_tasks) > 0:
            Parallel.sequential(*eval_tasks)()
        Parallel.sequential(*post_process_tasks)()
        return self

    def update(self, *, recursive: bool = True):
        """Update the module (and all its submodules). The module will be updated when the ``_get_train_tasks`` method is overridden.

Args:
    recursive (bool): Whether to recursively update all submodules, default is True.


Examples:
    >>> import lazyllm
    >>> m = lazyllm.module.TrainableModule().finetune_method(lazyllm.finetune.dummy).trainset("").deploy_method(lazyllm.deploy.dummy).mode('finetune').prompt(None)
    >>> m.evalset([1, 2, 3])
    >>> m.update()
    INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
    >>> print(m.eval_result)
    ["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]
    """
        return self._update(mode=['train', 'server', 'eval'], recursive=recursive)

    def update_server(self, *, recursive: bool = True):
        """Update the deployment (server) part of the module and its submodules. When a module or submodule implements deployment functionality, the corresponding services will be started.

Args:
    recursive (bool): Whether to recursively update deployment tasks of all submodules, default is True.
"""
        return self._update(mode=['server'], recursive=recursive)
    def eval(self, *, recursive: bool = True):
        """Evaluate the module (and all its submodules). This function takes effect after the module has been set with an evaluation set using 'evalset'.

Args:
    recursive (bool): Whether to recursively evaluate all submodules. Defaults to True.


Examples:
    >>> import lazyllm
    >>> class MyModule(lazyllm.module.ModuleBase):
    ...     def forward(self, input):
    ...         return f'reply for input'
    ... 
    >>> m = MyModule()
    >>> m.evalset([1, 2, 3])
    >>> m.eval().eval_result
    ['reply for input', 'reply for input', 'reply for input']
    """
        return self._update(mode=['eval'], recursive=recursive)
    def start(self):
        """Start the deployment services of the module and all its submodules. This ensures that the server functionality of the module and its submodules is executed, suitable for initialization or restarting services.

**Returns:**

- ModuleBase: Returns itself to support method chaining


Examples:
    >>> import lazyllm
    >>> m = lazyllm.TrainableModule().deploy_method(lazyllm.deploy.dummy).prompt(None)
    >>> m.start()
    <Module type=Trainable mode=None basemodel= target= stream=False return_trace=False>
    >>> m(1)
    "reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}"
    """
        return self._update(mode=['server'], recursive=True)
    def restart(self):
        """Restart the deployment services of the module and its submodules. Internally calls the ``start`` method to reinitialize the services.

**Returns:**

- ModuleBase: Returns itself to support method chaining


Examples:
    >>> import lazyllm
    >>> m = lazyllm.TrainableModule().deploy_method(lazyllm.deploy.dummy).prompt(None)
    >>> m.restart()
    <Module type=Trainable mode=None basemodel= target= stream=False return_trace=False>
    >>> m(1)
    "reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}"
    """
        return self.start()

    def wait(self):
        """Wait for the module or its submodules to finish execution. Currently, this method is a no-op and can be implemented by subclasses according to specific deployment logic.
"""
        pass

    def stop(self):
        """Stop the module and all its submodules. This method recursively calls the ``stop`` method of each submodule, suitable for releasing resources or shutting down services.
"""
        for m in self.submodules:
            m.stop()

    @property
    def options(self):
        options = self._options.copy()
        for m in self.submodules:
            options += m.options
        return options

    def _overwrote(self, f):
        return getattr(self.__class__, f) is not getattr(__class__, f)

    def __repr__(self):
        return lazyllm.make_repr('Module', self.__class__, name=self.name)

    def for_each(self, filter, action):
        """Execute a specified action on all submodules of the module. Recursively traverses all submodules, and if a submodule satisfies the ``filter`` condition, executes the ``action``.

Args:
    filter (Callable): A function that takes a submodule as input and returns a boolean, used to determine whether to perform the action.
    action (Callable): A function to perform on submodules that meet the condition.
"""
        for submodule in self.submodules:
            if filter(submodule):
                action(submodule)
            submodule.for_each(filter, action)

    @property
    def __cache_hash__(self):
        cache_hash = self.__class__.__name__
        if isinstance(self._use_cache, str): cache_hash += f'@{self._use_cache}'
        if hasattr(self, 'appendix_hash_key'): cache_hash += f'@{self.appendix_hash_key}'
        return cache_hash

    def use_cache(self, flag: Union[bool, str] = True):
        """Enable or disable the caching functionality for the module.

This method controls whether the module uses caching to store and retrieve execution results, 
improving performance and avoiding redundant computations.

Args:
    flag (bool or str, optional): Cache control flag. If True, enables caching; if False, disables caching;
                                 if a string, uses a specific cache identifier. Defaults to True.

**Returns:**

- Returns the module instance itself, supporting method chaining.

"""
        self._use_cache = flag or False
        return self

    def _cache_miss_handler(self): pass

`eval(*, recursive=True)`

Evaluate the module (and all its submodules). This function takes effect after the module has been set with an evaluation set using 'evalset'.

Parameters:

recursive (bool, default: True ) –

Whether to recursively evaluate all submodules. Defaults to True.

Examples:

>>> import lazyllm
>>> class MyModule(lazyllm.module.ModuleBase):
...     def forward(self, input):
...         return f'reply for input'
... 
>>> m = MyModule()
>>> m.evalset([1, 2, 3])
>>> m.eval().eval_result
['reply for input', 'reply for input', 'reply for input']

Source code in lazyllm/module/module.py

    def eval(self, *, recursive: bool = True):
        """Evaluate the module (and all its submodules). This function takes effect after the module has been set with an evaluation set using 'evalset'.

Args:
    recursive (bool): Whether to recursively evaluate all submodules. Defaults to True.


Examples:
    >>> import lazyllm
    >>> class MyModule(lazyllm.module.ModuleBase):
    ...     def forward(self, input):
    ...         return f'reply for input'
    ... 
    >>> m = MyModule()
    >>> m.evalset([1, 2, 3])
    >>> m.eval().eval_result
    ['reply for input', 'reply for input', 'reply for input']
    """
        return self._update(mode=['eval'], recursive=recursive)

`evalset(evalset, load_f=None, collect_f=lambda x: x)`

Set the evaluation set for the module.
During update or eval, the module will perform inference on the evaluation set, and the results will be stored in the eval_result variable.

Parameters:

evalset (Union[list, str]) –

Evaluation data list or path to an evaluation data file.
load_f (Optional[Callable], default: None ) –

Function to load and parse the evaluation file into a list if evalset is a file path, default is None.
collect_f (Callable, default: lambda x: x ) –

Function to post-process evaluation results, default is lambda x: x.

Examples:

>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().deploy_method(lazyllm.deploy.dummy).finetune_method(lazyllm.finetune.dummy).trainset("").mode("finetune").prompt(None)
>>> m.evalset([1, 2, 3])
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
>>> print(m.eval_result)
["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]

Source code in lazyllm/module/module.py

    def evalset(self, evalset, load_f=None, collect_f=lambda x: x):
        """Set the evaluation set for the module.  
During ``update`` or ``eval``, the module will perform inference on the evaluation set, and the results will be stored in the ``eval_result`` variable.  

Args:
    evalset (Union[list, str]): Evaluation data list or path to an evaluation data file.
    load_f (Optional[Callable]): Function to load and parse the evaluation file into a list if ``evalset`` is a file path, default is None.
    collect_f (Callable): Function to post-process evaluation results, default is ``lambda x: x``.


Examples:
    >>> import lazyllm
    >>> m = lazyllm.module.TrainableModule().deploy_method(lazyllm.deploy.dummy).finetune_method(lazyllm.finetune.dummy).trainset("").mode("finetune").prompt(None)
    >>> m.evalset([1, 2, 3])
    >>> m.update()
    INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
    >>> print(m.eval_result)
    ["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]
    """
        if isinstance(evalset, str) and os.path.exists(evalset):
            with open(evalset) as f:
                assert callable(load_f)
                self._evalset = load_f(f)
        else:
            self._evalset = evalset
        self.eval_result_collet_f = collect_f

`forward(*args, **kw)`

Forward computation interface that must be implemented by subclasses.
This method defines the logic for receiving inputs and returning outputs, and is the core function of the module as a functor.

Parameters:

*args –

Variable positional arguments, subclass can define the input as needed.
**kw –

Variable keyword arguments, subclass can define the input as needed.

Source code in lazyllm/module/module.py

    def forward(self, *args, **kw):
        """Forward computation interface that must be implemented by subclasses.  
This method defines the logic for receiving inputs and returning outputs, and is the core function of the module as a functor.

Args:
    *args: Variable positional arguments, subclass can define the input as needed.
    **kw: Variable keyword arguments, subclass can define the input as needed.
"""
        raise NotImplementedError

`start()`

Start the deployment services of the module and all its submodules. This ensures that the server functionality of the module and its submodules is executed, suitable for initialization or restarting services.

Returns:

ModuleBase: Returns itself to support method chaining

Examples:

>>> import lazyllm
>>> m = lazyllm.TrainableModule().deploy_method(lazyllm.deploy.dummy).prompt(None)
>>> m.start()
<Module type=Trainable mode=None basemodel= target= stream=False return_trace=False>
>>> m(1)
"reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}"

Source code in lazyllm/module/module.py

    def start(self):
        """Start the deployment services of the module and all its submodules. This ensures that the server functionality of the module and its submodules is executed, suitable for initialization or restarting services.

**Returns:**

- ModuleBase: Returns itself to support method chaining


Examples:
    >>> import lazyllm
    >>> m = lazyllm.TrainableModule().deploy_method(lazyllm.deploy.dummy).prompt(None)
    >>> m.start()
    <Module type=Trainable mode=None basemodel= target= stream=False return_trace=False>
    >>> m(1)
    "reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}"
    """
        return self._update(mode=['server'], recursive=True)

`restart()`

Restart the deployment services of the module and its submodules. Internally calls the start method to reinitialize the services.

Returns:

ModuleBase: Returns itself to support method chaining

Examples:

>>> import lazyllm
>>> m = lazyllm.TrainableModule().deploy_method(lazyllm.deploy.dummy).prompt(None)
>>> m.restart()
<Module type=Trainable mode=None basemodel= target= stream=False return_trace=False>
>>> m(1)
"reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}"

Source code in lazyllm/module/module.py

    def restart(self):
        """Restart the deployment services of the module and its submodules. Internally calls the ``start`` method to reinitialize the services.

**Returns:**

- ModuleBase: Returns itself to support method chaining


Examples:
    >>> import lazyllm
    >>> m = lazyllm.TrainableModule().deploy_method(lazyllm.deploy.dummy).prompt(None)
    >>> m.restart()
    <Module type=Trainable mode=None basemodel= target= stream=False return_trace=False>
    >>> m(1)
    "reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}"
    """
        return self.start()

`update(*, recursive=True)`

Update the module (and all its submodules). The module will be updated when the _get_train_tasks method is overridden.

Parameters:

recursive (bool, default: True ) –

Whether to recursively update all submodules, default is True.

Examples:

>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().finetune_method(lazyllm.finetune.dummy).trainset("").deploy_method(lazyllm.deploy.dummy).mode('finetune').prompt(None)
>>> m.evalset([1, 2, 3])
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
>>> print(m.eval_result)
["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]

Source code in lazyllm/module/module.py

    def update(self, *, recursive: bool = True):
        """Update the module (and all its submodules). The module will be updated when the ``_get_train_tasks`` method is overridden.

Args:
    recursive (bool): Whether to recursively update all submodules, default is True.


Examples:
    >>> import lazyllm
    >>> m = lazyllm.module.TrainableModule().finetune_method(lazyllm.finetune.dummy).trainset("").deploy_method(lazyllm.deploy.dummy).mode('finetune').prompt(None)
    >>> m.evalset([1, 2, 3])
    >>> m.update()
    INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
    >>> print(m.eval_result)
    ["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]
    """
        return self._update(mode=['train', 'server', 'eval'], recursive=recursive)

`stream_output(stream_output=None)`

Context manager for streaming output during inference or execution.
When a dictionary is provided to stream_output, a prefix and suffix can be specified along with optional colors.

Parameters:

stream_output (Optional[Union[bool, Dict]], default: None ) –
Configuration for streaming output.
- If True, enables default streaming output.
- If a dictionary, may include:
  - 'prefix' (str): Text to output at the beginning.
  - 'prefix_color' (str, optional): Color of the prefix.
  - 'suffix' (str): Text to output at the end.
  - 'suffix_color' (str, optional): Color of the suffix.

Source code in lazyllm/module/module.py

    @contextmanager
    def stream_output(self, stream_output: Optional[Union[bool, Dict]] = None):
        """Context manager for streaming output during inference or execution.  
When a dictionary is provided to ``stream_output``, a prefix and suffix can be specified along with optional colors.

Args:
    stream_output (Optional[Union[bool, Dict]]): Configuration for streaming output.

        - If True, enables default streaming output.
        - If a dictionary, may include:

            - 'prefix' (str): Text to output at the beginning.
            - 'prefix_color' (str, optional): Color of the prefix.
            - 'suffix' (str): Text to output at the end.
            - 'suffix_color' (str, optional): Color of the suffix.
"""
        if stream_output and isinstance(stream_output, dict) and (prefix := stream_output.get('prefix')):
            self._stream_output(prefix, stream_output.get('prefix_color'))
        yield
        if isinstance(stream_output, dict) and (suffix := stream_output.get('suffix')):
            self._stream_output(suffix, stream_output.get('suffix_color'))

`used_by(module_id)`

Mark which module is using the current module, indicating the calling relationship.
Supports chaining by returning the module itself.

Parameters:

module_id (str) –

Unique ID of the parent module that uses this module.

Returns:

ModuleBase: Returns the module itself for method chaining.

Source code in lazyllm/module/module.py

    def used_by(self, module_id):
        """Mark which module is using the current module, indicating the calling relationship.  
Supports chaining by returning the module itself.

Args:
    module_id (str): Unique ID of the parent module that uses this module.

**Returns:**

- ModuleBase: Returns the module itself for method chaining.
"""
        self._used_by_moduleid = module_id
        return self

`register_hook(hook_type)`

Register a hook to execute specific logic during module invocation.
The hook must inherit from LazyLLMHook and can be used to add custom operations before or after the module's forward computation, such as logging or metrics collection.

Parameters:

hook_type (LazyLLMHook) –

Hook object to register.

Source code in lazyllm/module/module.py

    def register_hook(self, hook_type: Union[LazyLLMHook, Callable]):
        """Register a hook to execute specific logic during module invocation.  
The hook must inherit from ``LazyLLMHook`` and can be used to add custom operations before or after the module's forward computation, such as logging or metrics collection.

Args:
    hook_type (LazyLLMHook): Hook object to register.
"""
        if not isinstance(hook_type, type) and not isinstance(hook_type, LazyLLMHook) and callable(hook_type):
            hook_type = LazyLLMFuncHook(hook_type)
        if not isinstance(hook_type, LazyLLMHook):
            raise TypeError(f'Invalid hook type: {type(hook_type)}, '
                            'must be subclass or instance of LazyLLMHook, or callable function')
        if hook_type not in self._hooks:
            self._hooks.append(hook_type)

`unregister_hook(hook_type)`

Unregister a previously registered hook.
If the hook exists in the module, it will be removed and no longer executed during module invocation.

Parameters:

hook_type (LazyLLMHook) –

Hook object to unregister.

Source code in lazyllm/module/module.py

    def unregister_hook(self, hook_type: LazyLLMHook):
        """Unregister a previously registered hook.  
If the hook exists in the module, it will be removed and no longer executed during module invocation.

Args:
    hook_type (LazyLLMHook): Hook object to unregister.
"""
        if hook_type in self._hooks:
            self._hooks.remove(hook_type)

`clear_hooks()`

Clear all hooks registered in the module.
After calling this, the module will no longer execute any hook logic.

Source code in lazyllm/module/module.py

    def clear_hooks(self):
        """Clear all hooks registered in the module.  
After calling this, the module will no longer execute any hook logic.
"""
        self._hooks = []

`update_server(*, recursive=True)`

Update the deployment (server) part of the module and its submodules. When a module or submodule implements deployment functionality, the corresponding services will be started.

Parameters:

recursive (bool, default: True ) –

Whether to recursively update deployment tasks of all submodules, default is True.

Source code in lazyllm/module/module.py

    def update_server(self, *, recursive: bool = True):
        """Update the deployment (server) part of the module and its submodules. When a module or submodule implements deployment functionality, the corresponding services will be started.

Args:
    recursive (bool): Whether to recursively update deployment tasks of all submodules, default is True.
"""
        return self._update(mode=['server'], recursive=recursive)

`wait()`

Wait for the module or its submodules to finish execution. Currently, this method is a no-op and can be implemented by subclasses according to specific deployment logic.

Source code in lazyllm/module/module.py

    def wait(self):
        """Wait for the module or its submodules to finish execution. Currently, this method is a no-op and can be implemented by subclasses according to specific deployment logic.
"""
        pass

`stop()`

Stop the module and all its submodules. This method recursively calls the stop method of each submodule, suitable for releasing resources or shutting down services.

Source code in lazyllm/module/module.py

    def stop(self):
        """Stop the module and all its submodules. This method recursively calls the ``stop`` method of each submodule, suitable for releasing resources or shutting down services.
"""
        for m in self.submodules:
            m.stop()

`for_each(filter, action)`

Execute a specified action on all submodules of the module. Recursively traverses all submodules, and if a submodule satisfies the filter condition, executes the action.

Parameters:

filter (Callable) –

A function that takes a submodule as input and returns a boolean, used to determine whether to perform the action.
action (Callable) –

A function to perform on submodules that meet the condition.

Source code in lazyllm/module/module.py

    def for_each(self, filter, action):
        """Execute a specified action on all submodules of the module. Recursively traverses all submodules, and if a submodule satisfies the ``filter`` condition, executes the ``action``.

Args:
    filter (Callable): A function that takes a submodule as input and returns a boolean, used to determine whether to perform the action.
    action (Callable): A function to perform on submodules that meet the condition.
"""
        for submodule in self.submodules:
            if filter(submodule):
                action(submodule)
            submodule.for_each(filter, action)

`lazyllm.module.servermodule.LLMBase`

Bases: object

Base class for large language model modules, inheriting from ModuleBase.
Manages initialization and switching of streaming output, prompts, and formatters; processes file information in inputs; supports instance sharing.

Parameters:

stream (bool or dict, default: False ) –

Whether to enable streaming output or streaming configuration, default is False.
return_trace (bool) –

Whether to return execution trace, default is False.
init_prompt (bool, default: True ) –

Whether to automatically create a default prompt at initialization, default is True.

Source code in lazyllm/module/servermodule.py

class LLMBase(object):
    """Base class for large language model modules, inheriting from ModuleBase.  
Manages initialization and switching of streaming output, prompts, and formatters; processes file information in inputs; supports instance sharing.

Args:
    stream (bool or dict): Whether to enable streaming output or streaming configuration, default is False.
    return_trace (bool): Whether to return execution trace, default is False.
    init_prompt (bool): Whether to automatically create a default prompt at initialization, default is True.
"""
    def __init__(self, stream: Union[bool, Dict[str, str]] = False, init_prompt: bool = True,
                 type: Optional[Union[str, LLMType]] = None, static_params: Optional[StaticParams] = None):
        self._stream = stream
        self._type = LLMType(type) if type else LLMType.LLM
        if init_prompt: self.prompt()
        self._static_params = static_params or {}
        __class__.formatter(self)

    def _get_files(self, input, lazyllm_files):
        if isinstance(input, package):
            assert not lazyllm_files, 'Duplicate `files` argument provided by args and kwargs'
            input, lazyllm_files = input

        if isinstance(input, list):
            has_images = any(_is_image_path(item) for item in input)
            if has_images:
                assert not lazyllm_files, 'Cannot use both interleaved input and lazyllm_files parameter'
                input, files = _parse_interleaved_input(input)
                return input, files

        if isinstance(input, str) and input.startswith(LAZYLLM_QUERY_PREFIX):
            assert not lazyllm_files, 'Argument `files` is already provided by query'
            deinput = decode_query_with_filepaths(input)
            assert isinstance(deinput, dict), 'decode_query_with_filepaths must return a dict.'
            input, files = deinput['query'], deinput['files']
        else:
            files = _lazyllm_get_file_list(lazyllm_files) if lazyllm_files else []
        return input, files

    def prompt(self, prompt: Optional[str] = None, history: Optional[List[List[str]]] = None):
        """Set or switch the prompt. Supports None, PrompterBase subclass, or string/dict to create ChatPrompter.

Args:
    prompt (str/dict/PrompterBase/None): The prompt to set.
    history (list): Conversation history, only valid when prompt is str or dict.

**Returns:**

- self: For chaining calls.
"""
        if prompt is None:
            assert not history, 'history is not supported in EmptyPrompter'
            self._prompt = EmptyPrompter()
        elif isinstance(prompt, PrompterBase):
            assert not history, 'history is not supported in user defined prompter'
            self._prompt = prompt
        elif isinstance(prompt, (str, dict)):
            self._prompt = ChatPrompter(prompt, history=history)
        else:
            raise TypeError(f'{prompt} type is not supported.')
        return self

    def formatter(self, format: Optional[FormatterBase] = None):
        """Set or switch the output formatter. Supports None, FormatterBase subclass or callable.

Args:
    format (FormatterBase/Callable/None): Formatter object or function, default is None.

**Returns:**

- self: For chaining calls.
"""
        assert format is None or isinstance(format, FormatterBase) or callable(format), 'format must be None or Callable'
        self._formatter = format or EmptyFormatter()
        return self

    def share(self, prompt: Optional[Union[str, dict, PrompterBase]] = None, format: Optional[FormatterBase] = None,
              stream: Optional[Union[bool, Dict[str, str]]] = None, history: Optional[List[List[str]]] = None,
              copy_static_params: bool = False):
        """Creates a shallow copy of the current instance, with optional resetting of prompt, formatter, and stream attributes.  
Useful for scenarios where multiple sessions or agents share a base configuration but customize certain parameters.

Args:
    prompt (str/dict/PrompterBase/None): New prompt, optional.
    format (FormatterBase/None): New formatter, optional.
    stream (bool/dict/None): New streaming settings, optional.
    history (list/None): New conversation history, effective only when setting prompt.

**Returns:**

- LLMBase: The new shared instance.
"""
        new = copy.copy(self)
        new._hooks = self._hooks.copy()
        new._set_mid()
        if prompt is not None: new.prompt(prompt, history=history)
        if format is not None: new.formatter(format)
        if stream is not None: new.stream = stream
        if copy_static_params: new._static_params = copy.deepcopy(self._static_params)
        return new

    @property
    def type(self):
        return self._type.value

    @property
    def stream(self):
        return self._stream

    @stream.setter
    def stream(self, v: Union[bool, Dict[str, str]]):
        self._stream = v

    async def astream_call(self, *args, **kwargs):
        """Async generator that yields tokens as they arrive. Suitable for FastAPI/asyncio contexts."""
        from .stream_helper import StreamCallHelper
        llm = self.share()
        kwargs.setdefault('stream_output', True)
        async for item in StreamCallHelper(llm).astream(*args, **kwargs):
            if item.get('tag', '') in ('text', 'think'):
                yield item.get('delta', '')

    @property
    def static_params(self) -> StaticParams:
        return self._static_params

    @static_params.setter
    def static_params(self, value: StaticParams):
        if not isinstance(value, dict):
            raise TypeError('static_params must be a dict (TypedDict)')
        self._static_params = value

    def __or__(self, other):
        if not isinstance(other, FormatterBase):
            return NotImplemented
        return self.share(format=(other if isinstance(self._formatter, EmptyFormatter) else (self._formatter | other)))

    @property
    def appendix_hash_key(self):
        try:
            prompts = self._prompt.generate_prompt('x')
        except Exception:
            prompts = self._prompt._instruction_template
        if not isinstance(prompts, str):
            try:
                content = json.dumps(prompts, sort_keys=True)
            except Exception:
                content = str(prompts)
        else:
            content = prompts
        return hashlib.md5(content.encode()).hexdigest()

`prompt(prompt=None, history=None)`

Set or switch the prompt. Supports None, PrompterBase subclass, or string/dict to create ChatPrompter.

Parameters:

prompt (str / dict / PrompterBase / None, default: None ) –

The prompt to set.
history (list, default: None ) –

Conversation history, only valid when prompt is str or dict.

Returns:

self: For chaining calls.

Source code in lazyllm/module/servermodule.py

    def prompt(self, prompt: Optional[str] = None, history: Optional[List[List[str]]] = None):
        """Set or switch the prompt. Supports None, PrompterBase subclass, or string/dict to create ChatPrompter.

Args:
    prompt (str/dict/PrompterBase/None): The prompt to set.
    history (list): Conversation history, only valid when prompt is str or dict.

**Returns:**

- self: For chaining calls.
"""
        if prompt is None:
            assert not history, 'history is not supported in EmptyPrompter'
            self._prompt = EmptyPrompter()
        elif isinstance(prompt, PrompterBase):
            assert not history, 'history is not supported in user defined prompter'
            self._prompt = prompt
        elif isinstance(prompt, (str, dict)):
            self._prompt = ChatPrompter(prompt, history=history)
        else:
            raise TypeError(f'{prompt} type is not supported.')
        return self

`formatter(format=None)`

Set or switch the output formatter. Supports None, FormatterBase subclass or callable.

Parameters:

format (FormatterBase / Callable / None, default: None ) –

Formatter object or function, default is None.

Returns:

self: For chaining calls.

Source code in lazyllm/module/servermodule.py

    def formatter(self, format: Optional[FormatterBase] = None):
        """Set or switch the output formatter. Supports None, FormatterBase subclass or callable.

Args:
    format (FormatterBase/Callable/None): Formatter object or function, default is None.

**Returns:**

- self: For chaining calls.
"""
        assert format is None or isinstance(format, FormatterBase) or callable(format), 'format must be None or Callable'
        self._formatter = format or EmptyFormatter()
        return self

`share(prompt=None, format=None, stream=None, history=None, copy_static_params=False)`

Creates a shallow copy of the current instance, with optional resetting of prompt, formatter, and stream attributes.
Useful for scenarios where multiple sessions or agents share a base configuration but customize certain parameters.

Parameters:

prompt (str / dict / PrompterBase / None, default: None ) –

New prompt, optional.
format (FormatterBase / None, default: None ) –

New formatter, optional.
stream (bool / dict / None, default: None ) –

New streaming settings, optional.
history (list / None, default: None ) –

New conversation history, effective only when setting prompt.

Returns:

LLMBase: The new shared instance.

Source code in lazyllm/module/servermodule.py

    def share(self, prompt: Optional[Union[str, dict, PrompterBase]] = None, format: Optional[FormatterBase] = None,
              stream: Optional[Union[bool, Dict[str, str]]] = None, history: Optional[List[List[str]]] = None,
              copy_static_params: bool = False):
        """Creates a shallow copy of the current instance, with optional resetting of prompt, formatter, and stream attributes.  
Useful for scenarios where multiple sessions or agents share a base configuration but customize certain parameters.

Args:
    prompt (str/dict/PrompterBase/None): New prompt, optional.
    format (FormatterBase/None): New formatter, optional.
    stream (bool/dict/None): New streaming settings, optional.
    history (list/None): New conversation history, effective only when setting prompt.

**Returns:**

- LLMBase: The new shared instance.
"""
        new = copy.copy(self)
        new._hooks = self._hooks.copy()
        new._set_mid()
        if prompt is not None: new.prompt(prompt, history=history)
        if format is not None: new.formatter(format)
        if stream is not None: new.stream = stream
        if copy_static_params: new._static_params = copy.deepcopy(self._static_params)
        return new

`lazyllm.module.ActionModule`

Bases: ModuleBase

Used to wrap a Module around functions, modules, flows, Module, and other callable objects. The wrapped Module (including the Module within the flow) will become a submodule of this Module.

Parameters:

action (Callable | list[Callable], default: () ) –

The object to be wrapped, which is one or a set of callable objects.
return_trace (bool, default: False ) –

Whether to enable trace mode to record the execution stack. Defaults to False.

Examples:

>>> import lazyllm
>>> def myfunc(input): return input + 1
... 
>>> class MyModule1(lazyllm.module.ModuleBase):
...     def forward(self, input): return input * 2
... 
>>> class MyModule2(lazyllm.module.ModuleBase):
...     def _get_deploy_tasks(self): return lazyllm.pipeline(lambda : print('MyModule2 deployed!'))
...     def forward(self, input): return input * 4
... 
>>> class MyModule3(lazyllm.module.ModuleBase):
...     def _get_deploy_tasks(self): return lazyllm.pipeline(lambda : print('MyModule3 deployed!'))
...     def forward(self, input): return f'get {input}'
... 
>>> m = lazyllm.ActionModule(myfunc, lazyllm.pipeline(MyModule1(), MyModule2), MyModule3())
>>> print(m(1))
get 16
>>> 
>>> m.evalset([1, 2, 3])
>>> m.update()
MyModule2 deployed!
MyModule3 deployed!
>>> print(m.eval_result)
['get 16', 'get 24', 'get 32']

evalset(evalset, load_f=None, collect_f=<function ModuleBase.<lambda>>)

Set the evaluation set for the Module. Modules that have been set with an evaluation set will be evaluated during update or eval, and the evaluation results will be stored in the eval_result variable.

evalset(evalset, collect_f=lambda x: ...)→ None

Parameters:

evalset (list) –

Evaluation set
collect_f (Callable) –

Post-processing method for evaluation results, no post-processing by default.

evalset(evalset, load_f=None, collect_f=lambda x: ...)→ None

Parameters:

evalset (str) –

Path to the evaluation set
load_f (Callable) –

Method for loading the evaluation set, including parsing file formats and converting to a list
collect_f (Callable) –

Post-processing method for evaluation results, no post-processing by default.

Examples:

>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().deploy_method(deploy.dummy)
>>> m.evalset([1, 2, 3])
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
>>> m.eval_result
["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]

Source code in lazyllm/module/module.py

class ActionModule(ModuleBase):
    """Used to wrap a Module around functions, modules, flows, Module, and other callable objects. The wrapped Module (including the Module within the flow) will become a submodule of this Module.

Args:
    action (Callable|list[Callable]): The object to be wrapped, which is one or a set of callable objects.
    return_trace (bool): Whether to enable trace mode to record the execution stack. Defaults to ``False``.

**Examples:**

```python
>>> import lazyllm
>>> def myfunc(input): return input + 1
... 
>>> class MyModule1(lazyllm.module.ModuleBase):
...     def forward(self, input): return input * 2
... 
>>> class MyModule2(lazyllm.module.ModuleBase):
...     def _get_deploy_tasks(self): return lazyllm.pipeline(lambda : print('MyModule2 deployed!'))
...     def forward(self, input): return input * 4
... 
>>> class MyModule3(lazyllm.module.ModuleBase):
...     def _get_deploy_tasks(self): return lazyllm.pipeline(lambda : print('MyModule3 deployed!'))
...     def forward(self, input): return f'get {input}'
... 
>>> m = lazyllm.ActionModule(myfunc, lazyllm.pipeline(MyModule1(), MyModule2), MyModule3())
>>> print(m(1))
get 16
>>> 
>>> m.evalset([1, 2, 3])
>>> m.update()
MyModule2 deployed!
MyModule3 deployed!
>>> print(m.eval_result)
['get 16', 'get 24', 'get 32']
```


<span style="font-size: 20px;">**`evalset(evalset, load_f=None, collect_f=<function ModuleBase.<lambda>>)`**</span>

Set the evaluation set for the Module. Modules that have been set with an evaluation set will be evaluated during ``update`` or ``eval``, and the evaluation results will be stored in the eval_result variable. 


<span style="font-size: 18px;">&ensp;**`evalset(evalset, collect_f=lambda x: ...)→ None `**</span>


Args:
    evalset (list) :Evaluation set
    collect_f (Callable) :Post-processing method for evaluation results, no post-processing by default.



<span style="font-size: 18px;">&ensp;**`evalset(evalset, load_f=None, collect_f=lambda x: ...)→ None`**</span>


Args:
    evalset (str) :Path to the evaluation set
    load_f (Callable) :Method for loading the evaluation set, including parsing file formats and converting to a list
    collect_f (Callable) :Post-processing method for evaluation results, no post-processing by default.

**Examples:**

```python
>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().deploy_method(deploy.dummy)
>>> m.evalset([1, 2, 3])
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
>>> m.eval_result
["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]
```


"""
    def __init__(self, *action, return_trace=False):
        super().__init__(return_trace=return_trace)
        if len(action) == 1 and isinstance(action, FlowBase): action = action[0]
        if isinstance(action, (tuple, list)):
            action = Pipeline(*action)
        assert isinstance(action, FlowBase), f'Invalid action type {type(action)}'
        self.action = action

    def forward(self, *args, **kw):
        """Executes the wrapped action with the provided input arguments. Equivalent to directly calling the module.

Args:
    args (list of callables or single callable): Positional arguments to be passed to the wrapped action.
    kwargs (dict of callables): Keyword arguments to be passed to the wrapped action.

**Returns:**

- Any: The result of executing the wrapped action.
"""
        return self.action(*args, **kw)

    @property
    def submodules(self):
        """Returns all submodules of type ModuleBase contained in the wrapped action. This automatically traverses any nested modules inside a Pipeline.

**Returns:**

- list[ModuleBase]: List of submodules
"""
        try:
            if isinstance(self.action, FlowBase):
                submodule = []
                self.action.for_each(lambda x: isinstance(x, ModuleBase), lambda x: submodule.append(x))
                return submodule
        except Exception as e:
            raise RuntimeError(f'{str(e)}\nOriginal traceback:\n{"".join(traceback.format_tb(e.__traceback__))}')
        return super().submodules

    def __repr__(self):
        return lazyllm.make_repr('Module', 'Action', subs=[repr(self.action)],
                                 name=self.name, return_trace=self._return_trace)

`submodules` `property`

Returns all submodules of type ModuleBase contained in the wrapped action. This automatically traverses any nested modules inside a Pipeline.

Returns:

list[ModuleBase]: List of submodules

`forward(*args, **kw)`

Executes the wrapped action with the provided input arguments. Equivalent to directly calling the module.

Parameters:

args (list of callables or single callable, default: () ) –

Positional arguments to be passed to the wrapped action.
kwargs (dict of callables) –

Keyword arguments to be passed to the wrapped action.

Returns:

Any: The result of executing the wrapped action.

Source code in lazyllm/module/module.py

    def forward(self, *args, **kw):
        """Executes the wrapped action with the provided input arguments. Equivalent to directly calling the module.

Args:
    args (list of callables or single callable): Positional arguments to be passed to the wrapped action.
    kwargs (dict of callables): Keyword arguments to be passed to the wrapped action.

**Returns:**

- Any: The result of executing the wrapped action.
"""
        return self.action(*args, **kw)

`lazyllm.module.TrainableModule`

Bases: UrlModule

Trainable module, all models (including LLM, Embedding, etc.) are served through TrainableModule

TrainableModule(base_model='', target_path='', *, stream=False, return_trace=False)

Parameters:

base_model (str, default: '' ) –

Name or path of the base model.
target_path (str, default: '' ) –

Path to save the fine-tuning task.
stream (bool, default: False ) –

Whether to output stream.
return_trace (bool, default: False ) –

Record the results in trace.
trust_remote_code (bool, default: True ) –

Whether to trust remote code.
type (str / LLMType, default: None ) –

Model type.
source (str, default: None ) –

Model source. If not set, it will read the value from the environment variable LAZYLLM_MODEL_SOURCE.

TrainableModule.trainset(v):

Set the training set for TrainableModule

Parameters:

v (str) –

Path to the training/fine-tuning dataset.

Examples:

>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().finetune_method(finetune.dummy).trainset('/file/to/path').deploy_method(None).mode('finetune')
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}

TrainableModule.train_method(v, **kw):

Set the training method for TrainableModule. Continued pre-training is not supported yet, expected to be available in the next version.

Parameters:

v (LazyLLMTrainBase) –

Training method, options include train.auto etc.
kw (**dict) –

Parameters required by the training method, corresponding to v.

TrainableModule.finetune_method(v, **kw):

Set the fine-tuning method and its parameters for TrainableModule.

Parameters:

v (LazyLLMFinetuneBase) –

Fine-tuning method, options include finetune.auto / finetune.alpacalora / finetune.collie etc.
kw (**dict) –

Parameters required by the fine-tuning method, corresponding to v.

Examples:

>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().finetune_method(finetune.dummy).deploy_method(None).mode('finetune')
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}

TrainableModule.deploy_method(v, **kw):

Set the deployment method and its parameters for TrainableModule.

Parameters:

v (LazyLLMDeployBase) –

Deployment method, options include deploy.auto / deploy.lightllm / deploy.vllm etc.
kw (**dict) –

Parameters required by the deployment method, corresponding to v.

Examples:

>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().deploy_method(deploy.dummy).mode('finetune')
>>> m.evalset([1, 2, 3])
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
>>> m.eval_result
["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]

TrainableModule.mode(v):

Set whether to execute training or fine-tuning during update for TrainableModule.

Parameters:

v (str) –

Sets whether to execute training or fine-tuning during update, options are 'finetune' and 'train', default is 'finetune'.

Examples:

>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().finetune_method(finetune.dummy).deploy_method(None).mode('finetune')
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}

eval(*, recursive=True) Evaluate the module (and all its submodules). This function takes effect after the module has set an evaluation set through evalset.

Parameters:

recursive (bool) –

Whether to recursively evaluate all submodules, default is True.

evalset(evalset, load_f=None, collect_f=<function ModuleBase.<lambda>>)

Set the evaluation set for the Module. Modules that have been set with an evaluation set will be evaluated during update or eval, and the evaluation results will be stored in the eval_result variable.

evalset(evalset, collect_f=lambda x: ...)→ None

Parameters:

evalset (list) –

Evaluation set
collect_f (Callable) –

Post-processing method for evaluation results, no post-processing by default.

evalset(evalset, load_f=None, collect_f=lambda x: ...)→ None

Parameters:

evalset (str) –

Path to the evaluation set
load_f (Callable) –

Method for loading the evaluation set, including parsing file formats and converting to a list
collect_f (Callable) –

Post-processing method for evaluation results, no post-processing by default.

Examples:

>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().deploy_method(deploy.dummy)
>>> m.evalset([1, 2, 3])
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
>>> m.eval_result
["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]

restart()

Restart the module and all its submodules.

Examples:

>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().deploy_method(deploy.dummy)
>>> m.restart()
>>> m(1)
"reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}"

start()

Deploy the module and all its submodules.

Examples:

import lazyllm
m = lazyllm.module.TrainableModule().deploy_method(deploy.dummy)
m.start()
m(1)
"reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}"

Source code in lazyllm/module/llms/trainablemodule.py

class TrainableModule(UrlModule):
    """Trainable module, all models (including LLM, Embedding, etc.) are served through TrainableModule

<span style="font-size: 20px;">**`TrainableModule(base_model='', target_path='', *, stream=False, return_trace=False)`**</span>


Args:
    base_model (str): Name or path of the base model. 
    target_path (str): Path to save the fine-tuning task. 
    stream (bool): Whether to output stream. 
    return_trace (bool): Record the results in trace.
    trust_remote_code (bool): Whether to trust remote code.
    type (str/LLMType): Model type.
    source (str): Model source. If not set, it will read the value from the environment variable LAZYLLM_MODEL_SOURCE.


<span style="font-size: 20px;">**`TrainableModule.trainset(v):`**</span>

Set the training set for TrainableModule


Args:
    v (str): Path to the training/fine-tuning dataset.

**Examples:**

```python
>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().finetune_method(finetune.dummy).trainset('/file/to/path').deploy_method(None).mode('finetune')
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
```

<span style="font-size: 20px;">**`TrainableModule.train_method(v, **kw):`**</span>

Set the training method for TrainableModule. Continued pre-training is not supported yet, expected to be available in the next version.

Args:
    v (LazyLLMTrainBase): Training method, options include ``train.auto`` etc.
    kw (**dict): Parameters required by the training method, corresponding to v.

<span style="font-size: 20px;">**`TrainableModule.finetune_method(v, **kw):`**</span>

Set the fine-tuning method and its parameters for TrainableModule.

Args:
    v (LazyLLMFinetuneBase): Fine-tuning method, options include ``finetune.auto`` / ``finetune.alpacalora`` / ``finetune.collie`` etc.
    kw (**dict): Parameters required by the fine-tuning method, corresponding to v.

**Examples:**

```python
>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().finetune_method(finetune.dummy).deploy_method(None).mode('finetune')
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}                
```

<span style="font-size: 20px;">**`TrainableModule.deploy_method(v, **kw):`**</span>

Set the deployment method and its parameters for TrainableModule.

Args:
    v (LazyLLMDeployBase): Deployment method, options include ``deploy.auto`` / ``deploy.lightllm`` / ``deploy.vllm`` etc.
    kw (**dict): Parameters required by the deployment method, corresponding to v.

**Examples:**

```python
>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().deploy_method(deploy.dummy).mode('finetune')
>>> m.evalset([1, 2, 3])
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
>>> m.eval_result
["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]
```                


<span style="font-size: 20px;">**`TrainableModule.mode(v):`**</span>

Set whether to execute training or fine-tuning during update for TrainableModule.

Args:
    v (str): Sets whether to execute training or fine-tuning during update, options are 'finetune' and 'train', default is 'finetune'.

**Examples:**

```python
>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().finetune_method(finetune.dummy).deploy_method(None).mode('finetune')
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
```    

<span style="font-size: 20px;">**`eval(*, recursive=True)`**</span>
Evaluate the module (and all its submodules). This function takes effect after the module has set an evaluation set through evalset.

Args:
    recursive (bool) :Whether to recursively evaluate all submodules, default is True.                         

<span style="font-size: 20px;">**`evalset(evalset, load_f=None, collect_f=<function ModuleBase.<lambda>>)`**</span>

Set the evaluation set for the Module. Modules that have been set with an evaluation set will be evaluated during ``update`` or ``eval``, and the evaluation results will be stored in the eval_result variable. 


<span style="font-size: 18px;">&ensp;**`evalset(evalset, collect_f=lambda x: ...)→ None `**</span>


Args:
    evalset (list) :Evaluation set
    collect_f (Callable) :Post-processing method for evaluation results, no post-processing by default.



<span style="font-size: 18px;">&ensp;**`evalset(evalset, load_f=None, collect_f=lambda x: ...)→ None`**</span>


Args:
    evalset (str) :Path to the evaluation set
    load_f (Callable) :Method for loading the evaluation set, including parsing file formats and converting to a list
    collect_f (Callable) :Post-processing method for evaluation results, no post-processing by default.

**Examples:**

```python
>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().deploy_method(deploy.dummy)
>>> m.evalset([1, 2, 3])
>>> m.update()
INFO: (lazyllm.launcher) PID: dummy finetune!, and init-args is {}
>>> m.eval_result
["reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1}", "reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1}"]
```

<span style="font-size: 20px;">**`restart() `**</span>

Restart the module and all its submodules.

**Examples:**

```python
>>> import lazyllm
>>> m = lazyllm.module.TrainableModule().deploy_method(deploy.dummy)
>>> m.restart()
>>> m(1)
"reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}"
```

<span style="font-size: 20px;">**`start() `**</span> 

Deploy the module and all its submodules.

**Examples:**

```python
import lazyllm
m = lazyllm.module.TrainableModule().deploy_method(deploy.dummy)
m.start()
m(1)
"reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1}"
```                                  
"""
    builder_keys = _TrainableModuleImpl.builder_keys

    def __init__(self, base_model: Option = '', target_path='', *, stream: Union[bool, Dict[str, str]] = False,
                 return_trace: bool = False, trust_remote_code: bool = True,
                 type: Optional[Union[str, LLMType]] = None, source: Optional[str] = None,
                 use_model_map: Union[str, bool] = True):
        super().__init__(url=None, stream=stream, return_trace=return_trace, init_prompt=False)
        self._template = _UrlTemplateStruct()
        self._impl = _TrainableModuleImpl(base_model, target_path, stream, None, lazyllm.finetune.auto,
                                          lazyllm.deploy.auto, self._template, self._url_wrapper,
                                          trust_remote_code, type, source=source, use_model_map=use_model_map)
        self._stream = stream
        self.prompt()
        if config['cache_local_module']:
            self.use_cache()

    template_message = property(lambda self: self._template.template_message)
    keys_name_handle = property(lambda self: self._template.keys_name_handle)
    template_headers = property(lambda self: self._template.template_headers)
    extract_result_func = property(lambda self: self._template.extract_result_func)
    stream_parse_parameters = property(lambda self: self._template.stream_parse_parameters)
    stream_url_suffix = property(lambda self: self._template.stream_url_suffix)

    base_model = property(lambda self: self._impl._base_model)
    target_path = property(lambda self: self._impl._target_path)
    finetuned_model_path = property(lambda self: self._impl._finetuned_model_path)
    _url_id = property(lambda self: self._impl._module_id)

    @property
    def series(self):
        return re.sub(r'\d+$', '', ModelManager._get_model_name(self.base_model).split('-')[0].upper())

    @property
    def type(self):
        if self._impl._type is not None: return self._impl._type.value
        return ModelManager.get_model_type(self.base_model).upper()

    def get_all_models(self):
        """get_all_models() -> List[str]

Returns a list of all fine-tuned model paths under the current target path.

**Returns:**

- List[str]: A list of fine-tuned model identifiers or directories.
"""
        return self._impl._get_all_finetuned_models()

    def set_specific_finetuned_model(self, model_path):
        """set_specific_finetuned_model(model_path: str) -> None

Sets the model to be used from a specific fine-tuned model path.

Args:
    model_path (str): The path to the fine-tuned model to use.
"""
        return self._impl._set_specific_finetuned_model(model_path)

    @property
    def _deploy_type(self):
        if self._impl._deploy is not lazyllm.deploy.AutoDeploy:
            return self._impl._deploy
        elif self._impl._deployer:
            return type(self._impl._deployer)
        else:
            return lazyllm.deploy.AutoDeploy

    def wait(self):
        """Wait for the model deployment task to complete. This method blocks the current thread until the deployment is finished.


Examples:
    >>> import lazyllm
    >>> class Mywait(lazyllm.module.llms.TrainableModule):
    ...    def forward(self):
    ...        self.wait()
    """
        if launcher := self._impl._launchers['default'].get('deploy'):
            launcher.wait()

    def stop(self, task_name: Optional[str] = None):
        """Pause a specific task of the model.

Args:
    task_name (str): The name of the task to pause. Defaults to None (pauses the 'deploy' task by default).


Examples:
    >>> import lazyllm
    >>> class Mystop(lazyllm.module.llms.TrainableModule):
    ...    def forward(self, task):
    ...        self.stop(task)
    """
        try:
            launcher = self._impl._launchers['manual' if task_name else 'default'][task_name or 'deploy']
        except KeyError:
            raise RuntimeError('Cannot stop an unstarted task')
        if not task_name: self._impl._get_deploy_tasks.flag.reset()
        launcher.cleanup()

    def status(self, task_name: Optional[str] = None):
        """status(task_name: Optional[str] = None) -> str

Returns the current status of a specific task in the module.

Args:
    task_name (Optional[str]): Name of the task (e.g., 'deploy'). Defaults to 'deploy' if not provided.

**Returns:**

- str: Status string such as 'running', 'finished', or 'stopped'.
"""
        launcher = self._impl._launchers['manual' if task_name else 'default'][task_name or 'deploy']
        return launcher.status

    def log_path(self, task_name: Optional[str] = None):
        """Get task log path.

Get corresponding log file path based on task name, supports default deployment tasks and manually specified tasks.

Args:
    task_name (Optional[str]): Task name, defaults to None (get default deployment task log)

Returns:
    str: Log file path
"""
        launcher = self._impl._launchers['manual' if task_name else 'default'][task_name or 'deploy']
        return launcher.log_path

    # modify default value to ''
    def prompt(self, prompt: Union[str, dict] = '', history: Optional[List[List[str]]] = None):
        """Processes the input prompt and generates a format compatible with the model.

Args:
    prompt (str): The input prompt. Defaults to an empty string.
    history (List): Conversation history.


Examples:
    >>> import lazyllm
    >>> class Myprompt(lazyllm.module.llms.TrainableModule):
    ...    def forward(self, prompt, history):
    ...        self.prompt(prompt,history)
    """
        if self.base_model != '' and prompt == '' and self.type != 'LLM':
            prompt = None
        clear_system = isinstance(prompt, dict) and prompt.get('drop_builtin_system')
        prompter = super(__class__, self).prompt(prompt, history)._prompt
        self._tools = getattr(prompter, '_tools', None)
        keys = ModelManager.get_model_prompt_keys(self.base_model).copy()
        if keys:
            if clear_system: keys['system'] = ''
            prompter._set_model_configs(**keys)
            for key in ['tool_start_token', 'tool_args_token', 'tool_end_token']:
                if key in keys: setattr(self, f'_{key}', keys[key])
        if hasattr(self, '_openai_module'):
            self._openai_module.prompt(prompt, history=history)
        return self

    def formatter(self, format: Optional[FormatterBase] = None):
        super(__class__, self).formatter(format)
        if hasattr(self, '_openai_module'):
            self._openai_module.formatter(format)
        return self

    def share(self, *args, **kwargs):
        new = super(__class__, self).share(*args, **kwargs)
        if hasattr(self, '_openai_module'):
            new._openai_module = self._openai_module.share(*args, **kwargs)
        return new

    def _loads_str(self, text: str) -> Union[str, Dict]:
        try:
            ret = json.loads(text)
            return self._loads_str(ret) if isinstance(ret, str) else ret
        except Exception:
            LOG.error(f'{text} is not a valid json string.')
            return text

    def _parse_arguments_with_args_token(self, output: str) -> tuple[str, dict]:
        items = output.split(self._tool_args_token)
        func_name = items[0].strip()
        if len(items) == 1:
            return func_name.split(self._tool_end_token)[0].strip() if getattr(self, '_tool_end_token', None)\
                else func_name, {}
        args = (items[1].split(self._tool_end_token)[0].strip() if getattr(self, '_tool_end_token', None)
                else items[1].strip())
        return func_name, self._loads_str(args) if isinstance(args, str) else args

    def _parse_arguments_without_args_token(self, output: str) -> tuple[str, dict]:
        items = output.split(self._tool_end_token)[0] if getattr(self, '_tool_end_token', None) else output
        func_name = ''
        args = {}
        try:
            items = json.loads(items.strip())
            func_name = items.get('name', '')
            args = items.get('parameters', items.get('arguments', {}))
        except Exception:
            LOG.error(f'tool calls info {items} parse error')

        return func_name, self._loads_str(args) if isinstance(args, str) else args

    def _parse_arguments_with_tools(self, output: Dict[str, Any], tools: List[str]) -> bool:
        func_name = ''
        args = {}
        is_tc = False
        tc = {}
        if output.get('name', '') in tools:
            is_tc = True
            func_name = output.get('name', '')
            args = output.get('parameters', output.get('arguments', {}))
            tc = {'name': func_name, 'arguments': self._loads_str(args) if isinstance(args, str) else args}
            return is_tc, tc
        return is_tc, tc

    def _parse_tool_start_token(self, output: str) -> tuple[str, List[Dict]]:
        tool_calls = []
        segs = output.split(self._tool_start_token)
        content = segs[0]
        for seg in segs[1:]:
            func_name, arguments = self._parse_arguments_with_args_token(seg.strip())\
                if getattr(self, '_tool_args_token', None)\
                else self._parse_arguments_without_args_token(seg.strip())
            if func_name:
                tool_calls.append({'name': func_name, 'arguments': arguments})

        return content, tool_calls

    def _resolve_tools(self):
        from ...components.prompter.builtinPrompt import _DynamicValue
        tools = self._tools
        if isinstance(tools, _DynamicValue):
            tools = tools.resolve(None, None)
        return tools or []

    def _parse_tools(self, output: str) -> tuple[str, List[Dict]]:
        tool_calls = []
        tools = {tool['function']['name'] for tool in self._resolve_tools()}
        lines = output.strip().split('\n')
        content = []
        is_tool_call = False
        for idx, line in enumerate(lines):
            if line.startswith('{') and idx > 0:
                func_name = lines[idx - 1].strip()
                if func_name in tools:
                    is_tool_call = True
                    if func_name == content[-1].strip():
                        content.pop()
                    arguments = '\n'.join(lines[idx:]).strip()
                    tool_calls.append({'name': func_name, 'arguments': arguments})
                    continue
            if '{' in line and 'name' in line:
                try:
                    items = json.loads(line.strip())
                    items = [items] if isinstance(items, dict) else items
                    if isinstance(items, list):
                        for item in items:
                            is_tool_call, tc = self._parse_arguments_with_tools(item, tools)
                            if is_tool_call:
                                tool_calls.append(tc)
                except Exception:
                    LOG.error(f'tool calls info {line} parse error')
            if not is_tool_call:
                content.append(line)
        content = '\n'.join(content) if len(content) > 0 else ''
        return content, tool_calls

    def _extract_tool_calls(self, output: str) -> tuple[str, List[Dict]]:
        tool_calls = []
        content = ''
        if getattr(self, '_tool_start_token', None) and self._tool_start_token in output:
            content, tool_calls = self._parse_tool_start_token(output)
        elif self._tools:
            content, tool_calls = self._parse_tools(output)
        else:
            content = output

        return content, tool_calls

    def _decode_base64_to_file(self, content: str) -> str:
        decontent = decode_query_with_filepaths(content)
        files = [_base64_to_file(file_content) if _is_base64_with_mime(file_content) else file_content
                 for file_content in decontent['files']]
        return encode_query_with_filepaths(query=decontent['query'], files=files)

    def _extract_and_format(self, output: str) -> str:
        """
        1.extract tool calls information;
            a. If 'tool_start_token' exists, the boundary of tool_calls can be found according to 'tool_start_token',
               and then the function name and arguments of tool_calls can be extracted according to 'tool_args_token'
               and 'tool_end_token'.
            b. If 'tool_start_token' does not exist, the text is segmented using '\\n' according to the incoming tools
               information, and then processed according to the rules.
        """
        content, tool_calls = self._extract_tool_calls(output)
        if isinstance(content, str) and content.startswith(LAZYLLM_QUERY_PREFIX):
            content = self._decode_base64_to_file(content)
        tc = [{'id': str(uuid.uuid4().hex), 'type': 'function', 'function': tool_call} for tool_call in tool_calls]
        return dict(role='assistant', content=content, tool_calls=tc)

    def __repr__(self):
        return lazyllm.make_repr('Module', 'Trainable', mode=self._impl._mode, basemodel=self.base_model,
                                 target=self.target_path, name=self.name, deploy_type=self._deploy_type,
                                 stream=bool(self._stream), return_trace=self._return_trace)

    def __getattr__(self, key):
        if key in self.__class__.builder_keys:
            return functools.partial(getattr(self._impl, key), _return_value=self)
        raise AttributeError(f'{__class__} object has no attribute {key}')

    def _record_usage(self, text_input_for_token_usage: str, temp_output: str):
        usage = {'prompt_tokens': self._estimate_token_usage(text_input_for_token_usage)}
        usage['completion_tokens'] = self._estimate_token_usage(temp_output)
        self._record_usage_impl(usage)

    def _record_usage_impl(self, usage: dict):
        globals['usage'][self._module_id] = usage
        par_muduleid = self._used_by_moduleid
        if par_muduleid is None:
            return
        if par_muduleid not in globals['usage']:
            globals['usage'][par_muduleid] = usage
            return
        existing_usage = globals['usage'][par_muduleid]
        if existing_usage['prompt_tokens'] == -1 or usage['prompt_tokens'] == -1:
            globals['usage'][par_muduleid] = {'prompt_tokens': -1, 'completion_tokens': -1}
        else:
            for k in globals['usage'][par_muduleid]:
                globals['usage'][par_muduleid][k] += usage[k]

    def forward(self, __input: Union[Tuple[Union[str, Dict], str], str, Dict] = package(),  # noqa B008
                *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False,
                max_retries: int = 3, **kw):
        """Supports handling various input formats, automatically builds the input structure required by the model, and adapts to multimodal scenarios.


Examples:
    >>> import lazyllm
    >>> from lazyllm.module import TrainableModule
    >>> class MyModule(TrainableModule):
    ...     def forward(self, __input, **kw):
    ...         return f"processed: {__input}"
    ...
    >>> MyModule()("Hello")
    'processed: Hello'
    """
        if self._url.endswith('/v1/'):
            return self.forward_openai(__input, llm_chat_history=llm_chat_history, lazyllm_files=lazyllm_files,
                                       tools=tools, stream_output=stream_output, max_retries=max_retries, **kw)
        else:
            return self.forward_standard(__input, llm_chat_history=llm_chat_history, lazyllm_files=lazyllm_files,
                                         tools=tools, stream_output=stream_output, max_retries=max_retries, **kw)

    def forward_openai(self, __input: Union[Tuple[Union[str, Dict], str], str, Dict] = package(),  # noqa B008
                       *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False,
                       max_retries: int = 3, **kw):
        """Perform forward inference using OpenAI compatible interface.

Call deployed model service through OpenAI standard API format, supports chat history, file processing, tool calling and streaming output.

Args:
    __input (Union[Tuple[Union[str, Dict], str], str, Dict]): Input data, can be text, dictionary or packaged data
    llm_chat_history: Chat history records
    lazyllm_files: File data
    tools: Tool calling configuration
    stream_output (bool): Whether to stream output
    **kw: Other keyword arguments

Returns:
    Model inference result
"""
        if not getattr(self, '_openai_module', None):
            model_type = self.type.lower()
            if model_type in ['llm', 'vlm']:
                self._openai_module = lazyllm.OnlineChatModule(
                    source='openai', model='lazyllm', base_url=self._url, skip_auth=True, type=model_type,
                    stream=self._stream).share(prompt=self._prompt, format=self._formatter)
                self._openai_module._prompt._set_model_configs(
                    system='You are LazyLLM, a large language model developed by SenseTime.'
                )
            elif model_type in ['embed', 'rerank']:
                self._openai_module = lazyllm.OnlineEmbeddingModule(
                    source='openai', embed_model_name='lazyllm', embed_url=self._url, type=model_type)
            else:
                raise ValueError(f'Unsupported type: {model_type} for openai compatible module')
            self._openai_module.used_by(self._module_id)
        return self._openai_module.forward(__input, llm_chat_history=llm_chat_history, lazyllm_files=lazyllm_files,
                                           tools=tools, stream_output=stream_output, max_retries=max_retries, **kw)

    def forward_standard(self, __input: Union[Tuple[Union[str, Dict], str], str, Dict] = package(),  # noqa B008
                         *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False,
                         max_retries: int = 3, **kw):
        """Perform forward inference using standard interface.

Call deployed model service through custom standard API format, supports template messages, file encoding and streaming output.

Args:
    __input (Union[Tuple[Union[str, Dict], str], str, Dict]): Input data, can be text, dictionary or packaged data
    llm_chat_history: Chat history records
    lazyllm_files: File data
    tools: Tool calling configuration
    stream_output (bool): Whether to stream output
    **kw: Other keyword arguments

Returns:
    Model inference result
"""
        __input, files = self._get_files(__input, lazyllm_files)
        text_input_for_token_usage = __input = self._prompt.generate_prompt(__input, llm_chat_history, tools)
        url = self._url

        if self.template_message:
            data = self._modify_parameters(copy.deepcopy(self.template_message), kw, optional_keys='modality')
            data[self.keys_name_handle.get('inputs', 'inputs')] = __input
            if files and (keys := list(set(self.keys_name_handle).intersection(LazyLLMDeployBase.encoder_map.keys()))):
                assert len(keys) == 1, 'Only one key is supported for encoder_mapping'
                data[self.keys_name_handle[keys[0]]] = encode_files(files, LazyLLMDeployBase.encoder_map[keys[0]])

            if stream_output:
                if self.stream_url_suffix and not url.endswith(self.stream_url_suffix):
                    url += self.stream_url_suffix
                if 'stream' in data: data['stream'] = stream_output
        else:
            data = __input
            if stream_output: LOG.warning('stream_output is not supported when template_message is not set, ignore it')
            assert not kw, 'kw is not supported when template_message is not set'

        if tools or self._tools:
            stop_key = next((k for k in data if k.startswith('stop')), 'stop')
            data[stop_key] = (data.get(stop_key) or []) + [self._tool_end_token]

        inputs_key = self.keys_name_handle.get('inputs', 'inputs')
        original_input = data.get(inputs_key, '') if isinstance(data, dict) else data
        _RETRY_DELAYS = [3, 10, 30]
        partial_out: List = []

        with self.stream_output((stream_output := (stream_output or self._stream))):
            for attempt in range(max_retries):
                if attempt > 0 and partial_out:
                    partial_messages = partial_out[0]
                    if isinstance(data, dict):
                        data = dict(data)
                        data[inputs_key] = original_input + partial_messages
                    else:
                        data = original_input + partial_messages
                    partial_out.clear()
                try:
                    return self._forward_impl(data, stream_output=stream_output, url=url,
                                              text_input=text_input_for_token_usage, partial_out=partial_out)
                except (requests.exceptions.ChunkedEncodingError, requests.exceptions.ConnectionError):
                    if attempt < max_retries - 1:
                        LOG.warning(f'Stream interrupted (attempt {attempt + 1}), retrying...')
                        time.sleep(_RETRY_DELAYS[min(attempt, len(_RETRY_DELAYS) - 1)])
                        continue
                    raise
                except requests.RequestException:
                    if attempt < max_retries - 1:
                        LOG.warning(f'Request failed (attempt {attempt + 1}), retrying...')
                        time.sleep(_RETRY_DELAYS[min(attempt, len(_RETRY_DELAYS) - 1)])
                        continue
                    raise

    def _maybe_has_fc(self, token: str, chunk: str) -> bool:
        return token and (token.startswith(chunk if token.startswith('\n') else chunk.lstrip('\n')) or token in chunk)

    def _forward_impl(self, data: Union[Tuple[Union[str, Dict], str], str, Dict] = package(), *,  # noqa B008
                      url: str, stream_output: Optional[Union[bool, Dict]] = None,
                      text_input: Optional[str] = None,
                      partial_out: Optional[List] = None) -> Tuple[Any, str]:
        headers = self.template_headers or {'Content-Type': 'application/json'}
        parse_parameters = self.stream_parse_parameters if stream_output else {'delimiter': b'<|lazyllm_delimiter|>'}

        # context bug with httpx, so we use requests
        with requests.post(url, json=data, stream=True, headers=headers, proxies={'http': None, 'https': None}) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            messages, cache = '', ''
            token = getattr(self, '_tool_start_token', '')
            color = stream_output.get('color') if isinstance(stream_output, dict) else None

            for line in r.iter_lines(**parse_parameters):
                if not line: continue
                line = self._decode_line(line)

                chunk = self._prompt.get_response(self.extract_result_func(line, data))
                chunk = chunk[len(messages):] if isinstance(chunk, str) and chunk.startswith(messages) else chunk
                messages = chunk if not isinstance(chunk, str) else messages + chunk
                if partial_out is not None: partial_out[:] = [messages]

                if not stream_output: continue
                if not cache: cache = chunk if self._maybe_has_fc(token, chunk) else self._stream_output(chunk, color)
                elif token in cache:
                    stream_output = False
                    if not cache.startswith(token): self._stream_output(cache.split(token)[0], color)
                else:
                    cache += chunk
                    if not self._maybe_has_fc(token, cache): cache = self._stream_output(cache, color)

        if text_input: self._record_usage(text_input, messages)
        temp_output = self._extract_and_format(messages)
        return self._formatter(temp_output)

    def _modify_parameters(self, paras: dict, kw: dict, *, optional_keys: Union[List[str], str] = None):
        for key, value in paras.items():
            if key == self.keys_name_handle['inputs']: continue
            elif isinstance(value, dict):
                if key in kw:
                    assert set(kw[key].keys()).issubset(set(value.keys()))
                    value.update(kw.pop(key))
                else: [setattr(value, k, kw.pop(k)) for k in value.keys() if k in kw]
            elif key in kw: paras[key] = kw.pop(key)

        optional_keys = [optional_keys] if isinstance(optional_keys, str) else (optional_keys or [])
        assert set(kw.keys()).issubset(set(optional_keys)), f'{kw.keys()} is not in {optional_keys}'
        paras.update(kw)
        return paras

    def set_default_parameters(self, *, optional_keys: Optional[List[str]] = None, **kw):
        """set_default_parameters(*, optional_keys: List[str] = [], **kw) -> None

Sets the default parameters to be used during inference or evaluation.

Args:
    optional_keys (List[str]): A list of optional keys to allow additional parameters without error.
    **kw: Key-value pairs for default parameters such as temperature, top_k, etc.
"""
        self._modify_parameters(self.template_message, kw, optional_keys=optional_keys or [])

    def _cache_miss_handler(self):
        if not self._url or self._url == fake_url:
            raise RuntimeError('Cache miss, please use `start()` to deploy the module first')

    def __getstate__(self):
        state = self.__dict__.copy()
        state['base_model'] = self._impl._base_model
        return state

    def __setstate__(self, state):
        self.__dict__.update(state)
        self._impl._base_model = state['base_model']

`wait()`

Wait for the model deployment task to complete. This method blocks the current thread until the deployment is finished.

Examples:

>>> import lazyllm
>>> class Mywait(lazyllm.module.llms.TrainableModule):
...    def forward(self):
...        self.wait()

Source code in lazyllm/module/llms/trainablemodule.py

    def wait(self):
        """Wait for the model deployment task to complete. This method blocks the current thread until the deployment is finished.


Examples:
    >>> import lazyllm
    >>> class Mywait(lazyllm.module.llms.TrainableModule):
    ...    def forward(self):
    ...        self.wait()
    """
        if launcher := self._impl._launchers['default'].get('deploy'):
            launcher.wait()

`stop(task_name=None)`

Pause a specific task of the model.

Parameters:

task_name (str, default: None ) –

The name of the task to pause. Defaults to None (pauses the 'deploy' task by default).

Examples:

>>> import lazyllm
>>> class Mystop(lazyllm.module.llms.TrainableModule):
...    def forward(self, task):
...        self.stop(task)

Source code in lazyllm/module/llms/trainablemodule.py

    def stop(self, task_name: Optional[str] = None):
        """Pause a specific task of the model.

Args:
    task_name (str): The name of the task to pause. Defaults to None (pauses the 'deploy' task by default).


Examples:
    >>> import lazyllm
    >>> class Mystop(lazyllm.module.llms.TrainableModule):
    ...    def forward(self, task):
    ...        self.stop(task)
    """
        try:
            launcher = self._impl._launchers['manual' if task_name else 'default'][task_name or 'deploy']
        except KeyError:
            raise RuntimeError('Cannot stop an unstarted task')
        if not task_name: self._impl._get_deploy_tasks.flag.reset()
        launcher.cleanup()

`prompt(prompt='', history=None)`

Processes the input prompt and generates a format compatible with the model.

Parameters:

prompt (str, default: '' ) –

The input prompt. Defaults to an empty string.
history (List, default: None ) –

Conversation history.

Examples:

>>> import lazyllm
>>> class Myprompt(lazyllm.module.llms.TrainableModule):
...    def forward(self, prompt, history):
...        self.prompt(prompt,history)

Source code in lazyllm/module/llms/trainablemodule.py

    def prompt(self, prompt: Union[str, dict] = '', history: Optional[List[List[str]]] = None):
        """Processes the input prompt and generates a format compatible with the model.

Args:
    prompt (str): The input prompt. Defaults to an empty string.
    history (List): Conversation history.


Examples:
    >>> import lazyllm
    >>> class Myprompt(lazyllm.module.llms.TrainableModule):
    ...    def forward(self, prompt, history):
    ...        self.prompt(prompt,history)
    """
        if self.base_model != '' and prompt == '' and self.type != 'LLM':
            prompt = None
        clear_system = isinstance(prompt, dict) and prompt.get('drop_builtin_system')
        prompter = super(__class__, self).prompt(prompt, history)._prompt
        self._tools = getattr(prompter, '_tools', None)
        keys = ModelManager.get_model_prompt_keys(self.base_model).copy()
        if keys:
            if clear_system: keys['system'] = ''
            prompter._set_model_configs(**keys)
            for key in ['tool_start_token', 'tool_args_token', 'tool_end_token']:
                if key in keys: setattr(self, f'_{key}', keys[key])
        if hasattr(self, '_openai_module'):
            self._openai_module.prompt(prompt, history=history)
        return self

`log_path(task_name=None)`

Get task log path.

Get corresponding log file path based on task name, supports default deployment tasks and manually specified tasks.

Parameters:

task_name (Optional[str], default: None ) –

Task name, defaults to None (get default deployment task log)

Returns:

str –

Log file path

Source code in lazyllm/module/llms/trainablemodule.py

    def log_path(self, task_name: Optional[str] = None):
        """Get task log path.

Get corresponding log file path based on task name, supports default deployment tasks and manually specified tasks.

Args:
    task_name (Optional[str]): Task name, defaults to None (get default deployment task log)

Returns:
    str: Log file path
"""
        launcher = self._impl._launchers['manual' if task_name else 'default'][task_name or 'deploy']
        return launcher.log_path

`forward_openai(__input=package(), *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False, max_retries=3, **kw)`

Perform forward inference using OpenAI compatible interface.

Call deployed model service through OpenAI standard API format, supports chat history, file processing, tool calling and streaming output.

Parameters:

__input (Union[Tuple[Union[str, Dict], str], str, Dict], default: package() ) –

Input data, can be text, dictionary or packaged data
llm_chat_history –

Chat history records
lazyllm_files –

File data
tools –

Tool calling configuration
stream_output (bool, default: False ) –

Whether to stream output
**kw –

Other keyword arguments

Returns:

–

Model inference result

Source code in lazyllm/module/llms/trainablemodule.py

    def forward_openai(self, __input: Union[Tuple[Union[str, Dict], str], str, Dict] = package(),  # noqa B008
                       *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False,
                       max_retries: int = 3, **kw):
        """Perform forward inference using OpenAI compatible interface.

Call deployed model service through OpenAI standard API format, supports chat history, file processing, tool calling and streaming output.

Args:
    __input (Union[Tuple[Union[str, Dict], str], str, Dict]): Input data, can be text, dictionary or packaged data
    llm_chat_history: Chat history records
    lazyllm_files: File data
    tools: Tool calling configuration
    stream_output (bool): Whether to stream output
    **kw: Other keyword arguments

Returns:
    Model inference result
"""
        if not getattr(self, '_openai_module', None):
            model_type = self.type.lower()
            if model_type in ['llm', 'vlm']:
                self._openai_module = lazyllm.OnlineChatModule(
                    source='openai', model='lazyllm', base_url=self._url, skip_auth=True, type=model_type,
                    stream=self._stream).share(prompt=self._prompt, format=self._formatter)
                self._openai_module._prompt._set_model_configs(
                    system='You are LazyLLM, a large language model developed by SenseTime.'
                )
            elif model_type in ['embed', 'rerank']:
                self._openai_module = lazyllm.OnlineEmbeddingModule(
                    source='openai', embed_model_name='lazyllm', embed_url=self._url, type=model_type)
            else:
                raise ValueError(f'Unsupported type: {model_type} for openai compatible module')
            self._openai_module.used_by(self._module_id)
        return self._openai_module.forward(__input, llm_chat_history=llm_chat_history, lazyllm_files=lazyllm_files,
                                           tools=tools, stream_output=stream_output, max_retries=max_retries, **kw)

`forward_standard(__input=package(), *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False, max_retries=3, **kw)`

Perform forward inference using standard interface.

Call deployed model service through custom standard API format, supports template messages, file encoding and streaming output.

Parameters:

__input (Union[Tuple[Union[str, Dict], str], str, Dict], default: package() ) –

Input data, can be text, dictionary or packaged data
llm_chat_history –

Chat history records
lazyllm_files –

File data
tools –

Tool calling configuration
stream_output (bool, default: False ) –

Whether to stream output
**kw –

Other keyword arguments

Returns:

–

Model inference result

Source code in lazyllm/module/llms/trainablemodule.py

    def forward_standard(self, __input: Union[Tuple[Union[str, Dict], str], str, Dict] = package(),  # noqa B008
                         *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False,
                         max_retries: int = 3, **kw):
        """Perform forward inference using standard interface.

Call deployed model service through custom standard API format, supports template messages, file encoding and streaming output.

Args:
    __input (Union[Tuple[Union[str, Dict], str], str, Dict]): Input data, can be text, dictionary or packaged data
    llm_chat_history: Chat history records
    lazyllm_files: File data
    tools: Tool calling configuration
    stream_output (bool): Whether to stream output
    **kw: Other keyword arguments

Returns:
    Model inference result
"""
        __input, files = self._get_files(__input, lazyllm_files)
        text_input_for_token_usage = __input = self._prompt.generate_prompt(__input, llm_chat_history, tools)
        url = self._url

        if self.template_message:
            data = self._modify_parameters(copy.deepcopy(self.template_message), kw, optional_keys='modality')
            data[self.keys_name_handle.get('inputs', 'inputs')] = __input
            if files and (keys := list(set(self.keys_name_handle).intersection(LazyLLMDeployBase.encoder_map.keys()))):
                assert len(keys) == 1, 'Only one key is supported for encoder_mapping'
                data[self.keys_name_handle[keys[0]]] = encode_files(files, LazyLLMDeployBase.encoder_map[keys[0]])

            if stream_output:
                if self.stream_url_suffix and not url.endswith(self.stream_url_suffix):
                    url += self.stream_url_suffix
                if 'stream' in data: data['stream'] = stream_output
        else:
            data = __input
            if stream_output: LOG.warning('stream_output is not supported when template_message is not set, ignore it')
            assert not kw, 'kw is not supported when template_message is not set'

        if tools or self._tools:
            stop_key = next((k for k in data if k.startswith('stop')), 'stop')
            data[stop_key] = (data.get(stop_key) or []) + [self._tool_end_token]

        inputs_key = self.keys_name_handle.get('inputs', 'inputs')
        original_input = data.get(inputs_key, '') if isinstance(data, dict) else data
        _RETRY_DELAYS = [3, 10, 30]
        partial_out: List = []

        with self.stream_output((stream_output := (stream_output or self._stream))):
            for attempt in range(max_retries):
                if attempt > 0 and partial_out:
                    partial_messages = partial_out[0]
                    if isinstance(data, dict):
                        data = dict(data)
                        data[inputs_key] = original_input + partial_messages
                    else:
                        data = original_input + partial_messages
                    partial_out.clear()
                try:
                    return self._forward_impl(data, stream_output=stream_output, url=url,
                                              text_input=text_input_for_token_usage, partial_out=partial_out)
                except (requests.exceptions.ChunkedEncodingError, requests.exceptions.ConnectionError):
                    if attempt < max_retries - 1:
                        LOG.warning(f'Stream interrupted (attempt {attempt + 1}), retrying...')
                        time.sleep(_RETRY_DELAYS[min(attempt, len(_RETRY_DELAYS) - 1)])
                        continue
                    raise
                except requests.RequestException:
                    if attempt < max_retries - 1:
                        LOG.warning(f'Request failed (attempt {attempt + 1}), retrying...')
                        time.sleep(_RETRY_DELAYS[min(attempt, len(_RETRY_DELAYS) - 1)])
                        continue
                    raise

`forward(__input=package(), *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False, max_retries=3, **kw)`

Supports handling various input formats, automatically builds the input structure required by the model, and adapts to multimodal scenarios.

Examples:

>>> import lazyllm
>>> from lazyllm.module import TrainableModule
>>> class MyModule(TrainableModule):
...     def forward(self, __input, **kw):
...         return f"processed: {__input}"
...
>>> MyModule()("Hello")
'processed: Hello'

Source code in lazyllm/module/llms/trainablemodule.py

    def forward(self, __input: Union[Tuple[Union[str, Dict], str], str, Dict] = package(),  # noqa B008
                *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False,
                max_retries: int = 3, **kw):
        """Supports handling various input formats, automatically builds the input structure required by the model, and adapts to multimodal scenarios.


Examples:
    >>> import lazyllm
    >>> from lazyllm.module import TrainableModule
    >>> class MyModule(TrainableModule):
    ...     def forward(self, __input, **kw):
    ...         return f"processed: {__input}"
    ...
    >>> MyModule()("Hello")
    'processed: Hello'
    """
        if self._url.endswith('/v1/'):
            return self.forward_openai(__input, llm_chat_history=llm_chat_history, lazyllm_files=lazyllm_files,
                                       tools=tools, stream_output=stream_output, max_retries=max_retries, **kw)
        else:
            return self.forward_standard(__input, llm_chat_history=llm_chat_history, lazyllm_files=lazyllm_files,
                                         tools=tools, stream_output=stream_output, max_retries=max_retries, **kw)

`lazyllm.module.UrlModule`

Bases: ModuleBase, LLMBase, _UrlHelper

The URL obtained from deploying the ServerModule can be wrapped into a Module. When calling __call__ , it will access the service.

Parameters:

url (str, default: '' ) –

The URL of the service to be wrapped, defaults to empty string.
stream (bool | Dict[str, str], default: False ) –

Whether to request and output in streaming mode, default is non-streaming.
return_trace (bool, default: False ) –

Whether to record the results in trace, default is False.
init_prompt (bool, default: True ) –

Whether to initialize prompt, defaults to True.

Examples:

>>> import lazyllm
>>> def demo(input): return input * 2
... 
>>> s = lazyllm.ServerModule(demo, launcher=lazyllm.launchers.empty(sync=False))
>>> s.start()
INFO:     Uvicorn running on http://0.0.0.0:35485
>>> u = lazyllm.UrlModule(url=s._url)
>>> print(u(1))
2

Source code in lazyllm/module/servermodule.py

class UrlModule(ModuleBase, LLMBase, _UrlHelper):
    """The URL obtained from deploying the ServerModule can be wrapped into a Module. When calling ``__call__`` , it will access the service.

Args:
    url (str): The URL of the service to be wrapped, defaults to empty string.
    stream (bool|Dict[str, str]): Whether to request and output in streaming mode, default is non-streaming.
    return_trace (bool): Whether to record the results in trace, default is False.
    init_prompt (bool): Whether to initialize prompt, defaults to True.


Examples:
    >>> import lazyllm
    >>> def demo(input): return input * 2
    ... 
    >>> s = lazyllm.ServerModule(demo, launcher=lazyllm.launchers.empty(sync=False))
    >>> s.start()
    INFO:     Uvicorn running on http://0.0.0.0:35485
    >>> u = lazyllm.UrlModule(url=s._url)
    >>> print(u(1))
    2
    """

    def __new__(cls, *args, **kw):
        if cls is not UrlModule:
            return super().__new__(cls)
        return ServerModule(*args, **kw)

    def __init__(self, *, url: Optional[str] = '', stream: Union[bool, Dict[str, str]] = False,
                 return_trace: bool = False, init_prompt: bool = True):
        super().__init__(return_trace=return_trace)
        LLMBase.__init__(self, stream=stream, init_prompt=init_prompt)
        _UrlHelper.__init__(self, url)

    def _estimate_token_usage(self, text):
        if not isinstance(text, str):
            return 0
        # extract english words, number and comma
        pattern = r'\b[a-zA-Z0-9]+\b|,'
        ascii_words = re.findall(pattern, text)
        ascii_ch_count = sum(len(ele) for ele in ascii_words)
        non_ascii_pattern = r'[^\x00-\x7F]'
        non_ascii_chars = re.findall(non_ascii_pattern, text)
        non_ascii_char_count = len(non_ascii_chars)
        return int(ascii_ch_count / 3.0 + non_ascii_char_count + 1)

    def _decode_line(self, line: bytes):
        try:
            return pickle.loads(codecs.decode(line, 'base64'))
        except Exception:
            return line.decode('utf-8')

    def _extract_and_format(self, output: str) -> str:
        return output

    def forward(self, *args, **kw):
        """Defines the computation steps to be executed each time. All subclasses of ModuleBase need to override this function.


Examples:
    >>> import lazyllm
    >>> class MyModule(lazyllm.module.ModuleBase):
    ...    def forward(self, input):
    ...        return input + 1
    ...
    >>> MyModule()(1)
    2
    """
        raise NotImplementedError

    def __call__(self, *args, **kw):
        assert self._url is not None, f'Please start {self.__class__} first'
        if len(args) > 1:
            return super(__class__, self).__call__(package(args), **kw)
        return super(__class__, self).__call__(*args, **kw)

    def __repr__(self):
        return lazyllm.make_repr('Module', 'Url', name=self.name, url=self._url,
                                 stream=self._stream, return_trace=self._return_trace)

`forward(*args, **kw)`

Defines the computation steps to be executed each time. All subclasses of ModuleBase need to override this function.

Examples:

>>> import lazyllm
>>> class MyModule(lazyllm.module.ModuleBase):
...    def forward(self, input):
...        return input + 1
...
>>> MyModule()(1)
2

Source code in lazyllm/module/servermodule.py

    def forward(self, *args, **kw):
        """Defines the computation steps to be executed each time. All subclasses of ModuleBase need to override this function.


Examples:
    >>> import lazyllm
    >>> class MyModule(lazyllm.module.ModuleBase):
    ...    def forward(self, input):
    ...        return input + 1
    ...
    >>> MyModule()(1)
    2
    """
        raise NotImplementedError

`lazyllm.module.ServerModule`

Bases: UrlModule

The ServerModule class inherits from UrlModule and provides functionality to deploy any callable object as an API service.
Built on FastAPI, it supports launching a main service with multiple satellite services, as well as preprocessing, postprocessing, and streaming capabilities.
A local callable can be deployed as a service, or an existing service can be accessed directly via a URL.

Parameters:

m (Optional[Union[str, ModuleBase]], default: None ) –

The module or its name to be wrapped as a service.
If a string is provided, it is treated as a URL and url must be None.
If a ModuleBase is provided, it will be wrapped as a service.
pre (Optional[Callable], default: None ) –

Preprocessing function executed in the service process. Default is None.
post (Optional[Callable], default: None ) –

Postprocessing function executed in the service process. Default is None.
stream (Union[bool, Dict], default: False ) –

Whether to enable streaming output. Can be a boolean or a dictionary with streaming configuration. Default is False.
return_trace (Optional[bool], default: False ) –

Whether to return debug trace information. Default is False.
port (Optional[int], default: None ) –

Port to deploy the service. If None, a random port will be assigned.
pythonpath (Optional[str], default: None ) –

PYTHONPATH environment variable passed to the subprocess. Defaults to None.
launcher (Optional[LazyLLMLaunchersBase], default: None ) –

The launcher used to deploy the service. Defaults to asynchronous remote deployment.
url (Optional[str], default: None ) –

URL of an already deployed service. If provided, m must be None.

Examples:

>>> import lazyllm
>>> def demo(input): return input * 2
...
>>> s = lazyllm.ServerModule(demo, launcher=launchers.empty(sync=False))
>>> s.start()
INFO:     Uvicorn running on http://0.0.0.0:35485
>>> print(s(1))
2

>>> class MyServe(object):
...     def __call__(self, input):
...         return 2 * input
...
...     @lazyllm.FastapiApp.post
...     def server1(self, input):
...         return f'reply for {input}'
...
...     @lazyllm.FastapiApp.get
...     def server2(self):
...        return f'get method'
...
>>> m = lazyllm.ServerModule(MyServe(), launcher=launchers.empty(sync=False))
>>> m.start()
INFO:     Uvicorn running on http://0.0.0.0:32028
>>> print(m(1))
2

Source code in lazyllm/module/servermodule.py

class ServerModule(UrlModule):
    """The ServerModule class inherits from UrlModule and provides functionality to deploy any callable object as an API service.  
Built on FastAPI, it supports launching a main service with multiple satellite services, as well as preprocessing, postprocessing, and streaming capabilities.  
A local callable can be deployed as a service, or an existing service can be accessed directly via a URL.

Args:
    m (Optional[Union[str, ModuleBase]]): The module or its name to be wrapped as a service.  
        If a string is provided, it is treated as a URL and `url` must be None.  
        If a ModuleBase is provided, it will be wrapped as a service.
    pre (Optional[Callable]): Preprocessing function executed in the service process. Default is ``None``.
    post (Optional[Callable]): Postprocessing function executed in the service process. Default is ``None``.
    stream (Union[bool, Dict]): Whether to enable streaming output. Can be a boolean or a dictionary with streaming configuration. Default is ``False``.
    return_trace (Optional[bool]): Whether to return debug trace information. Default is ``False``.
    port (Optional[int]): Port to deploy the service. If ``None``, a random port will be assigned.
    pythonpath (Optional[str]): PYTHONPATH environment variable passed to the subprocess. Defaults to ``None``.
    launcher (Optional[LazyLLMLaunchersBase]): The launcher used to deploy the service. Defaults to asynchronous remote deployment.
    url (Optional[str]): URL of an already deployed service. If provided, `m` must be None.


Examples:
    >>> import lazyllm
    >>> def demo(input): return input * 2
    ...
    >>> s = lazyllm.ServerModule(demo, launcher=launchers.empty(sync=False))
    >>> s.start()
    INFO:     Uvicorn running on http://0.0.0.0:35485
    >>> print(s(1))
    2

    >>> class MyServe(object):
    ...     def __call__(self, input):
    ...         return 2 * input
    ...
    ...     @lazyllm.FastapiApp.post
    ...     def server1(self, input):
    ...         return f'reply for {input}'
    ...
    ...     @lazyllm.FastapiApp.get
    ...     def server2(self):
    ...        return f'get method'
    ...
    >>> m = lazyllm.ServerModule(MyServe(), launcher=launchers.empty(sync=False))
    >>> m.start()
    INFO:     Uvicorn running on http://0.0.0.0:32028
    >>> print(m(1))
    2
    """
    def __init__(self, m: Optional[Union[str, ModuleBase]] = None, pre: Optional[Callable] = None,
                 post: Optional[Callable] = None, stream: Union[bool, Dict] = False,
                 return_trace: bool = False, port: Optional[int] = None, pythonpath: Optional[str] = None,
                 launcher: Optional[LazyLLMLaunchersBase] = None, url: Optional[str] = None,
                 num_replicas: int = 1, security_key: Optional[Union[str, bool]] = None):
        assert stream is False or return_trace is False, 'Module with stream output has no trace'
        assert (post is None) or (stream is False), 'Stream cannot be true when post-action exists'
        if isinstance(m, str):
            assert url is None, 'url should be None when m is a url'
            url, m = m, None
        if url:
            assert is_valid_url(url), f'Invalid url: {url}'
            assert m is None, 'm should be None when url is provided'
        super().__init__(url=url, stream=stream, return_trace=return_trace)
        self._security_key = f'sk-{str(uuid.uuid4().hex)}' if security_key is True else security_key
        self._impl = _ServerModuleImpl(m, pre, post, launcher, port, pythonpath, self._url_wrapper,
                                       num_replicas=num_replicas, security_key=self._security_key)
        if url: self._impl._get_deploy_tasks.flag.set()

    _url_id = property(lambda self: self._impl._module_id)

    def wait(self):
        """Wait for the current module service to finish starting or executing.  
Typically used to block the main thread until the service finishes or is interrupted.  
"""
        self._impl._launcher.wait()

    def stop(self):
        """Stop the current module service and its related subprocesses.  
After this call, the module will no longer respond to requests.  
"""
        self._impl.stop()

    @property
    def status(self):
        return self._impl._launcher.status

    def _call(self, fname, *args, **kwargs):
        args, kwargs = lazyllm.dump_obj(args), lazyllm.dump_obj(kwargs)
        url = urljoin(self._url.rsplit('/', 1)[0], '_call')
        headers = {
            'Content-Type': 'application/json',
            'Global-Parameters': globals.pickled_data,
            'Session-ID': globals._sid,
        }
        r = requests.post(url, json=(fname, args, kwargs), headers=headers)
        if r.status_code != 200:
            try:
                error_info = r.json()
            except ValueError:
                error_info = r.text
            raise requests.RequestException(f'{r.status_code}: {error_info}')
        return pickle.loads(codecs.decode(r.content, 'base64'))

    def forward(self, __input: Union[Tuple[Union[str, Dict], str], str, Dict] = package(), **kw):  # noqa B008
        headers = {
            'Content-Type': 'application/json',
            'Global-Parameters': globals.pickled_data,
            'Session-ID': globals._sid,
            'Security-Key': self._security_key,
        }
        data = obj2str((__input, kw))

        # context bug with httpx, so we use requests
        with requests.post(self._url, json=data, stream=True, headers=headers,
                           proxies={'http': None, 'https': None}) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            messages = ''
            with self.stream_output(self._stream):
                for line in r.iter_lines(delimiter=b'<|lazyllm_delimiter|>'):
                    line = self._decode_line(line)
                    if self._stream:
                        self._stream_output(str(line), getattr(self._stream, 'get', lambda x: None)('color'))
                    messages = (messages + str(line)) if self._stream else line

                temp_output = self._extract_and_format(messages)
                return self._formatter(temp_output)

    def __repr__(self):
        return lazyllm.make_repr('Module', 'Server', subs=[repr(self._impl._m)], name=self.name,
                                 stream=self._stream, return_trace=self._return_trace)

`wait()`

Wait for the current module service to finish starting or executing.
Typically used to block the main thread until the service finishes or is interrupted.

Source code in lazyllm/module/servermodule.py

    def wait(self):
        """Wait for the current module service to finish starting or executing.  
Typically used to block the main thread until the service finishes or is interrupted.  
"""
        self._impl._launcher.wait()

`stop()`

Stop the current module service and its related subprocesses.
After this call, the module will no longer respond to requests.

Source code in lazyllm/module/servermodule.py

    def stop(self):
        """Stop the current module service and its related subprocesses.  
After this call, the module will no longer respond to requests.  
"""
        self._impl.stop()

`lazyllm.module.AutoModel`

A factory for quickly creating either an online OnlineModule or a local TrainableModule. It prioritizes user-provided arguments; when config is enabled, settings in auto_model_config_map can override them, and it automatically decides which module to build:

For online mode, arguments are passed through to OnlineModule (automatically matching OnlineChatModule / OnlineEmbeddingModule / OnlineMultiModalModule).
For local mode, it initializes TrainableModule with model and user parameters, then reads the config map for configuration values.

Parameters:

model (str) –

Name of the model, e.g., Qwen3-32B. Required.
config_id (Optional[str]) –

ID from the config file. Defaults to empty.
source (Optional[str]) –

Provider for online modules (qwen / glm / openai). Set to local to force a local TrainableModule.
type (Optional[str]) –

Model type. If omitted, it will try to fetch from kwargs or be inferred by the online module.
config (Union[str, bool]) –

Whether to enable overrides from auto_model_config_map, or a user-specified config file path. Defaults to True.
**kwargs –

Only the synonyms base_model, embed_model_name and model_name for model are accepted; no other user-supplied fields are allowed. Other model parameters (e.g. stream, type, url) should be specified in the configuration file (auto_model_config_map) and referenced via config_id so they are injected automatically.

Source code in lazyllm/module/llms/automodel.py

class AutoModel:
    """A factory for quickly creating either an online ``OnlineModule`` or a local ``TrainableModule``. It prioritizes user-provided arguments; when ``config`` is enabled, settings in ``auto_model_config_map`` can override them, and it automatically decides which module to build: 

- For online mode, arguments are passed through to ``OnlineModule`` (automatically matching OnlineChatModule / OnlineEmbeddingModule / OnlineMultiModalModule).

- For local mode, it initializes ``TrainableModule`` with ``model`` and user parameters, then reads the config map for configuration values.

Args:
    model (str): Name of the model, e.g., ``Qwen3-32B``. Required.
    config_id (Optional[str]): ID from the config file. Defaults to empty.
    source (Optional[str]): Provider for online modules (``qwen`` / ``glm`` / ``openai``). Set to ``local`` to force a local TrainableModule.
    type (Optional[str]): Model type. If omitted, it will try to fetch from kwargs or be inferred by the online module.
    config (Union[str, bool]): Whether to enable overrides from ``auto_model_config_map``, or a user-specified config file path. Defaults to True.
    **kwargs: Only the synonyms `base_model`, `embed_model_name` and `model_name` for `model` are accepted; no other user-supplied fields are allowed. Other model parameters (e.g. ``stream``, ``type``, ``url``) should be specified in the configuration file (``auto_model_config_map``) and referenced via ``config_id`` so they are injected automatically.
"""

    def __new__(cls, model: Optional[str] = None, *, config_id: Optional[str] = None, source: Optional[str] = None,  # noqa C901
                type: Optional[str] = None, config: Union[str, bool] = True, **kwargs: Any):
        # check and accomodate user params
        model = model or kwargs.pop('base_model', kwargs.pop('embed_model_name', kwargs.pop('model_name', None)))

        if source == 'dynamic':
            from .onlinemodule import OnlineChatModule
            dynamic_auth = kwargs.pop('dynamic_auth', False)
            return OnlineChatModule(source='dynamic', dynamic_auth=dynamic_auth, type=type, **kwargs)

        if model in lazyllm.online.chat:
            if source is not None:
                raise ValueError(
                    f'`{model!r}` is a recognised source name; pass it as `source=` and '
                    f'do not also set `source={source!r}`.')
            source, model = model, None

        if not model:
            try:
                return lazyllm.OnlineModule(source=source, type=type)
            except Exception as e:
                raise RuntimeError(f'`model` is not provided in AutoModel, and {e}') from None

        trainable_entry, online_entry = get_candidate_entries(model, config_id, source, config)

        # 1) first: try TrainableModule with trainable config (for directly connecting deployed endpoint)
        if trainable_entry is not None:
            trainable_args = process_trainable_args(
                model=model, type=type, source=source, config=config, entry=trainable_entry
            )
            try:
                module = TrainableModule(**trainable_args)
                if module._url or module._impl._get_deploy_tasks.flag: return module
            except Exception as e:
                LOG.warning('Fail to create `TrainableModule`, will try to '
                            f'load model {model} with `OnlineModule`. Since the error: {e}')

        # 2) second: try OnlineModule with online config if found
        if online_entry is not None:
            online_args = process_online_args(model=model, source=source, type=type, entry=online_entry)
            if online_args: return OnlineModule(**online_args)

        # 3) finally: fallback (no config or config unusable)
        try:
            return OnlineModule(model=model, source=source, type=type)
        except Exception as e:
            LOG.warning('`OnlineModule` creation failed, and will try to '
                        f'load model {model} with local `TrainableModule`. Since the error: {e}')
            return TrainableModule(model, type=type)

`lazyllm.module.TrialModule`

Bases: object

Parameter grid search module will traverse all its submodules, collect all searchable parameters, and iterate over these parameters for fine-tuning, deployment, and evaluation.

Parameters:

m (Callable) –

The submodule whose parameters will be grid-searched. Fine-tuning, deployment, and evaluation will be based on this module.

Examples:

>>> import lazyllm
>>> from lazyllm import finetune, deploy
>>> m = lazyllm.TrainableModule('b1', 't').finetune_method(finetune.dummy, **dict(a=lazyllm.Option(['f1', 'f2'])))
>>> m.deploy_method(deploy.dummy).mode('finetune').prompt(None)
>>> s = lazyllm.ServerModule(m, post=lambda x, ori: f'post2({x})')
>>> s.evalset([1, 2, 3])
>>> t = lazyllm.TrialModule(s)
>>> t.update()
>>>
dummy finetune!, and init-args is {a: f1}
dummy finetune!, and init-args is {a: f2}
[["post2(reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1})", "post2(reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1})", "post2(reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1})"], ["post2(reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1})", "post2(reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1})", "post2(reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1})"]]

Source code in lazyllm/module/trialmodule.py

class TrialModule(object):
    """Parameter grid search module will traverse all its submodules, collect all searchable parameters, and iterate over these parameters for fine-tuning, deployment, and evaluation.

Args:
    m (Callable): The submodule whose parameters will be grid-searched. Fine-tuning, deployment, and evaluation will be based on this module.


Examples:
    >>> import lazyllm
    >>> from lazyllm import finetune, deploy
    >>> m = lazyllm.TrainableModule('b1', 't').finetune_method(finetune.dummy, **dict(a=lazyllm.Option(['f1', 'f2'])))
    >>> m.deploy_method(deploy.dummy).mode('finetune').prompt(None)
    >>> s = lazyllm.ServerModule(m, post=lambda x, ori: f'post2({x})')
    >>> s.evalset([1, 2, 3])
    >>> t = lazyllm.TrialModule(s)
    >>> t.update()
    >>>
    dummy finetune!, and init-args is {a: f1}
    dummy finetune!, and init-args is {a: f2}
    [["post2(reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1})", "post2(reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1})", "post2(reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1})"], ["post2(reply for 1, and parameters is {'do_sample': False, 'temperature': 0.1})", "post2(reply for 2, and parameters is {'do_sample': False, 'temperature': 0.1})", "post2(reply for 3, and parameters is {'do_sample': False, 'temperature': 0.1})"]]
    """
    def __init__(self, m):
        self.m = m

    @staticmethod
    def work(m, q):
        """Static method to deepcopy the module, perform update in a subprocess, and put the evaluation result into a queue.

Args:
    m (Callable): The module to perform update on.
    q (multiprocessing.Queue): Queue to store evaluation results.
"""
        # update option at module.update()
        m = copy.deepcopy(m)
        m.update()
        q.put(m.eval_result)

    def update(self):
        """Iterates through all configuration options of the module, updates the module in parallel using multiprocessing, and collects the evaluation results for each configuration.
"""
        options = get_options(self.m)
        q = multiprocessing.Queue()
        ps = []
        for _ in OptionIter(options, get_options):
            p = ForkProcess(target=TrialModule.work, args=(self.m, q), sync=True)
            ps.append(p)
            p.start()
            time.sleep(1)
        [p.join() for p in ps]
        result = [q.get() for p in ps]
        LOG.info(f'{result}')

`update()`

Iterates through all configuration options of the module, updates the module in parallel using multiprocessing, and collects the evaluation results for each configuration.

Source code in lazyllm/module/trialmodule.py

    def update(self):
        """Iterates through all configuration options of the module, updates the module in parallel using multiprocessing, and collects the evaluation results for each configuration.
"""
        options = get_options(self.m)
        q = multiprocessing.Queue()
        ps = []
        for _ in OptionIter(options, get_options):
            p = ForkProcess(target=TrialModule.work, args=(self.m, q), sync=True)
            ps.append(p)
            p.start()
            time.sleep(1)
        [p.join() for p in ps]
        result = [q.get() for p in ps]
        LOG.info(f'{result}')

`work(m, q)` `staticmethod`

Static method to deepcopy the module, perform update in a subprocess, and put the evaluation result into a queue.

Parameters:

m (Callable) –

The module to perform update on.
q (Queue) –

Queue to store evaluation results.

Source code in lazyllm/module/trialmodule.py

    @staticmethod
    def work(m, q):
        """Static method to deepcopy the module, perform update in a subprocess, and put the evaluation result into a queue.

Args:
    m (Callable): The module to perform update on.
    q (multiprocessing.Queue): Queue to store evaluation results.
"""
        # update option at module.update()
        m = copy.deepcopy(m)
        m.update()
        q.put(m.eval_result)

`lazyllm.module.OnlineChatModule`

Bases: _DynamicSourceRouterMixin, LLMBase

Used to manage and create access modules for large model platforms currently available on the market. Currently, it supports openai, sensenova, glm, kimi, qwen, doubao, ppio and deepseek (since the platform does not allow recharges for the time being, access is not supported for the time being). For how to obtain the platform's API key, please visit Getting Started

Parameters:

model (str, default: None ) –

Specify the model to access (Note that you need to use Model ID or Endpoint ID when using Doubao. For details on how to obtain it, see Getting the Inference Access Point. Before using the model, you must first activate the corresponding service on the Doubao platform.), default is gpt-3.5-turbo(openai) / SenseChat-5(sensenova) / glm-4(glm) / moonshot-v1-8k(kimi) / qwen-plus(qwen) / mistral-7b-instruct-v0.2(doubao) / deepseek/deepseek-v3.2(ppio) . A recognised source name can also be passed here; it will be automatically swapped into source.
source (str, default: None ) –

Specify the type of module to create. Options include openai / sensenova / glm / kimi / qwen / doubao / ppio / deepseek (not yet supported) .
url (str, default: None ) –

Specify the base link of the platform to be accessed. The default is the official link. The alias base_url is also accepted.
system_prompt (str) –

Specify the requested system prompt. The default is the official system prompt.
api_key (str, default: None ) –

You can pass an explicit API key. If set to auto or dynamic, the key is resolved from config at runtime, enabling dynamic key switching.
stream (bool, default: True ) –

Whether to request and output in streaming mode, default is streaming.
dynamic_auth (bool, default: False ) –

Whether to enable dynamic auth. When True, it is equivalent to api_key='dynamic'.
return_trace (bool, default: False ) –

Whether to record the results in trace, default is False.

Examples:

>>> import lazyllm
>>> from functools import partial
>>> m = lazyllm.OnlineChatModule(source="sensenova", stream=True)
>>> query = "Hello!"
>>> with lazyllm.ThreadPoolExecutor(1) as executor:
...     future = executor.submit(partial(m, llm_chat_history=[]), query)
...     while True:
...         if value := lazyllm.FileSystemQueue().dequeue():
...             print(f"output: {''.join(value)}")
...         elif future.done():
...             break
...     print(f"ret: {future.result()}")
...
output: Hello
output: ! How can I assist you today?
ret: Hello! How can I assist you today?
>>> from lazyllm.components.formatter import encode_query_with_filepaths
>>> vlm = lazyllm.OnlineChatModule(source="sensenova", model="SenseChat-Vision")
>>> query = "what is it?"
>>> inputs = encode_query_with_filepaths(query, ["/path/to/your/image"])
>>> print(vlm(inputs))

Source code in lazyllm/module/llms/onlinemodule/chat.py

class OnlineChatModule(_DynamicSourceRouterMixin, LLMBase, metaclass=_ChatModuleMeta):
    """Used to manage and create access modules for large model platforms currently available on the market. Currently, it supports openai, sensenova, glm, kimi, qwen, doubao, ppio and deepseek (since the platform does not allow recharges for the time being, access is not supported for the time being). For how to obtain the platform's API key, please visit [Getting Started](/#platform)

Args:
    model (str): Specify the model to access (Note that you need to use Model ID or Endpoint ID when using Doubao. For details on how to obtain it, see [Getting the Inference Access Point](https://www.volcengine.com/docs/82379/1099522). Before using the model, you must first activate the corresponding service on the Doubao platform.), default is ``gpt-3.5-turbo(openai)`` / ``SenseChat-5(sensenova)`` / ``glm-4(glm)`` / ``moonshot-v1-8k(kimi)`` / ``qwen-plus(qwen)`` / ``mistral-7b-instruct-v0.2(doubao)`` / ``deepseek/deepseek-v3.2(ppio)`` . A recognised source name can also be passed here; it will be automatically swapped into ``source``.
    source (str): Specify the type of module to create. Options include  ``openai`` /  ``sensenova`` /  ``glm`` /  ``kimi`` /  ``qwen`` / ``doubao`` / ``ppio`` / ``deepseek (not yet supported)`` .
    url (str): Specify the base link of the platform to be accessed. The default is the official link. The alias ``base_url`` is also accepted.
    system_prompt (str): Specify the requested system prompt. The default is the official system prompt.
    api_key (str): You can pass an explicit API key. If set to ``auto`` or ``dynamic``, the key is resolved from config at runtime, enabling dynamic key switching.
    stream (bool): Whether to request and output in streaming mode, default is streaming.
    dynamic_auth (bool): Whether to enable dynamic auth. When True, it is equivalent to ``api_key='dynamic'``.
    return_trace (bool): Whether to record the results in trace, default is False.      


Examples:
    >>> import lazyllm
    >>> from functools import partial
    >>> m = lazyllm.OnlineChatModule(source="sensenova", stream=True)
    >>> query = "Hello!"
    >>> with lazyllm.ThreadPoolExecutor(1) as executor:
    ...     future = executor.submit(partial(m, llm_chat_history=[]), query)
    ...     while True:
    ...         if value := lazyllm.FileSystemQueue().dequeue():
    ...             print(f"output: {''.join(value)}")
    ...         elif future.done():
    ...             break
    ...     print(f"ret: {future.result()}")
    ...
    output: Hello
    output: ! How can I assist you today?
    ret: Hello! How can I assist you today?
    >>> from lazyllm.components.formatter import encode_query_with_filepaths
    >>> vlm = lazyllm.OnlineChatModule(source="sensenova", model="SenseChat-Vision")
    >>> query = "what is it?"
    >>> inputs = encode_query_with_filepaths(query, ["/path/to/your/image"])
    >>> print(vlm(inputs))
    """
    _dynamic_module_slot = 'chat'
    _dynamic_source_error = 'No source is configured for dynamic LLM source.'

    def __new__(cls, model: str = None, source: str = None, url: str = None, stream: bool = True,
                return_trace: bool = False, skip_auth: bool = False, type: Optional[str] = None,
                api_key: str = None, static_params: Optional[StaticParams] = None, id: Optional[str] = None,
                name: Optional[str] = None, group_id: Optional[str] = None, dynamic_auth: bool = False, **kwargs):
        model, source, url, kwargs = resolve_online_params(
            model, source, url, kwargs, url_aliases='base_url', source_registry=lazyllm.online.chat)
        if cls._should_use_dynamic(source, dynamic_auth, skip_auth):
            return super().__new__(cls)

        if source is None and api_key is not None:
            raise ValueError('No source is given but an api_key is provided.')
        source, default_key = select_source_with_default_key(lazyllm.online.chat, source, LLMType.CHAT)
        api_key = api_key if api_key is not None else default_key
        if skip_auth and not url:
            raise ValueError('url must be set for local serving.')

        type = cls._resolve_type_name(type, model, options=[LLMType.LLM, LLMType.CHAT, LLMType.VLM])
        return getattr(lazyllm.online.chat, source)(
            base_url=url, model=model, stream=stream, return_trace=return_trace,
            api_key=api_key, skip_auth=skip_auth, type=type, **kwargs)

    def __init__(self, model: str = None, source: str = None, url: str = None, stream: bool = True,
                 return_trace: bool = False, skip_auth: bool = False, type: Optional[str] = None,
                 api_key: str = None, static_params: Optional[StaticParams] = None, id: Optional[str] = None,
                 name: Optional[str] = None, group_id: Optional[str] = None, dynamic_auth: bool = False, **kwargs):
        model, source, url, kwargs = resolve_online_params(
            model, source, url, kwargs, url_aliases='base_url', source_registry=lazyllm.online.chat)
        normalized_type = self._resolve_type_name(type, model, options=[LLMType.LLM, LLMType.CHAT, LLMType.VLM])
        _DynamicSourceRouterMixin.__init__(self, id=id, name=name, group_id=group_id, return_trace=return_trace)
        LLMBase.__init__(self, stream=stream, type=normalized_type, static_params=static_params)
        self._kwargs = kwargs
        self._base_url = url
        self._model_name = model
        self._skip_auth = skip_auth
        self._init_dynamic_auth(api_key, dynamic_auth)

    def _build_supplier(self, source: str, skip_auth: bool):
        params = {
            'base_url': self._base_url, 'model': self._model_name, 'stream': self._stream, 'type': self._type,
            'static_params': self._static_params, 'skip_auth': skip_auth, 'api_key': self._api_key,
            'return_trace': self._return_trace, **self._kwargs}
        supplier = getattr(lazyllm.online.chat, source)(**params)
        supplier.prompt(getattr(self, '_prompt', None))
        supplier.formatter(getattr(self, '_formatter', None))
        return supplier

`lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoChat`

Bases: OnlineChatModuleBase

Doubao online chat module, inheriting from OnlineChatModuleBase.
Encapsulates the Doubao API (ByteDance) for multi-turn Q&A interactions. Defaults to model doubao-1-5-pro-32k-250115, supporting streaming and optional trace return.

Parameters:

model (str, default: None ) –

The model name to use. Defaults to doubao-1-5-pro-32k-250115.
base_url (str, default: None ) –

Base URL of the API, default is "https://ark.cn-beijing.volces.com/api/v3/".
api_key (Optional[str], default: None ) –

Doubao API key. If not provided, it is read from lazyllm.config['doubao_api_key'].
stream (bool, default: True ) –

Whether to enable streaming output. Defaults to True.
return_trace (bool, default: False ) –

Whether to return trace information. Defaults to False.
**kwargs –

Additional arguments passed to the base class OnlineChatModuleBase.

Source code in lazyllm/module/llms/onlinemodule/supplier/doubao.py

class DoubaoChat(OnlineChatModuleBase):
    """Doubao online chat module, inheriting from OnlineChatModuleBase.  
Encapsulates the Doubao API (ByteDance) for multi-turn Q&A interactions. Defaults to model `doubao-1-5-pro-32k-250115`, supporting streaming and optional trace return.

Args:
    model (str): The model name to use. Defaults to `doubao-1-5-pro-32k-250115`.
    base_url (str): Base URL of the API, default is "https://ark.cn-beijing.volces.com/api/v3/".
    api_key (Optional[str]): Doubao API key. If not provided, it is read from `lazyllm.config['doubao_api_key']`.
    stream (bool): Whether to enable streaming output. Defaults to True.
    return_trace (bool): Whether to return trace information. Defaults to False.
    **kwargs: Additional arguments passed to the base class OnlineChatModuleBase.
"""
    MODEL_NAME = 'doubao-1-5-pro-32k-250115'
    VLM_MODEL_PREFIX = ['doubao-seed-1-6-vision', 'doubao-1-5-ui-tars']

    def __init__(self, model: Optional[str] = None, base_url: Optional[str] = None,
                 api_key: str = None, stream: bool = True, return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://ark.cn-beijing.volces.com/api/v3/'
        super().__init__(api_key=api_key or self._default_api_key(), base_url=base_url,
                         model_name=model or lazyllm.config['doubao_model_name'] or DoubaoChat.MODEL_NAME,
                         stream=stream, return_trace=return_trace, **kwargs)

    def _get_system_prompt(self):
        return ('You are Doubao, an AI assistant. Your task is to provide appropriate responses '
                'and support to user\'s questions and requests.')

    def _validate_api_key(self):
        """Validate API Key by sending a minimal request"""
        try:
            # Doubao (Volcano Engine) validates API key using a minimal chat request
            data = {
                'model': self._model_name,
                'messages': [{'role': 'user', 'content': 'hi'}],
                'max_tokens': 1  # Only generate 1 token for validation
            }
            response = requests.post(self._chat_url, headers=self._header, json=data, timeout=10)
            return response.status_code == 200
        except Exception:
            return False

`lazyllm.module.llms.onlinemodule.supplier.ppio.PPIOChat`

Bases: OnlineChatModuleBase

PPIO (Paiou Cloud) online chat module, inheriting from OnlineChatModuleBase.
Encapsulates the PPIO API for multi-turn Q&A interactions. Defaults to model deepseek/deepseek-v3.2, supporting streaming and optional trace return. PPIO provides OpenAI-compatible API interface.

Parameters:

model (str, default: None ) –

The model name to use. Defaults to deepseek/deepseek-v3.2.
base_url (str, default: None ) –

Base URL of the API, default is "https://api.ppinfra.com/openai".
api_key (Optional[str], default: None ) –

PPIO API key. If not provided, it is read from lazyllm.config['ppio_api_key'].
stream (bool, default: True ) –

Whether to enable streaming output. Defaults to True.
return_trace (bool, default: False ) –

Whether to return trace information. Defaults to False.
**kwargs –

Additional arguments passed to the base class OnlineChatModuleBase.

Examples:

>>> import lazyllm
>>> # Set environment variable: export LAZYLLM_PPIO_API_KEY=your_api_key
>>> # Or create config file ~/.lazyllm/config.json: {"ppio_api_key": "your_api_key"}
>>> chat = lazyllm.OnlineChatModule(source='ppio', model='deepseek/deepseek-v3.2')
>>> response = chat('Hello, how are you?')
>>> print(response)

Source code in lazyllm/module/llms/onlinemodule/supplier/ppio.py

class PPIOChat(OnlineChatModuleBase):
    """PPIO (Paiou Cloud) online chat module, inheriting from OnlineChatModuleBase.  
Encapsulates the PPIO API for multi-turn Q&A interactions. Defaults to model `deepseek/deepseek-v3.2`, supporting streaming and optional trace return. PPIO provides OpenAI-compatible API interface.

Args:
    model (str): The model name to use. Defaults to `deepseek/deepseek-v3.2`.
    base_url (str): Base URL of the API, default is "https://api.ppinfra.com/openai".
    api_key (Optional[str]): PPIO API key. If not provided, it is read from `lazyllm.config['ppio_api_key']`.
    stream (bool): Whether to enable streaming output. Defaults to True.
    return_trace (bool): Whether to return trace information. Defaults to False.
    **kwargs: Additional arguments passed to the base class OnlineChatModuleBase.


Examples:
    >>> import lazyllm
    >>> # Set environment variable: export LAZYLLM_PPIO_API_KEY=your_api_key
    >>> # Or create config file ~/.lazyllm/config.json: {"ppio_api_key": "your_api_key"}
    >>> chat = lazyllm.OnlineChatModule(source='ppio', model='deepseek/deepseek-v3.2')
    >>> response = chat('Hello, how are you?')
    >>> print(response)
    """
    TRAINABLE_MODEL_LIST = []
    NO_PROXY = False

    # Initialize PPIO module.
    # Args:
    #     base_url: API base URL, defaults to 'https://api.ppinfra.com/openai'
    #     model: Model name, defaults to 'deepseek/deepseek-v3.2'
    #     api_key: API key, if not provided, will be read from config
    #     stream: Whether to use streaming output, defaults to True
    #     return_trace: Whether to return execution trace, defaults to False
    #     skip_auth: Whether to skip authentication, defaults to False
    #     **kw: Other parameters
    def __init__(self, base_url: Optional[str] = None, model: Optional[str] = None,
                 api_key: str = None, stream: bool = True,
                 return_trace: bool = False, skip_auth: bool = False, **kw):
        base_url = base_url or 'https://api.ppinfra.com/openai/'
        model = model or 'deepseek/deepseek-v3.2'
        super().__init__(api_key=api_key or self._default_api_key(), base_url=base_url,
                         model_name=model, stream=stream, return_trace=return_trace, skip_auth=skip_auth, **kw)

    # Return PPIO system prompt.
    def _get_system_prompt(self):
        return 'You are a helpful AI assistant.'

    # Validate API key by sending a minimal chat request.
    def _validate_api_key(self):
        try:
            data = {'model': self._model_name, 'messages': [{'role': 'user', 'content': 'hi'}], 'max_tokens': 1}
            response = requests.post(self._chat_url, headers=self._header, json=data, timeout=10)
            return response.status_code == 200
        except Exception:
            return False

    # Chat API URL - PPIO endpoint is /openai/chat/completions.
    def _get_chat_url(self, url):
        base = (url or '').rstrip('/')
        if base.endswith('/chat/completions'):
            return url
        if not base.endswith(('/openai', '/v1')):
            base = f'{base}/openai/'
        else:
            base = f'{base}/'
        return urljoin(base, 'chat/completions')

    # PPIO does not support deployment, return model name and running status.
    def _create_deployment(self) -> Tuple[str, str]:
        return (self._model_name, 'RUNNING')

    # PPIO does not support deployment query, return running status.
    def _query_deployment(self, deployment_id) -> str:
        return 'RUNNING'

    def __repr__(self):
        return lazyllm.make_repr('Module', 'PPIO', name=self._model_name, url=self._base_url,
                                 stream=bool(self._stream), return_trace=self._return_trace)

`lazyllm.module.OnlineEmbeddingModule`

Bases: _DynamicSourceRouterMixin

Used to manage and create online Embedding service modules currently on the market, currently supporting openai, sensenova, glm, qwen, doubao.

Parameters:

model (str, default: None ) –

Specify the model to access (Note that you need to use Model ID or Endpoint ID when using Doubao. For details on how to obtain it, see Getting the Inference Access Point. Before using the model, you must first activate the corresponding service on the Doubao platform.), default is text-embedding-ada-002(openai) / nova-embedding-stable(sensenova) / embedding-2(glm) / text-embedding-v1(qwen) / doubao-embedding-text-240715(doubao). The aliases embed_model_name and model_name are also accepted. A recognised source name can be passed here too; it will be automatically swapped into source.
source (str, default: None ) –

Specify the type of module to create. Options are openai / sensenova / glm / qwen / doubao.
url (str, default: None ) –

Specify the base link of the platform to be accessed. The default is the official link. The aliases embed_url and base_url are also accepted.
type (str, default: None ) –

Service type, either embed or rerank. Inferred from the model name when omitted.
api_key (str, default: None ) –

You can pass an explicit API key. If set to auto or dynamic, the key is resolved from config at runtime, enabling dynamic key switching.
dynamic_auth (bool, default: False ) –

Whether to enable dynamic auth. When True, it is equivalent to api_key='dynamic'.
return_trace (bool, default: False ) –

Whether to record the results in trace. Defaults to False.
batch_size (int, default: 32 ) –

Batch size for bulk requests. Defaults to 32.

Examples:

>>> import lazyllm
>>> m = lazyllm.OnlineEmbeddingModule(source="sensenova")
>>> emb = m("hello world")
>>> print(f"emb: {emb}")
emb: [0.0010528564, 0.0063285828, 0.0049476624, -0.012008667, ..., -0.009124756, 0.0032043457, -0.051696777]
>>> m2 = lazyllm.OnlineEmbeddingModule("sensenova")
>>> emb2 = m2("hello world")

Source code in lazyllm/module/llms/onlinemodule/embedding.py

class OnlineEmbeddingModule(_DynamicSourceRouterMixin, metaclass=__EmbedModuleMeta):
    """Used to manage and create online Embedding service modules currently on the market, currently supporting openai, sensenova, glm, qwen, doubao.

Args:
    model (str): Specify the model to access (Note that you need to use Model ID or Endpoint ID when using Doubao. For details on how to obtain it, see [Getting the Inference Access Point](https://www.volcengine.com/docs/82379/1099522). Before using the model, you must first activate the corresponding service on the Doubao platform.), default is ``text-embedding-ada-002(openai)`` / ``nova-embedding-stable(sensenova)`` / ``embedding-2(glm)`` / ``text-embedding-v1(qwen)`` / ``doubao-embedding-text-240715(doubao)``. The aliases ``embed_model_name`` and ``model_name`` are also accepted. A recognised source name can be passed here too; it will be automatically swapped into ``source``.
    source (str): Specify the type of module to create. Options are  ``openai`` /  ``sensenova`` /  ``glm`` /  ``qwen`` / ``doubao``.
    url (str): Specify the base link of the platform to be accessed. The default is the official link. The aliases ``embed_url`` and ``base_url`` are also accepted.
    type (str): Service type, either ``embed`` or ``rerank``. Inferred from the model name when omitted.
    api_key (str): You can pass an explicit API key. If set to ``auto`` or ``dynamic``, the key is resolved from config at runtime, enabling dynamic key switching.
    dynamic_auth (bool): Whether to enable dynamic auth. When True, it is equivalent to ``api_key='dynamic'``.
    return_trace (bool): Whether to record the results in trace. Defaults to False.
    batch_size (int): Batch size for bulk requests. Defaults to 32.


Examples:
    >>> import lazyllm
    >>> m = lazyllm.OnlineEmbeddingModule(source="sensenova")
    >>> emb = m("hello world")
    >>> print(f"emb: {emb}")
    emb: [0.0010528564, 0.0063285828, 0.0049476624, -0.012008667, ..., -0.009124756, 0.0032043457, -0.051696777]
    >>> m2 = lazyllm.OnlineEmbeddingModule("sensenova")
    >>> emb2 = m2("hello world")
    """
    _dynamic_module_slot = 'embed'
    _dynamic_source_error = 'No source is configured for dynamic embedding source.'

    @staticmethod
    def _resolve_type_name(type_name: Optional[str], embed_model_name: Optional[str]) -> str:
        if type_name is not None:
            if type_name == LLMType.CROSS_MODAL_EMBED:
                return 'cross_modal_embed'
            return type_name
        resolved = get_model_type(embed_model_name) if embed_model_name else 'embed'
        return resolved if resolved in ('embed', 'rerank', 'cross_modal_embed') else 'embed'

    @staticmethod
    def _create_supplier(source: str, type_name: str, embed_model_name: str, params: dict):
        if type_name == 'cross_modal_embed':
            if source == 'doubao':
                return DoubaoMultimodalEmbed(**params)
            if source == 'qwen':
                return QwenMultimodalEmbed(**params)
            if source == 'siliconflow':
                return SiliconFlowMultimodalEmbed(**params)
            # OpenAI-compatible self-hosted cross-modal embedding (e.g. siglip via source=openai).
            if source in lazyllm.online.embed:
                return getattr(lazyllm.online.embed, source)(**params)
            raise ValueError(f'Source {source!r} does not support CROSS_MODAL_EMBED.')
        if type_name == 'embed':
            if source == 'doubao' and embed_model_name and embed_model_name.startswith('doubao-embedding-vision'):
                return DoubaoMultimodalEmbed(**params)
            if source == 'qwen' and _is_qwen_multimodal_embed_model(embed_model_name):
                return QwenMultimodalEmbed(**params)
            if source == 'siliconflow' and _is_siliconflow_multimodal_embed_model(embed_model_name):
                return SiliconFlowMultimodalEmbed(**params)
            if source == 'doubao':
                return DoubaoEmbed(**params)
            return getattr(lazyllm.online.embed, source)(**params)
        if type_name == 'rerank':
            return getattr(lazyllm.online.rerank, source)(**params)
        raise ValueError('Unknown type of online embedding module.')

    @staticmethod
    def _is_embed_source(name: str) -> bool:
        return name in lazyllm.online.embed or name in lazyllm.online.rerank

    def __new__(cls, model: str = None, source: str = None, url: str = None,
                return_trace: bool = False, api_key: str = None, dynamic_auth: bool = False,
                skip_auth: bool = False, id: Optional[str] = None, name: Optional[str] = None,
                group_id: Optional[str] = None, type: Optional[str] = None, batch_size: int = 32,
                num_worker: int = 4, **kwargs):
        model, source, url, kwargs = resolve_online_params(
            model, source, url, kwargs,
            model_aliases=('embed_model_name', 'model_name'), url_aliases=('embed_url', 'base_url'),
            source_registry=OnlineEmbeddingModule._is_embed_source)
        if cls._should_use_dynamic(source, dynamic_auth, skip_auth):
            return super().__new__(cls)
        if source is None and api_key is not None:
            raise ValueError('No source is given but an api_key is provided.')
        type_name = OnlineEmbeddingModule._resolve_type_name(type, model)
        if type_name in ('embed', 'cross_modal_embed'):
            source, default_key = select_source_with_default_key(lazyllm.online.embed, source, LLMType.EMBED)
        elif type_name == 'rerank':
            source, default_key = select_source_with_default_key(lazyllm.online.rerank, source, LLMType.RERANK)
        else:
            raise ValueError('Unknown type of online embedding module.')
        api_key = api_key if api_key is not None else default_key
        if skip_auth and not url:
            raise ValueError('url must be set for local serving.')
        params = {'embed_url': url, 'embed_model_name': model, 'return_trace': return_trace,
                  'batch_size': batch_size, 'num_worker': num_worker,
                  'api_key': api_key, 'skip_auth': skip_auth, **kwargs}
        return OnlineEmbeddingModule._create_supplier(source, type_name, model, params)

    def __init__(self, model: str = None, source: str = None, url: str = None,
                 return_trace: bool = False, api_key: str = None, dynamic_auth: bool = False,
                 skip_auth: bool = False, id: Optional[str] = None, name: Optional[str] = None,
                 group_id: Optional[str] = None, type: Optional[str] = None, batch_size: int = 32,
                 num_worker: int = 4, **kwargs):
        model, source, url, kwargs = resolve_online_params(
            model, source, url, kwargs,
            model_aliases=('embed_model_name', 'model_name'), url_aliases=('embed_url', 'base_url'),
            source_registry=OnlineEmbeddingModule._is_embed_source)
        _DynamicSourceRouterMixin.__init__(self, id=id, name=name, group_id=group_id, return_trace=return_trace)
        self._embed_url = url
        self._embed_model_name = model
        if source == 'dynamic' and type is None:
            raise ValueError('type must be explicitly provided when source is dynamic.')
        self._type = OnlineEmbeddingModule._resolve_type_name(type, model)
        self._skip_auth = skip_auth
        self._kwargs = kwargs
        self._kwargs.setdefault('num_worker', num_worker)
        self._batch_size = batch_size
        self._init_dynamic_auth(api_key, dynamic_auth)

    def _build_supplier(self, source: str, skip_auth: bool):
        params = {'embed_url': self._embed_url, 'embed_model_name': self._embed_model_name,
                  'return_trace': self._return_trace, 'batch_size': self._batch_size,
                  'api_key': self._api_key, 'skip_auth': skip_auth, **self._kwargs}
        return OnlineEmbeddingModule._create_supplier(source, self._type, self._embed_model_name, params)

`lazyllm.module.OnlineMultiModalModule`

Bases: _DynamicSourceRouterMixin

Used to manage and create online multimodal service modules. Supported task types are stt / tts / text2image / image_editing.

Parameters:

model (str, default: None ) –

Model name to use.
source (str, default: None ) –

Supplier to use, such as qwen / glm / minimax / siliconflow / doubao.
type (str, default: None ) –

Multimodal task type, one of stt / tts / text2image / image_editing.
url (str, default: None ) –

Base URL of the platform. Defaults to each supplier's official endpoint. The alias base_url is also accepted.
api_key (str, default: None ) –

You can pass an explicit API key. If set to auto or dynamic, the key is resolved from config at runtime, enabling dynamic key switching.
dynamic_auth (bool, default: False ) –

Whether to enable dynamic auth. When True, it is equivalent to api_key='dynamic'.
return_trace (bool, default: False ) –

Whether to record the result in trace. Defaults to False.

Examples:

>>> import lazyllm
>>> stt = lazyllm.OnlineMultiModalModule(source='qwen', type='stt', api_key='dynamic')
>>> tts = lazyllm.OnlineMultiModalModule(source='qwen', type='tts', dynamic_auth=True)
>>> img = lazyllm.OnlineMultiModalModule(source='qwen', type='text2image')

Source code in lazyllm/module/llms/onlinemodule/multimodal.py

class OnlineMultiModalModule(_DynamicSourceRouterMixin, metaclass=_OnlineMultiModalMeta):
    """Used to manage and create online multimodal service modules. Supported task types are ``stt`` / ``tts`` / ``text2image`` / ``image_editing``.

Args:
    model (str): Model name to use.
    source (str): Supplier to use, such as ``qwen`` / ``glm`` / ``minimax`` / ``siliconflow`` / ``doubao``.
    type (str): Multimodal task type, one of ``stt`` / ``tts`` / ``text2image`` / ``image_editing``.
    url (str): Base URL of the platform. Defaults to each supplier's official endpoint. The alias ``base_url`` is also accepted.
    api_key (str): You can pass an explicit API key. If set to ``auto`` or ``dynamic``, the key is resolved from config at runtime, enabling dynamic key switching.
    dynamic_auth (bool): Whether to enable dynamic auth. When True, it is equivalent to ``api_key='dynamic'``.
    return_trace (bool): Whether to record the result in trace. Defaults to False.


Examples:
    >>> import lazyllm
    >>> stt = lazyllm.OnlineMultiModalModule(source='qwen', type='stt', api_key='dynamic')
    >>> tts = lazyllm.OnlineMultiModalModule(source='qwen', type='tts', dynamic_auth=True)
    >>> img = lazyllm.OnlineMultiModalModule(source='qwen', type='text2image')
    """
    _dynamic_module_slot = 'multimodal'
    _dynamic_source_error = 'No source is configured for dynamic multimodal source.'
    TYPE_GROUP_MAP = {
        'stt': LLMType.STT,
        'tts': LLMType.TTS,
        'text2image': LLMType.TEXT2IMAGE,
        'image_editing': LLMType.TEXT2IMAGE,
    }

    @staticmethod
    def _resolve_type_name(type_name: Optional[str], model: Optional[str]) -> str:
        if type_name is not None:
            return LLMType._normalize(type_name)
        resolved = get_model_type(model) if model else None
        if resolved == 'sd':
            return 'text2image'
        if resolved not in OnlineMultiModalModule.TYPE_GROUP_MAP:
            raise ValueError(
                f'Cannot infer multimodal type from model {model!r}. '
                f'Please provide `type` explicitly (one of: {list(OnlineMultiModalModule.TYPE_GROUP_MAP.keys())}).')
        return resolved

    @staticmethod
    def _validate_parameters(source: Optional[str], model: Optional[str], type: str, url: Optional[str],
                             skip_auth: bool = False, **kwargs) -> tuple:
        if type not in OnlineMultiModalModule.TYPE_GROUP_MAP:
            raise ValueError(
                f'Invalid type: {type!r}. Must be one of: {list(OnlineMultiModalModule.TYPE_GROUP_MAP.keys())}')
        register_type = OnlineMultiModalModule.TYPE_GROUP_MAP.get(type).lower()
        model, source, url, kwargs = resolve_online_params(
            model, source, url, kwargs, source_registry=lazyllm.online[register_type])
        source, default_key = select_source_with_default_key(lazyllm.online[register_type], source, type)
        if default_key and not kwargs.get('api_key'):
            kwargs['api_key'] = default_key
        if skip_auth and not url:
            raise ValueError('url must be set for local serving.')
        default_module_cls = getattr(lazyllm.online[register_type], source)
        default_model_name = getattr(default_module_cls, 'IMAGE_EDITING_MODEL_NAME' if type == 'image_editing'
                                     else 'MODEL_NAME', None)
        if model is None and default_model_name:
            model = default_model_name
            lazyllm.LOG.info(f'For type {type}, source {source}. Automatically selected default model: {model}')
        if url is not None:
            kwargs['base_url'] = url
        return source, model, kwargs

    def __new__(cls, model: str = None, source: str = None, url: str = None, type: str = None,
                return_trace: bool = False, api_key: str = None, dynamic_auth: bool = False,
                skip_auth: bool = False, id: Optional[str] = None, name: Optional[str] = None,
                group_id: Optional[str] = None, **kwargs):
        model, source, url, kwargs = resolve_online_params(
            model, source, url, kwargs, url_aliases='base_url')
        if cls._should_use_dynamic(source, dynamic_auth, skip_auth):
            return super().__new__(cls)
        if source is None and api_key is not None:
            raise ValueError('No source is given but an api_key is provided.')
        if api_key is not None:
            kwargs['api_key'] = api_key
        type = OnlineMultiModalModule._resolve_type_name(
            type if type is not None else kwargs.pop('function', None), model)
        source, model, kwargs_normalized = OnlineMultiModalModule._validate_parameters(
            source=source, model=model, type=type, url=url, skip_auth=skip_auth, **kwargs)
        params = {'return_trace': return_trace, 'type': type}
        if model is not None:
            params['model'] = model
        params.update(kwargs_normalized)
        register_type = OnlineMultiModalModule.TYPE_GROUP_MAP.get(type).lower()
        return getattr(lazyllm.online[register_type], source)(**params)

    def __init__(self, model: str = None, source: str = None, url: str = None, type: str = None,
                 return_trace: bool = False, api_key: str = None, dynamic_auth: bool = False,
                 skip_auth: bool = False, id: Optional[str] = None, name: Optional[str] = None,
                 group_id: Optional[str] = None, **kwargs):
        model, source, url, kwargs = resolve_online_params(
            model, source, url, kwargs, url_aliases='base_url')
        _DynamicSourceRouterMixin.__init__(self, id=id, name=name, group_id=group_id, return_trace=return_trace)
        self._model_name = model
        self._base_url = url
        self._skip_auth = skip_auth
        self._type = self._resolve_type_name(type, model)
        self._kwargs = kwargs
        self._init_dynamic_auth(api_key, dynamic_auth)

    def _build_supplier(self, source: str, skip_auth: bool):
        params = {'base_url': self._base_url, 'model': self._model_name, 'return_trace': self._return_trace,
                  'type': self._type, 'api_key': self._api_key, 'skip_auth': skip_auth, **self._kwargs}
        register_type = OnlineMultiModalModule.TYPE_GROUP_MAP.get(self._type).lower()
        return getattr(lazyllm.online[register_type], source)(**params)

`lazyllm.module.llms.onlinemodule.supplier.openai.OpenAIEmbed`

Bases: LazyLLMOnlineEmbedModuleBase

Online embedding module using OpenAI. This class wraps the OpenAI Embedding API, defaulting to the text-embedding-ada-002 model, and converts text into vector representations.

Parameters:

embed_url (str, default: None ) –

The URL endpoint of the OpenAI embedding API. Default is "https://api.openai.com/v1/embeddings".
embed_model_name (str, default: None ) –

The name of the embedding model to use. Default is "text-embedding-ada-002".
api_key (str, default: None ) –

The OpenAI API key. If not provided, it will be read from lazyllm.config.

Source code in lazyllm/module/llms/onlinemodule/supplier/openai.py

class OpenAIEmbed(LazyLLMOnlineEmbedModuleBase):
    """Online embedding module using OpenAI.
This class wraps the OpenAI Embedding API, defaulting to the `text-embedding-ada-002` model, and converts text into vector representations.

Args:
    embed_url (str): The URL endpoint of the OpenAI embedding API. Default is "https://api.openai.com/v1/embeddings".
    embed_model_name (str): The name of the embedding model to use. Default is "text-embedding-ada-002".
    api_key (str, optional): The OpenAI API key. If not provided, it will be read from `lazyllm.config`.
"""
    NO_PROXY = True

    def __init__(self, embed_url: Optional[str] = None, embed_model_name: Optional[str] = None,
                 api_key: str = None, batch_size: int = 16, **kw):
        embed_url = embed_url or 'https://api.openai.com/v1/'
        embed_model_name = embed_model_name or 'text-embedding-ada-002'
        super().__init__(embed_url, api_key or self._default_api_key(),
                         embed_model_name, batch_size=batch_size, **kw)

    def _set_embed_url(self):
        self._embed_url = urljoin(self._embed_url, 'embeddings')

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenSTT`

Bases: LazyLLMOnlineSTTModuleBase

Speech-to-Text (STT) module based on Qwen's multimodal API, with paraformer-v2 as the default model.

Parameters:

model (str, default: None ) –

Model name. Defaults to None, in which case it will use lazyllm.config['qwen_stt_model_name'] or QwenSTT.MODEL_NAME.
api_key (str, default: None ) –

API key for Qwen service. Defaults to None.
return_trace (bool, default: False ) –

Whether to return intermediate trace information during inference. Defaults to False.
**kwargs –

Additional parameters passed to the parent class LazyLLMOnlineSTTModuleBase.

Source code in lazyllm/module/llms/onlinemodule/supplier/qwen.py

class QwenSTT(LazyLLMOnlineSTTModuleBase):
    """Speech-to-Text (STT) module based on Qwen's multimodal API, with ``paraformer-v2`` as the default model.

Args:
    model (str): Model name. Defaults to ``None``, in which case it will use ``lazyllm.config['qwen_stt_model_name']`` or ``QwenSTT.MODEL_NAME``.
    api_key (str): API key for Qwen service. Defaults to ``None``.
    return_trace (bool): Whether to return intermediate trace information during inference. Defaults to ``False``.
    **kwargs: Additional parameters passed to the parent class ``LazyLLMOnlineSTTModuleBase``.
"""
    MODEL_NAME = 'paraformer-v2'

    def __init__(self, model: str = None, api_key: str = None, return_trace: bool = False,
                 base_url: Optional[str] = None,
                 base_websocket_url: Optional[str] = None, **kwargs):
        _ensure_dashscope_urls_initialized()
        base_url = base_url or _DASHSCOPE_DEFAULT_HTTP_URL
        base_websocket_url = base_websocket_url or _DASHSCOPE_DEFAULT_WEBSOCKET_URL
        if base_url and base_url != _DASHSCOPE_DEFAULT_HTTP_URL:
            LOG.warning('QwenSTT ignores `base_url`; use `set_dashscope_urls` instead.')
        if base_websocket_url and base_websocket_url != _DASHSCOPE_DEFAULT_WEBSOCKET_URL:
            LOG.warning('QwenSTT ignores `base_websocket_url`; use `set_dashscope_urls` instead.')
        model_name = model or lazyllm.config['qwen_stt_model_name'] or QwenSTT.MODEL_NAME
        super().__init__(
            api_key=api_key or self._default_api_key(),
            model_name=model_name,
            return_trace=return_trace,
            base_url=base_url,
            **kwargs,
        )

    def _forward(self, files: List[str] = [], url: str = None, model: str = None, **kwargs):  # noqa B006
        assert any(file.startswith('http') for file in files), 'QwenSTT only supports http file urls'
        if url and url != self._base_url:
            raise Exception('Qwen STT forward() does not support overriding the `url` parameter, please remove it.')
        if 'base_websocket_url' in kwargs:
            raise Exception('Qwen STT forward() does not support overriding the `base_websocket_url` parameter.')
        call_params = {'model': model, 'file_urls': files, **kwargs}
        if self._api_key: call_params['api_key'] = self._api_key
        task_response = dashscope.audio.asr.Transcription.async_call(**call_params)
        transcribe_response = dashscope.audio.asr.Transcription.wait(task=task_response.output.task_id,
                                                                     api_key=self._api_key)
        if transcribe_response.status_code == HTTPStatus.OK:
            result_text = ''
            for task in transcribe_response.output.results:
                assert task['subtask_status'] == 'SUCCEEDED', 'subtask_status is not SUCCEEDED'
                response = json.loads(requests.get(task['transcription_url']).text)
                for transcript in response['transcripts']:
                    result_text += re.sub(r'<[^>]+>', '', transcript['text'])
            return result_text
        else:
            LOG.error(f'failed to transcribe: {transcribe_response.output}')
            raise Exception(f'failed to transcribe: {transcribe_response.output.message}')

`lazyllm.module.OnlineChatModuleBase = LazyLLMOnlineChatModuleBase` `module-attribute`

`lazyllm.module.OnlineEmbeddingModuleBase`

Bases: LazyLLMOnlineBase

OnlineEmbeddingModuleBase is the base class for managing embedding model interfaces on open platforms, used for requesting text to obtain embedding vectors. It is not recommended to directly instantiate this class. Specific platform classes should inherit from this class for instantiation.

If you need to support the capabilities of embedding models on a new open platform, please extend your custom class from OnlineEmbeddingModuleBase:

If the request and response data formats of the new platform's embedding model are the same as OpenAI's, no additional processing is needed; simply pass the URL and model.
If the request or response data formats of the new platform's embedding model differ from OpenAI's, you need to override the _encapsulated_data or _parse_response methods.
Configure the api_key supported by the new platform as a global variable by using lazyllm.config.add(variable_name, type, default_value, environment_variable_name) .

Parameters:

embed_url (str) –

Embedding API URL address.
api_key (str) –

API access key.
embed_model_name (str) –

Embedding model name.
return_trace (bool, default: False ) –

Whether to return trace information, defaults to False.

Examples:

>>> import lazyllm
>>> from lazyllm.module import OnlineEmbeddingModuleBase
>>> class NewPlatformEmbeddingModule(OnlineEmbeddingModuleBase):
...     def __init__(self,
...                 embed_url: str = '<new platform embedding url>',
...                 embed_model_name: str = '<new platform embedding model name>'):
...         super().__init__(embed_url, lazyllm.config['new_platform_api_key'], embed_model_name)
...
>>> class NewPlatformEmbeddingModule1(OnlineEmbeddingModuleBase):
...     def __init__(self,
...                 embed_url: str = '<new platform embedding url>',
...                 embed_model_name: str = '<new platform embedding model name>'):
...         super().__init__(embed_url, lazyllm.config['new_platform_api_key'], embed_model_name)
...
...     def _encapsulated_data(self, text:str, **kwargs):
...         pass
...         return json_data
...
...     def _parse_response(self, response: dict[str, any]):
...         pass
...         return embedding

Source code in lazyllm/module/llms/onlinemodule/base/onlineEmbeddingModuleBase.py

class OnlineEmbeddingModuleBase(LazyLLMOnlineBase):
    """OnlineEmbeddingModuleBase is the base class for managing embedding model interfaces on open platforms, used for requesting text to obtain embedding vectors. It is not recommended to directly instantiate this class. Specific platform classes should inherit from this class for instantiation.


If you need to support the capabilities of embedding models on a new open platform, please extend your custom class from OnlineEmbeddingModuleBase:

1. If the request and response data formats of the new platform's embedding model are the same as OpenAI's, no additional processing is needed; simply pass the URL and model.
2. If the request or response data formats of the new platform's embedding model differ from OpenAI's, you need to override the _encapsulated_data or _parse_response methods.
3. Configure the api_key supported by the new platform as a global variable by using ``lazyllm.config.add(variable_name, type, default_value, environment_variable_name)`` .

Args:
    embed_url (str): Embedding API URL address.
    api_key (str): API access key.
    embed_model_name (str): Embedding model name.
    return_trace (bool, optional): Whether to return trace information, defaults to False.


Examples:
    >>> import lazyllm
    >>> from lazyllm.module import OnlineEmbeddingModuleBase
    >>> class NewPlatformEmbeddingModule(OnlineEmbeddingModuleBase):
    ...     def __init__(self,
    ...                 embed_url: str = '<new platform embedding url>',
    ...                 embed_model_name: str = '<new platform embedding model name>'):
    ...         super().__init__(embed_url, lazyllm.config['new_platform_api_key'], embed_model_name)
    ...
    >>> class NewPlatformEmbeddingModule1(OnlineEmbeddingModuleBase):
    ...     def __init__(self,
    ...                 embed_url: str = '<new platform embedding url>',
    ...                 embed_model_name: str = '<new platform embedding model name>'):
    ...         super().__init__(embed_url, lazyllm.config['new_platform_api_key'], embed_model_name)
    ...
    ...     def _encapsulated_data(self, text:str, **kwargs):
    ...         pass
    ...         return json_data
    ...
    ...     def _parse_response(self, response: dict[str, any]):
    ...         pass
    ...         return embedding
    """
    NO_PROXY = True
    __lazyllm_registry_disable__ = True
    RAW_URL_PREFIX = '!'
    _EMBED_SUFFIXES = ('embeddings', 'embed', 'sparse_embed', 'rerank')

    def __init__(self, embed_url: str, api_key: str, embed_model_name: str, skip_auth: bool = False,
                 return_trace: bool = False, batch_size: int = 32, num_worker: int = 1, timeout: int = 60):
        super().__init__(api_key=api_key, skip_auth=skip_auth, return_trace=return_trace)
        self._embed_url = embed_url
        self._embed_model_name = embed_model_name
        self._batch_size = batch_size
        self._num_worker = num_worker
        self._timeout = timeout

        if ModelManager.get_model_type(embed_model_name) == 'rerank':
            self._batch_size = 1
        else:
            self._batch_size = batch_size

        if hasattr(self, '_set_embed_url'): self._set_embed_url()

    @property
    def type(self):
        return 'EMBED'

    def _normalize_embed_url(self, url: str) -> Tuple[str, bool]:
        if url.startswith(self.RAW_URL_PREFIX):
            return url[len(self.RAW_URL_PREFIX):], True
        if any(url.rstrip('/').endswith(s) for s in self._EMBED_SUFFIXES):
            return url, True
        return url, False

    def _get_embed_url(self, url: str) -> str:
        url, done = self._normalize_embed_url(url)
        if done: return url
        suffix = 'rerank' if self.type == 'RERANK' else 'embeddings'
        if not url.endswith('/'): url += '/'
        return urljoin(url, suffix)

    @property
    def batch_size(self):
        return self._batch_size

    @batch_size.setter
    def batch_size(self, value: int):
        self._batch_size = value

    def forward(self, input: Union[List, str], url: str = None, model: str = None, **kwargs
                ) -> Union[List[float], List[List[float]]]:
        model, _, url, kwargs = resolve_online_params(
            model, None, url, kwargs,
            model_aliases=('model_name', 'embed_model_name', 'embed_name'),
            url_aliases=('base_url', 'embed_url'))
        runtime_url = self._get_embed_url(url) if url else self._embed_url
        runtime_model = model or self._embed_model_name

        if runtime_model is not None:
            kwargs['model'] = runtime_model
        data = self._encapsulated_data(input, **kwargs)
        proxies = {'http': None, 'https': None} if self.NO_PROXY else None
        if isinstance(data, list):
            return self.run_embed_batch(input, data, proxies, runtime_url, **kwargs)
        else:
            with requests.post(runtime_url, json=data, headers=self._header, proxies=proxies,
                               timeout=self._timeout) as r:
                if r.status_code == 200:
                    return self._parse_response(r.json(), input=input)
                else:
                    err_body = '\n'.join([c.decode('utf-8') for c in r.iter_content(None)])
                    LOG.error(f'[OnlineEmbeddingModuleBase] HTTP {r.status_code} url={runtime_url!r} body={err_body!r}')
                    raise requests.RequestException(err_body)

    def _encapsulated_data(self, input: Union[List, str], **kwargs):
        if isinstance(input, str):
            json_data = {
                'input': [input],
                'model': self._embed_model_name
            }
            if len(kwargs) > 0:
                json_data.update(kwargs)
            return json_data
        else:
            text_batch = [input[i: i + self._batch_size] for i in range(0, len(input), self._batch_size)]
            json_data = [{'input': texts, 'model': self._embed_model_name} for texts in text_batch]
            if len(kwargs) > 0:
                for i in range(len(json_data)):
                    json_data[i].update(kwargs)
            return json_data

    def _parse_response(self, response: Dict, input: Union[List, str]) -> Union[List[List[float]], List[float]]:
        data = response.get('data', [])
        if not data:
            raise Exception('no data received')
        if isinstance(input, str):
            return data[0].get('embedding', [])
        else:
            return [res.get('embedding', []) for res in data]

    def run_embed_batch(self, input: List, data: List, proxies, url: str = None, **kwargs):
        """Internal method for executing batch embedding processing.

This method handles batch text embedding requests, supporting both single-threaded 
and multi-threaded processing modes. It automatically adjusts batch size and retries 
on request failures, providing robust error handling mechanisms.

Args:
    input (List): Original input text list
    data (List): Encapsulated batch request data list
    proxies: Proxy settings, set to None if NO_PROXY is True
    url (str, optional): Full endpoint URL used for this request, default to be self._embed_url
    **kwargs: Additional keyword arguments

**Returns:**

- A list of embedding vector lists, each sublist corresponds to an input text's embedding vector
"""
        ret = [[] for _ in range(len(input))]
        flag = False
        url = url or self._embed_url
        if self._num_worker == 1:
            with requests.Session() as session:
                while not flag:
                    for i in range(len(data)):
                        r = session.post(url, json=data[i], headers=self._header,
                                         proxies=proxies, timeout=self._timeout)
                        if r.status_code == 200:
                            vec = self._parse_response(r.json(), input=input)
                            start = i * self._batch_size
                            ret[start: start + len(vec)] = vec
                            if i == len(data) - 1:
                                flag = True
                        else:
                            error_msg = '\n'.join([c.decode('utf-8') for c in r.iter_content(None)])
                            if self._batch_size == 1 or r.status_code in [401, 429]:
                                raise requests.RequestException(error_msg)
                            else:
                                msg = f'Online embedding:{self._embed_model_name} post failed, adjust batch_size: '
                                msg = msg + f' from {self._batch_size} to {max(self._batch_size // 2, 1)}'
                                LOG.warning(msg)
                                self._batch_size = max(self._batch_size // 2, 1)
                                data = self._encapsulated_data(input, **kwargs)
                                break
        else:
            with ThreadPoolExecutor(max_workers=self._num_worker) as executor:
                while not flag:
                    futures = [executor.submit(requests.post, url, json=t, headers=self._header,
                                               proxies=proxies, timeout=self._timeout) for t in data]
                    fut_to_index = {fut: idx for idx, fut in enumerate(futures)}
                    for fut in as_completed(futures):
                        r = fut.result()
                        i = fut_to_index.pop(fut)
                        if r.status_code == 200:
                            vec = self._parse_response(r.json(), input=input)
                            start = i * self._batch_size
                            ret[start: start + len(vec)] = vec
                            if len(fut_to_index) == 0:
                                flag = True
                        else:
                            wait(futures)
                            error_msg = '\n'.join([c.decode('utf-8') for c in r.iter_content(None)])
                            if self._batch_size == 1 or r.status_code in [401, 429]:
                                raise requests.RequestException(error_msg)
                            else:
                                msg = f'Online embedding:{self._embed_model_name} post failed, adjust batch_size: '
                                msg = msg + f' from {self._batch_size} to {max(self._batch_size // 2, 1)}'
                                LOG.warning(msg)
                                self._batch_size = max(self._batch_size // 2, 1)
                                data = self._encapsulated_data(input, **kwargs)
                                break
        return ret

`run_embed_batch(input, data, proxies, url=None, **kwargs)`

Internal method for executing batch embedding processing.

This method handles batch text embedding requests, supporting both single-threaded and multi-threaded processing modes. It automatically adjusts batch size and retries on request failures, providing robust error handling mechanisms.

Parameters:

input (List) –

Original input text list
data (List) –

Encapsulated batch request data list
proxies –

Proxy settings, set to None if NO_PROXY is True
url (str, default: None ) –

Full endpoint URL used for this request, default to be self._embed_url
**kwargs –

Additional keyword arguments

Returns:

A list of embedding vector lists, each sublist corresponds to an input text's embedding vector

Source code in lazyllm/module/llms/onlinemodule/base/onlineEmbeddingModuleBase.py

    def run_embed_batch(self, input: List, data: List, proxies, url: str = None, **kwargs):
        """Internal method for executing batch embedding processing.

This method handles batch text embedding requests, supporting both single-threaded 
and multi-threaded processing modes. It automatically adjusts batch size and retries 
on request failures, providing robust error handling mechanisms.

Args:
    input (List): Original input text list
    data (List): Encapsulated batch request data list
    proxies: Proxy settings, set to None if NO_PROXY is True
    url (str, optional): Full endpoint URL used for this request, default to be self._embed_url
    **kwargs: Additional keyword arguments

**Returns:**

- A list of embedding vector lists, each sublist corresponds to an input text's embedding vector
"""
        ret = [[] for _ in range(len(input))]
        flag = False
        url = url or self._embed_url
        if self._num_worker == 1:
            with requests.Session() as session:
                while not flag:
                    for i in range(len(data)):
                        r = session.post(url, json=data[i], headers=self._header,
                                         proxies=proxies, timeout=self._timeout)
                        if r.status_code == 200:
                            vec = self._parse_response(r.json(), input=input)
                            start = i * self._batch_size
                            ret[start: start + len(vec)] = vec
                            if i == len(data) - 1:
                                flag = True
                        else:
                            error_msg = '\n'.join([c.decode('utf-8') for c in r.iter_content(None)])
                            if self._batch_size == 1 or r.status_code in [401, 429]:
                                raise requests.RequestException(error_msg)
                            else:
                                msg = f'Online embedding:{self._embed_model_name} post failed, adjust batch_size: '
                                msg = msg + f' from {self._batch_size} to {max(self._batch_size // 2, 1)}'
                                LOG.warning(msg)
                                self._batch_size = max(self._batch_size // 2, 1)
                                data = self._encapsulated_data(input, **kwargs)
                                break
        else:
            with ThreadPoolExecutor(max_workers=self._num_worker) as executor:
                while not flag:
                    futures = [executor.submit(requests.post, url, json=t, headers=self._header,
                                               proxies=proxies, timeout=self._timeout) for t in data]
                    fut_to_index = {fut: idx for idx, fut in enumerate(futures)}
                    for fut in as_completed(futures):
                        r = fut.result()
                        i = fut_to_index.pop(fut)
                        if r.status_code == 200:
                            vec = self._parse_response(r.json(), input=input)
                            start = i * self._batch_size
                            ret[start: start + len(vec)] = vec
                            if len(fut_to_index) == 0:
                                flag = True
                        else:
                            wait(futures)
                            error_msg = '\n'.join([c.decode('utf-8') for c in r.iter_content(None)])
                            if self._batch_size == 1 or r.status_code in [401, 429]:
                                raise requests.RequestException(error_msg)
                            else:
                                msg = f'Online embedding:{self._embed_model_name} post failed, adjust batch_size: '
                                msg = msg + f' from {self._batch_size} to {max(self._batch_size // 2, 1)}'
                                LOG.warning(msg)
                                self._batch_size = max(self._batch_size // 2, 1)
                                data = self._encapsulated_data(input, **kwargs)
                                break
        return ret

`lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoEmbed`

Bases: LazyLLMOnlineEmbedModuleBase

DoubaoEmbed class inherits from OnlineEmbeddingModuleBase, encapsulating the functionality to call Doubao's online text embedding service.
It supports remote text vector representation retrieval by specifying the service URL, model name, and API key.

Parameters:

embed_url (Optional[str], default: None ) –

URL of the Doubao text embedding service, defaulting to the Beijing region endpoint.
embed_model_name (Optional[str], default: None ) –

Name of the Doubao embedding model used, default is "doubao-embedding-text-240715".
api_key (Optional[str], default: None ) –

API key for accessing the Doubao service. If not provided, it is read from lazyllm config.

Source code in lazyllm/module/llms/onlinemodule/supplier/doubao.py

class DoubaoEmbed(LazyLLMOnlineEmbedModuleBase):
    """DoubaoEmbed class inherits from OnlineEmbeddingModuleBase, encapsulating the functionality to call Doubao's online text embedding service.  
It supports remote text vector representation retrieval by specifying the service URL, model name, and API key.

Args:
    embed_url (Optional[str]): URL of the Doubao text embedding service, defaulting to the Beijing region endpoint.
    embed_model_name (Optional[str]): Name of the Doubao embedding model used, default is "doubao-embedding-text-240715".
    api_key (Optional[str]): API key for accessing the Doubao service. If not provided, it is read from lazyllm config.
"""
    def __init__(self,
                 embed_url: Optional[str] = None,
                 embed_model_name: Optional[str] = None,
                 api_key: str = None,
                 batch_size: int = 16,
                 **kw):
        embed_url = embed_url or 'https://ark.cn-beijing.volces.com/api/v3/embeddings'
        embed_model_name = embed_model_name or 'doubao-embedding-text-240715'
        super().__init__(embed_url, api_key or self._default_api_key(), embed_model_name,
                         batch_size=batch_size, **kw)

`lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoMultimodalEmbed`

Bases: LazyLLMOnlineMultimodalEmbedModuleBase

DoubaoMultimodalEmbed class inherits from OnlineEmbeddingModuleBase, encapsulating the functionality to call Doubao's online multimodal (text + image) embedding service.
It supports converting text and image inputs into a unified vector representation by specifying the service URL, model name, and API key, enabling remote retrieval of multimodal embeddings.

Parameters:

embed_url (Optional[str], default: None ) –

URL of the Doubao multimodal embedding service, defaulting to the Beijing region endpoint.
embed_model_name (Optional[str], default: None ) –

Name of the Doubao multimodal embedding model used, default is "doubao-embedding-vision-241215".
api_key (Optional[str], default: None ) –

API key for accessing the Doubao service. If not provided, it is read from lazyllm config.

Source code in lazyllm/module/llms/onlinemodule/supplier/doubao.py

class DoubaoMultimodalEmbed(LazyLLMOnlineMultimodalEmbedModuleBase):
    """DoubaoMultimodalEmbed class inherits from OnlineEmbeddingModuleBase, encapsulating the functionality to call Doubao's online multimodal (text + image) embedding service.  
It supports converting text and image inputs into a unified vector representation by specifying the service URL, model name, and API key, enabling remote retrieval of multimodal embeddings.

Args:
    embed_url (Optional[str]): URL of the Doubao multimodal embedding service, defaulting to the Beijing region endpoint.
    embed_model_name (Optional[str]): Name of the Doubao multimodal embedding model used, default is "doubao-embedding-vision-241215".
    api_key (Optional[str]): API key for accessing the Doubao service. If not provided, it is read from lazyllm config.
"""
    def __init__(self,
                 embed_url: Optional[str] = None,
                 embed_model_name: str = None,
                 api_key: str = None):
        embed_url = embed_url or 'https://ark.cn-beijing.volces.com/api/v3/embeddings/multimodal'
        embed_model_name = (embed_model_name or lazyllm.config['doubao_multimodal_embed_model_name']
                            or 'doubao-embedding-vision-241215')
        super().__init__(embed_url, api_key or self._default_api_key(), embed_model_name)

    def _encapsulated_data(self, input: Union[List, str], **kwargs) -> Dict[str, str]:
        if isinstance(input, str):
            input = [{'text': input}]
        elif isinstance(input, list):
            # Validate input format, at most 1 text segment + 1 image
            if len(input) == 0:
                raise ValueError('Input list cannot be empty')
            if len(input) > 2:
                raise ValueError('Input list must contain at most 2 items (1 text and/or 1 image)')
        else:
            raise ValueError('Input must be either a string or a list of dictionaries')

        json_data = {
            'input': input,
            'model': self._embed_model_name
        }
        if len(kwargs) > 0:
            json_data.update(kwargs)

        return json_data

    def _parse_response(self, response: Dict, input: Union[List, str]) -> List[float]:
        # Doubao multimodal embedding returns a single fused embedding
        return response['data']['embedding']

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMChat`

Bases: OnlineChatModuleBase, FileHandlerBase

GLMChat class inherits from OnlineChatModuleBase and FileHandlerBase, encapsulating the functionality of accessing Zhipu's GLM series models online.
It supports chat generation, file handling, and fine-tuning. The default model is GLM-4, but other trainable models (e.g., chatglm3-6b, chatglm_12b) are also supported.

Parameters:

base_url (Optional[str], default: None ) –

API endpoint for Zhipu GLM service, default is "https://open.bigmodel.cn/api/paas/v4/".
model (Optional[str], default: None ) –

Name of the GLM model to use. Defaults to "glm-4", or one from the TRAINABLE_MODEL_LIST.
api_key (Optional[str], default: None ) –

API key for accessing GLM service. If not provided, it is read from lazyllm config.
stream (Optional[bool], default: True ) –

Whether to enable streaming output. Defaults to True.
return_trace (Optional[bool], default: False ) –

Whether to return debug trace information. Defaults to False.
**kwargs –

Additional optional parameters passed to OnlineChatModuleBase.

Source code in lazyllm/module/llms/onlinemodule/supplier/glm.py

class GLMChat(OnlineChatModuleBase, FileHandlerBase):
    """GLMChat class inherits from OnlineChatModuleBase and FileHandlerBase, encapsulating the functionality of accessing Zhipu's GLM series models online.  
It supports chat generation, file handling, and fine-tuning. The default model is GLM-4, but other trainable models (e.g., chatglm3-6b, chatglm_12b) are also supported.

Args:
    base_url (Optional[str]): API endpoint for Zhipu GLM service, default is "https://open.bigmodel.cn/api/paas/v4/".
    model (Optional[str]): Name of the GLM model to use. Defaults to "glm-4", or one from the TRAINABLE_MODEL_LIST.
    api_key (Optional[str]): API key for accessing GLM service. If not provided, it is read from lazyllm config.
    stream (Optional[bool]): Whether to enable streaming output. Defaults to True.
    return_trace (Optional[bool]): Whether to return debug trace information. Defaults to False.
    **kwargs: Additional optional parameters passed to OnlineChatModuleBase.
"""
    TRAINABLE_MODEL_LIST = ['chatglm3-6b', 'chatglm_12b', 'chatglm_32b', 'chatglm_66b', 'chatglm_130b']
    VLM_MODEL_PREFIX = ['glm-4.5v', 'glm-4.1v', 'glm-4v']
    MODEL_NAME = 'glm-4'

    def __init__(self, base_url: Optional[str] = None, model: Optional[str] = None,
                 api_key: str = None, stream: str = True, return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://open.bigmodel.cn/api/paas/v4/'
        super().__init__(api_key=api_key or self._default_api_key(),
                         model_name=model or lazyllm.config['glm_model_name'] or GLMChat.MODEL_NAME,
                         base_url=base_url, stream=stream, return_trace=return_trace, **kwargs)
        FileHandlerBase.__init__(self)
        self.default_train_data = {
            'model': None,
            'training_file': None,
            'validation_file': None,
            'extra_hyperparameters': {
                'fine_tuning_method': None,  # lora\full, default: lora,
                'fine_tuning_parameters': {
                    'max_sequence_length': None  # [1, 8192](int), default: 8192
                }
            },
            'hyperparameters': {
                'learning_rate_multiplier': 0.01,  # (0,5] , default: 1.0
                'batch_size': None,  # [1, 32], default: 8
                'n_epochs': 1,  # [1, 10], default: 3
            },
            'suffix': None,
            'request_id': None
        }
        self.fine_tuning_job_id = None

    def _get_system_prompt(self):
        return ('You are ChatGLM, an AI assistant developed based on a language model trained by Zhipu AI. '
                'Your task is to provide appropriate responses and support for user\'s questions and requests.')

    def _get_models_list(self):
        return ['glm-4', 'glm-4v', 'glm-3-turbo', 'chatglm-turbo', 'cogview-3', 'embedding-2', 'text-embedding']

    def _convert_file_format(self, filepath: str) -> str:
        with open(filepath, 'r', encoding='utf-8') as fr:
            dataset = [json.loads(line) for line in fr]

        json_strs = []
        for ex in dataset:
            lineEx = {'messages': []}
            messages = ex.get('messages', [])
            for message in messages:
                role = message.get('role', '')
                content = message.get('content', '')
                if role in ['system', 'user', 'assistant']:
                    lineEx['messages'].append({'role': role, 'content': content})
            json_strs.append(json.dumps(lineEx, ensure_ascii=False))

        return '\n'.join(json_strs)

    def _upload_train_file(self, train_file):
        url = urljoin(self._base_url, 'files')
        self.get_finetune_data(train_file)

        file_object = {
            'purpose': (None, 'fine-tune', None),
            'file': (os.path.basename(train_file), self._dataHandler, 'application/json')
        }

        with requests.post(url, headers=self._get_empty_header(), files=file_object) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            # delete temporary training file
            self._dataHandler.close()
            return r.json()['id']

    def _update_kw(self, data, normal_config):
        cur_data = self.default_train_data.copy()
        cur_data.update(data)

        cur_data['extra_hyperparameters']['fine_tuning_method'] = normal_config['finetuning_type'].strip().lower()
        cur_data['extra_hyperparameters']['fine_tuning_parameters']['max_sequence_length'] = normal_config['cutoff_len']
        cur_data['hyperparameters']['learning_rate_multiplier'] = normal_config['learning_rate']
        cur_data['hyperparameters']['batch_size'] = normal_config['batch_size']
        cur_data['hyperparameters']['n_epochs'] = normal_config['num_epochs']
        cur_data['suffix'] = str(uuid.uuid4())[:7]
        return cur_data

    def _create_finetuning_job(self, train_model, train_file_id, **kw) -> Tuple[str, str]:
        url = urljoin(self._base_url, 'fine_tuning/jobs')
        data = {'model': train_model, 'training_file': train_file_id}
        if len(kw) > 0:
            if 'finetuning_type' in kw:
                data = self._update_kw(data, kw)
            else:
                data.update(kw)

        with requests.post(url, headers=self._header, json=data) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            fine_tuning_job_id = r.json()['id']
            self.fine_tuning_job_id = fine_tuning_job_id
            status = self._status_mapping(r.json()['status'])
            return (fine_tuning_job_id, status)

    def _cancel_finetuning_job(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            return 'Invalid'
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        fine_tune_url = os.path.join(self._base_url, f'fine_tuning/jobs/{job_id}/cancel')
        with requests.post(fine_tune_url, headers=self._header) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        status = r.json()['status']
        if status == 'cancelled':
            return 'Cancelled'
        else:
            return f'JOB {job_id} status: {status}'

    def _query_finetuned_jobs(self):
        fine_tune_url = os.path.join(self._base_url, 'fine_tuning/jobs/')
        with requests.get(fine_tune_url, headers=self._get_empty_header()) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return r.json()

    def _get_finetuned_model_names(self) -> Tuple[List[Tuple[str, str]], List[Tuple[str, str]]]:
        model_data = self._query_finetuned_jobs()
        res = list()
        for model in model_data['data']:
            res.append([model['id'], model['fine_tuned_model'], self._status_mapping(model['status'])])
        return res

    def _status_mapping(self, status):
        if status == 'succeeded':
            return 'Done'
        elif status == 'failed':
            return 'Failed'
        elif status == 'cancelled':
            return 'Cancelled'
        elif status == 'running':
            return 'Running'
        else:  # create, validating_files, queued
            return 'Pending'

    def _query_job_status(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            raise RuntimeError('No job ID specified. Please ensure that a valid "fine_tuning_job_id" is '
                               'provided as an argument or started a training job.')
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        _, status = self._query_finetuning_job(job_id)
        return self._status_mapping(status)

    def _get_log(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            raise RuntimeError('No job ID specified. Please ensure that a valid "fine_tuning_job_id" is '
                               'provided as an argument or started a training job.')
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        fine_tune_url = os.path.join(self._base_url, f'fine_tuning/jobs/{job_id}/events')
        with requests.get(fine_tune_url, headers=self._get_empty_header()) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return job_id, r.json()

    def _get_curr_job_model_id(self):
        if not self.fine_tuning_job_id:
            return None, None
        model_id, _ = self._query_finetuning_job(self.fine_tuning_job_id)
        return self.fine_tuning_job_id, model_id

    def _query_finetuning_job_info(self, fine_tuning_job_id):
        fine_tune_url = os.path.join(self._base_url, f'fine_tuning/jobs/{fine_tuning_job_id}')
        with requests.get(fine_tune_url, headers=self._get_empty_header()) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return r.json()

    def _query_finetuning_job(self, fine_tuning_job_id) -> Tuple[str, str]:
        info = self._query_finetuning_job_info(fine_tuning_job_id)
        status = info['status']
        fine_tuned_model = info['fine_tuned_model'] if 'fine_tuned_model' in info else None
        return (fine_tuned_model, status)

    def _query_finetuning_cost(self, fine_tuning_job_id):
        info = self._query_finetuning_job_info(fine_tuning_job_id)
        if 'trained_tokens' in info and info['trained_tokens']:
            return info['trained_tokens']
        else:
            return None

    def _create_deployment(self) -> Tuple[str]:
        return (self._model_name, 'RUNNING')

    def _query_deployment(self, deployment_id) -> str:
        return 'RUNNING'

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMText2Image`

Bases: LazyLLMOnlineText2ImageModuleBase

GLM Text-to-Image module, inheriting from GLMMultiModal, encapsulates the functionality to generate images using the GLM CogView-4 model.
It supports generating a specified number of images with given resolution based on a text prompt and can call the remote service via an API key.

Parameters:

model_name (Optional[str], default: None ) –

Name of the GLM model to use, defaulting to "cogview-4-250304" or the 'glm_text_to_image_model_name' in config.
api_key (Optional[str], default: None ) –

API key to access the GLM image generation service.
return_trace (bool, default: False ) –

Whether to return debug trace information, default is False.
**kwargs –

Additional parameters passed to GLMMultiModal.

Source code in lazyllm/module/llms/onlinemodule/supplier/glm.py

class GLMText2Image(LazyLLMOnlineText2ImageModuleBase):
    """GLM Text-to-Image module, inheriting from GLMMultiModal, encapsulates the functionality to generate images using the GLM CogView-4 model.  
It supports generating a specified number of images with given resolution based on a text prompt and can call the remote service via an API key.

Args:
    model_name (Optional[str]): Name of the GLM model to use, defaulting to "cogview-4-250304" or the 'glm_text_to_image_model_name' in config.
    api_key (Optional[str]): API key to access the GLM image generation service.
    return_trace (bool): Whether to return debug trace information, default is False.
    **kwargs: Additional parameters passed to GLMMultiModal.
"""
    MODEL_NAME = 'cogview-4-250304'

    def __init__(self, model_name: str = None, api_key: str = None, return_trace: bool = False,
                 base_url: Optional[str] = None, **kwargs):
        base_url = base_url or 'https://open.bigmodel.cn/api/paas/v4'
        super().__init__(model_name=model_name or GLMText2Image.MODEL_NAME, api_key=api_key or self._default_api_key(),
                         return_trace=return_trace, base_url=base_url, **kwargs)
        if self._type == LLMType.IMAGE_EDITING:
            raise ValueError('GLM series models do not support image editing now.')

    def _forward(self, input: str = None, n: int = 1, size: str = '1024x1024',
                 url: str = None, model: str = None, **kwargs):
        runtime_url = url or self._base_url
        runtime_model = model or self._model_name
        client = _zhipu_client(base_url=runtime_url, api_key=self._api_key)
        call_params = {
            'model': runtime_model,
            'prompt': input,
            'n': n,
            'size': size,
            **kwargs
        }
        response = client.images.generations(**call_params)
        return encode_query_with_filepaths(None, bytes_to_file([requests.get(result.url).content
                                                                for result in response.data]))

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenText2Image`

Bases: LazyLLMOnlineText2ImageModuleBase

Qwen Text-to-Image module and Image-Edit module, inheriting from LazyLLMOnlineText2ImageModuleBase, encapsulates the functionality to generate images using the Qwen Wanx2.1-t2i-turbo model.
It supports generating a specified number of images with given resolution based on a text prompt, and allows setting negative prompts, random seeds, and prompt extension. The service is called remotely via DashScope API.

Parameters:

model (Optional[str], default: None ) –

Name of the Qwen model to use, default is taken from config 'qwen_text2image_model_name', or "wanx2.1-t2i-turbo" if not set.
api_key (Optional[str], default: None ) –

API key for accessing DashScope service.
return_trace (bool, default: False ) –

Whether to return debug trace information, default is False.
**kwargs –

Additional parameters passed to LazyLLMOnlineText2ImageModuleBase.

Source code in lazyllm/module/llms/onlinemodule/supplier/qwen.py

class QwenText2Image(LazyLLMOnlineText2ImageModuleBase):
    """Qwen Text-to-Image module and Image-Edit module, inheriting from LazyLLMOnlineText2ImageModuleBase, encapsulates the functionality to generate images using the Qwen Wanx2.1-t2i-turbo model.  
It supports generating a specified number of images with given resolution based on a text prompt, and allows setting negative prompts, random seeds, and prompt extension. The service is called remotely via DashScope API.

Args:
    model (Optional[str]): Name of the Qwen model to use, default is taken from config 'qwen_text2image_model_name', or "wanx2.1-t2i-turbo" if not set.
    api_key (Optional[str]): API key for accessing DashScope service.
    return_trace (bool): Whether to return debug trace information, default is False.
    **kwargs: Additional parameters passed to LazyLLMOnlineText2ImageModuleBase.
"""
    MODEL_NAME = 'wanx2.1-t2i-turbo'
    IMAGE_EDITING_MODEL_NAME = 'qwen-image-edit-plus'

    def __init__(self, model: str = None, api_key: str = None, return_trace: bool = False,
                 base_url: Optional[str] = None,
                 base_websocket_url: Optional[str] = None,
                 **kwargs):
        _ensure_dashscope_urls_initialized()
        base_url = base_url or _DASHSCOPE_DEFAULT_HTTP_URL
        base_websocket_url = base_websocket_url or _DASHSCOPE_DEFAULT_WEBSOCKET_URL
        if base_url and base_url != _DASHSCOPE_DEFAULT_HTTP_URL:
            LOG.warning('QwenText2Image ignores `base_url`; use `set_dashscope_urls` instead.')
        if base_websocket_url and base_websocket_url != _DASHSCOPE_DEFAULT_WEBSOCKET_URL:
            LOG.warning('QwenText2Image ignores `base_websocket_url`; use `set_dashscope_urls` instead.')
        super().__init__(api_key=api_key or self._default_api_key(), model_name=model,
                         return_trace=return_trace, base_url=base_url, **kwargs)

    def _call_sync_text2image(self, call_params):
        task_response = dashscope.MultiModalConversation.call(**call_params)
        if task_response.status_code != HTTPStatus.OK:
            raise RuntimeError(
                f'Failed to create image synthesis task, '
                f'status: {task_response.status_code}, message: {task_response.message}'
            )
        return task_response

    def _call_async_text2image(self, call_params):
        task_response = dashscope.ImageSynthesis.async_call(**call_params)
        if task_response.status_code != HTTPStatus.OK:
            raise RuntimeError(
                f'Failed to create image synthesis task, '
                f'status: {task_response.status_code}, message: {task_response.message}'
            )
        task_id = getattr(task_response.output, 'task_id', None)
        if not task_id:
            raise RuntimeError('No task_id returned from async image synthesis call')
        response = dashscope.ImageSynthesis.wait(task=task_id, api_key=self._api_key)
        return response

    def _extract_sync_image_urls(self, response):
        try:
            image_urls = []
            for idx, content in enumerate(response.output.choices[0].message.content):
                try:
                    image_url = content['image']
                    if image_url:
                        image_urls.append(image_url)
                except Exception as e:
                    LOG.warning(f'Failed to extract image URL from item {idx}: {str(e)}')
                    continue
            if not image_urls:
                LOG.warning('No image URLs found in content')
            return image_urls
        except Exception as e:
            LOG.error(f'Failed to extract sync image URLs: {str(e)}')
            return []

    def _extract_async_image_urls(self, response):
        try:
            output = getattr(response, 'output', None)
            if not output:
                return []
            results = getattr(output, 'results', [])
            if not results:
                LOG.warning('No results in async response output')
                return []
            return [getattr(result, 'url', None) for result in results if getattr(result, 'url', None)]
        except Exception as e:
            LOG.error(f'Failed to extract async image URLs: {str(e)}')
            return []

    def _forward(self, input: str = None, files: List[str] = None, negative_prompt: str = None, n: int = 1,
                 prompt_extend: bool = True, size: str = '1024*1024', seed: int = None,
                 url: str = None, model: str = None, **kwargs):
        has_ref_image = files is not None and len(files) > 0
        reference_image_data = None
        messages = []
        if url and url != self._base_url:
            raise Exception('Qwen Text2Image forward() does not support overriding the `url` parameter, '
                            'please remove it.')
        if 'base_websocket_url' in kwargs:
            raise Exception('Qwen Text2Image forward() does not support overriding the `base_websocket_url` parameter.')
        if self._type == LLMType.IMAGE_EDITING and not has_ref_image:
            raise ValueError(
                f'Image editing is enabled for model {self._model_name}, but no image file was provided. '
                f'Please provide an image file via the "files" parameter.'
            )
        if self._type != LLMType.IMAGE_EDITING and has_ref_image:
            raise ValueError(
                f'Image file was provided, but image editing is not enabled for model {self._model_name}. '
                f'Please use default image-editing model {self.IMAGE_EDITING_MODEL_NAME} or other image-editing model'
            )
        if has_ref_image:
            image_results = self._load_images(files)
            content = []
            for base64_str, _ in image_results:
                reference_image_data = f'data:image/png;base64,{base64_str}'
                content.append({'image': reference_image_data})
            content.append({'text': input})
            messages = [
                {
                    'role': 'user',
                    'content': content
                }
            ]

        call_params = {
            'model': model,
            'negative_prompt': negative_prompt,
            'n': n,
            'prompt_extend': prompt_extend,
            'size': size,
            **kwargs
        }
        if self._api_key: call_params['api_key'] = self._api_key
        if seed: call_params['seed'] = seed
        if has_ref_image:
            call_params['messages'] = messages
            response = self._call_sync_text2image(call_params)
            image_urls = self._extract_sync_image_urls(response)
        else:
            call_params['prompt'] = input
            response = self._call_async_text2image(call_params)
            image_urls = self._extract_async_image_urls(response)
        if response.status_code != HTTPStatus.OK:
            error_msg = getattr(response.output, 'message', 'Unknown error')
            raise Exception(f'Image generation failed: {error_msg}')
        image_results = self._load_images(image_urls)
        image_bytes = [data for _, data in image_results]
        return encode_query_with_filepaths(None, bytes_to_file(image_bytes))

`lazyllm.module.llms.onlinemodule.supplier.kimi.KimiChat`

Bases: OnlineChatModuleBase

KimiChat class, inheriting from OnlineChatModuleBase, encapsulates the functionality to call Kimi chat service provided by Moonshot AI.
By specifying the API key, model name, and service URL, it supports safe and accurate Chinese and English Q&A interactions, as well as image input in base64 format.

Parameters:

base_url (str, default: None ) –

Base URL of the Kimi service, default is "https://api.moonshot.cn/".
model (str, default: None ) –

Kimi model name to use, default is "moonshot-v1-8k".
api_key (Optional[str], default: None ) –

API key for accessing Kimi service. If not provided, it is read from lazyllm config.
stream (bool, default: True ) –

Whether to enable streaming output, default is True.
return_trace (bool, default: False ) –

Whether to return debug trace information, default is False.
**kwargs –

Additional parameters passed to OnlineChatModuleBase.

Source code in lazyllm/module/llms/onlinemodule/supplier/kimi.py

class KimiChat(OnlineChatModuleBase):
    """KimiChat class, inheriting from OnlineChatModuleBase, encapsulates the functionality to call Kimi chat service provided by Moonshot AI.  
By specifying the API key, model name, and service URL, it supports safe and accurate Chinese and English Q&A interactions, as well as image input in base64 format.

Args:
    base_url (str): Base URL of the Kimi service, default is "https://api.moonshot.cn/".
    model (str): Kimi model name to use, default is "moonshot-v1-8k".
    api_key (Optional[str]): API key for accessing Kimi service. If not provided, it is read from lazyllm config.
    stream (bool): Whether to enable streaming output, default is True.
    return_trace (bool): Whether to return debug trace information, default is False.
    **kwargs: Additional parameters passed to OnlineChatModuleBase.
"""

    def __init__(self, base_url: Optional[str] = None, model: Optional[str] = None,
                 api_key: str = None, stream: bool = True, return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://api.moonshot.cn/'
        model = model or 'moonshot-v1-8k'
        super().__init__(api_key=api_key or self._default_api_key(), base_url=base_url,
                         model_name=model, stream=stream, return_trace=return_trace, **kwargs)

    def _get_system_prompt(self):
        return ('You are Kimi, an AI assistant provided by Moonshot AI. You are better at speaking '
                'Chinese and English. You will provide users with safe, helpful, and accurate answers. '
                'At the same time, you will reject all answers involving terrorism, racial discrimination, '
                'pornographic violence, etc. Moonshot AI is a proper noun and cannot be translated '
                'into other languages.')

    def _get_chat_url(self, url):
        if url.rstrip('/').endswith('v1/chat/completions'):
            return url
        return urljoin(url, 'v1/chat/completions')

    def _format_vl_chat_image_url(self, image_url, mime):
        assert not image_url.startswith('http'), 'Kimi vision model only supports base64 format'
        assert mime is not None, 'Kimi Module requires mime info.'
        image_url = f'data:{mime};base64,{image_url}'
        return [{'type': 'image_url', 'image_url': {'url': image_url}}]

    def _format_vl_chat_query(self, query: str):
        return query

    def _validate_api_key(self):
        try:
            models_url = urljoin(self._base_url, 'v1/models')
            response = requests.get(models_url, headers=self._header, timeout=10)
            return response.status_code == 200
        except Exception:
            return False

`lazyllm.module.llms.onlinemodule.fileHandler.FileHandlerBase`

FileHandlerBase is a base class for handling fine-tuning data files, mainly used for validating and converting fine-tuning data formats.
This class cannot be instantiated directly; it must be inherited by a subclass that implements specific file format conversion logic.

Capabilities include

Validate that the fine-tuning data file is in standard .jsonl format.
Check that each data entry contains messages in the correct format (with role and content fields).
Verify that roles are within the allowed range (system, knowledge, user, assistant).
Ensure each conversation example contains at least one assistant response.
Provide temporary file storage for further processing.

Examples:

>>> import lazyllm
>>> from lazyllm.module.llms.onlinemodule.fileHandler import FileHandlerBase
>>> import tempfile
>>> import json
>>> sample_data = [
...     {"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there!"}]},
...     {"messages": [{"role": "user", "content": "How are you?"}, {"role": "assistant", "content": "I'm doing well, thank you!"}]}
... ] 
>>> with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=False) as f:
...     for item in sample_data:
...         f.write(json.dumps(item, ensure_ascii=False) + '
')
...     temp_file_path = f.name
>>> class CustomFileHandler(FileHandlerBase):
...     def _convert_file_format(self, filepath: str) -> str:
...         with open(filepath, 'r', encoding='utf-8') as f:
...             data = [json.loads(line) for line in f]
...         converted_data = []
...         for item in data:
...             messages = item.get('messages', [])
...             conversation = []
...             for msg in messages:
...                 conversation.append(f"{msg['role']}: {msg['content']}")
...             converted_data.append('
'.join(conversation))
...         return '
---
'.join(converted_data)
>>> handler = CustomFileHandler()
>>> try:
...     result = handler.get_finetune_data(temp_file_path)
...     print("数据验证和转换成功")
... except Exception as e:
...     print(f"错误: {e}")
... finally:
...     import os
...     os.unlink(temp_file_path)

Source code in lazyllm/module/llms/onlinemodule/fileHandler.py

class FileHandlerBase:
    """FileHandlerBase is a base class for handling fine-tuning data files, mainly used for validating and converting fine-tuning data formats.  
This class cannot be instantiated directly; it must be inherited by a subclass that implements specific file format conversion logic.

Capabilities include:
    1. Validate that the fine-tuning data file is in standard `.jsonl` format.
    2. Check that each data entry contains messages in the correct format (with `role` and `content` fields).
    3. Verify that roles are within the allowed range (system, knowledge, user, assistant).
    4. Ensure each conversation example contains at least one assistant response.
    5. Provide temporary file storage for further processing.


Examples:
    >>> import lazyllm
    >>> from lazyllm.module.llms.onlinemodule.fileHandler import FileHandlerBase
    >>> import tempfile
    >>> import json
    >>> sample_data = [
    ...     {"messages": [{"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there!"}]},
    ...     {"messages": [{"role": "user", "content": "How are you?"}, {"role": "assistant", "content": "I'm doing well, thank you!"}]}
    ... ] 
    >>> with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=False) as f:
    ...     for item in sample_data:
    ...         f.write(json.dumps(item, ensure_ascii=False) + '
    ')
    ...     temp_file_path = f.name
    >>> class CustomFileHandler(FileHandlerBase):
    ...     def _convert_file_format(self, filepath: str) -> str:
    ...         with open(filepath, 'r', encoding='utf-8') as f:
    ...             data = [json.loads(line) for line in f]
    ...         converted_data = []
    ...         for item in data:
    ...             messages = item.get('messages', [])
    ...             conversation = []
    ...             for msg in messages:
    ...                 conversation.append(f"{msg['role']}: {msg['content']}")
    ...             converted_data.append('
    '.join(conversation))
    ...         return '
    ---
    '.join(converted_data)
    >>> handler = CustomFileHandler()
    >>> try:
    ...     result = handler.get_finetune_data(temp_file_path)
    ...     print("数据验证和转换成功")
    ... except Exception as e:
    ...     print(f"错误: {e}")
    ... finally:
    ...     import os
    ...     os.unlink(temp_file_path)
    """

    def __init__(self):
        self._roles = ['system', 'knowledge', 'user', 'assistant']

    def _validate_json(self, data_path: str) -> None:  # noqa C901
        # Check if file name format
        if os.path.splitext(data_path)[-1] != '.jsonl':
            raise ValueError('The file name must end with .jsonl')
        # Check if the file exists
        if not os.path.exists(data_path):
            raise FileNotFoundError(f'File {data_path} does not exist.')

        # Load dataset
        with open(data_path, 'r', encoding='utf-8') as f:
            dataset = [json.loads(line) for line in f]

        # Initial dataset stats
        lazyllm.LOG.info('Num examples:', len(dataset))
        lazyllm.LOG.info('First example:')
        for message in dataset[0]['messages']:
            lazyllm.LOG.info(message)

        # Format error checks
        format_error: Dict[str, list[int]] = defaultdict(list)
        for index, line in enumerate(dataset, start=1):
            # Check if example is a dictionary type
            if not isinstance(line, dict):
                format_error['data_type'].append(index)
                continue

            messages = line.get('messages', None)
            # Check if messages keyword exists
            if messages is None:
                format_error['missing_messages_list'].append(index)
                continue

            for message in messages:
                if 'role' not in message or 'content' not in message:
                    format_error['message_missing_key'].append(index)

                if any(k not in ('role', 'content') for k in message):
                    format_error['message_unrecognized_key'].append(index)

                if message.get('role', None) not in self._roles:
                    format_error['unrecognized_role'].append(index)

                content = message.get('content', None)
                if content is None or not isinstance(content, str):
                    format_error['missing_content'].append(index)

            if not any(message.get('role', None) == 'assistant' for message in messages):
                format_error['example_missing_assistant_message'].append(index)

        if format_error:
            lazyllm.LOG.error('Found errors: ')
            for k, v in format_error.items():
                lazyllm.LOG.error(f'Error Type: {k}, Error number: {len(v)}')
                lazyllm.LOG.error(f'Error Type: {k}, Error line number: {v}')
        else:
            lazyllm.LOG.info('No errors found')

    def get_finetune_data(self, filepath: str) -> str:
        """Get and process fine-tuning data files, including validating file format and converting to the format supported by the target platform.

Args:
    filepath (str): Path to the fine-tuning data file, must be in .jsonl format
"""
        self._validate_json(filepath)
        self._save_tempfile(self._convert_file_format(filepath))

    def _save_tempfile(self, data: str):
        self._dataHandler = tempfile.TemporaryFile()
        self._dataHandler.write(data.encode())
        self._dataHandler.seek(0)

    def _convert_file_format(self, filepath: str) -> str:
        raise NotImplementedError

`get_finetune_data(filepath)`

Get and process fine-tuning data files, including validating file format and converting to the format supported by the target platform.

Parameters:

filepath (str) –

Path to the fine-tuning data file, must be in .jsonl format

Source code in lazyllm/module/llms/onlinemodule/fileHandler.py

    def get_finetune_data(self, filepath: str) -> str:
        """Get and process fine-tuning data files, including validating file format and converting to the format supported by the target platform.

Args:
    filepath (str): Path to the fine-tuning data file, must be in .jsonl format
"""
        self._validate_json(filepath)
        self._save_tempfile(self._convert_file_format(filepath))

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMChat`

Bases: OnlineChatModuleBase, FileHandlerBase

GLMChat class inherits from OnlineChatModuleBase and FileHandlerBase, encapsulating the functionality of accessing Zhipu's GLM series models online.
It supports chat generation, file handling, and fine-tuning. The default model is GLM-4, but other trainable models (e.g., chatglm3-6b, chatglm_12b) are also supported.

Parameters:

base_url (Optional[str], default: None ) –

API endpoint for Zhipu GLM service, default is "https://open.bigmodel.cn/api/paas/v4/".
model (Optional[str], default: None ) –

Name of the GLM model to use. Defaults to "glm-4", or one from the TRAINABLE_MODEL_LIST.
api_key (Optional[str], default: None ) –

API key for accessing GLM service. If not provided, it is read from lazyllm config.
stream (Optional[bool], default: True ) –

Whether to enable streaming output. Defaults to True.
return_trace (Optional[bool], default: False ) –

Whether to return debug trace information. Defaults to False.
**kwargs –

Additional optional parameters passed to OnlineChatModuleBase.

Source code in lazyllm/module/llms/onlinemodule/supplier/glm.py

class GLMChat(OnlineChatModuleBase, FileHandlerBase):
    """GLMChat class inherits from OnlineChatModuleBase and FileHandlerBase, encapsulating the functionality of accessing Zhipu's GLM series models online.  
It supports chat generation, file handling, and fine-tuning. The default model is GLM-4, but other trainable models (e.g., chatglm3-6b, chatglm_12b) are also supported.

Args:
    base_url (Optional[str]): API endpoint for Zhipu GLM service, default is "https://open.bigmodel.cn/api/paas/v4/".
    model (Optional[str]): Name of the GLM model to use. Defaults to "glm-4", or one from the TRAINABLE_MODEL_LIST.
    api_key (Optional[str]): API key for accessing GLM service. If not provided, it is read from lazyllm config.
    stream (Optional[bool]): Whether to enable streaming output. Defaults to True.
    return_trace (Optional[bool]): Whether to return debug trace information. Defaults to False.
    **kwargs: Additional optional parameters passed to OnlineChatModuleBase.
"""
    TRAINABLE_MODEL_LIST = ['chatglm3-6b', 'chatglm_12b', 'chatglm_32b', 'chatglm_66b', 'chatglm_130b']
    VLM_MODEL_PREFIX = ['glm-4.5v', 'glm-4.1v', 'glm-4v']
    MODEL_NAME = 'glm-4'

    def __init__(self, base_url: Optional[str] = None, model: Optional[str] = None,
                 api_key: str = None, stream: str = True, return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://open.bigmodel.cn/api/paas/v4/'
        super().__init__(api_key=api_key or self._default_api_key(),
                         model_name=model or lazyllm.config['glm_model_name'] or GLMChat.MODEL_NAME,
                         base_url=base_url, stream=stream, return_trace=return_trace, **kwargs)
        FileHandlerBase.__init__(self)
        self.default_train_data = {
            'model': None,
            'training_file': None,
            'validation_file': None,
            'extra_hyperparameters': {
                'fine_tuning_method': None,  # lora\full, default: lora,
                'fine_tuning_parameters': {
                    'max_sequence_length': None  # [1, 8192](int), default: 8192
                }
            },
            'hyperparameters': {
                'learning_rate_multiplier': 0.01,  # (0,5] , default: 1.0
                'batch_size': None,  # [1, 32], default: 8
                'n_epochs': 1,  # [1, 10], default: 3
            },
            'suffix': None,
            'request_id': None
        }
        self.fine_tuning_job_id = None

    def _get_system_prompt(self):
        return ('You are ChatGLM, an AI assistant developed based on a language model trained by Zhipu AI. '
                'Your task is to provide appropriate responses and support for user\'s questions and requests.')

    def _get_models_list(self):
        return ['glm-4', 'glm-4v', 'glm-3-turbo', 'chatglm-turbo', 'cogview-3', 'embedding-2', 'text-embedding']

    def _convert_file_format(self, filepath: str) -> str:
        with open(filepath, 'r', encoding='utf-8') as fr:
            dataset = [json.loads(line) for line in fr]

        json_strs = []
        for ex in dataset:
            lineEx = {'messages': []}
            messages = ex.get('messages', [])
            for message in messages:
                role = message.get('role', '')
                content = message.get('content', '')
                if role in ['system', 'user', 'assistant']:
                    lineEx['messages'].append({'role': role, 'content': content})
            json_strs.append(json.dumps(lineEx, ensure_ascii=False))

        return '\n'.join(json_strs)

    def _upload_train_file(self, train_file):
        url = urljoin(self._base_url, 'files')
        self.get_finetune_data(train_file)

        file_object = {
            'purpose': (None, 'fine-tune', None),
            'file': (os.path.basename(train_file), self._dataHandler, 'application/json')
        }

        with requests.post(url, headers=self._get_empty_header(), files=file_object) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            # delete temporary training file
            self._dataHandler.close()
            return r.json()['id']

    def _update_kw(self, data, normal_config):
        cur_data = self.default_train_data.copy()
        cur_data.update(data)

        cur_data['extra_hyperparameters']['fine_tuning_method'] = normal_config['finetuning_type'].strip().lower()
        cur_data['extra_hyperparameters']['fine_tuning_parameters']['max_sequence_length'] = normal_config['cutoff_len']
        cur_data['hyperparameters']['learning_rate_multiplier'] = normal_config['learning_rate']
        cur_data['hyperparameters']['batch_size'] = normal_config['batch_size']
        cur_data['hyperparameters']['n_epochs'] = normal_config['num_epochs']
        cur_data['suffix'] = str(uuid.uuid4())[:7]
        return cur_data

    def _create_finetuning_job(self, train_model, train_file_id, **kw) -> Tuple[str, str]:
        url = urljoin(self._base_url, 'fine_tuning/jobs')
        data = {'model': train_model, 'training_file': train_file_id}
        if len(kw) > 0:
            if 'finetuning_type' in kw:
                data = self._update_kw(data, kw)
            else:
                data.update(kw)

        with requests.post(url, headers=self._header, json=data) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            fine_tuning_job_id = r.json()['id']
            self.fine_tuning_job_id = fine_tuning_job_id
            status = self._status_mapping(r.json()['status'])
            return (fine_tuning_job_id, status)

    def _cancel_finetuning_job(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            return 'Invalid'
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        fine_tune_url = os.path.join(self._base_url, f'fine_tuning/jobs/{job_id}/cancel')
        with requests.post(fine_tune_url, headers=self._header) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        status = r.json()['status']
        if status == 'cancelled':
            return 'Cancelled'
        else:
            return f'JOB {job_id} status: {status}'

    def _query_finetuned_jobs(self):
        fine_tune_url = os.path.join(self._base_url, 'fine_tuning/jobs/')
        with requests.get(fine_tune_url, headers=self._get_empty_header()) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return r.json()

    def _get_finetuned_model_names(self) -> Tuple[List[Tuple[str, str]], List[Tuple[str, str]]]:
        model_data = self._query_finetuned_jobs()
        res = list()
        for model in model_data['data']:
            res.append([model['id'], model['fine_tuned_model'], self._status_mapping(model['status'])])
        return res

    def _status_mapping(self, status):
        if status == 'succeeded':
            return 'Done'
        elif status == 'failed':
            return 'Failed'
        elif status == 'cancelled':
            return 'Cancelled'
        elif status == 'running':
            return 'Running'
        else:  # create, validating_files, queued
            return 'Pending'

    def _query_job_status(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            raise RuntimeError('No job ID specified. Please ensure that a valid "fine_tuning_job_id" is '
                               'provided as an argument or started a training job.')
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        _, status = self._query_finetuning_job(job_id)
        return self._status_mapping(status)

    def _get_log(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            raise RuntimeError('No job ID specified. Please ensure that a valid "fine_tuning_job_id" is '
                               'provided as an argument or started a training job.')
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        fine_tune_url = os.path.join(self._base_url, f'fine_tuning/jobs/{job_id}/events')
        with requests.get(fine_tune_url, headers=self._get_empty_header()) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return job_id, r.json()

    def _get_curr_job_model_id(self):
        if not self.fine_tuning_job_id:
            return None, None
        model_id, _ = self._query_finetuning_job(self.fine_tuning_job_id)
        return self.fine_tuning_job_id, model_id

    def _query_finetuning_job_info(self, fine_tuning_job_id):
        fine_tune_url = os.path.join(self._base_url, f'fine_tuning/jobs/{fine_tuning_job_id}')
        with requests.get(fine_tune_url, headers=self._get_empty_header()) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return r.json()

    def _query_finetuning_job(self, fine_tuning_job_id) -> Tuple[str, str]:
        info = self._query_finetuning_job_info(fine_tuning_job_id)
        status = info['status']
        fine_tuned_model = info['fine_tuned_model'] if 'fine_tuned_model' in info else None
        return (fine_tuned_model, status)

    def _query_finetuning_cost(self, fine_tuning_job_id):
        info = self._query_finetuning_job_info(fine_tuning_job_id)
        if 'trained_tokens' in info and info['trained_tokens']:
            return info['trained_tokens']
        else:
            return None

    def _create_deployment(self) -> Tuple[str]:
        return (self._model_name, 'RUNNING')

    def _query_deployment(self, deployment_id) -> str:
        return 'RUNNING'

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMRerank`

Bases: LazyLLMOnlineRerankModuleBase

Reranking module for Zhipu AI, inheriting from OnlineEmbeddingModuleBase, used for relevance reranking of documents.

Parameters:

embed_url (str, default: None ) –

Base URL for reranking API, defaults to "https://open.bigmodel.cn/api/paas/v4/rerank".
embed_model_name (str, default: 'rerank' ) –

Model name to use, defaults to "rerank".
api_key (str, default: None ) –

Zhipu AI API key, if not provided will be read from lazyllm.config['glm_api_key'].

Properties

type: Returns model type, fixed as "ONLINE_RERANK".

Main Features

Performs relevance reranking for input query and document list
Supports custom ranking parameters
Returns relevance scores for each document

Source code in lazyllm/module/llms/onlinemodule/supplier/glm.py

class GLMRerank(LazyLLMOnlineRerankModuleBase):
    """Reranking module for Zhipu AI, inheriting from OnlineEmbeddingModuleBase, used for relevance reranking of documents.

Args:
    embed_url (str): Base URL for reranking API, defaults to "https://open.bigmodel.cn/api/paas/v4/rerank".
    embed_model_name (str): Model name to use, defaults to "rerank".
    api_key (str): Zhipu AI API key, if not provided will be read from lazyllm.config['glm_api_key'].

Properties:
    type: Returns model type, fixed as "ONLINE_RERANK".

Main Features:
    - Performs relevance reranking for input query and document list
    - Supports custom ranking parameters
    - Returns relevance scores for each document
"""
    def __init__(self,
                 embed_url: Optional[str] = None,
                 embed_model_name: str = 'rerank',
                 api_key: str = None, **kw):
        embed_url = embed_url or 'https://open.bigmodel.cn/api/paas/v4/rerank'
        embed_model_name = embed_model_name or 'rerank'
        super().__init__(embed_url, api_key or self._default_api_key(), embed_model_name, **kw)

    @property
    def type(self):
        return 'RERANK'

    def _encapsulated_data(self, query: str, documents: List[str], top_n: int, **kwargs) -> Dict[str, str]:
        json_data = {
            'query': query,
            'documents': documents,
            'top_n': top_n,
            'return_documents': False,
            'return_raw_scores': True
        }
        if len(kwargs) > 0:
            json_data.update(kwargs)

        return json_data

    def _parse_response(self, response: Dict, input: Union[List, str]) -> List[Tuple]:
        return [(result['index'], result['relevance_score']) for result in response['results']]

`lazyllm.module.OnlineMultiModalModule`

Bases: _DynamicSourceRouterMixin

Used to manage and create online multimodal service modules. Supported task types are stt / tts / text2image / image_editing.

Parameters:

model (str, default: None ) –

Model name to use.
source (str, default: None ) –

Supplier to use, such as qwen / glm / minimax / siliconflow / doubao.
type (str, default: None ) –

Multimodal task type, one of stt / tts / text2image / image_editing.
url (str, default: None ) –

Base URL of the platform. Defaults to each supplier's official endpoint. The alias base_url is also accepted.
api_key (str, default: None ) –

You can pass an explicit API key. If set to auto or dynamic, the key is resolved from config at runtime, enabling dynamic key switching.
dynamic_auth (bool, default: False ) –

Whether to enable dynamic auth. When True, it is equivalent to api_key='dynamic'.
return_trace (bool, default: False ) –

Whether to record the result in trace. Defaults to False.

Examples:

>>> import lazyllm
>>> stt = lazyllm.OnlineMultiModalModule(source='qwen', type='stt', api_key='dynamic')
>>> tts = lazyllm.OnlineMultiModalModule(source='qwen', type='tts', dynamic_auth=True)
>>> img = lazyllm.OnlineMultiModalModule(source='qwen', type='text2image')

Source code in lazyllm/module/llms/onlinemodule/multimodal.py

class OnlineMultiModalModule(_DynamicSourceRouterMixin, metaclass=_OnlineMultiModalMeta):
    """Used to manage and create online multimodal service modules. Supported task types are ``stt`` / ``tts`` / ``text2image`` / ``image_editing``.

Args:
    model (str): Model name to use.
    source (str): Supplier to use, such as ``qwen`` / ``glm`` / ``minimax`` / ``siliconflow`` / ``doubao``.
    type (str): Multimodal task type, one of ``stt`` / ``tts`` / ``text2image`` / ``image_editing``.
    url (str): Base URL of the platform. Defaults to each supplier's official endpoint. The alias ``base_url`` is also accepted.
    api_key (str): You can pass an explicit API key. If set to ``auto`` or ``dynamic``, the key is resolved from config at runtime, enabling dynamic key switching.
    dynamic_auth (bool): Whether to enable dynamic auth. When True, it is equivalent to ``api_key='dynamic'``.
    return_trace (bool): Whether to record the result in trace. Defaults to False.


Examples:
    >>> import lazyllm
    >>> stt = lazyllm.OnlineMultiModalModule(source='qwen', type='stt', api_key='dynamic')
    >>> tts = lazyllm.OnlineMultiModalModule(source='qwen', type='tts', dynamic_auth=True)
    >>> img = lazyllm.OnlineMultiModalModule(source='qwen', type='text2image')
    """
    _dynamic_module_slot = 'multimodal'
    _dynamic_source_error = 'No source is configured for dynamic multimodal source.'
    TYPE_GROUP_MAP = {
        'stt': LLMType.STT,
        'tts': LLMType.TTS,
        'text2image': LLMType.TEXT2IMAGE,
        'image_editing': LLMType.TEXT2IMAGE,
    }

    @staticmethod
    def _resolve_type_name(type_name: Optional[str], model: Optional[str]) -> str:
        if type_name is not None:
            return LLMType._normalize(type_name)
        resolved = get_model_type(model) if model else None
        if resolved == 'sd':
            return 'text2image'
        if resolved not in OnlineMultiModalModule.TYPE_GROUP_MAP:
            raise ValueError(
                f'Cannot infer multimodal type from model {model!r}. '
                f'Please provide `type` explicitly (one of: {list(OnlineMultiModalModule.TYPE_GROUP_MAP.keys())}).')
        return resolved

    @staticmethod
    def _validate_parameters(source: Optional[str], model: Optional[str], type: str, url: Optional[str],
                             skip_auth: bool = False, **kwargs) -> tuple:
        if type not in OnlineMultiModalModule.TYPE_GROUP_MAP:
            raise ValueError(
                f'Invalid type: {type!r}. Must be one of: {list(OnlineMultiModalModule.TYPE_GROUP_MAP.keys())}')
        register_type = OnlineMultiModalModule.TYPE_GROUP_MAP.get(type).lower()
        model, source, url, kwargs = resolve_online_params(
            model, source, url, kwargs, source_registry=lazyllm.online[register_type])
        source, default_key = select_source_with_default_key(lazyllm.online[register_type], source, type)
        if default_key and not kwargs.get('api_key'):
            kwargs['api_key'] = default_key
        if skip_auth and not url:
            raise ValueError('url must be set for local serving.')
        default_module_cls = getattr(lazyllm.online[register_type], source)
        default_model_name = getattr(default_module_cls, 'IMAGE_EDITING_MODEL_NAME' if type == 'image_editing'
                                     else 'MODEL_NAME', None)
        if model is None and default_model_name:
            model = default_model_name
            lazyllm.LOG.info(f'For type {type}, source {source}. Automatically selected default model: {model}')
        if url is not None:
            kwargs['base_url'] = url
        return source, model, kwargs

    def __new__(cls, model: str = None, source: str = None, url: str = None, type: str = None,
                return_trace: bool = False, api_key: str = None, dynamic_auth: bool = False,
                skip_auth: bool = False, id: Optional[str] = None, name: Optional[str] = None,
                group_id: Optional[str] = None, **kwargs):
        model, source, url, kwargs = resolve_online_params(
            model, source, url, kwargs, url_aliases='base_url')
        if cls._should_use_dynamic(source, dynamic_auth, skip_auth):
            return super().__new__(cls)
        if source is None and api_key is not None:
            raise ValueError('No source is given but an api_key is provided.')
        if api_key is not None:
            kwargs['api_key'] = api_key
        type = OnlineMultiModalModule._resolve_type_name(
            type if type is not None else kwargs.pop('function', None), model)
        source, model, kwargs_normalized = OnlineMultiModalModule._validate_parameters(
            source=source, model=model, type=type, url=url, skip_auth=skip_auth, **kwargs)
        params = {'return_trace': return_trace, 'type': type}
        if model is not None:
            params['model'] = model
        params.update(kwargs_normalized)
        register_type = OnlineMultiModalModule.TYPE_GROUP_MAP.get(type).lower()
        return getattr(lazyllm.online[register_type], source)(**params)

    def __init__(self, model: str = None, source: str = None, url: str = None, type: str = None,
                 return_trace: bool = False, api_key: str = None, dynamic_auth: bool = False,
                 skip_auth: bool = False, id: Optional[str] = None, name: Optional[str] = None,
                 group_id: Optional[str] = None, **kwargs):
        model, source, url, kwargs = resolve_online_params(
            model, source, url, kwargs, url_aliases='base_url')
        _DynamicSourceRouterMixin.__init__(self, id=id, name=name, group_id=group_id, return_trace=return_trace)
        self._model_name = model
        self._base_url = url
        self._skip_auth = skip_auth
        self._type = self._resolve_type_name(type, model)
        self._kwargs = kwargs
        self._init_dynamic_auth(api_key, dynamic_auth)

    def _build_supplier(self, source: str, skip_auth: bool):
        params = {'base_url': self._base_url, 'model': self._model_name, 'return_trace': self._return_trace,
                  'type': self._type, 'api_key': self._api_key, 'skip_auth': skip_auth, **self._kwargs}
        register_type = OnlineMultiModalModule.TYPE_GROUP_MAP.get(self._type).lower()
        return getattr(lazyllm.online[register_type], source)(**params)

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenRerank`

Bases: LazyLLMOnlineRerankModuleBase

Qwen reranking module, inheriting from OnlineEmbeddingModuleBase, used for relevance reranking of documents.

Parameters:

embed_url (str, default: None ) –

Base URL for reranking API, defaults to "https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank".
embed_model_name (str, default: None ) –

Model name to use, defaults to "gte-rerank".
api_key (str, default: None ) –

Qwen API key, if not provided will be read from lazyllm.config['qwen_api_key'].
**kwargs –

Additional arguments passed to the base class.

Properties

type: Returns model type, fixed as "ONLINE_RERANK".

Main Features

Performs relevance reranking for input query and document list
Supports custom ranking parameters
Returns index and relevance score for each document

Source code in lazyllm/module/llms/onlinemodule/supplier/qwen.py

class QwenRerank(LazyLLMOnlineRerankModuleBase):
    """Qwen reranking module, inheriting from OnlineEmbeddingModuleBase, used for relevance reranking of documents.

Args:
    embed_url (str): Base URL for reranking API, defaults to "https://dashscope.aliyuncs.com/api/v1/services/rerank/text-rerank/text-rerank".
    embed_model_name (str): Model name to use, defaults to "gte-rerank".
    api_key (str): Qwen API key, if not provided will be read from lazyllm.config['qwen_api_key'].
    **kwargs: Additional arguments passed to the base class.

Properties:
    type: Returns model type, fixed as "ONLINE_RERANK".

Main Features:
    - Performs relevance reranking for input query and document list
    - Supports custom ranking parameters
    - Returns index and relevance score for each document
"""

    def __init__(self,
                 embed_url: Optional[str] = None,
                 embed_model_name: Optional[str] = None,
                 api_key: str = None, **kw):
        embed_url = (embed_url or 'https://dashscope.aliyuncs.com/api/v1/services/'
                                  'rerank/text-rerank/text-rerank')
        embed_model_name = embed_model_name or 'gte-rerank-v2'
        super().__init__(embed_url, api_key or self._default_api_key(), embed_model_name, **kw)

    def _get_embed_url(self, url: str) -> str:
        url, done = self._normalize_embed_url(url)
        if done: return url
        if url.rstrip('/').endswith('api/v1'):
            return urljoin(url, 'services/rerank/text-rerank/text-rerank')
        return urljoin(url, 'api/v1/services/rerank/text-rerank/text-rerank')

    @property
    def type(self):
        return 'RERANK'

    def _encapsulated_data(self, query: str, documents: List[str], top_n: int, **kwargs) -> Dict[str, str]:
        json_data = {
            'input': {
                'query': query,
                'documents': documents
            },
            'parameters': {
                'top_n': top_n,
            },
            'model': self._embed_model_name
        }
        if len(kwargs) > 0:
            json_data.update(kwargs)

        return json_data

    def _parse_response(self, response: Dict, input: Union[List, str]) -> List[Tuple]:
        results = response['output']['results']
        return [(result['index'], result['relevance_score']) for result in results]

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenTTS`

Bases: LazyLLMOnlineTTSModuleBase

Qwen's text-to-speech module, inheriting from LazyLLMOnlineTTSModuleBase, providing support for multiple speech synthesis models.

Parameters:

model (str, default: None ) –

Model name, defaults to "qwen-tts". Available models include: - cosyvoice-v2 - cosyvoice-v1 - sambert - qwen-tts - qwen-tts-latest
api_key (str, default: None ) –

API key, defaults to None, will be read from lazyllm.config['qwen_api_key'].
return_trace (bool, default: False ) –

Whether to return call trace information, defaults to False.
**kwargs –

Additional arguments passed to the base class.

Synthesis Parameters:

input (str): Text content to convert.
voice (str): Speaker voice, defaults to model's default voice.
speech_rate (float): Speech rate, defaults to 1.0.
volume (int): Volume, defaults to 50.
pitch (float): Pitch, defaults to 1.0.

Note

Different models may support different voice options
Returned audio data is automatically encoded into file format

Source code in lazyllm/module/llms/onlinemodule/supplier/qwen.py

class QwenTTS(LazyLLMOnlineTTSModuleBase):
    """Qwen's text-to-speech module, inheriting from LazyLLMOnlineTTSModuleBase, providing support for multiple speech
synthesis models.

Args:
    model (str): Model name, defaults to "qwen-tts". Available models include:
        - cosyvoice-v2
        - cosyvoice-v1
        - sambert
        - qwen-tts
        - qwen-tts-latest
    api_key (str): API key, defaults to None, will be read from lazyllm.config['qwen_api_key'].
    return_trace (bool): Whether to return call trace information, defaults to False.
    **kwargs: Additional arguments passed to the base class.

Synthesis Parameters:

    input (str): Text content to convert.
    voice (str): Speaker voice, defaults to model's default voice.
    speech_rate (float): Speech rate, defaults to 1.0.
    volume (int): Volume, defaults to 50.
    pitch (float): Pitch, defaults to 1.0.

Note:
    - Different models may support different voice options
    - Returned audio data is automatically encoded into file format
"""
    MODEL_NAME = 'qwen-tts'
    SYNTHESIZERS = {
        'cosyvoice-v2': (synthesize_v2, 'longxiaochun_v2'),
        'cosyvoice-v1': (synthesize_v2, 'longxiaochun'),
        'sambert': (synthesize, 'zhinan-v1'),
        'qwen-tts': (synthesize_qwentts, 'Cherry'),
        'qwen-tts-latest': (synthesize_qwentts, 'Cherry')
    }

    def __init__(self, model: str = None, api_key: str = None, return_trace: bool = False,
                 base_url: Optional[str] = None,
                 base_websocket_url: Optional[str] = None,
                 **kwargs):
        _ensure_dashscope_urls_initialized()
        base_url = base_url or _DASHSCOPE_DEFAULT_HTTP_URL
        base_websocket_url = base_websocket_url or _DASHSCOPE_DEFAULT_WEBSOCKET_URL
        if base_url and base_url != _DASHSCOPE_DEFAULT_HTTP_URL:
            LOG.warning('QwenTTS ignores `base_url`; use `set_dashscope_urls` instead.')
        if base_websocket_url and base_websocket_url != _DASHSCOPE_DEFAULT_WEBSOCKET_URL:
            LOG.warning('QwenTTS ignores `base_websocket_url`; use `set_dashscope_urls` instead.')
        super().__init__(api_key=api_key or self._default_api_key(), model_name=model,
                         return_trace=return_trace, base_url=base_url, **kwargs)
        if self._model_name not in self.SYNTHESIZERS:
            raise ValueError(f'unsupported model: {self._model_name}. '
                             f'supported models: {QwenTTS.SYNTHESIZERS.keys()}')
        self._synthesizer_func, self._voice = QwenTTS.SYNTHESIZERS[self._model_name]

    def _forward(self, input: str = None, voice: str = None, speech_rate: float = 1.0, volume: int = 50,
                 pitch: float = 1.0, url: str = None, model: str = None, **kwargs):
        if url and url != self._base_url:
            raise Exception('Qwen TTS forward() does not support overriding the `url` parameter, please remove it.')
        if 'base_websocket_url' in kwargs:
            raise Exception('Qwen TTS forward() does not support overriding the `base_websocket_url` parameter.')
        # double check for the forward model name
        if model == self._model_name:
            synthesizer_func, default_voice = self._synthesizer_func, self._voice
        else:
            if model not in self.SYNTHESIZERS:
                raise ValueError(f'unsupported model: {model}. '
                                 f'supported models: {QwenTTS.SYNTHESIZERS.keys()}')
            synthesizer_func, default_voice = QwenTTS.SYNTHESIZERS[model]

        call_params = {
            'input': input,
            'model_name': model,
            'voice': voice or default_voice,
            'speech_rate': speech_rate,
            'volume': volume,
            'pitch': pitch,
            **kwargs
        }
        if self._api_key: call_params['api_key'] = self._api_key
        return encode_query_with_filepaths(None, bytes_to_file(synthesizer_func(**call_params)))

`lazyllm.module.llms.onlinemodule.supplier.sensenova.SenseNovaChat`

Bases: OnlineChatModuleBase, FileHandlerBase, _SenseNovaBase

SenseNovaChat is the LLM interface management component for SenseTime's open platform, inheriting from OnlineChatModuleBase and FileHandlerBase, providing both chat and file handling capabilities.

Parameters:

base_url (str, default: None ) –

Base URL for the API, defaults to "https://api.sensenova.cn/compatible-mode/v1/".
model (str, default: None ) –

Name of the model to use, defaults to "SenseChat-5".
api_key (str, default: None ) –

SenseTime API key, if not provided will be read from lazyllm.config['sensenova_api_key'].
secret_key (str, default: None ) –

SenseTime secret key, if not provided will be read from lazyllm.config['sensenova_secret_key'].
stream (bool, default: True ) –

Whether to enable streaming output, defaults to True.
return_trace (bool, default: False ) –

Whether to return trace information, defaults to False.
**kwargs –

Additional arguments passed to the base class.

Source code in lazyllm/module/llms/onlinemodule/supplier/sensenova.py

class SenseNovaChat(OnlineChatModuleBase, FileHandlerBase, _SenseNovaBase):
    """SenseNovaChat is the LLM interface management component for SenseTime's open platform, inheriting from OnlineChatModuleBase and FileHandlerBase, providing both chat and file handling capabilities.

Args:
    base_url (str): Base URL for the API, defaults to "https://api.sensenova.cn/compatible-mode/v1/".
    model (str): Name of the model to use, defaults to "SenseChat-5".
    api_key (str): SenseTime API key, if not provided will be read from lazyllm.config['sensenova_api_key'].
    secret_key (str): SenseTime secret key, if not provided will be read from lazyllm.config['sensenova_secret_key'].
    stream (bool): Whether to enable streaming output, defaults to True.
    return_trace (bool): Whether to return trace information, defaults to False.
    **kwargs: Additional arguments passed to the base class.
"""
    TRAINABLE_MODEL_LIST = ['nova-ptc-s-v2']
    VLM_MODEL_PREFIX = ['SenseNova-V6-Turbo', 'SenseChat-Vision', 'SenseNova-V6-Pro', 'SenseNova-V6-Reasoner',
                        'SenseNova-V6-5-Pro', 'SenseNova-V6-5-Turbo']

    def _materialize_lazy_api_key(self) -> str:
        return self._get_api_key(None, None)

    def __init__(self, base_url: Optional[str] = None, model: Optional[str] = None,
                 api_key: str = None, secret_key: str = None, stream: bool = True,
                 return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://api.sensenova.cn/compatible-mode/v1/'
        model = model or 'SenseChat-5'
        if secret_key and isinstance(api_key, (tuple, list)):
            raise KeyError('multi-key is not support when secret_key is provided, please use single-key mode!')
        if api_key not in LAZY_API_KEY_TOKENS:
            api_key = self._get_api_key(api_key, secret_key)
        super().__init__(api_key=api_key, base_url=base_url, model_name=model,
                         stream=stream, return_trace=return_trace, **kwargs)
        FileHandlerBase.__init__(self)
        self._deploy_paramters = None
        self._vlm_force_format_input_with_files = True

    def _get_system_prompt(self):
        return 'You are an AI assistant, developed by SenseTime.'

    def _convert_file_format(self, filepath: str) -> None:
        with open(filepath, 'r', encoding='utf-8') as fr:
            dataset = [json.loads(line) for line in fr]

        json_strs = []
        for ex in dataset:
            lineEx = []
            messages = ex.get('messages', [])
            for message in messages:
                role = message.get('role', '')
                content = message.get('content', '')
                if role in ['system', 'knowledge', 'user', 'assistant']:
                    lineEx.append({'role': role, 'content': content})
            json_strs.append(json.dumps(lineEx, ensure_ascii=False))

        return '\n'.join(json_strs)

    def _upload_train_file(self, train_file):
        url = self._train_parameters.get('upload_url', 'https://file.sensenova.cn/v1/files')
        self.get_finetune_data(train_file)
        file_object = {
            # The correct format should be to pass in a tuple in the format of:
            # (<fileName>, <fileObject>, <Content-Type>),
            # where fileObject refers to the specific value.

            'description': (None, 'train_file', None),
            'scheme': (None, 'FINE_TUNE_2', None),
            'file': (os.path.basename(train_file), self._dataHandler, 'application/json')
        }

        train_file_id = None
        with requests.post(url, headers=self._get_empty_header(), files=file_object) as r:
            if r.status_code != 200:
                raise requests.RequestException(r.text)

            train_file_id = r.json()['id']
            # delete temporary training file
            self._dataHandler.close()
            lazyllm.LOG.info(f'train file id: {train_file_id}')

        def _create_finetuning_dataset(description, files):
            url = urljoin(self._base_url, 'fine-tune/datasets')
            data = {
                'description': description,
                'files': files
            }
            with requests.post(url, headers=self._header, json=data) as r:
                if r.status_code != 200:
                    raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

                dataset_id = r.json()['dataset']['id']
                status = r.json()['dataset']['status']
                url = url + f'/{dataset_id}'
                while status.lower() != 'ready':
                    try:
                        time.sleep(10)
                        with requests.get(url, headers=self._header) as r:
                            if r.status_code != 200:
                                raise requests.RequestException(r.text)

                            dataset_id = r.json()['dataset']['id']
                            status = r.json()['dataset']['status']
                    except Exception as e:
                        lazyllm.LOG.error(f'error: {e}')
                        raise ValueError(f'created datasets {dataset_id} failed')
                return dataset_id

        return _create_finetuning_dataset('fine-tuning dataset', [train_file_id])

    def _create_finetuning_job(self, train_model, train_file_id, **kw) -> Tuple[str, str]:
        url = urljoin(self._base_url, 'fine-tunes')
        data = {
            'model': train_model,
            'training_file': train_file_id,
            'suffix': kw.get('suffix', 'ft-' + str(uuid.uuid4().hex))
        }
        if 'training_parameters' in kw.keys():
            data.update(kw['training_parameters'])

        with requests.post(url, headers=self._header, json=data) as r:
            if r.status_code != 200:
                raise requests.RequestException(r.text)

            fine_tuning_job_id = r.json()['job']['id']
            status = r.json()['job']['status']
            return (fine_tuning_job_id, status)

    def _validate_api_key(self):
        fine_tune_url = urljoin('https://api.sensenova.cn/v1/llm/', 'models')
        response = requests.get(fine_tune_url, headers=self._header)
        if response.status_code == 200:
            return True
        return False

    def _query_finetuning_job(self, fine_tuning_job_id) -> Tuple[str, str]:
        fine_tune_url = urljoin(self._base_url, f'fine-tunes/{fine_tuning_job_id}')
        with requests.get(fine_tune_url, headers=self._header) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            status = r.json()['job']['status']
            fine_tuned_model = None
            if status.lower() == 'succeeded':
                fine_tuned_model = r.json()['job']['fine_tuned_model']
            return (fine_tuned_model, status)

    def set_deploy_parameters(self, **kw):
        """Set parameters for model deployment.

Args:
    **kw: Key-value pairs of deployment parameters that will be used when creating deployment.
"""
        self._deploy_paramters = kw

    def _create_deployment(self) -> Tuple[str, str]:
        url = urljoin(self._base_url, 'fine-tune/servings')
        data = {
            'model': self._model_name,
            'config': {
                'run_time': 0
            }
        }
        if self._deploy_paramters and len(self._deploy_paramters) > 0:
            data.update(self._deploy_paramters)

        with requests.post(url, headers=self._header, json=data) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            fine_tuning_job_id = r.json()['job']['id']
            status = r.json()['job']['status']
            return (fine_tuning_job_id, status)

    def _query_deployment(self, deployment_id) -> str:
        fine_tune_url = urljoin(self._base_url, f'fine-tune/servings/{deployment_id}')
        with requests.get(fine_tune_url, headers=self._header) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            status = r.json()['job']['status']
            return status

    def _format_vl_chat_image_url(self, image_url, mime):
        if image_url.startswith('http'):
            return [{'type': 'image_url', 'image_url': image_url}]
        else:
            return [{'type': 'image_base64', 'image_base64': image_url}]

`set_deploy_parameters(**kw)`

Set parameters for model deployment.

Parameters:

**kw –

Key-value pairs of deployment parameters that will be used when creating deployment.

Source code in lazyllm/module/llms/onlinemodule/supplier/sensenova.py

    def set_deploy_parameters(self, **kw):
        """Set parameters for model deployment.

Args:
    **kw: Key-value pairs of deployment parameters that will be used when creating deployment.
"""
        self._deploy_paramters = kw

`lazyllm.module.llms.onlinemodule.base.onlineMultiModalBase.OnlineMultiModalBase`

Bases: LazyLLMOnlineBase, LLMBase

Base class for online multimodal models, inheriting from LLMBase, providing basic functionality for multimodal models.

Parameters:

model_name (str) –

Model name, defaults to None. A warning will be generated if not specified.
return_trace (bool, default: False ) –

Whether to return call trace information, defaults to False.
**kwargs –

Additional arguments passed to the base class.

Properties:

series: Returns the model series name.
type: Returns the model type, fixed as "MultiModal".

Main Methods:

share(): Create a shared instance of the module.
forward(input, lazyllm_files, **kwargs): Main method for handling input and files.
_forward(input, files, **kwargs): Forward method to be implemented by subclasses.

Notes

Subclasses must implement the _forward method.
A warning log will be generated if model name (model_name) is not specified.

Source code in lazyllm/module/llms/onlinemodule/base/onlineMultiModalBase.py

class OnlineMultiModalBase(LazyLLMOnlineBase, LLMBase):
    """Base class for online multimodal models, inheriting from LLMBase, providing basic functionality for multimodal models.

Args:
    model_name (str): Model name, defaults to None. A warning will be generated if not specified.
    return_trace (bool): Whether to return call trace information, defaults to False.
    **kwargs: Additional arguments passed to the base class.

Properties:

    series: Returns the model series name.
    type: Returns the model type, fixed as "MultiModal".

Main Methods:

    share(): Create a shared instance of the module.
    forward(input, lazyllm_files, **kwargs): Main method for handling input and files.
    _forward(input, files, **kwargs): Forward method to be implemented by subclasses.

Notes:
    - Subclasses must implement the _forward method.
    - A warning log will be generated if model name (model_name) is not specified.
"""
    __lazyllm_registry_disable__ = True

    def __init__(self, model: str = None, return_trace: bool = False, skip_auth: bool = False,
                 api_key: Optional[Union[str, List[str]]] = None, url: str = None, type: Optional[str] = None, **kwargs):
        super().__init__(api_key=api_key, skip_auth=skip_auth, return_trace=return_trace)
        LLMBase.__init__(self, stream=False, init_prompt=False, type=type)
        self._model_name = model if model is not None else kwargs.get('model_name')
        self._base_url = url if url is not None else kwargs.get('base_url')
        if not self._model_name:
            lazyllm.LOG.warning(f'model_name not specified for {self.series}')

    @property
    def type(self):
        return 'MultiModal'

    def _forward(self, input: Union[Dict, str] = None, files: List[str] = None, **kwargs):
        raise NotImplementedError(f'Subclass {self.__class__.__name__} must implement this method')

    def forward(self, input: Union[Dict, str] = None, *, lazyllm_files=None,
                url: str = None, model: str = None, **kwargs):
        input, files = self._get_files(input, lazyllm_files or kwargs.pop('files', None))
        model, _, url, kwargs = resolve_online_params(model, None, url, kwargs,
                                                      model_aliases='model_name', url_aliases='base_url')
        runtime_url = url or self._base_url
        runtime_model = model or self._model_name
        call_params = {'input': input, **kwargs}
        if files: call_params['files'] = files
        return self._forward(**call_params, model=runtime_model, url=runtime_url)

    def __repr__(self):
        return lazyllm.make_repr('Module', 'OnlineMultiModalModule',
                                 series=self.series,
                                 name=self._model_name,
                                 return_trace=self._return_trace)

    def _is_internal_address(self, hostname: str) -> bool:
        try:
            ip_addresses = socket.gethostbyname_ex(hostname)[2]
            for ip_str in ip_addresses:
                ip = ipaddress.ip_address(ip_str)
                if ip.is_private or ip.is_loopback or ip.is_reserved or ip.is_link_local:
                    return True
            return False
        except Exception as e:
            lazyllm.LOG.warning(f'Failed to parse hostname={hostname}: {e}')
            return True

    def _validate_url_security(self, url: str) -> None:
        if not lazyllm.config['allow_internal_network']:
            parse = urlparse(url)
            hostname = parse.hostname
            if hostname and self._is_internal_address(hostname):
                raise ValueError(
                    f'Access to internal network address is not allowed: {hostname}. '
                    f'Set LAZYLLM_ALLOW_INTERNAL_NETWORK=True to enable internal network access.'
                )

    def _validate_image_content_type(self, content_type: str, source: str) -> None:
        if not content_type.startswith('image/'):
            raise ValueError(
                f'Invalid content type for image: {content_type} from {source}. '
                f'Expected content type starting with "image/".'
            )

    def _validate_image_data(self, data: bytes, source: str) -> None:
        try:
            with PIL.Image.open(BytesIO(data)) as img:
                img.verify()
        except Exception:
            raise ValueError(
                f'Invalid image data from {source}. '
                f'The file does not appear to be a valid image.'
            )

    def _get_image_data_from_url(self, url: str, timeout: int = 30) -> bytes:
        self._validate_url_security(url)
        resp = requests.get(url, timeout=timeout, allow_redirects=False)
        resp.raise_for_status()
        content_type = resp.headers.get('Content-Type', '')
        self._validate_image_content_type(content_type, url)
        data = resp.content
        self._validate_image_data(data, url)
        return data

    def _load_images(self, image_paths: Union[str, List[str]]) -> List[tuple]:
        if isinstance(image_paths, str):
            image_paths = [image_paths]
        results = []
        for image_path in image_paths:
            try:
                if image_path.startswith('http://') or image_path.startswith('https://'):
                    data = self._get_image_data_from_url(image_path)
                else:
                    p = Path(image_path)
                    if not p.exists():
                        raise FileNotFoundError(f'Image file not found: {image_path}')
                    data = p.read_bytes()
                    self._validate_image_data(data, image_path)
                base64_str = base64.b64encode(data).decode('utf-8')
                results.append((base64_str, data))
            except Exception as e:
                lazyllm.LOG.error(f'Unexpected error loading image from {image_path}: {str(e)}')
                raise ValueError(f'Failed to load image from {image_path}: {str(e)}')
        return results

`lazyllm.module.llms.onlinemodule.base.utils.LazyLLMOnlineBase`

Bases: ModuleBase

Base class for online modules, inheriting from ModuleBase and powered by LazyLLMRegisterMetaClass, providing unified basic functionality for all online service modules.
This class encapsulates common behaviors of online modules, including caching mechanisms and debug tracing functionality, serving as the foundation for building various online API service modules.

Key Features

Inherits all basic functionality from ModuleBase, including submodule management, hook registration, etc.
Supports online module caching mechanism, controllable through configuration.
Provides debug tracing functionality for troubleshooting and performance analysis.
Serves as a common base class for all online service modules (chat, embedding, multimodal, etc.).

Parameters:

return_trace (bool, default: False ) –

Whether to write inference results into the trace queue for debugging and tracking. Default is False.

Use Cases

As a base class for online chat modules (OnlineChatModuleBase).
As a base class for online embedding modules (OnlineEmbeddingModuleBase).
As a base class for online multimodal modules (OnlineMultiModalBase).
Providing unified basic functionality for custom online service modules.

Source code in lazyllm/module/llms/onlinemodule/base/utils.py

class LazyLLMOnlineBase(ModuleBase, metaclass=LazyLLMRegisterMetaClass):
    """Base class for online modules, inheriting from ModuleBase and powered by LazyLLMRegisterMetaClass, providing unified basic functionality for all online service modules.  
This class encapsulates common behaviors of online modules, including caching mechanisms and debug tracing functionality, serving as the foundation for building various online API service modules.

Key Features:
    - Inherits all basic functionality from ModuleBase, including submodule management, hook registration, etc.
    - Supports online module caching mechanism, controllable through configuration.
    - Provides debug tracing functionality for troubleshooting and performance analysis.
    - Serves as a common base class for all online service modules (chat, embedding, multimodal, etc.).

Args:
    return_trace (bool): Whether to write inference results into the trace queue for debugging and tracking. Default is ``False``.

Use Cases:
    1. As a base class for online chat modules (OnlineChatModuleBase).
    2. As a base class for online embedding modules (OnlineEmbeddingModuleBase).
    3. As a base class for online multimodal modules (OnlineMultiModalBase).
    4. Providing unified basic functionality for custom online service modules.
"""
    _model_series = None

    def __init__(self, api_key: Optional[Union[str, List[str]]],
                 skip_auth: Optional[bool] = False, return_trace: bool = False):
        super().__init__(return_trace=return_trace)
        if not skip_auth and not api_key: raise ValueError('api_key is required')
        self._dynamic_auth = (not skip_auth) and isinstance(api_key, str) and api_key in LAZY_API_KEY_TOKENS
        self.__api_keys = '' if skip_auth else api_key
        self.__headers = ([self._get_header('')] if skip_auth else None if self._dynamic_auth else
                          [self._get_header(key) for key in (api_key if isinstance(api_key, list) else [api_key])])
        if config['cache_online_module']: self.use_cache()

    @classmethod
    def _default_api_key(cls) -> str:
        if cls._model_series is None:
            raise ValueError(f'{cls.__name__} has no _model_series; cannot resolve default api_key.')
        return globals.config[f'{cls._model_series}_api_key']

    @property
    def series(self):
        return self.__class__._model_series

    def _materialize_lazy_api_key(self) -> str:
        return self._default_api_key()

    @property
    def _api_key(self):
        if self._dynamic_auth:
            return self._materialize_lazy_api_key()
        if isinstance(self.__api_keys, list):
            return random.choice(self.__api_keys)
        return self.__api_keys

    @staticmethod
    def _get_header(api_key: str) -> dict:
        return {'Content-Type': 'application/json', **({'Authorization': 'Bearer ' + api_key} if api_key else {})}

    def _get_empty_header(self, api_key: Optional[str] = None) -> dict:
        api_key = api_key or self._api_key
        return {'Authorization': f'Bearer {api_key}'} if api_key else None

    @property
    def _header(self):
        if self._dynamic_auth:
            return self._get_header(self._api_key)
        return random.choice(self.__headers)

    @staticmethod
    def __lazyllm_after_registry_hook__(cls, group_name: str, name: str, isleaf: bool):

        allowed = set(list(LLMType))
        config_type_dict = {
            LLMType.CHAT: ('', 'The default model name for '),
            LLMType.EMBED: ('', 'The default embed model name for '),
            LLMType.RERANK: ('', 'The default rerank model name for '),
            LLMType.MULTIMODAL_EMBED: ('_multimodal_embed', 'The default multimodal embed model name for '),
            LLMType.STT: ('_stt', 'The default stt model name for '),
            LLMType.TTS: ('_tts', 'The default tts model name for '),
            LLMType.TEXT2IMAGE: ('_text2image', 'The default text2image model name for '),
        }

        check_and_add_config(key='default_source', description='The default model source for online modules.')
        check_and_add_config(key='default_key', description='The default API key for online modules.')

        if group_name == '':
            assert name == 'online'
        elif not isleaf:
            assert group_name == 'online', 'The group can only be "online" here.'
            assert name.lower() in allowed, f'Registry key {name} not in list {allowed}'
        else:
            subgroup = group_name.split('.')[-1]
            assert name.lower().endswith(subgroup), f'Class name {name} must follow \
                the schema of <SupplierType>, like <Qwen{subgroup.capitalize()}>'
            cls._model_series = supplier = name[:-len(subgroup)].lower()

            check_and_add_config(key=f'{supplier}_api_key',
                                 description=f'The API key for {supplier}', cfg=globals.config)

            if subgroup in config_type_dict:
                key_suffix, description = config_type_dict[subgroup]
                check_and_add_config(key=f'{supplier}{key_suffix}_model_name',
                                     description=f'{description}{supplier}', cfg=globals.config)

`lazyllm.module.module.ModuleCache`

Bases: object

Module cache manager providing unified cache storage and retrieval functionality.
This class encapsulates multiple cache strategies (memory, file, SQLite, Redis), automatically selecting cache storage methods based on configuration, providing efficient caching mechanisms for module execution results.

Key Features

Supports multiple cache strategies: memory cache, file cache, SQLite database cache, Redis cache.
Automatically selects cache strategy based on configuration, defaults to memory cache.
Supports cache mode control (read-write, read-only, write-only, disabled).
Provides unified cache interface, hiding underlying storage implementation details.
Supports parameter hashing to ensure uniqueness of cache keys.

Parameters:

strategy (Optional[str], default: None ) –

Cache strategy, options include 'memory', 'file', 'sqlite', 'redis'. Defaults to None, will use strategy from configuration.

Use Cases

Provide caching for module execution results to avoid redundant computation.
Use Redis cache in distributed environments for sharing.
Use file or database cache for persistent storage.
Select different cache strategies based on performance requirements.

Source code in lazyllm/module/module.py

class ModuleCache(object):
    """Module cache manager providing unified cache storage and retrieval functionality.  
This class encapsulates multiple cache strategies (memory, file, SQLite, Redis), automatically selecting cache storage methods based on configuration, providing efficient caching mechanisms for module execution results.

Key Features:
    - Supports multiple cache strategies: memory cache, file cache, SQLite database cache, Redis cache.
    - Automatically selects cache strategy based on configuration, defaults to memory cache.
    - Supports cache mode control (read-write, read-only, write-only, disabled).
    - Provides unified cache interface, hiding underlying storage implementation details.
    - Supports parameter hashing to ensure uniqueness of cache keys.

Args:
    strategy (Optional[str]): Cache strategy, options include 'memory', 'file', 'sqlite', 'redis'. Defaults to None, will use strategy from configuration.

Use Cases:
    1. Provide caching for module execution results to avoid redundant computation.
    2. Use Redis cache in distributed environments for sharing.
    3. Use file or database cache for persistent storage.
    4. Select different cache strategies based on performance requirements.
"""
    def __init__(self, strategy: Optional[str] = None):
        self._strategy = self._create_strategy(strategy or lazyllm.config['cache_strategy'])

    def _create_strategy(self, strategy: str) -> _CacheStorageStrategy:
        strategy = strategy.lower()
        strategies = {
            'memory': _MemoryCacheStrategy,
            'file': _FileCacheStrategy,
            'sqlite': _SQLiteCacheStrategy,
            'redis': _RedisCacheStrategy,
        }

        if strategy not in strategies:
            raise ValueError(f'Unsupported cache strategy: {strategy}. '
                             f'Available strategies: {list(strategies.keys())}')
        return strategies[strategy]()

    def _hash(self, args, kw):
        def process_value(value, hash_obj):
            meta = ''
            if isinstance(value, (list, tuple, dict, set)):
                meta = str(type(value)) + str(len(value))
            if isinstance(value, str):
                hash_obj.update(str(file_content_hash(value)).encode())
            elif isinstance(value, set):
                hash_obj.update((meta + '>').encode())
                for item in sorted(value):
                    process_value(item, hash_obj)
                hash_obj.update(('<' + meta).encode())
            elif isinstance(value, (list, tuple)):
                hash_obj.update((meta + '>').encode())
                for item in value:
                    process_value(item, hash_obj)
                hash_obj.update(('<' + meta).encode())
            elif isinstance(value, dict):
                hash_obj.update((meta + '>').encode())
                for k, v in sorted(value.items()):
                    key_meta = 'key:' + str(type(k)) + str(k)
                    hash_obj.update(key_meta.encode())
                    process_value(v, hash_obj)
                hash_obj.update(('<' + meta).encode())
            else:
                value_meta = str(type(value)) + str(value)
                hash_obj.update(value_meta.encode())
        hash_obj = hashlib.md5()
        process_value(args, hash_obj)
        if kw:
            process_value(kw, hash_obj)
        return hash_obj.hexdigest()

    def get(self, key, args, kw):
        """Retrieve data from cache.

Retrieves data from cache based on the provided key and parameters. Throws an exception if cache mode doesn't allow reading or data doesn't exist.

Args:
    key: Cache key used to identify cached data.
    args: Positional arguments used to generate cache hash key.
    kw: Keyword arguments used to generate cache hash key.

**Returns:**

- Any: Data stored in cache.

**Exceptions:** 

- CacheNotFoundError: Raised when specified data doesn't exist in cache.
- RuntimeError: Raised when cache mode is set to write-only (WO).
"""
        if 'R' not in lazyllm.config['cache_mode']:
            raise CacheNotFoundError('Cannot read cache due to `LAZYLLM_CACHE_MODE = WO`')
        hash_key = self._hash(args, kw)
        value = self._strategy.get(key, hash_key)
        return transform_path(value, mode='r2a')

    def set(self, key, args, kw, value):
        """Store data in cache.

Stores data in cache based on the provided key and parameters. If cache mode doesn't allow writing, returns directly without executing storage operation.

Args:
    key: Cache key used to identify cached data.
    args: Positional arguments used to generate cache hash key.
    kw: Keyword arguments used to generate cache hash key.
    value: Data to be stored.

**Note:** 

- If cache mode is set to read-only (RO) or disabled (NONE), this method will return directly without executing storage operation.
"""
        if 'W' not in lazyllm.config['cache_mode']: return
        hash_key = self._hash(args, kw)
        value = transform_path(value, mode='a2r')
        self._strategy.set(key, hash_key, value)

    def close(self):
        """Close cache storage strategy.

Releases resources occupied by the cache storage strategy, such as closing database connections, clearing memory cache, etc. After calling this method, the cache will no longer be available.

**Note:** 

- After calling this method, the cache instance will no longer be usable.
- Different cache strategies may have different resource cleanup behaviors.
"""
        self._strategy.close()

`close()`

Close cache storage strategy.

Releases resources occupied by the cache storage strategy, such as closing database connections, clearing memory cache, etc. After calling this method, the cache will no longer be available.

Note:

After calling this method, the cache instance will no longer be usable.
Different cache strategies may have different resource cleanup behaviors.

Source code in lazyllm/module/module.py

    def close(self):
        """Close cache storage strategy.

Releases resources occupied by the cache storage strategy, such as closing database connections, clearing memory cache, etc. After calling this method, the cache will no longer be available.

**Note:** 

- After calling this method, the cache instance will no longer be usable.
- Different cache strategies may have different resource cleanup behaviors.
"""
        self._strategy.close()

`get(key, args, kw)`

Retrieve data from cache.

Retrieves data from cache based on the provided key and parameters. Throws an exception if cache mode doesn't allow reading or data doesn't exist.

Parameters:

key –

Cache key used to identify cached data.
args –

Positional arguments used to generate cache hash key.
kw –

Keyword arguments used to generate cache hash key.

Returns:

Any: Data stored in cache.

Exceptions:

CacheNotFoundError: Raised when specified data doesn't exist in cache.
RuntimeError: Raised when cache mode is set to write-only (WO).

Source code in lazyllm/module/module.py

    def get(self, key, args, kw):
        """Retrieve data from cache.

Retrieves data from cache based on the provided key and parameters. Throws an exception if cache mode doesn't allow reading or data doesn't exist.

Args:
    key: Cache key used to identify cached data.
    args: Positional arguments used to generate cache hash key.
    kw: Keyword arguments used to generate cache hash key.

**Returns:**

- Any: Data stored in cache.

**Exceptions:** 

- CacheNotFoundError: Raised when specified data doesn't exist in cache.
- RuntimeError: Raised when cache mode is set to write-only (WO).
"""
        if 'R' not in lazyllm.config['cache_mode']:
            raise CacheNotFoundError('Cannot read cache due to `LAZYLLM_CACHE_MODE = WO`')
        hash_key = self._hash(args, kw)
        value = self._strategy.get(key, hash_key)
        return transform_path(value, mode='r2a')

`set(key, args, kw, value)`

Store data in cache.

Stores data in cache based on the provided key and parameters. If cache mode doesn't allow writing, returns directly without executing storage operation.

Parameters:

key –

Cache key used to identify cached data.
args –

Positional arguments used to generate cache hash key.
kw –

Keyword arguments used to generate cache hash key.
value –

Data to be stored.

Note:

If cache mode is set to read-only (RO) or disabled (NONE), this method will return directly without executing storage operation.

Source code in lazyllm/module/module.py

    def set(self, key, args, kw, value):
        """Store data in cache.

Stores data in cache based on the provided key and parameters. If cache mode doesn't allow writing, returns directly without executing storage operation.

Args:
    key: Cache key used to identify cached data.
    args: Positional arguments used to generate cache hash key.
    kw: Keyword arguments used to generate cache hash key.
    value: Data to be stored.

**Note:** 

- If cache mode is set to read-only (RO) or disabled (NONE), this method will return directly without executing storage operation.
"""
        if 'W' not in lazyllm.config['cache_mode']: return
        hash_key = self._hash(args, kw)
        value = transform_path(value, mode='a2r')
        self._strategy.set(key, hash_key, value)

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenChat`

Bases: OnlineChatModuleBase, FileHandlerBase

TODO: The Qianwen model has been finetuned and deployed successfully,

   but it is not compatible with the OpenAI interface and can only
   be accessed through the Dashscope SDK.

Source code in lazyllm/module/llms/onlinemodule/supplier/qwen.py

class QwenChat(OnlineChatModuleBase, FileHandlerBase):
    """
    #TODO: The Qianwen model has been finetuned and deployed successfully,
           but it is not compatible with the OpenAI interface and can only
           be accessed through the Dashscope SDK.
    """
    TRAINABLE_MODEL_LIST = ['qwen-turbo', 'qwen-7b-chat', 'qwen-72b-chat']
    VLM_MODEL_PREFIX = ['qwen-vl-plus', 'qwen-vl-max', 'qvq-max', 'qvq-plus']
    MODEL_NAME = 'qwen-plus'

    def __init__(self, base_url: Optional[str] = None, model: Optional[str] = None,
                 api_key: str = None, stream: bool = True, return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://dashscope.aliyuncs.com/'
        super().__init__(api_key=api_key or self._default_api_key(),
                         model_name=model or lazyllm.config['qwen_model_name'] or QwenChat.MODEL_NAME,
                         base_url=base_url, stream=stream, return_trace=return_trace, **kwargs)
        FileHandlerBase.__init__(self)
        self._deploy_paramters = dict()
        if stream:
            self._model_optional_params['incremental_output'] = True
        self.default_train_data = {
            'model': 'qwen-turbo',
            'training_file_ids': None,
            'validation_file_ids': None,
            'training_type': 'efficient_sft',  # sft or efficient_sft
            'hyper_parameters': {
                'n_epochs': 1,
                'batch_size': 16,
                'learning_rate': '1.6e-5',
                'split': 0.9,
                'warmup_ratio': 0.0,
                'eval_steps': 1,
                'lr_scheduler_type': 'linear',
                'max_length': 2048,
                'lora_rank': 8,
                'lora_alpha': 32,
                'lora_dropout': 0.1,
            }
        }
        self.fine_tuning_job_id = None

    def _get_system_prompt(self):
        return ('You are a large-scale language model from Alibaba Cloud, '
                'your name is Tongyi Qianwen, and you are a useful assistant.')

    def _get_chat_url(self, url):
        if url.rstrip('/').endswith('compatible-mode/v1/chat/completions'):
            return url
        return urljoin(url, 'compatible-mode/v1/chat/completions')

    def _convert_file_format(self, filepath: str) -> None:
        with open(filepath, 'r', encoding='utf-8') as fr:
            dataset = [json.loads(line) for line in fr]

        json_strs = []
        for ex in dataset:
            lineEx = {'messages': []}
            messages = ex.get('messages', [])
            for message in messages:
                role = message.get('role', '')
                content = message.get('content', '')
                if role in ['system', 'user', 'assistant']:
                    lineEx['messages'].append({'role': role, 'content': content})
            json_strs.append(json.dumps(lineEx, ensure_ascii=False))

        return '\n'.join(json_strs)

    def _upload_train_file(self, train_file):
        url = urljoin(self._base_url, 'api/v1/files')
        self.get_finetune_data(train_file)
        file_object = {
            # The correct format should be to pass in a tuple in the format of:
            # (<fileName>, <fileObject>, <Content-Type>),
            # where fileObject refers to the specific value.
            'files': (os.path.basename(train_file), self._dataHandler, 'application/json'),
            'descriptions': (None, 'training file', None)
        }

        with requests.post(url, headers=self._get_empty_header(), files=file_object) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            if 'data' not in r.json().keys():
                raise ValueError('No data found in response')
            if 'uploaded_files' not in r.json()['data'].keys():
                raise ValueError('No uploaded_files found in response')
            # delete temporary training file
            self._dataHandler.close()
            return r.json()['data']['uploaded_files'][0]['file_id']

    def _update_kw(self, data, normal_config):
        current_train_data = self.default_train_data.copy()
        current_train_data.update(data)

        current_train_data['hyper_parameters']['n_epochs'] = normal_config['num_epochs']
        current_train_data['hyper_parameters']['learning_rate'] = str(normal_config['learning_rate'])
        current_train_data['hyper_parameters']['lr_scheduler_type'] = normal_config['lr_scheduler_type']
        current_train_data['hyper_parameters']['batch_size'] = normal_config['batch_size']
        current_train_data['hyper_parameters']['max_length'] = normal_config['cutoff_len']
        current_train_data['hyper_parameters']['lora_rank'] = normal_config['lora_r']
        current_train_data['hyper_parameters']['lora_alpha'] = normal_config['lora_alpha']

        return current_train_data

    def _create_finetuning_job(self, train_model, train_file_id, **kw) -> Tuple[str, str]:
        url = urljoin(self._base_url, 'api/v1/fine-tunes')
        data = {
            'model': train_model,
            'training_file_ids': [train_file_id]
        }
        if 'training_parameters' in kw.keys():
            data.update(kw['training_parameters'])
        elif 'finetuning_type' in kw:
            data = self._update_kw(data, kw)

        with requests.post(url, headers=self._header, json=data) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            fine_tuning_job_id = r.json()['output']['job_id']
            self.fine_tuning_job_id = fine_tuning_job_id
            status = r.json()['output']['status']
            return (fine_tuning_job_id, status)

    def _cancel_finetuning_job(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            return 'Invalid'
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        fine_tune_url = urljoin(self._base_url, f'api/v1/fine-tunes/{job_id}/cancel')
        with requests.post(fine_tune_url, headers=self._header) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        status = r.json()['output']['status']
        if status == 'success':
            return 'Cancelled'
        else:
            return f'JOB {job_id} status: {status}'

    def _query_finetuned_jobs(self):
        fine_tune_url = urljoin(self._base_url, 'api/v1/fine-tunes')
        with requests.get(fine_tune_url, headers=self._header) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return r.json()

    def _get_finetuned_model_names(self) -> Tuple[List[Tuple[str, str]], List[Tuple[str, str]]]:
        model_data = self._query_finetuned_jobs()
        res = list()
        if 'jobs' not in model_data['output']:
            return res
        for model in model_data['output']['jobs']:
            status = self._status_mapping(model['status'])
            if status == 'Done':
                model_id = model['finetuned_output']
            else:
                model_id = model['model'] + '-' + model['job_id']
            res.append([model['job_id'], model_id, status])
        return res

    def _status_mapping(self, status):
        if status == 'SUCCEEDED':
            return 'Done'
        elif status == 'FAILED':
            return 'Failed'
        elif status in ('CANCELING', 'CANCELED'):
            return 'Cancelled'
        elif status == 'RUNNING':
            return 'Running'
        else:  # PENDING, QUEUING
            return 'Pending'

    def _query_job_status(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            raise RuntimeError('No job ID specified. Please ensure that a valid "fine_tuning_job_id" is '
                               'provided as an argument or started a training job.')
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        _, status = self._query_finetuning_job(job_id)
        return self._status_mapping(status)

    def _get_log(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            raise RuntimeError('No job ID specified. Please ensure that a valid "fine_tuning_job_id" is '
                               'provided as an argument or started a training job.')
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        fine_tune_url = urljoin(self._base_url, f'api/v1/fine-tunes/{job_id}/logs')
        with requests.get(fine_tune_url, headers=self._header) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return job_id, r.json()

    def _get_curr_job_model_id(self):
        if not self.fine_tuning_job_id:
            return None, None
        model_id, _ = self._query_finetuning_job(self.fine_tuning_job_id)
        return self.fine_tuning_job_id, model_id

    def _query_finetuning_job_info(self, fine_tuning_job_id):
        fine_tune_url = urljoin(self._base_url, f'api/v1/fine-tunes/{fine_tuning_job_id}')
        with requests.get(fine_tune_url, headers=self._header) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return r.json()['output']

    def _query_finetuning_job(self, fine_tuning_job_id) -> Tuple[str, str]:
        info = self._query_finetuning_job_info(fine_tuning_job_id)
        status = info['status']
        # QWen only status == 'SUCCEEDED' can have `finetuned_output`
        if 'finetuned_output' in info:
            fine_tuned_model = info['finetuned_output']
        else:
            fine_tuned_model = info['model'] + '-' + info['job_id']
        return (fine_tuned_model, status)

    def _query_finetuning_cost(self, fine_tuning_job_id):
        info = self._query_finetuning_job_info(fine_tuning_job_id)
        if 'usage' in info and info['usage']:
            return info['usage']
        else:
            return None

    def set_deploy_parameters(self, **kw):
        """Set model deployment parameters.

Configure relevant parameters for deployment tasks, such as capacity specifications, for subsequent model deployment.

Args:
    **kw: Deployment parameter key-value pairs.
"""
        self._deploy_paramters = kw

    def _create_deployment(self) -> Tuple[str, str]:
        url = urljoin(self._base_url, 'api/v1/deployments')
        data = {
            'model_name': self._model_name,
            'capacity': self._deploy_paramters.get('capcity', 2)
        }
        if self._deploy_paramters and len(self._deploy_paramters) > 0:
            data.update(self._deploy_paramters)

        with requests.post(url, headers=self._header, json=data) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            deployment_id = r.json()['output']['deployed_model']
            status = r.json()['output']['status']
            return (deployment_id, status)

    def _query_deployment(self, deployment_id) -> str:
        fine_tune_url = urljoin(self._base_url, f'api/v1/deployments/{deployment_id}')
        with requests.get(fine_tune_url, headers=self._header) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            status = r.json()['output']['status']
            return status

    def _format_vl_chat_image_url(self, image_url, mime):
        assert mime is not None, 'Qwen Module requires mime info.'
        image_url = f'data:{mime};base64,{image_url}'
        return [{'type': 'image_url', 'image_url': {'url': image_url}}]

`set_deploy_parameters(**kw)`

Set model deployment parameters.

Configure relevant parameters for deployment tasks, such as capacity specifications, for subsequent model deployment.

Parameters:

**kw –

Deployment parameter key-value pairs.

Source code in lazyllm/module/llms/onlinemodule/supplier/qwen.py

    def set_deploy_parameters(self, **kw):
        """Set model deployment parameters.

Configure relevant parameters for deployment tasks, such as capacity specifications, for subsequent model deployment.

Args:
    **kw: Deployment parameter key-value pairs.
"""
        self._deploy_paramters = kw

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenEmbed`

Bases: LazyLLMOnlineEmbedModuleBase

Qwen online text embedding module.

This class inherits from OnlineEmbeddingModuleBase and provides interaction capabilities with the Qwen text embedding API, supporting conversion of text to vector representations.

Parameters:

embed_url (str, default: None ) –

Embedding API URL address. Defaults to Qwen official API address
embed_model_name (str, default: None ) –

Embedding model name. Defaults to 'text-embedding-v1'
api_key (str, default: None ) –

API key. Defaults to 'qwen_api_key' from configuration

Source code in lazyllm/module/llms/onlinemodule/supplier/qwen.py

class QwenEmbed(LazyLLMOnlineEmbedModuleBase):
    """Qwen online text embedding module.

This class inherits from OnlineEmbeddingModuleBase and provides interaction capabilities with the Qwen text embedding API, supporting conversion of text to vector representations.

Args:
    embed_url (str, optional): Embedding API URL address. Defaults to Qwen official API address
    embed_model_name (str, optional): Embedding model name. Defaults to 'text-embedding-v1'
    api_key (str, optional): API key. Defaults to 'qwen_api_key' from configuration
"""

    def __init__(self,
                 embed_url: Optional[str] = None,
                 embed_model_name: Optional[str] = None,
                 api_key: str = None,
                 batch_size: int = 16,
                 **kw):
        embed_url = (embed_url or 'https://dashscope.aliyuncs.com/api/v1/services/'
                                  'embeddings/text-embedding/text-embedding')
        embed_model_name = embed_model_name or 'text-embedding-v1'
        super().__init__(embed_url, api_key or self._default_api_key(), embed_model_name,
                         batch_size=batch_size, **kw)

    def _get_embed_url(self, url: str) -> str:
        url, done = self._normalize_embed_url(url)
        if done: return url
        if url.rstrip('/').endswith('api/v1'):
            return urljoin(url, 'services/embeddings/text-embedding/text-embedding')
        return urljoin(url, 'api/v1/services/embeddings/text-embedding/text-embedding')

    def _encapsulated_data(self, text: Union[List, str], **kwargs):
        if isinstance(text, str):
            json_data = {
                'input': {
                    'texts': [text]
                },
                'model': self._embed_model_name
            }
            if len(kwargs) > 0:
                json_data.update(kwargs)
            return json_data
        else:
            text_batch = [text[i: i + self._batch_size] for i in range(0, len(text), self._batch_size)]
            json_data = [{'input': {'texts': texts}, 'model': self._embed_model_name} for texts in text_batch]
            if len(kwargs) > 0:
                for i in range(len(json_data)):
                    json_data[i].update(kwargs)
            return json_data

    def _parse_response(self, response: Dict, input: Union[List, str]) -> Union[List[List[float]], List[float]]:
        output = response.get('output', {})
        if not output:
            return []
        embeddings = output.get('embeddings', [])
        if not embeddings:
            return []
        if isinstance(input, str):
            return embeddings[0].get('embedding', [])
        else:
            return [res.get('embedding', []) for res in embeddings]

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMEmbed`

Bases: LazyLLMOnlineEmbedModuleBase

GLM embedding model interface class for calling Zhipu AI's text embedding services.

Parameters:

embed_url (str, default: None ) –

Embedding service API address, defaults to "https://open.bigmodel.cn/api/paas/v4/embeddings"
embed_model_name (str, default: None ) –

Embedding model name, defaults to "embedding-2"
api_key (str, default: None ) –

API key

Source code in lazyllm/module/llms/onlinemodule/supplier/glm.py

class GLMEmbed(LazyLLMOnlineEmbedModuleBase):
    """GLM embedding model interface class for calling Zhipu AI's text embedding services.

Args:
    embed_url (str): Embedding service API address, defaults to "https://open.bigmodel.cn/api/paas/v4/embeddings"
    embed_model_name (str): Embedding model name, defaults to "embedding-2"
    api_key (str): API key
"""
    def __init__(self,
                 embed_url: Optional[str] = None,
                 embed_model_name: Optional[str] = None,
                 api_key: str = None,
                 batch_size: int = 16,
                 **kw):
        embed_url = embed_url or 'https://open.bigmodel.cn/api/paas/v4/embeddings'
        embed_model_name = embed_model_name or 'embedding-2'
        super().__init__(embed_url, api_key or self._default_api_key(), embed_model_name,
                         batch_size=batch_size, **kw)

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMSTT`

Bases: LazyLLMOnlineSTTModuleBase

GLM Speech-to-Text module, inherits from GLMMultiModal.

Provides speech-to-text (STT) functionality based on Zhipu AI, supports audio file speech recognition.

Parameters:

model_name (str, default: None ) –

Model name, defaults to configured model name or "glm-asr"
api_key (str, default: None ) –

API key, defaults to configured key
return_trace (bool, default: False ) –

Whether to return trace information, defaults to False
**kwargs –

Other model parameters

Source code in lazyllm/module/llms/onlinemodule/supplier/glm.py

class GLMSTT(LazyLLMOnlineSTTModuleBase):
    """GLM Speech-to-Text module, inherits from GLMMultiModal.

Provides speech-to-text (STT) functionality based on Zhipu AI, supports audio file speech recognition.

Args:
    model_name (str, optional): Model name, defaults to configured model name or "glm-asr"
    api_key (str, optional): API key, defaults to configured key
    return_trace (bool, optional): Whether to return trace information, defaults to False
    **kwargs: Other model parameters
"""
    MODEL_NAME = 'glm-asr'

    def __init__(self, model_name: str = None, api_key: str = None,
                 base_url: Optional[str] = None,
                 return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://open.bigmodel.cn/api/paas/v4'
        super().__init__(model_name=model_name or GLMSTT.MODEL_NAME,
                         api_key=api_key or self._default_api_key(), return_trace=return_trace,
                         base_url=base_url, **kwargs)

    def _forward(self, files: List[str] = [], url: str = None, model: str = None, **kwargs):  # noqa B006
        assert len(files) == 1, 'GLMSTT only supports one file'
        assert os.path.exists(files[0]), f'File {files[0]} not found'
        client = _zhipu_client(base_url=url or self._base_url, api_key=self._api_key)
        transcriptResponse = client.audio.transcriptions.create(
            model=model,
            file=open(files[0], 'rb'),
        )
        return transcriptResponse.text

`lazyllm.module.llms.onlinemodule.supplier.deepseek.DeepSeekChat`

Bases: OnlineChatModuleBase

DeepSeek large language model interface module.

Parameters:

base_url (str, default: None ) –

API base URL, defaults to "https://api.deepseek.com"
model (str, default: None ) –

Model name, defaults to "deepseek-chat"
api_key (str, default: None ) –

API key, if None, gets from configuration
stream (bool, default: True ) –

Whether to enable streaming output, defaults to True
return_trace (bool, default: False ) –

Whether to return trace information, defaults to False
**kwargs –

Other parameters passed to base class

Source code in lazyllm/module/llms/onlinemodule/supplier/deepseek.py

class DeepSeekChat(OnlineChatModuleBase):
    """DeepSeek large language model interface module.

Args:
    base_url (str): API base URL, defaults to "https://api.deepseek.com"
    model (str): Model name, defaults to "deepseek-chat"
    api_key (str): API key, if None, gets from configuration
    stream (bool): Whether to enable streaming output, defaults to True
    return_trace (bool): Whether to return trace information, defaults to False
    **kwargs: Other parameters passed to base class
"""
    def __init__(self, base_url: Optional[str] = None, model: Optional[str] = None,
                 api_key: str = None, stream: bool = True, return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://api.deepseek.com'
        model = model or 'deepseek-chat'
        if model in ('deepseek-chat', 'deepseek-reasoner'):
            LOG.warning(
                f'Model "{model}" is deprecated and will be removed after 2026/07/24. '
                'Please use "deepseek-v4-flash" or "deepseek-v4-pro" instead.')
        super().__init__(api_key=api_key or self._default_api_key(),
                         base_url=base_url, model_name=model, stream=stream, return_trace=return_trace, **kwargs)

    def _get_system_prompt(self):
        return 'You are an intelligent assistant developed by China\'s DeepSeek. You are a helpful assistanti.'

    def _validate_api_key(self):
        try:
            models_url = urljoin(self._base_url, 'models')
            response = requests.get(models_url, headers=self._header, timeout=10)
            return response.status_code == 200
        except Exception:
            return False

`lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoText2Image`

Bases: LazyLLMOnlineText2ImageModuleBase

ByteDance Doubao Text-to-Image module supporting text to image generation and image editing.

Based on ByteDance Doubao multimodal model's text-to-image functionality, inherits from LazyLLMOnlineText2ImageModuleBase and calls Doubao via the Volcengine Ark SDK for high-quality generation.

Parameters:

api_key (str, default: None ) –

Doubao API key, defaults to None.
model_name (str) –

Model name, defaults to "doubao-seedream-3-0-t2i-250415".
return_trace (bool, default: False ) –

Whether to return trace information, defaults to False.
**kwargs –

Other parameters passed to parent class.

Source code in lazyllm/module/llms/onlinemodule/supplier/doubao.py

class DoubaoText2Image(LazyLLMOnlineText2ImageModuleBase):
    """ByteDance Doubao Text-to-Image module supporting text to image generation and image editing.

Based on ByteDance Doubao multimodal model's text-to-image functionality, inherits from
LazyLLMOnlineText2ImageModuleBase and calls Doubao via the Volcengine Ark SDK for high-quality generation.

Args:
    api_key (str, optional): Doubao API key, defaults to None.
    model_name (str, optional): Model name, defaults to "doubao-seedream-3-0-t2i-250415".
    return_trace (bool, optional): Whether to return trace information, defaults to False.
    **kwargs: Other parameters passed to parent class.
"""
    MODEL_NAME = 'doubao-seedream-3-0-t2i-250415'
    IMAGE_EDITING_MODEL_NAME = 'doubao-seedream-3-0-t2i-250415'

    def __init__(self, api_key: str = None, model: Optional[str] = None, url: Optional[str] = None,
                 return_trace: bool = False, **kwargs):
        url = url or 'https://ark.cn-beijing.volces.com/api/v3'
        resolved_model = model or lazyllm.config['doubao_text2image_model_name'] or DoubaoText2Image.MODEL_NAME
        super().__init__(model=resolved_model, api_key=api_key or self._default_api_key(),
                         return_trace=return_trace, url=url, **kwargs)

    def _ark_client(self, base_url=None):
        return volcenginesdkarkruntime.Ark(base_url=(base_url or self._base_url), api_key=self._api_key)

    def _forward(self, input: str = None, files: List[str] = None, n: int = 1, size: str = '1024x1024', seed: int = -1,
                 guidance_scale: float = 2.5, watermark: bool = True, model: str = None, url: str = None, **kwargs):
        has_ref_image = files is not None and len(files) > 0
        if self._type == LLMType.IMAGE_EDITING and not has_ref_image:
            LOG.warning(
                f'Image editing is enabled for model {self._model_name}, but no image file was provided. '
                f'Please provide an image file via the "files" parameter.'
            )
        if self._type != LLMType.IMAGE_EDITING and has_ref_image:
            msg = str(f'Image file was provided, but image editing is not enabled for model {self._model_name}. Please '
                      f'use default image-editing model {self.IMAGE_EDITING_MODEL_NAME} or other image-editing model.')
            raise ValueError(msg)

        if has_ref_image:
            image_results = self._load_images(files)
            contents = [f'data:image/png;base64,{base64_str}' for base64_str, _ in image_results]
        api_params = {
            'model': model,
            'prompt': input,
            'size': size,
            'seed': seed,
            'guidance_scale': guidance_scale,
            'watermark': watermark,
            **kwargs
        }
        if has_ref_image:
            api_params['image'] = contents
            if n > 1:
                api_params['sequential_image_generation'] = 'auto'
                max_images = min(n, 15)
                sigo = volcenginesdkarkruntime.types.images.SequentialImageGenerationOptions
                api_params['sequential_image_generation_options'] = sigo(max_images=max_images)
        imagesResponse = self._ark_client(base_url=url).images.generate(**api_params)
        image_contents = [requests.get(result.url).content for result in imagesResponse.data]
        return encode_query_with_filepaths(None, bytes_to_file(image_contents))

`lazyllm.module.llms.onlinemodule.supplier.openai.OpenAIChat`

Bases: OnlineChatModuleBase, FileHandlerBase

OpenAI API integration module for chat completion and fine-tuning operations.

Provides interface to interact with OpenAI's chat models, supporting both inference and fine-tuning capabilities. Inherits from OnlineChatModuleBase and FileHandlerBase.

Parameters:

base_url (str, default: None ) –

OpenAI API base URL, defaults to "https://api.openai.com/v1/".
model (str, default: None ) –

Model name to use for chat completion, defaults to "gpt-3.5-turbo".
api_key (str, default: None ) –

OpenAI API key, defaults to lazyllm.config['openai_api_key'].
stream (bool, default: True ) –

Whether to use streaming response, defaults to True.
return_trace (bool, default: False ) –

Whether to return trace information, defaults to False.
**kwargs –

Additional arguments passed to OnlineChatModuleBase.

Source code in lazyllm/module/llms/onlinemodule/supplier/openai.py

class OpenAIChat(OnlineChatModuleBase, FileHandlerBase):
    """OpenAI API integration module for chat completion and fine-tuning operations.

Provides interface to interact with OpenAI's chat models, supporting both inference
and fine-tuning capabilities. Inherits from OnlineChatModuleBase and FileHandlerBase.

Args:
    base_url (str, optional): OpenAI API base URL, defaults to "https://api.openai.com/v1/".
    model (str, optional): Model name to use for chat completion, defaults to "gpt-3.5-turbo".
    api_key (str, optional): OpenAI API key, defaults to lazyllm.config['openai_api_key'].
    stream (bool, optional): Whether to use streaming response, defaults to True.
    return_trace (bool, optional): Whether to return trace information, defaults to False.
    **kwargs: Additional arguments passed to OnlineChatModuleBase.
"""
    TRAINABLE_MODEL_LIST = ['gpt-3.5-turbo-0125', 'gpt-3.5-turbo-1106',
                            'gpt-3.5-turbo-0613', 'babbage-002',
                            'davinci-002', 'gpt-4-0613']
    NO_PROXY = False

    def __init__(self, base_url: Optional[str] = None, model: Optional[str] = None,
                 api_key: str = None, stream: bool = True, return_trace: bool = False, skip_auth: bool = False, **kw):
        base_url = base_url or 'https://api.openai.com/v1/'
        model = model or 'gpt-3.5-turbo'
        super().__init__(api_key=api_key or self._default_api_key(),
                         base_url=base_url, model_name=model, stream=stream, return_trace=return_trace,
                         skip_auth=skip_auth, **kw)
        FileHandlerBase.__init__(self)
        self.default_train_data = {
            'model': 'gpt-3.5-turbo-0613',
            'training_file': None,
            'validation_file': None,
            'hyperparameters': {
                'n_epochs': 1,
                'batch_size': 16,
                'learning_rate_multiplier': '1.6e-5',
            }
        }
        self.fine_tuning_job_id = None

    def _get_system_prompt(self):
        return 'You are ChatGPT, a large language model trained by OpenAI. You are a helpful assistant.'

    def _convert_file_format(self, filepath: str) -> str:
        with open(filepath, 'r', encoding='utf-8') as fr:
            dataset = [json.loads(line) for line in fr]

        json_strs = []
        for ex in dataset:
            lineEx = {'messages': []}
            messages = ex.get('messages', [])
            for message in messages:
                role = message.get('role', '')
                content = message.get('content', '')
                if role in ['system', 'user', 'assistant']:
                    lineEx['messages'].append({'role': role, 'content': content})
            json_strs.append(json.dumps(lineEx, ensure_ascii=False))

        return '\n'.join(json_strs)

    def _upload_train_file(self, train_file):
        url = urljoin(self._base_url, 'files')
        self.get_finetune_data(train_file)
        file_object = {
            'purpose': (None, 'fine-tune', None),
            'file': (os.path.basename(train_file), self._dataHandler, 'application/json')
        }

        with requests.post(url, headers=self._get_empty_header(), files=file_object) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            # delete temporary training file
            self._dataHandler.close()
            return r.json()['id']

    def _update_kw(self, data, normal_config):
        current_train_data = self.default_train_data.copy()
        current_train_data.update(data)

        current_train_data['hyperparameters']['n_epochs'] = normal_config['num_epochs']
        current_train_data['hyperparameters']['learning_rate_multiplier'] = str(normal_config['learning_rate'])
        current_train_data['hyperparameters']['batch_size'] = normal_config['batch_size']
        current_train_data['suffix'] = str(uuid.uuid4())[:7]

        return current_train_data

    def _create_finetuning_job(self, train_model, train_file_id, **kw) -> Tuple[str, str]:
        url = urljoin(self._base_url, 'fine_tuning/jobs')
        data = {
            'model': train_model,
            'training_file': train_file_id
        }
        if len(kw) > 0:
            if 'finetuning_type' in kw:
                data = self._update_kw(data, kw)
            else:
                data.update(kw)

        with requests.post(url, headers=self._header, json=data) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))

            fine_tuning_job_id = r.json()['id']
            self.fine_tuning_job_id = fine_tuning_job_id
            status = r.json()['status']
            return (fine_tuning_job_id, status)

    def _cancel_finetuning_job(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            return 'Invalid'
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        fine_tune_url = urljoin(self._base_url, f'fine_tuning/jobs/{job_id}/cancel')
        with requests.post(fine_tune_url, headers=self._get_empty_header()) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        status = r.json()['status']
        if status == 'cancelled':
            return 'Cancelled'
        else:
            return f'JOB {job_id} status: {status}'

    def _query_finetuned_jobs(self):
        fine_tune_url = urljoin(self._base_url, 'fine_tuning/jobs')
        with requests.get(fine_tune_url, headers=self._get_empty_header()) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return r.json()

    def _get_finetuned_model_names(self) -> Tuple[List[Tuple[str, str]], List[Tuple[str, str]]]:
        model_data = self._query_finetuned_jobs()
        res = list()
        for model in model_data['data']:
            res.append([model['id'], model['fine_tuned_model'], self._status_mapping(model['status'])])
        return res

    def _status_mapping(self, status):
        if status == 'succeeded':
            return 'Done'
        elif status == 'failed':
            return 'Failed'
        elif status == 'cancelled':
            return 'Cancelled'
        elif status == 'running':
            return 'Running'
        else:  # validating_files, queued
            return 'Pending'

    def _query_job_status(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            raise RuntimeError('No job ID specified. Please ensure that a valid "fine_tuning_job_id" is '
                               'provided as an argument or started a training job.')
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        _, status = self._query_finetuning_job(job_id)
        return self._status_mapping(status)

    def _get_log(self, fine_tuning_job_id=None):
        if not fine_tuning_job_id and not self.fine_tuning_job_id:
            raise RuntimeError('No job ID specified. Please ensure that a valid "fine_tuning_job_id" is '
                               'provided as an argument or started a training job.')
        job_id = fine_tuning_job_id if fine_tuning_job_id else self.fine_tuning_job_id
        fine_tune_url = urljoin(self._base_url, f'fine_tuning/jobs/{job_id}/events')
        with requests.get(fine_tune_url, headers=self._get_empty_header()) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return job_id, r.json()

    def _get_curr_job_model_id(self):
        if not self.fine_tuning_job_id:
            return None, None
        model_id, _ = self._query_finetuning_job(self.fine_tuning_job_id)
        return self.fine_tuning_job_id, model_id

    def _query_finetuning_job_info(self, fine_tuning_job_id):
        fine_tune_url = urljoin(self._base_url, f'fine_tuning/jobs/{fine_tuning_job_id}')
        with requests.get(fine_tune_url, headers=self._get_empty_header()) as r:
            if r.status_code != 200:
                raise requests.RequestException('\n'.join([c.decode('utf-8') for c in r.iter_content(None)]))
        return r.json()

    def _query_finetuning_job(self, fine_tuning_job_id) -> Tuple[str, str]:
        info = self._query_finetuning_job_info(fine_tuning_job_id)
        status = info['status']
        fine_tuned_model = info['fine_tuned_model'] if 'fine_tuned_model' in info else None
        return (fine_tuned_model, status)

    def _query_finetuning_cost(self, fine_tuning_job_id):
        info = self._query_finetuning_job_info(fine_tuning_job_id)
        if 'trained_tokens' in info and info['trained_tokens']:
            return info['trained_tokens']
        else:
            return None

    def _create_deployment(self) -> Tuple[str, str]:
        return (self._model_name, 'RUNNING')

    def _query_deployment(self, deployment_id) -> str:
        return 'RUNNING'

`lazyllm.module.llms.onlinemodule.supplier.openai.OpenAIRerank`

Bases: LazyLLMOnlineRerankModuleBase

The OpenAIRerank class provides functionality to call OpenAI's Reranking API for re-ordering a list of text documents.

This class inherits from OnlineEmbeddingModuleBase and mainly provides:

Setting the embedding model URL and name;
Encapsulating request data and calling the OpenAI Rerank API;
Parsing the returned ranking results.

Parameters:

embed_url (str, default: None ) –

Base URL of the OpenAI API, default is 'https://api.openai.com/v1/'.
embed_model_name (str, default: None ) –

Name of the embedding model used for Rerank.
api_key (str, default: None ) –

OpenAI API Key, optional. If not provided, the default from lazyllm config is used.
**kw –

Additional keyword arguments passed to the parent constructor.

Source code in lazyllm/module/llms/onlinemodule/supplier/openai.py

class OpenAIRerank(LazyLLMOnlineRerankModuleBase):
    """
The OpenAIRerank class provides functionality to call OpenAI's Reranking API for re-ordering a list of text documents.

This class inherits from `OnlineEmbeddingModuleBase` and mainly provides:

- Setting the embedding model URL and name;
- Encapsulating request data and calling the OpenAI Rerank API;
- Parsing the returned ranking results.

Args:
    embed_url (str): Base URL of the OpenAI API, default is 'https://api.openai.com/v1/'.
    embed_model_name (str): Name of the embedding model used for Rerank.
    api_key (str): OpenAI API Key, optional. If not provided, the default from lazyllm config is used.
    **kw: Additional keyword arguments passed to the parent constructor.
"""
    NO_PROXY = True

    def __init__(self, embed_url: Optional[str] = None, embed_model_name: Optional[str] = None,
                 api_key: str = None, **kw):
        embed_url = embed_url or 'https://api.openai.com/v1/'
        embed_model_name = embed_model_name or 'rerank-multilingual-v3.0'
        super().__init__(embed_url, api_key or self._default_api_key(), embed_model_name, **kw)

    def _set_embed_url(self):
        self._embed_url = urljoin(self._embed_url, 'rerank')

    @property
    def type(self):
        return 'RERANK'

    def _encapsulated_data(self, query: str, documents: List[str], top_n: int, **kwargs) -> Dict[str, str]:
        json_data = {
            'query': query,
            'documents': documents,
            'top_n': top_n,
            'model': self._embed_model_name
        }
        if len(kwargs) > 0:
            json_data.update(kwargs)

        return json_data

    def _parse_response(self, response: Dict, input: Union[List, str]) -> List[Tuple]:
        results = response['results']
        return [(result['index'], result['relevance_score']) for result in results]

`lazyllm.module.llms.onlinemodule.supplier.sensenova.SenseNovaEmbed`

Bases: LazyLLMOnlineEmbedModuleBase, _SenseNovaBase

SenseTime SenseNova Embedding module for text vectorization operations.Provides interface to interact with SenseTime's SenseNova embedding models, supporting text-to-vector conversion functionality. Inherits from OnlineEmbeddingModuleBase and _SenseNovaBase.

Parameters:

embed_url (str, default: None ) –

Embedding API URL, defaults to "https://api.sensenova.cn/v1/llm/embeddings".
embed_model_name (str, default: None ) –

Embedding model name, defaults to "nova-embedding-stable".
api_key (str, default: None ) –

API access key, defaults to None.
secret_key (str, default: None ) –

API secret key, defaults to None.

Source code in lazyllm/module/llms/onlinemodule/supplier/sensenova.py

class SenseNovaEmbed(LazyLLMOnlineEmbedModuleBase, _SenseNovaBase):
    """SenseTime SenseNova Embedding module for text vectorization operations.Provides interface to interact with SenseTime's SenseNova embedding models, supporting text-to-vector conversion functionality. Inherits from OnlineEmbeddingModuleBase and _SenseNovaBase.

Args:
    embed_url (str, optional): Embedding API URL, defaults to "https://api.sensenova.cn/v1/llm/embeddings".
    embed_model_name (str, optional): Embedding model name, defaults to "nova-embedding-stable".
    api_key (str, optional): API access key, defaults to None.
    secret_key (str, optional): API secret key, defaults to None.
"""

    def _materialize_lazy_api_key(self) -> str:
        return self._get_api_key(None, None)

    def __init__(self,
                 embed_url: Optional[str] = None,
                 embed_model_name: Optional[str] = None,
                 api_key: str = None,
                 secret_key: str = None,
                 batch_size: int = 16,
                 **kw):
        embed_url = embed_url or 'https://api.sensenova.cn/v1/llm/embeddings'
        embed_model_name = embed_model_name or 'nova-embedding-stable'
        if api_key not in LAZY_API_KEY_TOKENS:
            api_key = self._get_api_key(api_key, secret_key)
        super().__init__(embed_url, api_key, embed_model_name, batch_size=batch_size, **kw)

    def _get_embed_url(self, url: str) -> str:
        url, done = self._normalize_embed_url(url)
        if done: return url
        return urljoin(url, 'v1/llm/embeddings')

    def _parse_response(self, response: Dict, input: Union[List, str]) -> Union[List[List[float]], List[float]]:
        embeddings = response.get('embeddings', [])
        if not embeddings:
            return []
        if isinstance(input, str):
            return embeddings[0].get('embedding', [])
        else:
            return [res.get('embedding', []) for res in embeddings]

`lazyllm.module.llms.onlinemodule.supplier.siliconflow.SiliconFlowTTS`

Bases: LazyLLMOnlineTTSModuleBase

SiliconFlow Text-to-Speech module, inherits from OnlineMultiModalBase.

Provides text-to-speech (TTS) functionality based on SiliconFlow, supports converting text to audio files.

Parameters:

api_key (str, default: None ) –

API key, defaults to configured siliconflow_api_key
model_name (str, default: None ) –

Model name, defaults to "fnlp/MOSS-TTSD-v0.5"
base_url (str, default: None ) –

Base API URL, defaults to "https://api.siliconflow.cn/v1/"
return_trace (bool, default: False ) –

Whether to return trace information, defaults to False
**kwargs –

Other model parameters

Source code in lazyllm/module/llms/onlinemodule/supplier/siliconflow.py

class SiliconFlowTTS(LazyLLMOnlineTTSModuleBase):
    """SiliconFlow Text-to-Speech module, inherits from OnlineMultiModalBase.

Provides text-to-speech (TTS) functionality based on SiliconFlow, supports converting text to audio files.

Args:
    api_key (str, optional): API key, defaults to configured siliconflow_api_key
    model_name (str, optional): Model name, defaults to "fnlp/MOSS-TTSD-v0.5"
    base_url (str, optional): Base API URL, defaults to "https://api.siliconflow.cn/v1/"
    return_trace (bool, optional): Whether to return trace information, defaults to False
    **kwargs: Other model parameters
"""
    MODEL_NAME = 'fnlp/MOSS-TTSD-v0.5'

    def __init__(self, api_key: str = None, model_name: str = None,
                 base_url: Optional[str] = None,
                 return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://api.siliconflow.cn/v1/'
        super().__init__(api_key=api_key or self._default_api_key(),
                         model_name=model_name or SiliconFlowTTS.MODEL_NAME,
                         return_trace=return_trace, base_url=base_url, **kwargs)
        self._endpoint = 'audio/speech'

    def _make_binary_request(self, endpoint, payload, base_url=None, timeout=180):
        url = f'{(base_url or self._base_url)}{endpoint}'
        try:
            response = requests.post(url, headers=self._header, json=payload, timeout=timeout)
            response.raise_for_status()
            return response.content
        except Exception as e:
            LOG.error(f'API request failed: {str(e)}')
            raise

    def _forward(self, input: str = None, response_format: str = 'mp3',
                 sample_rate: int = 44100, speed: float = 1.0,
                 voice: str = None, references=None, out_path: str = None,
                 url: str = None, model: str = None, **kwargs):

        if not voice:
            active_model = model
            if active_model == 'fnlp/MOSS-TTSD-v0.5':
                voice = 'fnlp/MOSS-TTSD-v0.5:alex'
            elif active_model == 'FunAudioLLM/CosyVoice2-0.5B':
                voice = 'FunAudioLLM/CosyVoice2-0.5B:alex'
            else:
                raise ValueError(
                    f'Default voice is only supported for models "fnlp/MOSS-TTSD-v0.5" and '
                    f'"FunAudioLLM/CosyVoice2-0.5B". For model "{active_model}", '
                    f'please provide a valid voice parameter.')
        payload = {
            'model': model,
            'input': input,
            'response_format': response_format,
            'sample_rate': sample_rate,
            'speed': speed,
            'voice': voice
        }

        if references:
            payload['references'] = references

        payload.update(kwargs)
        audio_content = self._make_binary_request(self._endpoint, payload, base_url=url, timeout=180)
        file_path = bytes_to_file([audio_content])[0]

        if out_path:
            with open(file_path, 'rb') as src, open(out_path, 'wb') as dst:
                dst.write(src.read())
            file_path = out_path

        result = encode_query_with_filepaths(None, [file_path])

        return result

`lazyllm.module.llms.onlinemodule.supplier.siliconflow.SiliconFlowChat`

Bases: OnlineChatModuleBase, FileHandlerBase

SiliconFlow module, inherits from OnlineChatModuleBase and FileHandlerBase.

Provides large language model chat capabilities via the SiliconFlow platform, supports multiple models (including vision-language models), and includes file handling functionality.

Parameters:

base_url (str, default: None ) –

Base API URL, defaults to "https://api.siliconflow.cn/v1/"
model (str, default: None ) –

Model name to use, defaults to "Qwen/QwQ-32B"
api_key (str, default: None ) –

API key, defaults to lazyllm.config['siliconflow_api_key']
stream (bool, default: True ) –

Whether to enable streaming output, defaults to True
return_trace (bool, default: False ) –

Whether to return trace information, defaults to False
**kwargs –

Other model parameters

Source code in lazyllm/module/llms/onlinemodule/supplier/siliconflow.py

class SiliconFlowChat(OnlineChatModuleBase, FileHandlerBase):
    """SiliconFlow module, inherits from OnlineChatModuleBase and FileHandlerBase.

Provides large language model chat capabilities via the SiliconFlow platform, supports multiple models (including vision-language models), and includes file handling functionality.

Args:
    base_url (str, optional): Base API URL, defaults to "https://api.siliconflow.cn/v1/"
    model (str, optional): Model name to use, defaults to "Qwen/QwQ-32B"
    api_key (str, optional): API key, defaults to lazyllm.config['siliconflow_api_key']
    stream (bool, optional): Whether to enable streaming output, defaults to True
    return_trace (bool, optional): Whether to return trace information, defaults to False
    **kwargs: Other model parameters
"""
    VLM_MODEL_PREFIX = ['Qwen/Qwen2.5-VL-72B-Instruct', 'Qwen/Qwen3-VL-30B-A3B-Instruct', 'deepseek-ai/deepseek-vl2',
                        'Qwen/Qwen3-VL-30B-A3B-Thinking', 'THUDM/GLM-4.1V-9B-Thinking']

    def __init__(self, base_url: Optional[str] = None, model: Optional[str] = None,
                 api_key: str = None, stream: bool = True, return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://api.siliconflow.cn/v1/'
        model = model or 'Qwen/Qwen3-8B'
        super().__init__(api_key=api_key or self._default_api_key(), base_url=base_url, model_name=model,
                         stream=stream, return_trace=return_trace, **kwargs)
        FileHandlerBase.__init__(self)
        if stream:
            self._model_optional_params['stream'] = True

    def _get_system_prompt(self):
        return 'You are an intelligent assistant provided by SiliconFlow. You are a helpful assistant.'

    def _validate_api_key(self):
        """Validate API Key by sending a minimal request"""
        try:
            # SiliconFlow validates API key using a minimal chat request
            models_url = urljoin(self._base_url, 'models')
            response = requests.get(models_url, headers=self._header, timeout=10)
            return response.status_code == 200
        except Exception:
            return False

`lazyllm.module.llms.onlinemodule.supplier.siliconflow.SiliconFlowRerank`

Bases: LazyLLMOnlineRerankModuleBase

SiliconFlow reranking module, inherits from OnlineEmbeddingModuleBase.

Provides text reranking functionality via the SiliconFlow platform, reordering a list of documents based on their relevance to a given query.

Parameters:

embed_url (str, default: None ) –

Reranking API URL, defaults to "https://api.siliconflow.cn/v1/rerank"
embed_model_name (str, default: None ) –

Name of the reranking model to use, defaults to "BAAI/bge-reranker-v2-m3"
api_key (str, default: None ) –

API key, defaults to lazyllm.config['siliconflow_api_key']
**kw –

Additional reranking module parameters

Returns: List[Tuple]: A list of reranking results, each containing 'index' and 'relevance_score'.

Source code in lazyllm/module/llms/onlinemodule/supplier/siliconflow.py

class SiliconFlowRerank(LazyLLMOnlineRerankModuleBase):
    """SiliconFlow reranking module, inherits from OnlineEmbeddingModuleBase.

Provides text reranking functionality via the SiliconFlow platform, reordering a list of documents based on their relevance to a given query.

Args:
    embed_url (str, optional): Reranking API URL, defaults to "https://api.siliconflow.cn/v1/rerank"
    embed_model_name (str, optional): Name of the reranking model to use, defaults to "BAAI/bge-reranker-v2-m3"
    api_key (str, optional): API key, defaults to lazyllm.config['siliconflow_api_key']
    **kw: Additional reranking module parameters
Returns:
    List[Tuple]: A list of reranking results, each containing 'index' and 'relevance_score'.
"""
    def __init__(self, embed_url: Optional[str] = None, embed_model_name: Optional[str] = None,
                 api_key: str = None, **kw):
        embed_url = embed_url or 'https://api.siliconflow.cn/v1/rerank'
        embed_model_name = embed_model_name or 'BAAI/bge-reranker-v2-m3'
        super().__init__(embed_url, api_key or self._default_api_key(), embed_model_name, **kw)

    def _encapsulated_data(self, query: str, documents: List[str], top_n: int, **kwargs) -> Dict:
        json_data = {
            'model': self._embed_model_name,
            'query': query,
            'documents': documents,
            'top_n': top_n
        }
        if len(kwargs) > 0:
            json_data.update(kwargs)
        return json_data

    def _parse_response(self, response: Dict, input: Union[List, str]) -> List[Tuple]:
        results = response.get('results', [])
        return [(result['index'], result['relevance_score']) for result in results]

`lazyllm.module.llms.onlinemodule.supplier.siliconflow.SiliconFlowText2Image`

Bases: LazyLLMOnlineText2ImageModuleBase

SiliconFlow Text-to-Image module, inherits from OnlineMultiModalBase.

Provides text-to-image generation functionality based on SiliconFlow, supports generating images from text descriptions and image editing.

Parameters:

api_key (str, default: None ) –

API key, defaults to configured siliconflow_api_key
model_name (str) –

Model name, defaults to "Qwen/Qwen-Image"
base_url (str) –

Base API URL, defaults to "https://api.siliconflow.cn/v1/"
return_trace (bool, default: False ) –

Whether to return trace information, defaults to False
**kwargs –

Other model parameters

Source code in lazyllm/module/llms/onlinemodule/supplier/siliconflow.py

class SiliconFlowText2Image(LazyLLMOnlineText2ImageModuleBase):
    """SiliconFlow Text-to-Image module, inherits from OnlineMultiModalBase.

Provides text-to-image generation functionality based on SiliconFlow, supports generating images from text descriptions and image editing.

Args:
    api_key (str, optional): API key, defaults to configured siliconflow_api_key
    model_name (str, optional): Model name, defaults to "Qwen/Qwen-Image"
    base_url (str, optional): Base API URL, defaults to "https://api.siliconflow.cn/v1/"
    return_trace (bool, optional): Whether to return trace information, defaults to False
    **kwargs: Other model parameters
"""
    MODEL_NAME = 'Qwen/Qwen-Image'
    IMAGE_EDITING_MODEL_NAME = 'Qwen/Qwen-Image-Edit-2509'

    def __init__(self, api_key: str = None, model: str = None,
                 url: Optional[str] = None,
                 return_trace: bool = False, **kwargs):
        url = url or 'https://api.siliconflow.cn/v1/'
        super().__init__(api_key=api_key or self._default_api_key(),
                         model=model or SiliconFlowText2Image.MODEL_NAME, url=url, return_trace=return_trace, **kwargs)
        self._endpoint = 'images/generations'

    def _get_image_data_from_url(self, url: str, timeout: int = 30) -> bytes:
        """
        Override parent implementation because SiliconFlow S3 temporary URLs
        may return application/octet-stream instead of image/* content type.
        """
        self._validate_url_security(url)

        resp = requests.get(
            url,
            timeout=timeout,
            allow_redirects=True
        )
        resp.raise_for_status()
        data = resp.content
        self._validate_image_data(data, url)
        return data

    def _make_request(self, endpoint, payload, base_url=None, timeout=180):
        url = f'{(base_url or self._base_url)}{endpoint}'
        try:
            response = requests.post(url, headers=self._header, json=payload, timeout=timeout)
            response.raise_for_status()
            return response.json()
        except Exception as e:
            LOG.error(f'API request failed: {str(e)}')
            raise

    def _forward(self, input: str = None, files: List[str] = None, size: str = '1024x1024', url: str = None,
                 model: str = None, **kwargs):
        has_ref_image = files is not None and len(files) > 0
        reference_image_data = None
        if self._type == LLMType.IMAGE_EDITING and not has_ref_image:
            raise ValueError(
                f'Image editing is enabled for model {self._model_name}, but no image file was provided. '
                f'Please provide an image file via the "files" parameter.'
            )
        if self._type != LLMType.IMAGE_EDITING and has_ref_image:
            raise ValueError(
                f'Image file was provided, but image editing is not enabled for model {self._model_name}. '
                f'Please use default image-editing model {self.IMAGE_EDITING_MODEL_NAME} or other image-editing model'
            )

        payload = {
            'model': model,
            'prompt': input,
            **kwargs
        }
        if has_ref_image:
            for i, file in enumerate(files):
                reference_image_base64, _ = self._load_images(file)[0]
                reference_image_data = f'data:image/png;base64,{reference_image_base64}'
                if i == 0:
                    payload['image'] = reference_image_data
                elif i > 0:
                    payload[f'image{i + 1}'] = reference_image_data
        result = self._make_request(self._endpoint, payload)
        image_urls = [item['url'] for item in result.get('data', [])]
        if not image_urls:
            raise Exception('No images returned from API')
        image_results = self._load_images(image_urls)
        image_bytes = [data for _, data in image_results]
        if not image_bytes:
            raise Exception('Failed to download any images')
        ai_img_path = os.path.join(config['temp_dir'], 'ai_img')
        file_paths = bytes_to_file(image_bytes, target_dir=ai_img_path)
        return encode_query_with_filepaths(None, file_paths)

`lazyllm.module.llms.onlinemodule.supplier.aiping.AipingChat`

Bases: OnlineChatModuleBase, FileHandlerBase

AipingChat is an online chat module for AIPing, inheriting from OnlineChatModuleBase and FileHandlerBase.

Provides an interface to interact with AIPing's large language models, supporting chat generation, file handling, and model fine-tuning. Supports multiple models including Vision-Language Models (VLM) such as Qwen2.5-VL, Qwen3-VL, GLM-4.5V, GLM-4.6V, etc.

Parameters:

base_url (str, default: None ) –

Base URL for the API, defaults to "https://aiping.cn/api/v1/".
model (str, default: None ) –

Name of the model to use, defaults to "DeepSeek-R1".
api_key (Optional[str], default: None ) –

API key for accessing AIPing service. If not provided, it is read from lazyllm config.
stream (bool, default: True ) –

Whether to enable streaming output, defaults to True.
return_trace (bool, default: False ) –

Whether to return debug trace information, defaults to False.
**kwargs –

Additional parameters passed to OnlineChatModuleBase.

Features

Supports multiple large language models, including general chat models and vision-language models
Supports streaming output for better user experience
Integrated file handling functionality, supporting fine-tuning data format validation and conversion
Built-in system prompt: "You are an intelligent assistant developed by AIPing. You are a helpful assistant."
Supports API key validation to ensure service security

Source code in lazyllm/module/llms/onlinemodule/supplier/aiping.py

class AipingChat(OnlineChatModuleBase, FileHandlerBase):
    """AipingChat is an online chat module for AIPing, inheriting from OnlineChatModuleBase and FileHandlerBase.

Provides an interface to interact with AIPing's large language models, supporting chat generation, file handling, and model fine-tuning. Supports multiple models including Vision-Language Models (VLM) such as Qwen2.5-VL, Qwen3-VL, GLM-4.5V, GLM-4.6V, etc.

Args:
    base_url (str): Base URL for the API, defaults to "https://aiping.cn/api/v1/".
    model (str): Name of the model to use, defaults to "DeepSeek-R1".
    api_key (Optional[str]): API key for accessing AIPing service. If not provided, it is read from lazyllm config.
    stream (bool): Whether to enable streaming output, defaults to True.
    return_trace (bool): Whether to return debug trace information, defaults to False.
    **kwargs: Additional parameters passed to OnlineChatModuleBase.

Features:
    1. Supports multiple large language models, including general chat models and vision-language models
    2. Supports streaming output for better user experience
    3. Integrated file handling functionality, supporting fine-tuning data format validation and conversion
    4. Built-in system prompt: "You are an intelligent assistant developed by AIPing. You are a helpful assistant."
    5. Supports API key validation to ensure service security
"""
    VLM_MODEL_PREFIX = [
        'Qwen2.5-VL-',
        'Qwen3-VL-',
        'GLM-4.5V',
        'GLM-4.6V'
    ]

    def __init__(self, base_url: Optional[str] = None, model: Optional[str] = None,
                 api_key: str = None, stream: bool = True, return_trace: bool = False, **kwargs):
        base_url = base_url or 'https://aiping.cn/api/v1/'
        model = model or 'DeepSeek-R1'
        super().__init__(api_key=api_key or self._default_api_key(), base_url=base_url, model_name=model,
                         stream=stream, return_trace=return_trace, **kwargs)
        FileHandlerBase.__init__(self)
        if stream:
            self._model_optional_params['stream'] = True

    def _get_system_prompt(self):
        return 'You are an intelligent assistant developed by AIPing. You are a helpful assistant.'

    def _validate_api_key(self):
        try:
            data = {
                'model': self._model_name,
                'messages': [{'role': 'user', 'content': 'hi'}],
                'max_tokens': 1
            }
            response = requests.post(self._chat_url, headers=self._header, json=data, timeout=TIMEOUT)
            return response.status_code == 200
        except Exception:
            return False

`lazyllm.module.llms.onlinemodule.supplier.aiping.AipingEmbed`

Bases: LazyLLMOnlineEmbedModuleBase

Aiping text embedding module, inheriting from OnlineEmbeddingModuleBase.

Provides an interface to interact with AIPing's text embedding service, supporting conversion of text to vector representations with batch processing support.

Parameters:

embed_url (str, default: 'https://aiping.cn/api/v1/embeddings' ) –

Embedding API URL, defaults to "https://aiping.cn/api/v1/embeddings".
embed_model_name (str, default: 'text-embedding-v1' ) –

Name of the embedding model to use, defaults to "text-embedding-v1".
api_key (Optional[str], default: None ) –

API key for accessing AIPing service. If not provided, it is read from lazyllm config.
batch_size (int, default: 16 ) –

Batch size for processing, defaults to 16.
**kw –

Additional parameters passed to the base class.

Features

Converts text to high-dimensional vector representations
Supports batch text processing for improved efficiency
Configurable batch size to accommodate different performance requirements
Seamless integration with AIPing API

Source code in lazyllm/module/llms/onlinemodule/supplier/aiping.py

class AipingEmbed(LazyLLMOnlineEmbedModuleBase):
    """Aiping text embedding module, inheriting from OnlineEmbeddingModuleBase.

Provides an interface to interact with AIPing's text embedding service, supporting conversion of text to vector representations with batch processing support.

Args:
    embed_url (str): Embedding API URL, defaults to "https://aiping.cn/api/v1/embeddings".
    embed_model_name (str): Name of the embedding model to use, defaults to "text-embedding-v1".
    api_key (Optional[str]): API key for accessing AIPing service. If not provided, it is read from lazyllm config.
    batch_size (int): Batch size for processing, defaults to 16.
    **kw: Additional parameters passed to the base class.

Features:
    1. Converts text to high-dimensional vector representations
    2. Supports batch text processing for improved efficiency
    3. Configurable batch size to accommodate different performance requirements
    4. Seamless integration with AIPing API
"""
    def __init__(self, embed_url: str = 'https://aiping.cn/api/v1/embeddings',
                 embed_model_name: str = 'text-embedding-v1', api_key: str = None,
                 batch_size: int = 16, **kw):
        embed_url = embed_url or 'https://aiping.cn/api/v1/embeddings'
        embed_model_name = embed_model_name or 'text-embedding-v1'
        super().__init__(embed_url, api_key or self._default_api_key(),
                         embed_model_name, batch_size=batch_size, **kw)

`lazyllm.module.llms.onlinemodule.supplier.aiping.AipingRerank`

Bases: LazyLLMOnlineRerankModuleBase

Aiping reranking module, inheriting from OnlineEmbeddingModuleBase.

Provides an interface to interact with AIPing's reranking service, used for reordering a list of documents based on their relevance to a given query. Returns a list of tuples containing document index and relevance score.

Parameters:

embed_url (str, default: None ) –

Reranking API URL, defaults to "https://aiping.cn/api/v1/rerank".
embed_model_name (str, default: None ) –

Name of the reranking model to use, defaults to "Qwen3-Reranker-0.6B".
api_key (Optional[str], default: None ) –

API key for accessing AIPing service. If not provided, it is read from lazyllm config.
**kw –

Additional parameters passed to the base class.

Properties

type (str): Returns model type, fixed as "RERANK".

Features

Reranks documents based on query relevance
Supports custom ranking parameters (e.g., top_n)
Returns index and relevance score for each document
Suitable for search result optimization and document recommendation scenarios

Source code in lazyllm/module/llms/onlinemodule/supplier/aiping.py

class AipingRerank(LazyLLMOnlineRerankModuleBase):
    """Aiping reranking module, inheriting from OnlineEmbeddingModuleBase.

Provides an interface to interact with AIPing's reranking service, used for reordering a list of documents based on their relevance to a given query. Returns a list of tuples containing document index and relevance score.

Args:
    embed_url (str): Reranking API URL, defaults to "https://aiping.cn/api/v1/rerank".
    embed_model_name (str): Name of the reranking model to use, defaults to "Qwen3-Reranker-0.6B".
    api_key (Optional[str]): API key for accessing AIPing service. If not provided, it is read from lazyllm config.
    **kw: Additional parameters passed to the base class.

Properties:
    type (str): Returns model type, fixed as "RERANK".

Features:
    1. Reranks documents based on query relevance
    2. Supports custom ranking parameters (e.g., top_n)
    3. Returns index and relevance score for each document
    4. Suitable for search result optimization and document recommendation scenarios
"""
    def __init__(self, embed_url: Optional[str] = None, embed_model_name: Optional[str] = None,
                 api_key: str = None, **kw):
        embed_url = embed_url or 'https://aiping.cn/api/v1/rerank'
        embed_model_name = embed_model_name or 'Qwen3-Reranker-0.6B'
        super().__init__(embed_url, api_key or self._default_api_key(),
                         embed_model_name, **kw)

    @property
    def type(self):
        return 'RERANK'

    def _encapsulated_data(self, query: str, documents: List[str], top_n: int, **kwargs) -> Dict[str, str]:
        json_data = {
            'model': self._embed_model_name,
            'query': query,
            'documents': documents,
            'top_n': top_n
        }
        if len(kwargs) > 0:
            json_data.update(kwargs)

        return json_data

    def _parse_response(self, response: Dict, input: Union[List, str]) -> List[Tuple]:
        results = response.get('results', [])
        if not results:
            return []
        return [(result['index'], result['relevance_score']) for result in results]

`lazyllm.module.llms.onlinemodule.supplier.aiping.AipingText2Image`

Bases: LazyLLMOnlineText2ImageModuleBase

Aiping text-to-image module, inheriting from OnlineMultiModalBase.

Provides an interface to interact with AIPing's image generation service, supporting image generation from text descriptions. Supports parameters such as negative prompts, image count, size, and random seeds.

Parameters:

api_key (Optional[str], default: None ) –

API key for accessing AIPing service. If not provided, it is read from lazyllm config.
model_name (str, default: None ) –

Name of the model to use, defaults to "Qwen-Image".
base_url (str, default: None ) –

Base URL for the API, defaults to "https://aiping.cn/api/v1/".
return_trace (bool, default: False ) –

Whether to return debug trace information, defaults to False.
**kwargs –

Additional parameters passed to the base class.

Features

Generates high-quality images from text prompts
Supports negative prompts to filter unwanted image features
Configurable number of images to generate (n parameter)
Supports multiple image size specifications
Supports random seed control for reproducible results
Automatically downloads generated images and encodes them as files
Default negative prompt: "模糊，低质量"

Note

This module automatically downloads generated images to local files
The returned result contains file path information for easy subsequent processing

Source code in lazyllm/module/llms/onlinemodule/supplier/aiping.py

class AipingText2Image(LazyLLMOnlineText2ImageModuleBase):
    """Aiping text-to-image module, inheriting from OnlineMultiModalBase.

Provides an interface to interact with AIPing's image generation service, supporting image generation from text descriptions. Supports parameters such as negative prompts, image count, size, and random seeds.

Args:
    api_key (Optional[str]): API key for accessing AIPing service. If not provided, it is read from lazyllm config.
    model_name (str): Name of the model to use, defaults to "Qwen-Image".
    base_url (str): Base URL for the API, defaults to "https://aiping.cn/api/v1/".
    return_trace (bool): Whether to return debug trace information, defaults to False.
    **kwargs: Additional parameters passed to the base class.

Features:
    1. Generates high-quality images from text prompts
    2. Supports negative prompts to filter unwanted image features
    3. Configurable number of images to generate (n parameter)
    4. Supports multiple image size specifications
    5. Supports random seed control for reproducible results
    6. Automatically downloads generated images and encodes them as files
    7. Default negative prompt: "模糊，低质量"

Note:
    - This module automatically downloads generated images to local files
    - The returned result contains file path information for easy subsequent processing
"""
    def __init__(self, api_key: str = None, model_name: Optional[str] = None,
                 base_url: Optional[str] = None,
                 return_trace: bool = False, **kwargs):
        model_name = model_name or 'Qwen-Image'
        base_url = base_url or 'https://aiping.cn/api/v1/'
        super().__init__(model_name=model_name, api_key=api_key or self._default_api_key(),
                         return_trace=return_trace, **kwargs)
        self._endpoint = 'images/generations'
        self._base_url = base_url

    def _make_request(self, endpoint, payload, timeout=TIMEOUT):
        url = f'{self._base_url}{endpoint}'

        try:
            response = requests.post(url, headers=self._header, json=payload, timeout=timeout)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.RequestException as e:
            lazyllm.LOG.error(f'Request failed: {e}')
            raise

    def _forward(self, input: str = None, negative_prompt: str = None, n: int = None,
                 size: str = None, seed: int = None, **kwargs):
        if not input:
            raise ValueError('Prompt is required')

        input_params = {
            'prompt': input,
            'negative_prompt': negative_prompt or '模糊，低质量'
        }

        extra_body = {}

        if n is not None:
            extra_body['n'] = n

        if size is not None:
            extra_body['size'] = size

        if seed is not None:
            extra_body['seed'] = seed

        payload = {
            'model': self._model_name,
            'input': input_params
        }

        if extra_body:
            payload['extra_body'] = extra_body

        try:
            result = self._make_request(self._endpoint, payload)

            images = result.get('data')
            if not images or not isinstance(images, list) or not images:
                raise ValueError(f'Unexpected response format: {result}')

            image_urls = [img.get('url') for img in images if img.get('url')]
            if not image_urls:
                raise ValueError(f'No image URLs found in response: {result}')

            return encode_query_with_filepaths(None, bytes_to_file([requests.get(url).content for url in image_urls]))

        except Exception as e:
            lazyllm.LOG.error(f'Failed to generate image: {e}')
            raise Exception(f'Failed to generate image: {str(e)}')

Module

lazyllm.module.ModuleBase

eval(*, recursive=True)

evalset(evalset, load_f=None, collect_f=lambda x: x)

forward(*args, **kw)

start()

restart()

update(*, recursive=True)

stream_output(stream_output=None)

used_by(module_id)

register_hook(hook_type)

unregister_hook(hook_type)

clear_hooks()

update_server(*, recursive=True)

wait()

stop()

for_each(filter, action)

lazyllm.module.servermodule.LLMBase

prompt(prompt=None, history=None)

formatter(format=None)

share(prompt=None, format=None, stream=None, history=None, copy_static_params=False)

lazyllm.module.ActionModule

submodules property

forward(*args, **kw)

lazyllm.module.TrainableModule

wait()

stop(task_name=None)

prompt(prompt='', history=None)

log_path(task_name=None)

forward_openai(__input=package(), *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False, max_retries=3, **kw)

forward_standard(__input=package(), *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False, max_retries=3, **kw)

forward(__input=package(), *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False, max_retries=3, **kw)

lazyllm.module.UrlModule

forward(*args, **kw)

lazyllm.module.ServerModule

wait()

stop()

lazyllm.module.AutoModel

lazyllm.module.TrialModule

update()

work(m, q) staticmethod

lazyllm.module.OnlineChatModule

lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoChat

lazyllm.module.llms.onlinemodule.supplier.ppio.PPIOChat

lazyllm.module.OnlineEmbeddingModule

lazyllm.module.OnlineMultiModalModule

lazyllm.module.llms.onlinemodule.supplier.openai.OpenAIEmbed

lazyllm.module.llms.onlinemodule.supplier.qwen.QwenSTT

lazyllm.module.OnlineChatModuleBase = LazyLLMOnlineChatModuleBase module-attribute

lazyllm.module.OnlineEmbeddingModuleBase

run_embed_batch(input, data, proxies, url=None, **kwargs)

lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoEmbed

lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoMultimodalEmbed

lazyllm.module.llms.onlinemodule.supplier.glm.GLMChat

lazyllm.module.llms.onlinemodule.supplier.glm.GLMText2Image

lazyllm.module.llms.onlinemodule.supplier.qwen.QwenText2Image

lazyllm.module.llms.onlinemodule.supplier.kimi.KimiChat

lazyllm.module.llms.onlinemodule.fileHandler.FileHandlerBase

get_finetune_data(filepath)

lazyllm.module.llms.onlinemodule.supplier.glm.GLMChat

lazyllm.module.llms.onlinemodule.supplier.glm.GLMRerank

lazyllm.module.OnlineMultiModalModule

lazyllm.module.llms.onlinemodule.supplier.qwen.QwenRerank

lazyllm.module.llms.onlinemodule.supplier.qwen.QwenTTS

lazyllm.module.llms.onlinemodule.supplier.sensenova.SenseNovaChat

set_deploy_parameters(**kw)

lazyllm.module.llms.onlinemodule.base.onlineMultiModalBase.OnlineMultiModalBase

lazyllm.module.llms.onlinemodule.base.utils.LazyLLMOnlineBase

lazyllm.module.module.ModuleCache

close()

get(key, args, kw)

set(key, args, kw, value)

lazyllm.module.llms.onlinemodule.supplier.qwen.QwenChat

TODO: The Qianwen model has been finetuned and deployed successfully,

set_deploy_parameters(**kw)

lazyllm.module.llms.onlinemodule.supplier.qwen.QwenEmbed

lazyllm.module.llms.onlinemodule.supplier.glm.GLMEmbed

lazyllm.module.llms.onlinemodule.supplier.glm.GLMSTT

lazyllm.module.llms.onlinemodule.supplier.deepseek.DeepSeekChat

lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoText2Image

`lazyllm.module.ModuleBase`

`eval(*, recursive=True)`

`evalset(evalset, load_f=None, collect_f=lambda x: x)`

`forward(*args, **kw)`

`start()`

`restart()`

`update(*, recursive=True)`

`stream_output(stream_output=None)`

`used_by(module_id)`

`register_hook(hook_type)`

`unregister_hook(hook_type)`

`clear_hooks()`

`update_server(*, recursive=True)`

`wait()`

`stop()`

`for_each(filter, action)`

`lazyllm.module.servermodule.LLMBase`

`prompt(prompt=None, history=None)`

`formatter(format=None)`

`share(prompt=None, format=None, stream=None, history=None, copy_static_params=False)`

`lazyllm.module.ActionModule`

`submodules` `property`

`forward(*args, **kw)`

`lazyllm.module.TrainableModule`

`wait()`

`stop(task_name=None)`

`prompt(prompt='', history=None)`

`log_path(task_name=None)`

`forward_openai(__input=package(), *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False, max_retries=3, **kw)`

`forward_standard(__input=package(), *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False, max_retries=3, **kw)`

`forward(__input=package(), *, llm_chat_history=None, lazyllm_files=None, tools=None, stream_output=False, max_retries=3, **kw)`

`lazyllm.module.UrlModule`

`forward(*args, **kw)`

`lazyllm.module.ServerModule`

`wait()`

`stop()`

`lazyllm.module.AutoModel`

`lazyllm.module.TrialModule`

`update()`

`work(m, q)` `staticmethod`

`lazyllm.module.OnlineChatModule`

`lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoChat`

`lazyllm.module.llms.onlinemodule.supplier.ppio.PPIOChat`

`lazyllm.module.OnlineEmbeddingModule`

`lazyllm.module.OnlineMultiModalModule`

`lazyllm.module.llms.onlinemodule.supplier.openai.OpenAIEmbed`

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenSTT`

`lazyllm.module.OnlineChatModuleBase = LazyLLMOnlineChatModuleBase` `module-attribute`

`lazyllm.module.OnlineEmbeddingModuleBase`

`run_embed_batch(input, data, proxies, url=None, **kwargs)`

`lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoEmbed`

`lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoMultimodalEmbed`

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMChat`

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMText2Image`

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenText2Image`

`lazyllm.module.llms.onlinemodule.supplier.kimi.KimiChat`

`lazyllm.module.llms.onlinemodule.fileHandler.FileHandlerBase`

`get_finetune_data(filepath)`

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMChat`

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMRerank`

`lazyllm.module.OnlineMultiModalModule`

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenRerank`

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenTTS`

`lazyllm.module.llms.onlinemodule.supplier.sensenova.SenseNovaChat`

`set_deploy_parameters(**kw)`

`lazyllm.module.llms.onlinemodule.base.onlineMultiModalBase.OnlineMultiModalBase`

`lazyllm.module.llms.onlinemodule.base.utils.LazyLLMOnlineBase`

`lazyllm.module.module.ModuleCache`

`close()`

`get(key, args, kw)`

`set(key, args, kw, value)`

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenChat`

`set_deploy_parameters(**kw)`

`lazyllm.module.llms.onlinemodule.supplier.qwen.QwenEmbed`

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMEmbed`

`lazyllm.module.llms.onlinemodule.supplier.glm.GLMSTT`

`lazyllm.module.llms.onlinemodule.supplier.deepseek.DeepSeekChat`

`lazyllm.module.llms.onlinemodule.supplier.doubao.DoubaoText2Image`

`lazyllm.module.llms.onlinemodule.supplier.openai.OpenAIChat`

`lazyllm.module.llms.onlinemodule.supplier.openai.OpenAIRerank`