jp6/cu126/: vllm-0.6.4.dev0+gfd47e57f.d20241015.cu126 metadata and description

Homepage Simple index Stable version available

A high-throughput and memory-efficient inference and serving engine for LLMs

author vLLM Team
classifiers
  • Programming Language :: Python :: 3.8
  • Programming Language :: Python :: 3.9
  • Programming Language :: Python :: 3.10
  • Programming Language :: Python :: 3.11
  • Programming Language :: Python :: 3.12
  • License :: OSI Approved :: Apache Software License
  • Intended Audience :: Developers
  • Intended Audience :: Information Technology
  • Intended Audience :: Science/Research
  • Topic :: Scientific/Engineering :: Artificial Intelligence
  • Topic :: Scientific/Engineering :: Information Analysis
description_content_type text/markdown
license Apache 2.0
project_urls
  • Homepage, https://github.com/vllm-project/vllm
  • Documentation, https://vllm.readthedocs.io/en/latest/
requires_dist
  • psutil
  • sentencepiece
  • numpy<2.0.0
  • requests>=2.26.0
  • tqdm
  • py-cpuinfo
  • transformers>=4.45.0
  • tokenizers>=0.19.1
  • protobuf
  • aiohttp
  • openai>=1.40.0
  • uvicorn[standard]
  • pydantic>=2.9
  • pillow
  • prometheus-client>=0.18.0
  • prometheus-fastapi-instrumentator>=7.0.0
  • tiktoken>=0.6.0
  • lm-format-enforcer==0.10.6
  • outlines<0.1,>=0.0.43
  • typing-extensions>=4.10
  • filelock>=3.10.4
  • partial-json-parser
  • pyzmq
  • msgspec
  • gguf==0.10.0
  • importlib-metadata
  • mistral-common[opencv]>=1.4.4
  • pyyaml
  • einops
  • ray>=2.9
  • nvidia-ml-py
  • fastapi<0.113.0,>=0.107.0; python_version < "3.9"
  • six>=1.16.0; python_version > "3.11"
  • setuptools>=74.1.1; python_version > "3.11"
  • fastapi!=0.113.*,!=0.114.0,>=0.107.0; python_version >= "3.9"
  • librosa; extra == "audio"
  • soundfile; extra == "audio"
  • tensorizer>=2.9.0; extra == "tensorizer"
requires_python >=3.8

Because this project isn't in the mirror_whitelist, no releases from root/pypi are included.

File Tox results History
vllm-0.6.4.dev0+gfd47e57f.d20241015.cu126-cp310-cp310-linux_aarch64.whl
Size
110 MB
Type
Python Wheel
Python
3.10

vLLM

Easy, fast, and cheap LLM serving for everyone

| Documentation | Blog | Paper | Discord | Twitter/X | Developer Slack |

Latest News 🔥


About

vLLM is a fast and easy-to-use library for LLM inference and serving.

vLLM is fast with:

Performance benchmark: We include a performance benchmark at the end of our blog post. It compares the performance of vLLM against other LLM serving engines (TensorRT-LLM, SGLang and LMDeploy). The implementation is under nightly-benchmarks folder and you can reproduce this benchmark using our one-click runnable script.

vLLM is flexible and easy to use with:

vLLM seamlessly supports most popular open-source models on HuggingFace, including:

Find the full list of supported models here.

Getting Started

Install vLLM with pip or from source:

pip install vllm

Visit our documentation to learn more.

Contributing

We welcome and value any contributions and collaborations. Please check out CONTRIBUTING.md for how to get involved.

Sponsors

vLLM is a community project. Our compute resources for development and testing are supported by the following organizations. Thank you for your support!

We also have an official fundraising venue through OpenCollective. We plan to use the fund to support the development, maintenance, and adoption of vLLM.

Citation

If you use vLLM for your research, please cite our paper:

@inproceedings{kwon2023efficient,
  title={Efficient Memory Management for Large Language Model Serving with PagedAttention},
  author={Woosuk Kwon and Zhuohan Li and Siyuan Zhuang and Ying Sheng and Lianmin Zheng and Cody Hao Yu and Joseph E. Gonzalez and Hao Zhang and Ion Stoica},
  booktitle={Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles},
  year={2023}
}

Contact Us