Puget Systems Debuts “On-Premise” Generative AI and Machine Learning Custom Server at SIGGRAPH 2023

Puget Systems’ Specialized AI Training and Inference Server Supports Up to Four NVIDIA RTX 6000 Ada Graphics Cards to Host Web-Based Chat Interface for Large Language Models

Puget Systems (www.pugetsystems.com) today announced it will debut a custom Generative AI and Machine Learning server at SIGGRAPH 2023 in Los Angeles this week. In its booth #630 in the LA Convention Center, the team from Puget Systems will demonstrate its new specialized AI Training and Inference server, configured with four NVIDIA RTX 6000 Ada graphics cards to handle intensive generative AI and machine learning and to effectively manage real-time rendering, graphics, AR/MR/VR/XR, compute, and deep learning processing.

The Puget Systems AI Training and Inference server is a rackmount workstation capable of hosting a web-based chat server using STOA models such as the Meta-Llama-2-70b large language models (LLMs) supporting multiple simultaneous users. Puget Systems Labs conducted extensive testing of this configuration with Llama-2-70b and Falcon-40b. (Falcon-40b requires less memory space and can run with only two RTX 6000 Ada GPUs.) In addition to running a chat interface, this hardware is also suitable for base model fine-tuning within the available GPU memory limits.

Puget Labs Testing Processes and Results

The Puget Systems Lab team conducted extensive testing of the new AI Training and Inference servier, utilizing a full set of four NVIDIA RTX 6000 Ada graphics cards. Labs tested the system with Meta’s Llama-2-70b-chat-hf, using HuggingFace Text-Generation-Inference (TGI) server and HuggingFace ChatUI. The test model used approximately 130GB of video memory (VRAM), and the Labs confirmed that the system should work well with other LLMs that fit within available GPU memory (192GB with four cards installed).

Following are some notable performance stats from the testing:

Typical usage measured response:
- Validation Time = 0.59673 ms
- Queue Time = 0.17409 ms
- Time per Token = 54.558 ms
Stress tested with multiple concurrent users
- Data below is from a session with 114 prompts (20-30 users) over 5 minutes
Average prompt response under multi-user load:
- Validation Time = 3.0312 ms
- Queue Time = 4687.9 ms
- Time per Token = 68.076 ms

For more information on Puget Systems AI Training and Inference server, please visit here.

July 1, 2026Blackmagic Design Powers Houston Tamil Sangam Literacy Competition

Volunteers use ATEM Mini Pro, Blackmagic Design cameras and DaVinci Resolve to …

July 1, 2026Manfrotto Introduces UNCOVER, the new premium camera‑bag collection for modern hybrid creators

A new adaptive carry system designed for creators who live in constant …

Pricing and Availability

Puget Systems custom AI Training and Inference servers will be available for configuration for a wide range of generative AI applications beginning in the coming weeks. To learn more or to join the waitlist, please visit here. To learn more about Puget Systems Canadian consulting and sales operations, please visit here.

Enjoying the news? Sign up for the Creative COW Newsletter!

Sign up for the Creative COW newsletter and get weekly updates on industry news, forum highlights, jobs, inspirational tutorials, tips, burning questions, and more! Receive bulletins from the largest, longest-running community dedicated to supporting professionals working in film, video, and audio.

Enter your email address, and your first and last name below!

Puget Labs Testing Processes and Results

Pricing and Availability

Enjoying the news? Sign up for the Creative COW Newsletter!

Sign up:

Responses