add image_loader and fix some bugs of image_url in tool results

main
hkr04 2025-04-10 00:10:05 +08:00
parent 26c45d966e
commit f539edc717
31 changed files with 298 additions and 816 deletions

View File

@ -1,194 +0,0 @@
# Humanus.cpp Docker 开发环境使用指南
本文档提供了使用Docker环境来构建和运行Humanus.cpp项目的指南。
## 环境配置
Docker环境采用多阶段构建方式包含以下组件
- Ubuntu 20.04 作为基础操作系统
- C++ 编译工具链 (GCC, G++, CMake)
- OpenSSL 开发库支持
- Python3 开发环境
- Node.js 18.x 和 npm 支持
- 预安装的npm包:
- @modelcontextprotocol/server-puppeteer
- @modelcontextprotocol/server-filesystem
- @kevinwatt/shell-mcp
- @modelcontextprotocol/server-everything
### 多阶段构建的优势
我们的Dockerfile采用了多阶段构建方法具有以下优点
1. **优化镜像大小**:最终镜像只包含运行时必要的组件
2. **简化依赖管理**:所有依赖都在构建阶段解决,运行阶段不需要网络连接
3. **提高构建成功率**:通过分离构建和运行环境,减少构建失败的风险
4. **加快开发速度**:预构建的工具链减少每次容器启动的准备时间
## 使用方法
### 构建并启动开发环境
使用提供的脚本最为简便:
```bash
# 使用便捷脚本启动开发环境
./.devops/scripts/start-dev.sh
```
此脚本会:
1. 构建Docker镜像使用多阶段构建
2. 启动容器
3. 询问是否进入容器
也可以手动执行这些步骤:
```bash
# 进入项目根目录
cd /path/to/humanus.cpp
# 构建并启动容器
docker-compose -f .devops/docker-compose.yml build
docker-compose -f .devops/docker-compose.yml up -d
# 进入容器的交互式终端
docker-compose -f .devops/docker-compose.yml exec humanus bash
```
### 在容器内编译项目
使用提供的脚本:
```bash
# 使用便捷脚本构建项目
./.devops/scripts/build-project.sh
```
或者手动执行:
```bash
# 进入容器
docker-compose -f .devops/docker-compose.yml exec humanus bash
# 在容器内执行以下命令
cd /app/build
cmake ..
make -j$(nproc)
```
编译完成后,二进制文件将位于 `/app/build/bin/` 目录下。
### 运行项目
可以通过以下方式运行编译后的项目:
```bash
# 从容器外运行
docker-compose -f .devops/docker-compose.yml exec humanus /app/build/bin/humanus_cli
# 或者在容器内运行
# 先进入容器
docker-compose -f .devops/docker-compose.yml exec humanus bash
# 然后在容器内运行
/app/build/bin/humanus_cli
```
## 开发工作流
1. 在宿主机上修改代码
2. 代码会通过挂载卷自动同步到容器内
3. 在容器内重新编译项目
4. 在容器内测试运行
## 注意事项
- 项目的构建文件存储在Docker卷`humanus_build`中,不会影响宿主机的构建目录
- Node.js和npm已在镜像中预先安装无需额外设置
- 默认暴露了8818端口如果需要其他端口请修改`docker-compose.yml`文件
## 网络问题解决方案
如果您在构建过程中仍然遇到网络连接问题,可以尝试以下解决方案:
### 解决EOF错误
如果遇到类似以下的EOF错误
```
failed to solve: ubuntu:20.04: failed to resolve source metadata for docker.io/library/ubuntu:20.04: failed to authorize: failed to fetch anonymous token: Get "https://auth.docker.io/token?scope=repository%3Alibrary%2Fubuntu%3Apull&service=registry.docker.io": EOF
```
这通常是网络连接不稳定或Docker的DNS解析问题导致的。解决方法
1. **配置Docker镜像加速**
在Docker Desktop设置中添加以下配置
```json
{
"registry-mirrors": [
"https://registry.docker-cn.com",
"https://docker.mirrors.ustc.edu.cn",
"https://hub-mirror.c.163.com"
]
}
```
2. **使用构建参数和标志**
```bash
# 使用选项3在start-dev.sh脚本中
# 或手动执行
docker-compose -f .devops/docker-compose.yml build --build-arg BUILDKIT_INLINE_CACHE=1 --network=host
```
3. **尝试拉取基础镜像**
有时先单独拉取基础镜像可以解决问题:
```bash
docker pull ubuntu:20.04
```
### 设置代理
在终端中设置HTTP代理后再运行构建命令
```bash
# 设置HTTP代理环境变量
export HTTP_PROXY=http://your-proxy-server:port
export HTTPS_PROXY=http://your-proxy-server:port
export NO_PROXY=localhost,127.0.0.1
# 然后运行构建
docker-compose -f .devops/docker-compose.yml build
```
### 使用Docker镜像加速器
在Docker Desktop设置中添加镜像加速器
1. 打开Docker Desktop
2. 进入Settings -> Docker Engine
3. 添加以下配置:
```json
{
"registry-mirrors": [
"https://registry.docker-cn.com",
"https://docker.mirrors.ustc.edu.cn",
"https://hub-mirror.c.163.com"
]
}
```
4. 点击"Apply & Restart"
## 问题排查
如果遇到问题,请尝试以下步骤:
1. 检查容器日志:`docker-compose -f .devops/docker-compose.yml logs humanus`
2. 重新构建镜像:`docker-compose -f .devops/docker-compose.yml build --no-cache`
3. 重新创建容器:`docker-compose -f .devops/docker-compose.yml up -d --force-recreate`
4. 网络问题确认Docker可以正常访问互联网或设置适当的代理
5. 磁盘空间:确保有足够的磁盘空间用于构建和运行容器
6. 如果看到"Read-only file system"错误,不要尝试修改容器内的只读文件,而是通过环境变量配置

View File

@ -1,102 +0,0 @@
# 第一阶段:构建环境
FROM ubuntu:20.04 AS builder
# 避免交互式前端
ENV DEBIAN_FRONTEND=noninteractive
# 设置DNS环境变量避免网络连接问题
ENV RES_OPTIONS="timeout:1 attempts:1 rotate"
ENV GETDNS_STUB_TIMEOUT=100
# 使用阿里云镜像源加速
RUN sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list && \
sed -i 's/security.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list
# 安装构建工具和依赖
RUN apt-get update && apt-get install -y \
build-essential \
cmake \
git \
curl \
libssl-dev \
python3-dev \
python3-pip \
ca-certificates \
gnupg \
--no-install-recommends \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# 设置pip镜像源阿里云
RUN python3 -m pip install -i https://mirrors.aliyun.com/pypi/simple/ --upgrade pip && \
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
# 安装Node.js
RUN mkdir -p /etc/apt/keyrings && \
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg && \
echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_18.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list && \
apt-get update && \
apt-get install -y nodejs && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# 设置npm淘宝镜像
RUN npm config set registry https://registry.npmmirror.com
# 安装项目所需的npm包
RUN npm install -g @modelcontextprotocol/server-puppeteer \
@modelcontextprotocol/server-filesystem \
@kevinwatt/shell-mcp \
@modelcontextprotocol/server-everything
# 创建工作目录
WORKDIR /app
# 第二阶段:运行环境(包含所有依赖但没有构建工具)
FROM ubuntu:20.04 AS release
# 避免交互式前端
ENV DEBIAN_FRONTEND=noninteractive
# 设置DNS环境变量避免网络连接问题
ENV RES_OPTIONS="timeout:1 attempts:1 rotate"
ENV GETDNS_STUB_TIMEOUT=100
# 使用阿里云镜像源加速
RUN sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list && \
sed -i 's/security.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list
# 安装运行时依赖(最小化)
RUN apt-get update && apt-get install -y \
libssl-dev \
python3 \
ca-certificates \
curl \
--no-install-recommends \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# 安装Node.js
RUN mkdir -p /etc/apt/keyrings && \
curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg && \
echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_18.x nodistro main" | tee /etc/apt/sources.list.d/nodesource.list && \
apt-get update && \
apt-get install -y nodejs && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# 设置npm淘宝镜像
RUN npm config set registry https://registry.npmmirror.com
# 从构建阶段复制全局npm包
COPY --from=builder /usr/local/lib/node_modules /usr/local/lib/node_modules
COPY --from=builder /usr/local/bin /usr/local/bin
# 创建工作目录
WORKDIR /app
# 创建构建目录
RUN mkdir -p /app/build
# 设置默认命令为bash
CMD ["/bin/bash"]

View File

@ -1,35 +0,0 @@
services:
humanus:
build:
context: ..
dockerfile: .devops/Dockerfile
target: release # 使用第二阶段作为最终镜像
args:
# 添加buildkit参数提高构建稳定性
BUILDKIT_INLINE_CACHE: 1
DOCKER_BUILDKIT: 1
container_name: humanus_cpp
volumes:
# 挂载源代码目录,方便开发时修改代码
- ..:/app
# 创建独立的构建目录,避免覆盖本地构建
- humanus_build:/app/build
ports:
# 如果项目有需要暴露的端口,可以在这里添加
- "8818:8818"
environment:
# 可以在此处设置环境变量
- PYTHONPATH=/app
# 添加DNS相关环境变量避免容器内网络问题
- DNS_OPTS=8.8.8.8,8.8.4.4
# 开发模式下使用交互式终端
stdin_open: true
tty: true
# 默认命令
command: /bin/bash
# 可选使用host网络模式解决某些网络问题仅限Linux
# network_mode: "host"
volumes:
humanus_build:
# 创建一个命名卷用于存储构建文件

View File

@ -1,35 +0,0 @@
#!/bin/bash
# 该脚本在容器内运行用于设置Node.js和npm
echo "=== 安装Node.js和npm ==="
# 首先安装curl如果没有
if ! command -v curl &> /dev/null; then
apt-get update
apt-get install -y curl
fi
# 安装Node.js
echo "正在安装Node.js..."
curl -fsSL https://deb.nodesource.com/setup_18.x | bash -
apt-get install -y nodejs
# 验证安装
echo "Node.js版本:"
node --version
echo "npm版本:"
npm --version
# 设置npm淘宝镜像
echo "配置npm使用淘宝镜像..."
npm config set registry https://registry.npmmirror.com
# 安装项目所需的npm包
echo "安装项目所需的npm包..."
npm install -g @modelcontextprotocol/server-puppeteer \
@modelcontextprotocol/server-filesystem \
@kevinwatt/shell-mcp \
@modelcontextprotocol/server-everything
echo "Node.js和npm设置完成。"

View File

@ -1,90 +0,0 @@
#!/bin/sh
# 脚本路径
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
# 项目根目录
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
echo "=== Humanus.cpp 开发环境启动脚本 ==="
echo "项目根目录: $PROJECT_ROOT"
# 确保在项目根目录执行
cd "$PROJECT_ROOT" || { echo "无法进入项目根目录"; exit 1; }
# 确保脚本有执行权限
chmod +x .devops/scripts/*.sh
# 检查网络连接
echo "正在检查网络连接..."
if ! ping -c 1 -W 1 auth.docker.io > /dev/null 2>&1; then
echo "警告: 无法连接到Docker认证服务器可能会导致EOF错误"
echo "推荐解决方案:"
echo "1. 检查Docker Desktop设置中的DNS配置"
echo "2. 添加Docker镜像加速器"
echo "3. 检查网络连接和代理设置"
echo ""
echo "是否继续尝试构建? (可能会失败)"
read -p "继续构建? (y/n): " CONTINUE_BUILD
if [ "$CONTINUE_BUILD" != "y" ] && [ "$CONTINUE_BUILD" != "Y" ]; then
echo "构建已取消"
exit 1
fi
fi
# 提供使用备用构建选项
echo "选择构建方式:"
echo "1. 标准构建 (docker-compose build)"
echo "2. 使用--no-cache选项 (适用于之前构建失败)"
echo "3. 使用host网络构建 (适用于网络问题)"
read -p "请选择构建方式 [1-3默认1]: " BUILD_OPTION
BUILD_OPTION=${BUILD_OPTION:-1}
# 构建Docker镜像
echo "正在构建Docker镜像多阶段构建..."
case $BUILD_OPTION in
1)
docker-compose -f .devops/docker-compose.yml build
;;
2)
docker-compose -f .devops/docker-compose.yml build --no-cache
;;
3)
docker-compose -f .devops/docker-compose.yml build --build-arg BUILDKIT_INLINE_CACHE=1 --network=host
;;
*)
echo "无效选项,使用标准构建"
docker-compose -f .devops/docker-compose.yml build
;;
esac
# 检查构建结果
if [ $? -ne 0 ]; then
echo "构建失败!请查看错误信息。"
echo "如果看到EOF错误请参考 .devops/DOCKER_README.md 中的网络问题解决方案。"
exit 1
fi
# 启动容器
echo "正在启动开发容器..."
docker-compose -f .devops/docker-compose.yml up -d
# 显示容器状态
echo "容器状态:"
docker-compose -f .devops/docker-compose.yml ps
echo ""
echo "开发环境已启动。所有依赖包括Node.js和npm已预装在镜像中。"
echo ""
echo "您可以使用以下命令进入容器:"
echo "docker-compose -f .devops/docker-compose.yml exec humanus bash"
echo ""
echo "要停止环境,请使用:"
echo "docker-compose -f .devops/docker-compose.yml down"
echo ""
# 询问是否进入容器
read -p "是否立即进入容器? (y/n): " ENTER_CONTAINER
if [ "$ENTER_CONTAINER" = "y" ] || [ "$ENTER_CONTAINER" = "Y" ]; then
echo "进入容器..."
docker-compose -f .devops/docker-compose.yml exec humanus bash
fi

View File

@ -1,32 +0,0 @@
#!/bin/sh
# 脚本路径
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
# 项目根目录
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
echo "=== Humanus.cpp 开发环境停止脚本 ==="
echo "项目根目录: $PROJECT_ROOT"
# 确保在项目根目录执行
cd "$PROJECT_ROOT" || { echo "无法进入项目根目录"; exit 1; }
# 停止并移除容器
echo "正在停止并移除容器..."
docker-compose -f .devops/docker-compose.yml down
# 显示容器状态
echo "容器状态:"
docker-compose -f .devops/docker-compose.yml ps
echo ""
echo "开发环境已停止。"
echo ""
# 询问是否删除构建卷
read -p "是否删除构建卷? (y/n): " REMOVE_VOLUME
if [ "$REMOVE_VOLUME" = "y" ] || [ "$REMOVE_VOLUME" = "Y" ]; then
echo "删除构建卷..."
docker volume rm humanus_cpp_humanus_build
echo "构建卷已删除。"
fi

View File

@ -1,55 +0,0 @@
# 版本控制
.git
.gitignore
.gitmodules
# 构建目录
build/
*/build/
# 日志目录
logs/
# macOS 文件
.DS_Store
# IDE 目录
.vscode/
.idea/
# 临时文件
*.log
*.temp
*.tmp
*.o
*.a
.cache/
# Node.js
node_modules/
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
ENV/
# Docker 相关文件
.dockerignore
# Do not ignore .git directory, otherwise the reported build number will always be 0
.github/
.vs/
models/*
/llama-cli
/llama-quantize
arm_neon.h
compile_commands.json
Dockerfile

View File

@ -1,40 +0,0 @@
# Humanus.cpp Docker 开发环境
本文件提供了使用Docker环境进行Humanus.cpp开发的快速指南。
## 快速开始
### 启动开发环境
```bash
# 使用便捷脚本启动开发环境
./.devops/scripts/start-dev.sh
```
所有依赖包括C++工具链、Node.js和npm已预装在镜像中无需额外设置。
### 在容器内构建项目
```bash
# 使用便捷脚本构建项目
./.devops/scripts/build-project.sh
```
### 停止开发环境
```bash
# 使用便捷脚本停止开发环境
./.devops/scripts/stop-dev.sh
```
## 优点
- **多阶段构建**:优化镜像大小,只包含必要组件
- **预装依赖**所有必要的工具和库包括Node.js和npm包都已预装
- **简化开发**:无需手动设置环境,直接开始开发
- **稳定可靠**使用Ubuntu 20.04作为基础镜像
- **高效脚本**提供便捷脚本无需记忆Docker命令
## 详细说明
有关Docker开发环境的详细说明包括多阶段构建的优势请参考 [.devops/DOCKER_README.md](.devops/DOCKER_README.md) 文件。

View File

@ -90,7 +90,8 @@ struct BaseAgent : std::enable_shared_from_this<BaseAgent> {
memory->current_request = request;
if (state != AgentState::IDLE) {
throw std::runtime_error("Cannot run agent from state " + agent_state_map[state]);
logger->error("Cannot run agent from state " + agent_state_map[state]);
return "Cannot run agent from state " + agent_state_map[state];
}
if (!request.empty()) {

View File

@ -6,11 +6,10 @@
#include "prompt.h"
#include "tool/tool_collection.h"
#include "tool/python_execute.h"
#include "tool/terminate.h"
#include "tool/puppeteer.h"
#include "tool/playwright.h"
#include "tool/filesystem.h"
#include "tool/image_loader.h"
namespace humanus {
/**
@ -27,6 +26,8 @@ struct Humanus : ToolCallAgent {
std::make_shared<PythonExecute>(),
std::make_shared<Filesystem>(),
std::make_shared<Playwright>(),
std::make_shared<ImageLoader>(),
std::make_shared<ContentProvider>(),
std::make_shared<Terminate>()
}
),

View File

@ -16,7 +16,7 @@ bool ToolCallAgent::think() {
tool_calls = ToolCall::from_json_list(response["tool_calls"]);
// Log response info
logger->info("" + name + "'s thoughts: " + response["content"].get<std::string>());
logger->info("" + name + "'s thoughts: " + (response["content"].empty() ? "<no content>" : response["content"].get<std::string>()));
logger->info(
"🛠️ " + name + " selected " + std::to_string(tool_calls.size()) + " tool(s) to use"
);
@ -55,11 +55,6 @@ bool ToolCallAgent::think() {
return true; // Will be handled in act()
}
// For 'auto' mode, continue with content if no commands but content exists
if (tool_choice == "auto" && !tool_calls.empty()) {
return !response["content"].empty();
}
return !tool_calls.empty();
} catch (const std::exception& e) {
logger->error("🚨 Oops! The " + name + "'s thinking process hit a snag: " + std::string(e.what()));
@ -78,50 +73,77 @@ std::string ToolCallAgent::act() {
return memory->get_messages().empty() || memory->get_messages().back().content.empty() ? "No content or commands to execute" : memory->get_messages().back().content.dump();
}
std::vector<std::string> results;
std::vector<ToolResult> results;
std::string result_str;
for (const auto& tool_call : tool_calls) {
if (state != AgentState::RUNNING) {
result_str += "Agent is not running, so no more tool calls will be executed.\n\n";
break;
}
auto result = execute_tool(tool_call);
logger->info(
"🎯 Tool `" + tool_call.function.name + "` completed its mission! Result: " + result.substr(0, 500) + (result.size() > 500 ? "..." : "")
"🎯 Tool `" + tool_call.function.name + "` completed its mission! Result: " + result.to_string(500)
);
if (result.to_string().size() > 12288) { // Pre-check before tokenization (will be done in Message constructor)
if (!(result.output.size() == 1 && result.output[0]["type"] == "image_url")) {
// If the result is not an image, split the result into multiple chunks and save to memory
// Might be long text or mixture of text and image
result = content_provider->handle_write({
{"content", result.output}
});
logger->info("🔍 Tool result for `" + tool_call.function.name + "` has been split into multiple chunks and saved to memory.");
result.output = "This tool call has been split into multiple chunks and saved to memory. Please refer to below information to use the `content_provider` tool to read the chunks:\n" + result.to_string();
}
}
// Add tool response to memory
Message tool_msg = Message::tool_message(
result, tool_call.id, tool_call.function.name
result.error.empty() ? result.output : result.error, tool_call.id, tool_call.function.name
);
// If the tool message is too long, use the `content_provider` tool to split the message into multiple chunks
if (tool_msg.num_tokens > 4096) { // TODO: Make this configurable)
auto result = content_provider->handle_write({
{"content", tool_msg.content}
});
logger->info("🔍 Tool result for `" + tool_call.function.name + "` has been split into multiple chunks and saved to memory.");
tool_msg = Message::tool_message(
"This tool call has been split into multiple chunks and saved to memory. Please refer to below information to use the `content_provider` tool to read the chunks:\n" + result.to_string(),
tool_call.id,
tool_call.function.name
);
}
memory->add_message(tool_msg);
results.push_back(result);
}
auto observation = result.empty() ?
"Tool `" + tool_msg.name + "` completed with no output" :
"Observed output of tool `" + tool_msg.name + "` executed:\n" + result.to_string();
std::string result_str;
for (const auto& result : results) {
result_str += result + "\n\n";
}
if (state != AgentState::RUNNING) {
result_str += "Agent is not running, so no more tool calls will be executed.\n\n";
result_str += observation + "\n\n";
}
return result_str;
}
// Execute a single tool call with robust error handling
std::string ToolCallAgent::execute_tool(ToolCall tool_call) {
ToolResult ToolCallAgent::execute_tool(ToolCall tool_call) {
if (tool_call.empty() || tool_call.function.empty() || tool_call.function.name.empty()) {
return "Error: Invalid command format";
return ToolError("Invalid command format");
}
std::string name = tool_call.function.name;
if (available_tools.tools_map.find(name) == available_tools.tools_map.end()) {
return "Error: Unknown tool `" + name + "`. Please use one of the following tools: " +
return ToolError("Unknown tool `" + name + "`. Please use one of the following tools: " +
std::accumulate(available_tools.tools_map.begin(), available_tools.tools_map.end(), std::string(),
[](const std::string& a, const auto& b) {
return a + (a.empty() ? "" : ", ") + b.first;
});
})
);
}
try {
@ -136,25 +158,20 @@ std::string ToolCallAgent::execute_tool(ToolCall tool_call) {
logger->info("🔧 Activating tool: `" + name + "`...");
ToolResult result = available_tools.execute(name, args);
// Format result for display
auto observation = result.empty() ?
"Cmd `" + name + "` completed with no output" :
"Observed output of cmd `" + name + "` executed:\n" + result.to_string();
// Handle special tools like `finish`
_handle_special_tool(name, result);
return observation;
return result;
} catch (const json::exception& /* e */) {
std::string error_msg = "Error parsing arguments for " + name + ": Invalid JSON format";
logger->error(
"📝 Oops! The arguments for `" + name + "` don't make sense - invalid JSON"
);
return "Error: " + error_msg;
return ToolError(error_msg);
} catch (const std::exception& e) {
std::string error_msg = "⚠️ Tool `" + name + "` encountered a problem: " + std::string(e.what());
logger->error(error_msg);
return "Error: " + error_msg;
return ToolError(error_msg);
}
}

View File

@ -5,6 +5,7 @@
#include "prompt.h"
#include "tool/tool_collection.h"
#include "tool/terminate.h"
#include "tool/content_provider.h"
namespace humanus {
@ -15,9 +16,12 @@ struct ToolCallAgent : ReActAgent {
std::string tool_choice;
std::set<std::string> special_tool_names;
std::shared_ptr<ContentProvider> content_provider;
ToolCallAgent(
const ToolCollection& available_tools = ToolCollection(
{
std::make_shared<ContentProvider>(),
std::make_shared<Terminate>()
}
),
@ -44,8 +48,14 @@ struct ToolCallAgent : ReActAgent {
available_tools(available_tools),
tool_choice(tool_choice),
special_tool_names(special_tool_names) {
if (available_tools.tools_map.find("terminate") == available_tools.tools_map.end()) {
throw std::runtime_error("terminate tool must be present in available_tools");
if (this->available_tools.tools_map.find("terminate") == this->available_tools.tools_map.end()) {
this->available_tools.add_tool(std::make_shared<Terminate>());
}
if (this->available_tools.tools_map.find("content_provider") == this->available_tools.tools_map.end()) {
content_provider = std::make_shared<ContentProvider>();
this->available_tools.add_tool(content_provider);
} else {
content_provider = std::dynamic_pointer_cast<ContentProvider>(this->available_tools.tools_map["content_provider"]);
}
}
@ -56,7 +66,7 @@ struct ToolCallAgent : ReActAgent {
std::string act() override;
// Execute a single tool call with robust error handling
std::string execute_tool(ToolCall tool_call);
ToolResult execute_tool(ToolCall tool_call);
// Handle special tool execution and state changes
void _handle_special_tool(const std::string& name, const ToolResult& result, const json& kwargs = {});

View File

@ -1,4 +1,4 @@
[default]
[memory]
model = "qwen-max"
base_url = "https://dashscope.aliyuncs.com"
endpoint = "/compatible-mode/v1/chat/completions"
@ -10,18 +10,19 @@ base_url = "https://open.bigmodel.cn"
endpoint = "/api/paas/v4/chat/completions"
api_key = "7e12e1cb8fe5786d83c74d2ef48db511.xPVWzEZt8RvIciW9"
[qwen-vl-max]
model = "qwen-vl-max"
[default]
model = "qwen-vl-max-latest"
base_url = "https://dashscope.aliyuncs.com"
endpoint = "/compatible-mode/v1/chat/completions"
api_key = "sk-cb1bb2a240d84182bb93f6dd0fe03600"
enable_vision = true
[claude-3.5-sonnet]
model = "anthropic/claude-3.5-sonnet"
base_url = "https://openrouter.ai"
endpoint = "/api/v1/chat/completions"
api_key = "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"
max_tokens = 8192
enable_vision = true
[deepseek-chat]
model = "deepseek-chat"
@ -34,4 +35,4 @@ model = "deepseek-reasoner"
base_url = "https://api.deepseek.com"
endpoint = "/v1/chat/completions"
api_key = "sk-93c5bfcb920c4a8aa345791d429b8536"
oai_tool_support = false
enable_tool = false

View File

@ -1,36 +0,0 @@
[llm]
model = "anthropic/claude-3.7-sonnet"
base_url = "https://openrouter.ai"
endpoint = "/api/v1/chat/completions"
api_key = "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"
max_tokens = 8192
[llm]
model = "deepseek-chat"
base_url = "https://api.deepseek.com"
endpoint = "/v1/chat/completions"
api_key = "sk-93c5bfcb920c4a8aa345791d429b8536"
max_tokens = 8192
[llm]
model = "qwen-max"
base_url = "https://dashscope.aliyuncs.com"
endpoint = "/compatible-mode/v1/chat/completions"
api_key = "sk-cb1bb2a240d84182bb93f6dd0fe03600"
max_tokens = 8192
[llm]
model = "deepseek-reasoner"
base_url = "https://api.deepseek.com"
endpoint = "/v1/chat/completions"
api_key = "sk-93c5bfcb920c4a8aa345791d429b8536"
max_tokens = 8192
oai_tool_support = false
[llm]
model = "claude-3-5-sonnet-20241022"
base_url = "https://gpt.soruxgpt.com"
endpoint = "/api/api/v1/chat/completions"
api_key = "sk-o38PVgxNjzt8bYsfruSlKq9DqoPeiOwKytlOzN7fakJ4YRDF"
max_tokens = 8192
oai_tool_support = false

View File

@ -0,0 +1,6 @@
[memory]
max_messages = 32
max_tokens_message = 32768
max_tokens_messages = 65536
max_tokens_context = 131072
retrieval_limit = 32

View File

@ -55,9 +55,11 @@ int main() {
if (agent.current_step == agent.max_steps) {
std::cout << "Automatically paused after " << agent.max_steps << " steps." << std::endl;
std::cout << "Enter your prompt (enter an empty line to resume or 'exit' to quit): ";
std::cout.flush();
agent.reset(false);
} else {
std::cout << "Enter your prompt (or 'exit' to quit): ";
std::cout.flush();
}
std::string prompt;

View File

@ -305,7 +305,7 @@ std::string PlanningFlow::_get_plan_text() {
{"plan_id", active_plan_id}
});
return !result.output.empty() ? result.output.dump() : result.to_string();
return result.to_string();
} catch (const std::exception& e) {
LOG_ERROR("Error getting plan: " + std::string(e.what()));
return _generate_plan_text_from_storage();

View File

@ -24,7 +24,7 @@ struct LLMConfig {
int timeout;
double temperature;
bool enable_vision;
bool oai_tool_support;
bool enable_tool;
LLMConfig(
const std::string& model = "deepseek-chat",
@ -36,9 +36,9 @@ struct LLMConfig {
int timeout = 120,
double temperature = -1, // -1 for default
bool enable_vision = false,
bool oai_tool_support = true
bool enable_tool = true
) : model(model), api_key(api_key), base_url(base_url), endpoint(endpoint), vision_details(vision_details),
max_tokens(max_tokens), timeout(timeout), temperature(temperature), enable_vision(enable_vision), oai_tool_support(oai_tool_support) {}
max_tokens(max_tokens), timeout(timeout), temperature(temperature), enable_vision(enable_vision), enable_tool(enable_tool) {}
json to_json() const {
json j;
@ -195,9 +195,11 @@ struct VectorStoreConfig {
struct MemoryConfig {
// Base config
int max_messages = 16; // Short-term memory capacity
int max_tokens_context = 32768; // Maximum number of tokens in short-term memory
int retrieval_limit = 32; // Number of results to retrive from long-term memory
int max_messages = 32; // Maximum number of messages in short-term memory
int max_tokens_message = 1 << 15; // Maximum number of tokens in single message
int max_tokens_messages = 1 << 19; // Maximum number of tokens in short-term memory
int max_tokens_context = 1 << 20; // Maximum number of tokens in short-term memory
int retrieval_limit = 32; // Number of results to retrive from long-term memory
// Prompt config
std::string fact_extraction_prompt = prompt::FACT_EXTRACTION_PROMPT;

View File

@ -31,7 +31,7 @@ private:
public:
// Constructor
LLM(const std::string& config_name, const std::shared_ptr<LLMConfig>& config = nullptr, const std::shared_ptr<ToolParser>& tool_parser = nullptr) : llm_config_(config), tool_parser_(tool_parser) {
if (!llm_config_->oai_tool_support && !tool_parser_) {
if (!llm_config_->enable_tool && !tool_parser_) {
if (Config::get_instance().tool_parser().find(config_name) == Config::get_instance().tool_parser().end()) {
logger->warn("Tool helper config not found: " + config_name + ", falling back to default tool helper config.");
tool_parser_ = std::make_shared<ToolParser>(Config::get_instance().tool_parser().at("default"));

View File

@ -3,6 +3,7 @@
#include "mcp_message.h"
#include "utils.h"
#include "httplib.h"
#include "tokenizer/utils.h"
#include "tokenizer/bpe.h"
#include <string>
@ -173,14 +174,22 @@ struct MemoryItem {
long long updated_at; // The last update time of the memory
float score; // The score associated with the text data, used for ranking and sorting
MemoryItem(size_t id = -1, const std::string& memory = "", const std::string& hash = "")
: id(id), memory(memory), hash(hash) {
MemoryItem(size_t id = -1, const std::string& memory = "")
: id(id), memory(memory) {
hash = httplib::detail::MD5(memory);
auto now = std::chrono::system_clock::now().time_since_epoch().count();
created_at = now;
updated_at = now;
score = -1.0f;
}
void update_memory(const std::string& memory) {
this->memory = memory;
hash = httplib::detail::MD5(memory);
auto now = std::chrono::system_clock::now().time_since_epoch().count();
updated_at = now;
}
bool empty() const {
return memory.empty();
}

View File

@ -65,6 +65,8 @@ struct Memory : BaseMemory {
std::string fact_extraction_prompt;
std::string update_memory_prompt;
int max_messages;
int max_token_message;
int max_tokens_messages;
int max_tokens_context;
int retrieval_limit;
FilterFunc filter;
@ -77,12 +79,15 @@ struct Memory : BaseMemory {
bool retrieval_enabled;
int num_tokens_context;
int num_tokens_messages;
Memory(const MemoryConfig& config) : config(config) {
fact_extraction_prompt = config.fact_extraction_prompt;
update_memory_prompt = config.update_memory_prompt;
max_messages = config.max_messages;
max_token_message = config.max_tokens_message;
max_tokens_messages = config.max_tokens_messages;
max_tokens_context = config.max_tokens_context;
retrieval_limit = config.retrieval_limit;
filter = config.filter;
@ -125,25 +130,25 @@ struct Memory : BaseMemory {
}
bool add_message(const Message& message) override {
if (message.num_tokens > config.max_tokens_context) {
if (message.num_tokens > config.max_tokens_message) {
logger->warn("Message is too long, skipping"); // TODO: use content_provider to handle this
return false;
}
messages.push_back(message);
num_tokens_context += message.num_tokens;
num_tokens_messages += message.num_tokens;
std::vector<Message> messages_to_memory;
while (messages.size() > max_messages || num_tokens_context > config.max_tokens_context) {
while (messages.size() > max_messages || num_tokens_messages > config.max_tokens_messages) {
messages_to_memory.push_back(messages.front());
num_tokens_context -= messages.front().num_tokens;
num_tokens_messages -= messages.front().num_tokens;
messages.pop_front();
}
if (!messages.empty()) { // Ensure the first message is always a user or system message
if (messages.front().role == "assistant") {
messages.push_front(Message::user_message("Current request: " + current_request + "\n\nDue to limited memory, some previous messages are not shown."));
num_tokens_context += messages.front().num_tokens;
num_tokens_messages += messages.front().num_tokens;
} else if (messages.front().role == "tool") {
messages_to_memory.push_back(messages.front());
num_tokens_context -= messages.front().num_tokens;
num_tokens_messages -= messages.front().num_tokens;
messages.pop_front();
}
}
@ -167,7 +172,7 @@ struct Memory : BaseMemory {
if (retrieval_enabled && !query.empty()) {
auto embeddings = embedding_model->embed(
query.size() > 8192 ? query.substr(0, validate_utf8(query.substr(0, 8192))) : query, // TODO: split to chunks instead of truncating
query,
EmbeddingType::SEARCH
);
std::vector<MemoryItem> memories;
@ -179,17 +184,24 @@ struct Memory : BaseMemory {
if (!memories.empty()) {
sort(memories.begin(), memories.end(), [](const MemoryItem& a, const MemoryItem& b) {
return a.updated_at < b.updated_at;
return a.updated_at > b.updated_at;
});
std::string memory_prompt;
for (const auto& memory_item : memories) {
memory_prompt += "<memory>" + memory_item.memory + "</memory>";
int num_tokens_context = num_tokens_messages;
std::deque<Message> memory_messages;
for (const auto& memory_item : memories) { // Make sure the oldest memory is at the front of the deque and the tokens within the limit
auto memory_message = Message::user_message("<memory>" + memory_item.memory + "</memory>");
if (num_tokens_context + memory_message.num_tokens > config.max_tokens_context) {
break;
}
num_tokens_context += memory_message.num_tokens;
memory_messages.push_front(memory_message);
}
messages_with_memory.push_back(Message::user_message(memory_prompt));
logger->info("📤 Total retreived memories: " + std::to_string(memory_messages.size()));
logger->info("📤 Total retreived memories: " + std::to_string(memories.size()));
messages_with_memory.insert(messages_with_memory.end(), memory_messages.begin(), memory_messages.end());
}
}
@ -388,8 +400,7 @@ struct Memory : BaseMemory {
MemoryItem metadata{
memory_id,
data,
httplib::detail::MD5(data)
data
};
vector_store->insert(
@ -423,9 +434,7 @@ struct Memory : BaseMemory {
embedding = embedding_model->embed(data, EmbeddingType::ADD);
}
existing_memory.memory = data;
existing_memory.hash = httplib::detail::MD5(data);
existing_memory.updated_at = std::chrono::system_clock::now().time_since_epoch().count();
existing_memory.update_memory(data);
vector_store->update(
memory_id,

View File

@ -24,7 +24,7 @@ std::vector<float> OAIEmbeddingModel::embed(const std::string& text, EmbeddingTy
json json_data = json::parse(res->body);
return json_data["data"][0]["embedding"].get<std::vector<float>>();
} catch (const std::exception& e) {
logger->error(std::string(__func__) + ": Failed to parse response: " + std::string(e.what()));
logger->error(std::string(__func__) + ": Failed to parse response: error=" + std::string(e.what()) + ", body=" + res->body);
}
} else {
logger->error(std::string(__func__) + ": Failed to send request: status=" + std::to_string(res->status) + ", body=" + res->body);

View File

@ -123,13 +123,13 @@ void Config::_load_initial_llm_config() {
llm_config.enable_vision = llm_table["enable_vision"].as_boolean()->get();
}
if (llm_table.contains("oai_tool_support") && llm_table["oai_tool_support"].is_boolean()) {
llm_config.oai_tool_support = llm_table["oai_tool_support"].as_boolean()->get();
if (llm_table.contains("enable_tool") && llm_table["enable_tool"].is_boolean()) {
llm_config.enable_tool = llm_table["enable_tool"].as_boolean()->get();
}
_config.llm[std::string(key.str())] = llm_config;
if (!llm_config.oai_tool_support) {
if (!llm_config.enable_tool) {
// Load tool helper configuration
ToolParser tool_parser;
if (llm_table.contains("tool_parser") && llm_table["tool_parser"].is_table()) {

View File

@ -43,17 +43,14 @@ json LLM::format_messages(const std::vector<Message>& messages) {
continue;
}
formatted_messages.push_back(message.to_json());
if (!llm_config_->oai_tool_support) {
if (!llm_config_->enable_tool) {
if (formatted_messages.back()["content"].is_null()) {
formatted_messages.back()["content"] = "";
}
if (formatted_messages.back()["role"] == "tool") {
std::string tool_results_str = formatted_messages.back().dump(2);
formatted_messages.back() = {
{"role", "user"},
{"content", tool_results_str}
};
formatted_messages.back()["role"] = "user";
formatted_messages.back()["content"] = concat_content("Tool result for `" + message.name + "`:\n\n", formatted_messages.back()["content"]);
} else if (!formatted_messages.back()["tool_calls"].empty()) {
if (formatted_messages.back()["content"].is_null()) {
formatted_messages.back()["content"] = "";
}
std::string tool_calls_str = tool_parser_->dump(formatted_messages.back()["tool_calls"]);
formatted_messages.back().erase("tool_calls");
formatted_messages.back()["content"] = concat_content(formatted_messages.back()["content"], tool_calls_str);
@ -150,7 +147,7 @@ std::string LLM::ask(
total_completion_tokens_ += json_data["usage"]["completion_tokens"].get<size_t>();
return json_data["choices"][0]["message"]["content"].get<std::string>();
} catch (const std::exception& e) {
logger->error(std::string(__func__) + ": Failed to parse response: " + std::string(e.what()));
logger->error(std::string(__func__) + ": Failed to parse response: error=" + std::string(e.what()) + ", body=" + res->body);
}
} else {
logger->error(std::string(__func__) + ": Failed to send request: status=" + std::to_string(res->status) + ", body=" + res->body);
@ -253,7 +250,7 @@ json LLM::ask_tool(
body["max_tokens"] = llm_config_->max_tokens;
}
if (llm_config_->oai_tool_support) {
if (llm_config_->enable_tool) {
body["tools"] = tools;
body["tool_choice"] = tool_choice;
} else {
@ -286,14 +283,14 @@ json LLM::ask_tool(
try {
json json_data = json::parse(res->body);
json message = json_data["choices"][0]["message"];
if (!llm_config_->oai_tool_support && message["content"].is_string()) {
if (!llm_config_->enable_tool && message["content"].is_string()) {
message = tool_parser_->parse(message["content"].get<std::string>());
}
total_prompt_tokens_ += json_data["usage"]["prompt_tokens"].get<size_t>();
total_completion_tokens_ += json_data["usage"]["completion_tokens"].get<size_t>();
return message;
} catch (const std::exception& e) {
logger->error(std::string(__func__) + ": Failed to parse response: " + std::string(e.what()));
logger->error(std::string(__func__) + ": Failed to parse response: error=" + std::string(e.what()) + ", body=" + res->body);
}
} else {
logger->error(std::string(__func__) + ": Failed to send request: status=" + std::to_string(res->status) + ", body=" + res->body);

View File

@ -8,10 +8,12 @@ namespace humanus {
const char* SYSTEM_PROMPT = "\
You are Humanus, an all-capable AI assistant, aimed at solving any task presented by the user. You have various tools at your disposal that you can call upon to efficiently complete complex requests. Whether it's programming, information retrieval, file processing or web browsingyou can handle it all.";
const char* NEXT_STEP_PROMPT = R"(You can interact with the computer using python_execute, save important content and information files through filesystem, open browsers and retrieve information with playwright.
const char* NEXT_STEP_PROMPT = R"(You can interact with the computer using python_execute, save important content and information files through filesystem, get base64 image from file or url with image_loader, save and load content with content_provider, open browsers and retrieve information with playwright.
- python_execute: Execute Python code to interact with the computer system, data processing, automation tasks, etc.
- filesystem: Read/write files locally, such as txt, py, html, etc. Create/list/delete directories, move files/directories, search for files and get file metadata.
- playwright: Interact with web pages, take screenshots, generate test code, web scraps the page and execute JavaScript in a real browser environment. Note: Most of the time you need to observer the page before executing other actions.
- image_loader: Get base64 image from file or url.
- content_provider: Save content and retrieve by chunks.
Remember the following:
- Today's date is {current_date}.

View File

@ -54,13 +54,32 @@ struct ToolResult {
static std::string parse_json_content(const json& content) {
if (content.is_string()) {
return content.get<std::string>();
} else if (content.is_array()) {
std::string result;
for (const auto& item : content) {
if (item["type"] == "text") {
result += item["text"].get<std::string>();
} else if (item["type"] == "image_url") {
result += "<image>" + item["image_url"]["url"].get<std::string>() + "</image>";
}
}
return result;
} else {
return content.dump(2);
}
}
std::string to_string() const {
return !error.empty() ? "Error: " + parse_json_content(error) : parse_json_content(output);
std::string to_string(int max_length = -1) const {
std::string result;
if (!error.empty()) {
result = "Error: " + parse_json_content(error);
} else {
result = parse_json_content(output);
}
if (max_length > 0 && result.length() > max_length) {
result = result.substr(0, max_length) + "...";
}
return result;
}
};

View File

@ -22,13 +22,13 @@ struct ContentProvider : BaseTool {
},
"content": {
"type": "array",
"description": "The content to store. Required when operation is `write`. Format: [{`type`: `text`, `text`: `content`}, {`type`: `image`, `image_url`: {`url`: `image_url`}}]",
"description": "The content to store. Required when operation is `write` (the `read` operation will return the same format). Format: [{'type': 'text', 'text': <content>}, {'type': 'image_url', 'image_url': {'url': <image_url>}}]",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": ["text", "image"]
"enum": ["text", "image_url"]
},
"text": {
"type": "string",
@ -36,7 +36,7 @@ struct ContentProvider : BaseTool {
},
"image_url": {
"type": "object",
"description": "Image URL information. Required when type is `image`.",
"description": "Image URL information. Required when type is `image_url`.",
"properties": {
"url": {
"type": "string",
@ -53,7 +53,7 @@ struct ContentProvider : BaseTool {
},
"max_chunk_size": {
"type": "integer",
"description": "Maximum size in characters for each text chunk. Default is 4000.",
"description": "Maximum size in characters for each text chunk. Default is 4000. Used by `write` operation.",
"default": 4000
}
},
@ -111,21 +111,14 @@ struct ContentProvider : BaseTool {
// 如果找到了合适的分割点且不是原始位置
if (break_pos > min_pos) {
// 向前移动到分隔符后面的位置
break_pos++;
// 检查新的分割点是否会导致 UTF-8 截断
break_pos++; // Include the last character
std::string new_chunk = text.substr(offset, break_pos - offset);
size_t new_valid_length = validate_utf8(new_chunk);
if (new_valid_length == new_chunk.size()) {
// 只有在不会截断 UTF-8 字符的情况下使用新的分割点
chunk_size = break_pos - offset;
}
size_t new_valid_length = validate_utf8(new_chunk); // Validate the new chunk
chunk_size = break_pos - offset;
}
}
// 创建一个文本块
// Create a text chunk
json chunk;
chunk["type"] = "text";
chunk["text"] = text.substr(offset, chunk_size);
@ -146,6 +139,8 @@ struct ContentProvider : BaseTool {
}
std::vector<json> processed_content;
std::string text_content;
// 处理内容,分割大型文本
for (const auto& item : args["content"]) {
@ -153,17 +148,21 @@ struct ContentProvider : BaseTool {
return ToolError("Each content item must have a `type` field");
}
std::string type = item["type"];
std::string type = item["type"].get<std::string>();
if (type == "text") {
if (!item.contains("text") || !item["text"].is_string()) {
return ToolError("Text items must have a `text` field with string value");
}
std::string text = item["text"];
auto chunks = split_text_into_chunks(text, max_chunk_size);
processed_content.insert(processed_content.end(), chunks.begin(), chunks.end());
} else if (type == "image") {
text_content += item["text"].get<std::string>() + "\n\n"; // Handle them together
} else if (type == "image_url") {
if (!text_content.empty()) {
auto chunks = split_text_into_chunks(text_content, max_chunk_size);
processed_content.insert(processed_content.end(), chunks.begin(), chunks.end());
text_content.clear();
}
if (!item.contains("image_url") || !item["image_url"].is_object() ||
!item["image_url"].contains("url") || !item["image_url"]["url"].is_string()) {
return ToolError("Image items must have an `image_url` field with a `url` property");
@ -175,9 +174,20 @@ struct ContentProvider : BaseTool {
return ToolError("Unsupported content type: " + type);
}
}
if (!text_content.empty()) {
auto chunks = split_text_into_chunks(text_content, max_chunk_size);
processed_content.insert(processed_content.end(), chunks.begin(), chunks.end());
text_content.clear();
}
// 生成一个唯一的存储ID
std::string store_id = "content_" + std::to_string(current_id_);
if (content_store_.find(store_id) != content_store_.end()) {
logger->warn("Store ID `" + store_id + "` already exists, it will be overwritten");
}
current_id_ = (current_id_ + 1) % MAX_STORE_ID;
// 存储处理后的内容
@ -220,29 +230,8 @@ struct ContentProvider : BaseTool {
return ToolResult(result);
} else if (cursor == "select_store") {
// 用户需要选择一个存储ID
return ToolError("Please provide a store_id as cursor in format `store_id:content_X`");
} else if (cursor.find("store_id:") == 0) {
// 用户选择了一个存储ID
std::string store_id = cursor.substr(9); // 移除 "store_id:" 前缀
if (content_store_.find(store_id) == content_store_.end()) {
return ToolError("Store ID `" + store_id + "` not found");
}
// 返回该存储的第一个内容项
json result = content_store_[store_id][0];
// 添加导航信息
if (content_store_[store_id].size() > 1) {
result["next_cursor"] = store_id + ":1";
result["remaining_items"] = content_store_[store_id].size() - 1;
} else {
result["next_cursor"] = "end";
result["remaining_items"] = 0;
}
return ToolResult(result);
} else if (cursor.find(":") != std::string::npos) {
return ToolError("Please provide a store_id as cursor in format `content_X:Y`");
} else if (cursor.find(":") != std::string::npos) { // content_X:Y
// 用户正在浏览特定存储内的内容
size_t delimiter_pos = cursor.find(":");
std::string store_id = cursor.substr(0, delimiter_pos);

View File

@ -0,0 +1,95 @@
#ifndef HUMANUS_IMAGE_LOADER_H
#define HUMANUS_IMAGE_LOADER_H
#include "httplib.h"
namespace humanus {
struct ImageLoader : BaseTool {
inline static const std::string name_ = "image_loader";
inline static const std::string description_ = "Load an image from URL. Returns the image as a base64 encoded string in the format 'data:<mime_type>;base64,<base64_image_data>'.";
inline static const json parameters_ = json::parse(R"json({
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The URL of the image to load. Supports HTTP/HTTPS URLs and local file paths. If the URL is a local file path, it must start with file://"
}
},
"required": ["url"]
})json");
inline static const std::map<std::string, std::string> mime_type_map = {
{".bmp", "bmp"}, {".dib", "bmp"},
{".icns", "icns"},
{".ico", "x-icon"},
{".jfif", "jpeg"}, {".jpe", "jpeg"}, {".jpeg", "jpeg"}, {".jpg", "jpeg"},
{".j2c", "jp2"}, {".j2k", "jp2"}, {".jp2", "jp2"}, {".jpc", "jp2"}, {".jpf", "jp2"}, {".jpx", "jp2"},
{".apng", "png"}, {".png", "png"},
{".bw", "sgi"}, {".rgb", "sgi"}, {".rgba", "sgi"}, {".sgi", "sgi"}, {".tif", "tiff"}, {".tiff", "tiff"}, {".webp", "webp"},
{".gif", "gif"}
};
ImageLoader() : BaseTool(name_, description_, parameters_) {}
std::string get_mime_type(const std::string& path) {
std::string extension = path.substr(path.find_last_of(".") + 1);
if (mime_type_map.find(extension) != mime_type_map.end()) {
return "image/" + mime_type_map.at(extension);
}
return "image/png";
}
ToolResult execute(const json& args) override {
if (!args.contains("url")) {
return ToolError("`url` is required");
}
std::string binary_content;
std::string url = args["url"];
if (url.find("http") == 0) {
std::string base_url, endpoint;
size_t pos = url.find("://");
if (pos == std::string::npos) {
return ToolError("Invalid URL");
}
pos = url.find("/", pos + 3);
if (pos == std::string::npos) {
return ToolError("Invalid URL");
}
base_url = url.substr(0, pos);
endpoint = url.substr(pos);
httplib::Client client(base_url);
auto res = client.Get(endpoint.c_str());
if (res->status != 200) {
return ToolError("Failed to load image from URL");
}
binary_content = res->body;
} else if (url.find("file://") == 0) {
std::ifstream file(url.substr(7));
if (!file.is_open()) {
return ToolError("Invalid file path");
}
binary_content = std::string((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
} else {
return ToolError("Invalid URL");
}
std::string base64_image = httplib::detail::base64_encode(binary_content);
std::string mime_type = get_mime_type(url);
std::string image_data = "data:" + mime_type + ";base64," + base64_image;
json result = {{
{"type", "image_url"},
{"image_url", {
{"url", image_data}
}}
}};
return ToolResult(result);
}
};
} // namespace humanus
#endif

View File

@ -1,59 +0,0 @@
#ifndef HUMANUS_TOOL_MEMORY_UPDATE_H
#define HUMANUS_TOOL_MEMORY_UPDATE_H
namespace humanus {
struct MemoryUpdate : BaseTool {
inline static const std::string name_ = "memory_update";
inline static const std::string description_ = "Compare newly retrieved facts with the existing memory. For each new fact, decide whether to:\n- ADD: Add it to the memory as a new element\n- UPDATE: Update an existing memory element\n- DELETE: Delete an existing memory element\n- NONE: Make no change (if the fact is already present or irrelevant)";
inline static const json parameters = json::parse(R"json({
"type": "object",
"properties": {
"memory": {
"description": "List of memory operations.",
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"description": "Unique integer ID of the memory item, required by event UPDATE and DELETE",
"type": "number"
},
"text": {
"description": "Plain text fact to ADD, UPDATE or DELETE",
"type": "string"
},
"event": {
"description": "The type of the operation",
"type": "string",
"enum": [
"ADD",
"UPDATE",
"DELETE",
"NONE"
]
}
},
"required": [
"text",
"event"
]
}
}
},
"required": [
"memory"
],
"additionalProperties": false
})json");
MemoryUpdate() : BaseTool(name_, description_, parameters) {}
ToolResult execute(const json& arguments) override {
return ToolResult(arguments["memory"]);
}
};
} // namespace humanus
#endif // HUMANUS_TOOL_MEMORY_UPDATE_H

View File

@ -224,11 +224,11 @@ struct Playwright : BaseMCPTool {
for (size_t i = 0; i < result["content"].size(); i++) {
if (result["content"][i]["type"] == "image") {
std::string data = result["content"][i]["data"].get<std::string>();
std::string mimeType = result["content"][i].value("mimeType", "image/png");
std::string mime_type = result["content"][i].value("mimeType", "image/png");
// Convert to OAI-compatible image_url format
result["content"][i] = {
{"type", "image_url"},
{"image_url", {{"url", "data:" + mimeType + ";base64," + data}}}
{"image_url", {{"url", "data:" + mime_type + ";base64," + data}}}
};
}
}

View File

@ -102,11 +102,11 @@ struct Puppeteer : BaseMCPTool {
for (size_t i = 0; i < result["content"].size(); i++) {
if (result["content"][i]["type"] == "image") {
std::string data = result["content"][i]["data"].get<std::string>();
std::string mimeType = result["content"][i].value("mimeType", "image/png");
std::string mime_type = result["content"][i].value("mimeType", "image/png");
// Convert to OAI-compatible image_url format
result["content"][i] = {
{"type", "image_url"},
{"image_url", {{"url", "data:" + mimeType + ";base64," + data}}}
{"image_url", {{"url", "data:" + mime_type + ";base64," + data}}}
};
}
}