Extractors
Extractors turn the raw ActionRecord chat log into the structured JSON the frontend Data
tab renders. They are pure functions, grouped into sibling modules and re-exported from
simswarm/extractor.py so callers keep importing from simswarm.extractor import ....
from simswarm.extractor import (
extract_posts, extract_top_posts,
extract_engagement_summary, extract_agent_trajectories, extract_profiles,
agent_sentiment_from_trajectories,
extract_social_graph, extract_market_data,
)
Shared helpers — extractor_common.py
The shared module holds action-type predicates and the post-body accessor. All extractors
read post text through post_text(action_args), which prefers action_args["text"] and
falls back to ["content"] (the social env writes bodies under text; older fixtures use
content). Predicates: is_post (create_post), is_like (like_post), is_comment
(create_comment), is_follow (follow), is_trade (buy_shares/sell_shares). It also
houses score_sentiment (the legacy keyword-bag scorer — see Stance scoring).
Posts — extractor_posts.py
extract_posts(chat_log) returns one dict per create_post record:
agent_id, agent_name, platform, content, round_num, action_type, timestamp,
success.
extract_top_posts(chat_log, limit=20) ranks posts by engagement. It stamps each post with a
stable post_id (preferring an explicit id from action_result["post_id"] /
action_args["post_id"] / ["id"], else a synthetic f"{agent_id}-r{round}-{counter}"),
then tallies likes, dislikes, shares, and comments from actions whose
action_args["post_id"] or ["target_id"] matches. vote direction follows the social env:
value > 0 is a like, anything else (including 0, missing, or non-numeric) is a dislike.
Output fields: post_id, agent_id, agent_name, platform, content, round_num,
timestamp, num_likes, num_dislikes, num_shares (reposts plus comments),
engagement (num_likes + num_shares), sorted descending, truncated to limit.
Activity — extractor_activity.py
extract_engagement_summary(chat_log)— per-round metrics:round,total_posts,total_likes(withvotedirection matchingextract_top_posts),total_comments,active_agents.extract_agent_trajectories(chat_log)— per-agentroundslist of{round, posts, actions, sentiment}, wheresentimentisstance.score_stanceover the agent's combined post/comment text that round (VADER,[-1, 1]).extract_profiles(chat_log)— one card per agent:agent_id,name,persona(a one-line activity summary; the richer LLM persona is layered on bypersonas.py),total_posts,total_actions,rounds_active,platforms.agent_sentiment_from_trajectories(trajectories)—{agent_id: mean_sentiment}(unclamped; agents with no rounds skipped).
Market & social graph — extractor_market_social.py
extract_social_graph(chat_log) builds follow edges from successful follow actions. The
followee id is read from action_args["agent_id"] (the engine tool schema's key), falling
back to ["target_id"] for older logs; the followee name resolves through an id→name map.
Returns {"edges": [...], "mutual_follows": [...]}.
extract_market_data(chat_log) emits one trade record per buy_shares/sell_shares. It
reads executed values from action_result (populated by the engine from the env's
ActionResult.data) and falls back to action_args. Buys expose cost (USD spent) and
shares; sells expose proceeds under the cost key so the UI column stays non-negative.
Each record carries a synthetic trade_id (f"{agent_id}-r{round}-{idx}"), side,
market_id, outcome, price, cost, shares, amount_requested, and metadata.
Observed JSON shapes
These are the keys that downstream consumers actually read:
- top posts:
content(nottext),post_id,num_likes,num_shares,engagement. - agent trajectories:
rounds[].round,rounds[].posts,rounds[].actions,rounds[].sentiment— no per-round stance field.
Vocabulary-drift caution
Extractors match action_type strings and action_args keys by literal string. Every
change to an environment's tools must be mirrored here and in graph.py. If a social env
renames a vote arg or a market env changes a result key, the matching extractor silently
tallies zeros — no error, just empty charts (the failure mode behind sims 127 and 128). When
auditing an env change, grep both the extractor family and graph.py for the affected
action-type and arg-key, and verify the env actually emits the key the extractor reads. Also
beware: sibling extractors with similar names exist — confirm which one actually runs before
diagnosing a data gap.