Constructs a Knowledge object optionally initialized with a data frame and
extended with variable relationships expressed via formulas, selectors, or infix operators:
tier(1 ~ V1 + V2, exposure ~ E)
V1 %-->% V3 # infix syntax for required edge from V1 to V3
V2 %!-->% V3 # infix syntax for an edge from V2 to V3 that is forbidden
exogenous(V1, V2)Arguments
- ...
Arguments to define the
Knowledgeobject:Optionally, a single data frame (first argument) whose column names initialize and freeze the variable set.
Zero or more mini-DSL calls:
tier(),exogenous(), (shorthandexo()), or infix operators%-->%,%!-->%.tier(): One or more two-sided formulas (tier(1 ~ x + y)), or a numeric vector.exogenous()/exo(): Variable names or tidyselect selectors. Arguments are evaluated in order; only these calls are allowed.
Details
Create a Knowledge object using a concise mini-DSL with tier(), exogenous() and infix edge operators
%-->% and %!-->%.
The first argument can be a data frame, which will be used to populate the
Knowledge object with variable names. If you later add variables with
add_* verbs, this will throw a warning, since the Knowledge object will
be frozen. You can unfreeze a Knowledge object by using the function
unfreeze(knowledge).
If no data frame is provided, the object is initially empty. Variables can
then be added via tier(), forbidden(), required(), infix operators, or add_vars().
tier(): Assigns variables to tiers. Tiers may be numeric or string labels. The left-hand side (LHS) of the formula is the tier; the right-hand side (RHS) specifies variables. Variables can also be selected using tidyselect syntax:tier(1 ~ starts_with("V")).%-->%and%!-->%: Infix operators to define required and forbidden edges, respectively. Both sides of the operator can use tidyselect syntax to select multiple variables.exogenous()/exo(): Mark variables as exogenous.Numeric vector shortcut for
tier():tier(c(1, 2, 1))assigns tiers by index to all existing variables.
Multiple calls or operators are additive: each call adds new edges to the Knowledge object.
For example:
V1 %-->% V3
V2 %-->% V3results in both edges being required - i.e., the union of all specified required edges.
See also
Other knowledge functions:
+.Knowledge(),
add_exogenous(),
add_tier(),
add_to_tier(),
add_vars(),
as_bnlearn_knowledge(),
as_pcalg_constraints(),
as_tetrad_knowledge(),
convert_tiers_to_forbidden(),
deparse_knowledge(),
forbid_edge(),
get_tiers(),
knowledge_to_caugi(),
remove_edge(),
remove_tiers(),
remove_vars(),
reorder_tiers(),
reposition_tier(),
require_edge(),
seq_tiers(),
unfreeze()
Examples
data(tpc_example)
# Knowledge objects can contain tier information, forbidden and required edges
kn <- knowledge(
tier(
1 ~ V1 + V2,
2 ~ V3
),
V1 %-->% V2,
V3 %!-->% V1
)
# If a data frame is provided, variable names are checked against it
kn <- knowledge(
tpc_example,
tier(
1 ~ child_x1 + child_x2,
2 ~ youth_x3 + youth_x4,
3 ~ oldage_x5 + oldage_x6
)
)
# Throws error if variable not in data
try(
knowledge(
tpc_example,
tier(
1 ~ child_x1 + child_x2,
2 ~ youth_x3 + youth_x4,
3 ~ oldage_x5 + woops
)
)
)
#>
#> ── Knowledge object ────────────────────────────────────────────────────────────
#>
#> ── Tiers ──
#>
#> tier
#> <chr>
#> 1 1
#> 2 2
#> 3 3
#> ── Variables ──
#>
#> var tier
#> <chr> <chr>
#> 1 child_x1 1
#> 2 child_x2 1
#> 3 youth_x3 2
#> 4 youth_x4 2
#> 5 oldage_x5 3
#> 6 oldage_x6 NA
# Using tidyselect helpers
kn <- knowledge(
tpc_example,
tier(
1 ~ starts_with("child"),
2 ~ ends_with(c("_x3", "_x4")),
3 ~ starts_with("oldage")
)
)
# Numeric vector shortcut
kn <- knowledge(
tpc_example,
tier(c(1, 1, 2, 2, 3, 3))
)
# Custom tier naming
kn <- knowledge(
tpc_example,
tier(
child ~ starts_with("child"),
youth ~ starts_with("youth"),
elderly ~ starts_with("oldage")
)
)
# There is also required and forbidden edges, which are specified like so
kn <- knowledge(
tpc_example,
child_x1 %-->% youth_x3,
oldage_x6 %!-->% child_x1
)
# You can also add exogenous variables
kn <- knowledge(
tpc_example,
exogenous(child_x1),
exo(child_x2) # shorthand
)
# Mix different operators
kn <- knowledge(
tpc_example,
tier(
1 ~ starts_with("child") + youth_x4,
2 ~ youth_x3 + starts_with("oldage")
),
child_x1 %-->% youth_x3,
oldage_x6 %!-->% oldage_x5,
exo(child_x2)
)
# You can also build knowledge with a verb pipeline
kn <-
knowledge() |>
add_vars(c("A", "B", "C", "D")) |> # Knowledge now only takes these variables
add_tier(One) |>
add_to_tier(One ~ A + B) |>
add_tier(2, after = One) |>
add_to_tier(2 ~ C + D) |>
forbid_edge(A ~ C) |>
require_edge(A ~ B)
# Mix DSL start + verb refinement
kn <-
knowledge(
tier(1 ~ V5, 2 ~ V6),
V5 %!-->% V6
) |>
add_tier(3, after = "2") |>
add_to_tier(3 ~ V7) |>
add_exo(V2) |>
add_exogenous(V3)
# Using seq_tiers for larger datasets
large_data <- as.data.frame(
matrix(
runif(100),
nrow = 1,
ncol = 100,
byrow = TRUE
)
)
names(large_data) <- paste0("X_", 1:100)
kn <- knowledge(
large_data,
tier(
seq_tiers(
1:100,
ends_with("_{i}")
)
),
X_1 %-->% X_2
)
small_data <- data.frame(
X_1 = 1,
X_2 = 2,
tier3_A = 3,
Y5_ok = 4,
check.names = FALSE
)
kn <- knowledge(
small_data,
tier(
seq_tiers(1:2, ends_with("_{i}")),
seq_tiers(3, starts_with("tier{i}")),
seq_tiers(5, matches("Y{i}_ok"))
)
)
