Universitas Indonesia 2012 Data Mining More data is generated Bank telecom other business transactions Scientific Data astronomy biology etc Web text and ecommerce More data is captured ID: 807156
Download The PPT/PDF document "Magister Teknologi Informasi" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Magister Teknologi InformasiUniversitas Indonesia2012
Data Mining
Slide2More data is generated:Bank, telecom, other business transactions ...
Scientific Data: astronomy, biology, etc
Web, text, and e-commerce
More data is captured:Storage technology faster and cheaperDBMS capable of handling bigger DB
Slide3We have large data stored in one or more database/s.We
starved
to find new
information within those data (for research usage, competitive edge, etc).We want to identify patterns or rules (trends and relationships) in those data.We know that a certain data exist inside a database, but what are the consequences of that data’s existence?
Slide4There is often information “hidden
”
in the data that is
not readily evidentHuman analysts may take weeks to discover useful informationMuch of the data is never analyzed at all
The Data Gap
Total new disk (TB) since 1995
Number of analysts
From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”
Slide5Data Mining is a process of extracting previously unknown, valid and
actionable
information from large databases and then using the information to make crucial
business decisions (Cabena et al. 1998).Data mining: discovering interesting patterns from large amounts of data (Han and Kamber, 2001).
Slide6Definition from [Connolly, 2005]:The process of extracting valid,
previously unknown
, comprehensive, and actionable information from large databases and using it to make crucial business decisions.
The thin red line of data mining: it is all about finding patterns or rules by extracting data from large databases in order to find new information that could lead to new knowledge.
Slide7What is Data Mining?
Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area)
Group together similar documents returned by search engine according to their context (e.g. Amazon rainforest, Amazon.com,)
What is not Data Mining?
Look up phone number in phone directory
Query a Web search engine for information about “Amazon”
Slide8__
____
__
____
__
____
Transformed
Data
Patterns
and
Rules
Target
Data
RawData
Knowledge
Data Mining
Transformation
Interpretation
& Evaluation
Selection
& Cleaning
Integration
Understanding
DATA
Ware
house
Knowledge
Data Mining in Knowledge Discovery Process
Slide9ClusteringClassification
Association Rules
Other Methods:
Outlier detectionSequential patternsPrediction
Trends and analysis of changes
Methods for special data types, e.g., spatial data mining, web mining
…
Association rules try to find association between items in a set of transactions.For example, in the case of association between items bought by customers in supermarket:
90% of transactions that purchase bread and butter also purchase milk
Antecedent: bread and butter Consequent: milk
Confidence factor:
90%
A transaction is a set of items: T={ia, i
b
,…i
t}T ⊂ I, where I is the set of all possible items {i1
, i
2
,…i
n
}
D
, the task relevant data, is a set of transactions (database of transactions).
Example:
items sold by supermarket (I:Itemset):
{sugar, parsley, onion, tomato, salt, bread, olives, cheese, butter}
Transaction by customer (T):
T1:
{sugar, onion, salt}
Database (D):
{T1={salt, bread, olives}, T2={sugar, onion, salt}, T3={bread}, T4={cheese, butter}, T5={tomato}, …}
An association rule is the form: P → Q, where P⊂
I
, Q ⊂
I, and P ∩ Q = ∅Example:{bread} → {butter, cheese}
{onion, tomato} → {salt}
Support of a rule P → Q = Support of (P
∪
Q
) in DsD(P → Q
) =
s
D
(
P
∪
Q
)
percentage of transactions in
D
containing P and Q
.
#transactions containing
P
and
Q
divided by cardinality of
D
.
Confidence of a rule P → Q
c
D
(
P → Q
) =
s
D
(
P
∪
Q
)/
s
D
(
P
)
percentage of transactions that contain both
P
and
Q
in the subset of transactions that contain already
P
.
Thresholds:minimum support:
minsup
minimum confidence:
minconfFrequent itemset Psupport of P larger than minimum support
Strong rule P → Q (
c
%)
(
P
∪
Q
) frequent,
c
is larger than minimum confidence
For rule {A} → {
C
}:
support = support({A, C}) = 50%confidence = support({A
,
C
})/support({
A
}) = 66.6%
For rule {
C
} → {
A
}:
support = support({A,
C
}) = 50%
confidence = support({
A
,
C
})/support({
C
}) = 100.0%
Transaction ID
Items Bought
2000
A, B, C
1000
A, C
4000
A, D
5000
B, E, F
Frequent Itemset
Support
{A}
75%
{B}
50%
{C}
50%
{A,C}
50%
Min. support 50%
Min. confidence 50%
Slide16InputA database of transactions
Each transaction is a list of items (Ex. purchased by a customer in a visit)
Find
all strong rules that associate the presence of one set of items with that of another set of items.Example: 98% of people who purchase tires and auto accessories also get automotive services done
There are no restrictions on the number of items in the head or body of the rule.
The most famous algorithm is APRIORI
Find the frequent itemsets: the sets of items that have minimum support A subset of a frequent itemset must also be a frequent itemset
i.e., if {AB} isa frequent itemset, both {A} and {B} should be a frequent itemset
Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)
Use the frequent itemsets to generate association rules.Source: [Sunysb, 2009]
Slide18Consider a database, D, consisting of 9 transactions.Suppose min. support count required is 2 (i.e.
min_sup = 2/9 = 22%
).
Let minimum confidence required is 70%.We have to first find out the frequent itemset using apriori algorithm.Then, association rules will be generated using min. support & min. confidence
TID
List of Items
T100
I1, I2, I5
T100
I2, I4
T100
I2, I3
T100
I1, I2, I4
T100
I1, I3
T100
I2, I3
T100
I1, I3
T100
I1, I2 ,I3, I5
T100
I1, I2, I3
Slide19Step 1: Generating 1-itemset Frequent Pattern
Itemset
Sup.Count
{l1}
6
{l2}
7
{l3}
6
{l4}
2
{;5}
2
Itemset
Sup.Count
{l1}
6
{l2}
7
{l3}
6
{l4}
2
{;5}
2
Scan D for count of each candidate
Compare candidate support count with minimum support count
C
1
L
1
The set of frequent 1-itemsets, L1
, consists of the candidate 1-itemsets satisfying minimum support.
In the first iteration of the algorithm, each item is member of the set of candidate
Slide20Step 2: Generating 2-itemset Frequent Pattern
Itemset
{l1,l2}
{l1,l3}
{l1,l4}
{l1,l5}
{l2,l3}
{l2,l4}
{l2,l5}
{l3,l4}
{l3,l5}
{l4,l5}
Itemset
Sup.
Count
{l1,l2}
4
{l1,l3}
4
{l1,l4}
1
{l1,l5}
2
{l2,l3}
4
{l2,l4}
2
{l2,l5}
2
{l3,l4}
0
{l3,l5}
1
{l4,l5}
0
Itemset
Sup.
Count
{l1,l2}
4
{l1,l3}
4
{l1,l5}
2
{l2,l3}
4
{l2,l4}
2
{l2,l5}
2
Generate C
2
candidates from L
1
Scan D for count of each candidate
Compare candidate support count with minimum support count
C
2
C
2
L
2
Slide21To discover the set of frequent 2-itemsets, L2, the algorithm uses
L
1
JoinL1 to generate a candidate set of 2-itemsets, C2.Next, the transactions in D are scanned and the support count for each candidate itemset in C
2
is accumulated (as shown in the middle table).
The set of frequent 2-itemsets, L
2
, is then determined, consisting of those candidate 2-itemsets in C
2
having minimum support.
Note
:We haven’t used Apriori Property yet.
Slide22Step 3: Generating 3-itemset Frequent Pattern
Itemset
{l1,l2,l3}
{l1,l2,l5}
Itemset
Sup.
Count
{l1,l2,l3}
2
{l1,l2,l5}
2
Generate C
3
candidates from L
2
Scan D for count of each candidate
Compare candidate support count with minimum support count
C
3
C
3
L
3
Itemset
Sup.
Count
{l1,l2,l3}
2
{l1,l2,l5}
2
Slide23The generation of the set of candidate 3-itemsets, C3, involves
use of the Apriori Property
.
In order to find C3, we compute L2JoinL2.
C
3
= L
2
JoinL
2
= {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
Slide24Based on the Apriori property that all subsets of a frequent itemset must also be frequent, we can determine that four latter candidates cannot possibly be frequent.
For example, lets take
{I1, I2, I3}
. The 2-item subsets of it are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item subsets of {I1, I2, I3} are members of L2, We will keep {I1, I2, I3} in C3. Lets take another example of
{I2, I3, I5}
which shows how the pruning is performed. The 2-item subsets are {I2, I3}, {I2, I5} & {I3,I5}.
Slide25BUT, {I3, I5} is not a member of L2 and hence it is not frequent
violating Apriori Property.
Thus We will have to remove {I2, I3, I5} from C
3. Therefore, C3= {{I1, I2, I3}, {I1, I2, I5}} after checking for all members of result of Join operation for
Pruning
.
Now, the transactions in D are scanned in order to determine L
3
, consisting of those candidates 3-itemsets in C having minimum support.
Slide26Step 4: Generating 4-itemset Frequent PatternThe algorithm uses
L
3
JoinL3 to generate a candidate set of 4-itemsets, C4. Although the join results in {{I1, I2, I3, I5}}, this itemset is pruned since its subset {{I2, I3, I5}} is not frequent. Thus,
C4= φ
, and algorithm terminates,
having found all of the frequent items. This completes our Apriori Algorithm
.
What’s Next ? These frequent itemsets will be used to generate
strong association rules
( where strong association rules satisfy both minimum support & minimum confidence).
Slide27Step 5: Generating Association Rules from Frequent ItemsetsProcedure
:
For each frequent itemset “I”, generate all nonempty subsets of I.
For every nonempty subset s of I, output the rule “s
➔
(I-s)
” if
support_count(I) / support_count(s) ≥ min_conf
where min_conf is minimum confidence threshold.
Slide28In our example:We had L = {{l1},{l2},{l3},{l4},{l5},{l1,l2},{l1,l3},{l1,l5},{l2,l3},{l2,l3},{l2,l5},{l1,l2,l3},{l1,l2,l5}}.
Lets take
I
= {l1,l2,l5}Its all nonempty subsets are {l1,l2}, {l1,l5}, {l2,l5}, {l1}, {l2}, {l5}
Slide29Let minimum confidence thresholdis , say 70%.The resulting association rules are shown below, each listed with its confidence.
R1: {I1,I2}
➔
{I5}Confidence = sc{I1,I2,I5}/sc{I1,I2} = 2/4 = 50% R1 is Rejected.R2: {I1,I5}
➔
{I2}
Confidence = sc{I1,I2,I5}/sc{I1,I5} = 2/2 = 100%
R2 is Selected
.
R3: {I2,I5}
➔
{I1}
Confidence = sc{I1,I2,I5}/sc{I2,I5} = 2/2 = 100%
R3 is Selected
.
Slide30R4: {I1} ➔ {I2,I5}
Confidence = sc{I1,I2,I5}/sc{I1} = 2/6 = 33%
R4 is Rejected.
R5: {I2} ➔ {I1,I5}
Confidence = sc{I1,I2,I5}/{I2} = 2/7 = 29%
R5 is Rejected.
R6: {I5}
➔
{I1,I2}
Confidence = sc{I1,I2,I5}/ {I5} = 2/2 = 100%
R6 is Selected
.
In this way, We have found three strong association rules.
Slide31Learn a method for predicting the instance class from pre-labeled (classified) instances
Many approaches: Statistics,
Decision Trees, Neural Networks,
...
Slide32Prepare a collection of records (training set )
Each record contains a set of
attributes
, one of the attributes is the class.Find a model for class attribute as a function of the values of other attributes (decision tree, neural network, etc)
Prepare
test set
to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
After happy with the accuracy, use your model to classify new instance
From: http://www-users.cs.umn.edu/~kumar/dmbook/index.php
Slide33categorical
categorical
continuous
class
Test
Set
Training
Set
Model
Learn
Classifier
From: http://www-users.cs.umn.edu/~kumar/dmbook/index.php
Slide34categorical
categorical
continuous
class
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
Splitting Attributes
Training Data
Model: Decision Tree
Slide35categorical
categorical
continuous
class
MarSt
Refund
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
There could be more than one tree that fits the same data!
Slide36Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
Test Data
Start from the root of tree.
Slide37Direct MarketingGoal: Reduce cost of mailing by
targeting
a set of consumers likely to buy a new cell-phone product.
Approach:Use the data for a similar product introduced before. We know which customers decided to buy and which decided otherwise. This {buy, don’t buy} decision forms the
class attribute
.
Collect various demographic, lifestyle, and company-interaction related information about all such customers.
Type of business, where they stay, how much they earn, etc.
Use this information as input attributes to learn a classifier model.
From [Berry & Linoff] Data Mining Techniques, 1997
Slide38Fraud DetectionGoal: Predict fraudulent cases in credit card transactions.
Approach:
Use credit card transactions and the information on its account-holder as attributes.
When does a customer buy, what does he buy, how often he pays on time, etcLabel past transactions as fraud or fair transactions. This forms the class attribute.
Learn a model for the class of the transactions.
Use this model to detect fraud by observing credit card transactions on an account.
From: http://www-users.cs.umn.edu/~kumar/dmbook/index.php
Slide39Customer Attrition/Churn:Goal: To predict whether a customer is likely to be lost to a competitor.
Approach:
Use detailed record of transactions with each of the past and present customers, to find attributes.
How often the customer calls, where he calls, what time-of-the day he calls most, his financial status, marital status, etc. Label the customers as loyal or disloyal.
Find a model for loyalty.
From [Berry & Linoff] Data Mining Techniques, 1997
Slide40Helps users understand the natural grouping or structure in a data set.Cluster
: a collection of data objects that are “similar” to one another and thus can be treated collectively as one group.
Clustering:
unsupervised classification: no predefined classes
Clustering
is a process of partitioning a set of data
(or objects) in a set of meaningful sub-classes, called
clusters
.
Slide41Find “natural” grouping of instances given un-labeled data
Slide42A good clustering method will produce high quality clusters in which:the
intra-class
similarity (that is within a cluster) is high.
the inter-class similarity (that is between clusters) is low.The quality of a clustering result also depends on both the similarity measure used by the method and its implementation.
The
quality
of a clustering method is also measured by its ability to discover some or all of the
hidden
patterns.
The quality of a clustering result also depends on the definition and representation of cluster chosen.
Partitioning algorithms: Construct various partitions and then evaluate them by some criterion.
Hierarchy algorithms
: Create a hierarchical decomposition of the set of data (or objects) using some criterion. There is an agglomerative approach and a divisive approach.
Partitioning method: Given a number k
, partition a database
D
of n objects into a set of k clusters so that a chosen objective function is minimized (e.g., sum of distances to the center of the clusters).Global optimum: exhaustively enumerate all partitions – too expensive!
Heuristic methods based on iterative refinement of an initial partition
Hierarchical decomposition of the data set (with respect to a given similarity measure) into a set of nested clustersResult represented by a so called
dendrogram
Nodes in the dendrogram represent possible clusters
can be constructed bottom-up (agglomerative approach) or top down (divisive approach)
Clustering obtained by cutting the dendrogram at a desired level: each
connected
component forms a cluster.
Slide46cluster similarity = similarity of two most similar members
-
Potentially long and skinny clusters
+
Fast
Slide471
2
3
4
5
Slide481
2
3
4
5
Slide491
2
3
4
5
Slide50cluster similarity = similarity of two least similar members
+
tight clusters
-
slow
Slide511
2
3
4
5
Slide521
2
3
4
5
Slide531
2
3
4
5
Slide54Clustering obtained by cutting the dendrogram at a desired level: each connected component forms a cluster.
Dendogram: Hierarchical Clustering
Understanding the DataData Cleaning
Missing Values, Noisy Values, Outliers
Dates
Nominal/NumericDiscretizationNormalization
Smoothing
Transformation
Attribute selection
Can't be expected to be expert in all fields, but understanding the data can be extremely useful for data mining.What data is available?
What available data is actually relevant or useful?
Can the data be enriched from other sources?
Are there historical datasets available?Who is the Real expert to ask questions of?
(Are the results at all sensible? Or are they completely obvious?)
Answers to these questions before embarking on a data mining project are invaluable later on.
Number of instances available:5000 or more for reliable results
Number of attributes:
Depends on data set, but any attribute less than 10 instances is typically not worth including
Number of instances per class:More than 100 per classIf very unbalanced, consider stratified sampling
Goal: maximizing data qualityAssess data quality
Correct noted deficiencies
Assess data quality
Analyze data distribution: is there any strange data distribution? Analyze data elements: check inconsistencies, redundant, missing values, outlier, etc.Conduct physical audit: ensure data recorded properly, for example: cross check data to customer
Analyze business rules: check data violates business rules
Slide59Exclude the attribute for which data is frequently missingExclude records that have missing data
Extrapolate missing values from other known values
Use a predictive model to determine a value
For quantitative values, use a generic figure, such as the average.
Slide60We want all dates to be the same. YYYYMMDD is an ISO standard, BUT it has some issues for data mining.Year 10,000 AD! (we only have 4 digits)
Dates BC[E] eg -0300-02-03 is not a valid YYYY-MM-DD date.
Most importantly: Does not preserve intervals, with/without the –second
Other representations:Posix/Unix System Date: Number of seconds since 1970
etc
Nominal data – without ordering, eg: Sex, Country, etcSome algorithms can't deal with nominal or numeric attributes. Eg Decision trees deal best with nominal, but Neural Networks and many clustering algorithms require only numeric attribute values.
In case the algorithm requires converting Nominal to Numeric
Binary field: One value is 0, other value is 1 (eg gender)
Ordered fields: Convert to numbers to preserve order (eg A vs C grade becomes 4 and 2 respectively)Few Values: Convert each value into a new binary attribute, for example: possible values for attribute AT are A, B, C, D then you can create 4 new attributes ATa, ATb, ATc, ATd with each attribute has value either 0 or 1
Many Values: Convert into groups of values, each with its own (binary) attribute. eg group states in the US into 5 groups of 10.
Unique Values: Ignore identifier like attributes (buang atribut)
Some algorithms require nominal or discrete values. How can we turn a numeric value into a nominal value, or a numeric value with a smaller range of values.Several Discretization techniques. Often called 'binning‘.
Equal Width
Equal Depth
Class DependentEntropyFuzzy (Allow some fuzziness as to the edges of the bin)
Non-disjoint (Allow overlapping intervals)
ChiMerge (Use Chi-Squared Test in the same way as Entropy)
Iterative (Use some technique, then minimise classifier error)
Lazy (Only discretize during classification (VERY lazy!))
Proportional k-Interval (Number of bins = square root of instances)
We might want to normalise our data such that two numeric values are comparable. For example to compare age and income.
Decimal Scaling: v' = v/10k for smallest k such that max(abs(v'))<1
Eg: -991 and 99, k is 3, and -991 becomes -0.991
Min/Max Normalisation: v'= (v-min)/(max-min) * (newMax-newMin) + newMin
Eg: 73600 in [12000,98000] to [0,1] is 0.716
Zero-mean Normalization: v'=(v-mean)/stddev
eg: mean 54000, stddev = 16000, v=73600 then v' = 1.225
Slide64In the case of noisy data, we might want to smooth the data such that it is more uniform.
Some possible techniques:
Regression: Find the function for the data, and move each value some amount closer to what the function predicts (see classification)
Clustering: Some clustering techniques remove outliers. We could cluster the data to remove these noisy values.Binning: We could use some technique to discretize the data, and then smooth based on those 'bins'.
Slide65Transform data to more meaningful formFor example:
Birth date
is transformed to
AgeDate of the first transaction is transformed to number of days since the customer becomes member
Grades of each course
are transformed to
cumulative GPA
Slide66Before getting to the data mining, we may want to either remove instances or select only a portion of the complete data set to work with.Why? Perhaps our algorithms don't scale well to the amount of data we have
Techniques:
Records selection
Partitioning: Split the database into sections and work with each in turn. Often not appropriate unless the algorithm is designed to do it.Sampling: Select a random subset of the data and use that which is hopefully representative.
Attribute selection
Stepwise Forward Selection: Find the best attribute and add.
Stepwise Backward Elimination: Find the worst attribute and remove.
Genetic Algorithms: Use a 'survival of the fittest' along with random cross-breeding approach
etc
Slide67What technique will you use to solve this problem?Given set of applicant attributes (name, salary, age, etc), you want to decide whether you have to approve customer application on credit card or not.
Given national examination scores, you want to group Kabupatens into three educational level: Good, Average, Poor
You want to suggest to your customer about suitable pant given her/his choice of shirt.
You want to estimate economic growth of Indonesia given some data (GNP, GDP, etc)
Slide68Pak Bedu adalah seorang dokter yang ingin menggunakan TI untuk membantunya memutuskan apakah pasiennya terkena kanker atau tidak. Untuk memutuskan hal tersebut, Pak Bedu sudah memiliki data setiap pasien yang meliputi hasil uji dalam 5 test laboratorium serta keputusan apakah dia terkena kanker atau tidak. Berikut ini contoh datanya:
ID
T1
T2
T3
T4
T5
Cancer?
P1
1
3
4
2
3
Yes
P2
0
5
1
1
No
P3
1
2
4
2
2
No
P4
2
2
3
1
2
Yes
P5
2
2
3
1
2
No
Masalah apa yang bisa Anda temukan
di data Pak Bedu?
Slide69Pada saat Pak Bedu memeriksa satu atribut (misal T1) ditemukan 5 atribut yang distinct dengan jumlah sebagai berikut:
1 dengan jumlah 1234
2 dengan jumlah 2037
3 dengan jumlah 16594 dengan jumlah 190111 dengan jumlah 1
Apa yang bisa Anda simpulkan dengan melihat data tersebut?
Slide70Pak Bedu mengembangkan usaha kliniknya di 3 tempat. Pak Bedu menyerahkan sepenuhnya pengelolaan data pasien ke setiap klinik. Pak Bedu ingin mengetahui karakteristik dari pasien untuk usahanya dengan mengumpulkan data dari ketiga kliniknya. Hanya saja Pak Bedu bingung karena setiap klinik memiliki skema data yang berbeda-beda. Apa yang harus Pak Bedu lakukan?
Skema Klinik 1: Pasien( Nama, TglLahir, Tinggi (meter), Berat(kg), JenisKelamin (L/P), Alamat, Provinsi)
Skema Klinik 2: Pasien( Nama, TglLahir, Tinggi (centimeter), Berat(kg), JenisKelamin (P/W), Alamat, Kota )
Skema Klinik 3: Pasien( Nama, Umur, Tinggi (meter), Berat(kg), JenisKelamin (L/P), Kota, Provinsi)
Slide71Pak Bedu ternyata juga memiliki usaha Sekolah Tinggi Kesehatan. Sebagai orang yang baik hati, Pak Bedu ingin memberikan beasiswa untuk mahasiswanya yang sedang mengerjakan skripsi. Hanya saja Pak Bedu ingin memperoleh mahasiswa yang memiliki potensi untuk bisa menyelesaikan skripsinya dalam waktu satu semester. Pak Bedu memiliki data nilai mahasiswa yang sudah lulus beserta lama waktu penyelesaian studinya. Skema data yang dimiliki Pak Bedu antara lain: Tabel Mahasiswa(NPM, Nama, Umur, AsalDaerah, LamaSkripsi, LamaStudi), Tabel MataKuliah(Kode, Nama, Kelompok), Tabel Nilai(NPM, KodeMK, Nilai)
Diskusikan apa yang kira-kira bisa Anda lakukan untuk membantu Pak Bedu
Slide72Pak Bedu mengembangkan kliniknya sampai 100 cabang. Pak Bedu ingin melihat pola-pola kunjungan pasien, daerah mana yang banyak orang sakit, kapan banyak yang sakit, dsb. Pak Bedu memiliki data tentang klinik dan jumlah total kunjungan pasien setiap bulannya.
Apa yang bisa Anda lakukan?